Phi 3 Mini 4k Instruct Onnx : Quantized ONNX model of Phi-3 Mini, supporting acceleration of inference on multiple hardware platforms

Phi 3 Mini 4k Instruct Onnx

AI Model AI Model Inference Training #Natural Language Processing #Generative AI Model #Large Model #Accelerated Inference #ONNX #Multi-Hardware Platforms Fresh Picks Open Source

Overview :

Phi-3 Mini is a lightweight, advanced open-source large-scale model constructed on synthetic data from Phi-2 and filtered website data, committed to providing high-quality, inference-intensive data. The model has undergone a rigorous enhancement process, combining supervised fine-tuning and direct preference optimization to ensure strict adherence to instructions and robust security measures. This repository provides an optimized ONNX version of Phi-3 Mini, which can be accelerated on CPUs and GPUs using ONNX Runtime, supporting servers, Windows, Linux, Mac, and various platforms, with the best precision configuration tailored for each platform. ONNX Runtime's DirectML support allows developers to achieve large-scale hardware acceleration on Windows devices with AMD, Intel, and NVIDIA GPUs.

Target Users :

["- Business: Integration of Phi-3 Mini into various business applications to provide natural language processing capabilities","- Developers: Leverage the powerful generation capabilities of Phi-3 Mini to develop various language-related applications and services, such as conversational systems, Q&A systems, text generation, and data analysis","- Individual Users: Utilize Phi-3 Mini to produce high-quality natural language content to assist with writing, inquiries, and other needs"]

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 59.6K

Use Cases

1. Integrate Phi-3 Mini into the intelligent assistant system of a business to provide customers with natural language interaction and generative services

2. Develop automatic text generation and creative assistance tools based on Phi-3 Mini to provide support for writers, content creators, and others

3. Utilize the inference capabilities of Phi-3 Mini to build data analysis and report generation systems, automatically generating analysis reports

Features

- Supports acceleration of inference on multiple hardware platforms, including: - DirectML: for Windows devices with AMD, Intel, and NVIDIA GPUs, with int4 precision achieved through AWQ quantization - FP16 CUDA: for NVIDIA GPUs, with FP16 precision - Int4 CUDA: for NVIDIA GPUs, with int4 precision achieved through RTN quantization - Int4 CPU and mobile: with int4 precision achieved through RTN quantization, providing two versions for CPUs and mobile devices to balance latency and precision

- Offers a new Generate() API for ONNX Runtime, greatly simplifying the integration process of generative AI models in applications

- Exceptional performance, up to 10 times faster than PyTorch, and up to 3 times faster than Llama.cpp

- Supports large batch, long prompt, and long output inference

- Post-quantization, the model size is small, facilitating deployment

How to Use

1. Go to the Hugging Face page to download the required ONNX model files

2. Install the ONNX Runtime and ONNX Runtime Generate() API-related software packages

3. Load the ONNX model file in the code

4. Use the ONNX Runtime Generate() API to set inference parameters, such as batch size, prompt length, etc.

5. Call the generation function and enter the text prompt

6. Retrieve the output results and perform subsequent processing

Featured AI Tools

Gemini

Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AI Model

6.9M

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Direct Visits	48.39%	External Links	35.85%	Email	0.03%
Organic Search	12.76%	Social Media	2.96%	Display Ads	0.02%

Monthly Visits	25296.55k
Average Visit Duration	285.77
Pages Per Visit	5.83
Bounce Rate	43.31%

Monthly Visits	25296.55k
United States	17.94%
China	17.08%
India	8.40%
Russia	4.58%
Japan	3.42%