Phi 3 Mini 128k Instruct Onnx : Phi-3 Mini-128K-Instruct ONNX optimized model for inference acceleration

Phi 3 Mini 128k Instruct Onnx

AI Model AI Model Inference Training #Natural Language Processing #Large Language Model #ONNX #ONNX Runtime #Optimized Inference #Cross-Platform Deployment Fresh Picks Open Source

Overview :

Phi-3 Mini is a lightweight top-tier open-source model built upon the synthetic data and filtered websites used by Phi-2, focusing on high-quality inference-intensive data. This model belongs to the Phi-3 series and the mini version has two variants supporting 4K and 128K context lengths. The model has undergone rigorous enhancement processes, including supervised fine-tuning and direct preference optimization, to ensure precise instruction following and robust security measures. These ONNX-optimized Phi-3 Mini models run efficiently on CPUs, GPUs, and mobile devices. Microsoft has also introduced the ONNX Runtime Generate() API, simplifying the usage of Phi-3.

Target Users :

["? Machine learning researchers and developers can leverage this optimized model to enhance inference performance","? Enterprises and organizations that need to deploy large language models on various devices (servers, Windows, Linux, Mac, mobile devices)","? Professionals in dialogue systems, Q&A systems, content generation, and other tasks can utilize this model to generate high-quality outputs","? Any application requiring natural language processing can benefit from the model's powerful performance"]

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 74.0K

Use Cases

1. A technology company can use the Phi-3 Mini model to build high-performance conversational agents to provide automated Q&A services to customers.

2. A news agency can leverage the model to automatically generate high-quality news article summaries and headlines.

3. Researchers can use the model to conduct experiments and research related to natural language processing, exploring new uses of language models.

Features

? Supports ONNX format to accelerate inference on CPUs, GPUs, and mobile devices

? Provides various optimization configurations, including int4 quantization for DirectML, fp16 and int4 quantization for NVIDIA GPUs, and int4 quantization for CPUs and mobile devices

? Enhanced through training to ensure precise instruction following and robust security

? A lightweight design focused on high-quality inference-intensive data

? Offers the new ONNX Runtime Generate() API to simplify the usage of Phi-3

? Perfomance tested and optimized on a variety of hardware and platforms

How to Use

1. Download the ONNX model file suitable for your hardware configuration from the GitHub repository.

2. Install necessary Python packages, such as ONNX Runtime and transformers.

3. Load the model and perform inference using the ONNX Runtime Generate() API.

4. Prepare your input text or instructions.

5. Call the model to make predictions or generate outputs.

6. Perform post-processing on the output results if necessary.

7. Integrate the generated output into your application or workflow.