Phi-3-vision-128k-instruct
P
Phi 3 Vision 128k Instruct
Overview :
Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built on a dataset encompassing synthetic data and curated publicly available websites. It focuses on exceptionally high-quality reasoning-intensive data for both text and vision. Belonging to the Phi-3 family of models, the multimodal version supports a 128K context length (in tokens) and has undergone rigorous enhancement processes, combining supervised fine-tuning and direct preference optimization to ensure precise instruction following and robust safety measures.
Target Users :
This model is geared towards a wide range of commercial and research use cases, particularly in general-purpose AI systems and applications requiring both visual and text input. It is suited for memory/computation-constrained environments, latency-sensitive scenarios, general image understanding, OCR, chart and table understanding, etc.
Total Visits: 885.4K
Top Region: US(30.61%)
Website Views : 81.1K
Use Cases
Used in the education sector to aid students in comprehending complex concepts.
In business environments, utilized for analyzing and processing image and text data.
In research, serves as a powerful foundational model for generative AI capabilities.
Features
4.2B parameters, comprising an image encoder, connector, projector, and the Phi-3 Mini language model.
Supports both text and image input, best utilized with chat-formatted prompts.
Context length of 128K tokens.
Trained using 512 H100-80G GPUs with a training duration of 1.5 days.
Training data consists of 500 billion visual and textual tokens.
Output is text generated in response to the input.
Model training dates from February to April 2024.
Model is a static model, training concluded on March 15, 2024.
How to Use
1. Access the Azure AI model hub and select the Phi-3-vision-128k-instruct model.
2. Download or deploy the model as needed.
3. Prepare input data, including text and images.
4. Set model parameters, such as temperature and maximum new tokens.
5. Pass the input data to the model and receive the output.
6. Analyze the model output and perform further processing based on the application scenario.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase