

Phi 3 Vision 128k Instruct
Overview :
Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built on a dataset encompassing synthetic data and curated publicly available websites. It focuses on exceptionally high-quality reasoning-intensive data for both text and vision. Belonging to the Phi-3 family of models, the multimodal version supports a 128K context length (in tokens) and has undergone rigorous enhancement processes, combining supervised fine-tuning and direct preference optimization to ensure precise instruction following and robust safety measures.
Target Users :
This model is geared towards a wide range of commercial and research use cases, particularly in general-purpose AI systems and applications requiring both visual and text input. It is suited for memory/computation-constrained environments, latency-sensitive scenarios, general image understanding, OCR, chart and table understanding, etc.
Use Cases
Used in the education sector to aid students in comprehending complex concepts.
In business environments, utilized for analyzing and processing image and text data.
In research, serves as a powerful foundational model for generative AI capabilities.
Features
4.2B parameters, comprising an image encoder, connector, projector, and the Phi-3 Mini language model.
Supports both text and image input, best utilized with chat-formatted prompts.
Context length of 128K tokens.
Trained using 512 H100-80G GPUs with a training duration of 1.5 days.
Training data consists of 500 billion visual and textual tokens.
Output is text generated in response to the input.
Model training dates from February to April 2024.
Model is a static model, training concluded on March 15, 2024.
How to Use
1. Access the Azure AI model hub and select the Phi-3-vision-128k-instruct model.
2. Download or deploy the model as needed.
3. Prepare input data, including text and images.
4. Set model parameters, such as temperature and maximum new tokens.
5. Pass the input data to the model and receive the output.
6. Analyze the model output and perform further processing based on the application scenario.
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
6.9M