

Wepoints
Overview :
WePOINTS is a series of multimodal models developed by the WeChat AI team, aimed at creating a unified framework that accommodates various modalities. These models utilize the latest advancements and technologies in multimodal modeling to promote seamless integration of content understanding and generation. The WePOINTS project provides not only models but also pre-trained datasets, evaluation tools, and usage tutorials, making it a significant contribution to the field of multimodal artificial intelligence.
Target Users :
The target audience of WePOINTS primarily includes researchers, developers, and enterprise users in the field of artificial intelligence. Researchers can utilize the multimodal models provided by WePOINTS for academic research and technological innovation. Developers can quickly build and deploy multimodal applications based on WePOINTS. Enterprise users can leverage WePOINTS to enhance the intelligence of their products, such as intelligent customer service and content moderation.
Use Cases
- Use WePOINTS models for joint analysis of images and text to enhance the accuracy of content moderation systems.
- Develop intelligent customer service systems using WePOINTS models for image and text recognition and response functionalities.
- Combine WePOINTS models with CATTY technology for image recognition and classification tasks to improve efficiency and accuracy in image processing.
Features
- Provides a unified framework for multimodal models: WePOINTS integrates various modalities, such as vision and language, offering a standardized processing approach.
- Supports bilingual models: WePOINTS 1.5 models support bilingual capabilities, enhancing international applicability.
- Integrated into SGLang: WePOINTS 1.5 will be integrated into SGLang, expanding the application scenarios of the models.
- Provides pre-trained datasets: WePOINTS will release the pre-trained dataset for POINTS 1.5, facilitating use by researchers and developers.
- Supports Hugging Face and ModelScope: WePOINTS models are available on Hugging Face and ModelScope, enabling quick access and utilization by developers.
- Provides evaluation tools: WePOINTS uses VLMEvalKit to assess model performance, ensuring a standardized evaluation process.
- Supports model fusion technology: WePOINTS introduces model fusion techniques, enhancing the performance of the final model through the integration of models fine-tuned on different instruction datasets.
- Offers CATTY image segmentation technology: WePOINTS features CATTY technology for segmenting high-resolution images into equally-sized smaller image blocks while maintaining the aspect ratio of the original image.
How to Use
1. Clone the WePOINTS project to your local machine: Use git to clone the project repository.
2. Install dependencies: Navigate to the project directory and use pip to install the required dependencies.
3. Download models: Download the appropriate WePOINTS models as needed from Hugging Face or ModelScope.
4. Configure the environment: Set up the runtime environment according to model requirements, such as the CUDA version.
5. Run the model: Follow the tutorial provided by WePOINTS to execute the model for multimodal tasks.
6. Evaluate the model: Use the VLMEvalKit tool to assess model performance.
7. Model fusion: If multiple models are available, utilize WePOINTS' model fusion techniques for integration.
8. Image segmentation: Apply CATTY technology for segmentation of high-resolution images.
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
7.0M