Fine-tuning

# Fine-tuning

Animagine XL 4.0

Animagine XL 4.0

Animagine XL 4.0 is an anime-themed generation model fine-tuned from Stable Diffusion XL 1.0. It has been trained on 8.4 million diverse anime-style images for a total of 2650 hours. The model focuses on generating and modifying anime-themed images based on text prompts, supporting various special tags to control different aspects of image generation. Its main advantages include high-quality image generation, rich anime style details, and precise restoration of specific characters and styles. The model was developed by Cagliostro Research Lab and is licensed under CreativeML Open RAIL++-M, allowing for commercial use and modification.

Image Generation

Flex.1-alpha

Flex.1-alpha is a powerful text-to-image generation model based on an 8 billion parameter corrected flow transformer architecture. It inherits features from FLUX.1-schnell and generates images without the need for CFG through trained guided embedders. The model supports fine-tuning and is open-source (Apache 2.0), making it suitable for use in various inference engines like Diffusers and ComfyUI. Its main advantages include efficient generation of high-quality images, flexible fine-tuning capabilities, and strong community support. The development background aims to address the compression and optimization issues of image generation models while continuously improving model performance through ongoing training.

Image Generation

Llama-3.3-70B-Instruct

Llama 3.3 70B Instruct

Llama-3.3-70B-Instruct is a large language model with 70 billion parameters developed by Meta, specifically optimized for multilingual dialogue scenarios. This model utilizes an optimized transformer architecture and enhances its utility and safety through supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). It supports multiple languages and is proficient in handling text generation tasks, making it a significant advancement in the field of natural language processing.

Tülu 3

Tülu 3 is a series of open-source advanced language models that have been fine-tuned to adapt to various tasks and user needs. These models achieve complex training processes by combining elements of proprietary methods, innovative technology, and established academic research. The success of Tülu 3 is rooted in meticulous data management, rigorous experimentation, innovative methodologies, and enhanced training infrastructure. By openly sharing data, recipes, and findings, Tülu 3 aims to empower the community to explore new and innovative fine-tuning techniques.

Language Models

WorkflowLLM

WorkflowLLM is a data-centric framework designed to enhance the orchestration capabilities of large language models (LLMs). At its core is WorkflowBench, a large-scale supervised fine-tuning dataset containing 106,763 samples from 1,503 APIs across 83 applications and 28 categories. WorkflowLLM fine-tunes the Llama-3.1-8B model to create the WorkflowLlama model optimized specifically for workflow orchestration tasks. Experimental results indicate that WorkflowLlama excels in orchestrating complex workflows and generalizes well to unseen APIs.

Workflow Orchestration

TableGPT2

TableGPT2 is a large multimodal model specifically pre-trained and fine-tuned for tabular data to address the challenges of inadequate integration in practical applications. It was pre-trained and fine-tuned on over 593.8K tables and 2.36M high-quality query-table-output tuples, achieving unprecedented scale. A key innovation of TableGPT2 is its novel table encoder, designed to capture information at both pattern and cell levels, enhancing the model's ability to handle ambiguous queries, missing column names, and irregular tables. On 23 benchmark metrics, the average performance of TableGPT2 improved by 35.20% for the 7B model and by 49.32% for the 72B model, while maintaining robust general language and coding capabilities.

Llama-3.2-1B

Llama-3.2-1B is a multilingual large language model released by Meta, focusing on text generation tasks. The model utilizes an optimized Transformer architecture and is fine-tuned through supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to align with human preferences for usefulness and safety. It supports eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and demonstrates excellent performance across various conversational use cases.

Phi-3.5-mini-instruct

Phi 3.5 Mini Instruct

Phi-3.5-mini-instruct is a lightweight, multilingual advanced text generation model developed by Microsoft based on high-quality data. It focuses on delivering high-quality, reasoning-intensive data, supports a context length of 128K tokens, and has undergone rigorous enhancement processes, including supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure accurate instruction following and robust safety measures.

RAGFoundry

RAGFoundry is a library designed to enhance the ability of large language models (LLMs) to utilize external information by fine-tuning models on specially created RAG-augmented datasets. The library facilitates efficient model training using Parameter-Efficient Fine-Tuning (PEFT), allowing users to easily measure performance improvements with RAG-specific metrics. It features a modular design, enabling workflow customization through configuration files.

AI Development Assistant

Finetune

Finetune is a developer-focused platform for fine-tuning AI intelligent agents. It allows developers to create synthetic users that reflect customer characteristics, enabling agents to test and learn in a simulated environment. The platform offers session reports and weighted execution graphs to help developers understand performance and optimize accordingly. Additionally, Finetune supports various popular AI models and frameworks, simplifying the integration and deployment process.

Development & Tools

Mastering LLMs

Mastering LLMs is a free course featuring over 25 industry veterans covering topics such as evaluation, retrieval-augmented generation (RAG), and fine-tuning. The course content is provided by experts in fields like information retrieval, machine learning, recommendation systems, MLOps, and data science, aimed at applying previous techniques in these domains to LLMs, offering meaningful advantages to users. The course is designed for technical ICs, including engineers and data scientists, who need guidance on improving AI products.

lmms-finetune

lmms-finetune is a unified codebase designed to simplify the fine-tuning process for large multimodal models (LMMs). It provides a structured framework that allows users to seamlessly integrate and fine-tune cutting-edge LMMs, supporting full fine-tuning as well as strategies like LoRA. The codebase is lightweight and straightforward, making it easy to understand and modify, and it supports various models including LLaVA-1.5, Phi-3-Vision, Qwen-VL-Chat, LLaVA-NeXT-Interleave, and LLaVA-NeXT-Video.

AI Development Assistant

Meta-Llama-3.1-70B

Meta Llama 3.1 70B

Meta Llama 3.1 is a large language model released by Meta, featuring 70 billion parameters and supporting text generation in eight languages. It employs an optimized Transformer architecture and is further refined through supervised fine-tuning and reinforcement learning from human feedback to align with human preferences for helpfulness and safety. The model excels in multilingual conversation use cases, outperforming many existing open-source and closed chatbot models.

AMchat

AMchat is a large language model that integrates mathematical knowledge, higher-level mathematics learning questions, and their solutions. Based on the InternLM2-Math-7B model and fine-tuned with xtuner, it is specifically designed to answer higher-level mathematics problems. The project received Top12 ranking and the Innovation Creativity Award in the 2024 Puyu Large Model Series Competition (Spring Session), demonstrating its professional capabilities and innovativeness in the field of higher-level mathematics.

AI mathematical assistant

EmoLLM

EmoLLM is a mental health large language model fine-tuned from LLM instructions, designed to comprehensively understand and promote the mental health status of individuals, groups, and even entire societies. It encompasses key components such as cognitive factors, emotional factors, behavioral factors, social environment, physical health, psychological resilience, preventative and intervention measures, and assessment and diagnostic tools. Through fine-tuning configurations, EmoLLM can provide support in mental health counseling tasks, helping users better understand and cope with psychological issues.

AI mental health

AIKit

AIKit is an open-source tool designed to simplify the process of hosting, deploying, building, and fine-tuning large language models (LLMs). It offers a REST API compatible with the OpenAI API, supporting various inference capabilities and formats, allowing users to send requests using any compatible client. Furthermore, AIKit provides an extensible fine-tuning interface with Unsloth support, offering users a fast, memory-efficient, and user-friendly fine-tuning experience.

AI Development Assistant

mistral-finetune

Mistral Finetune

mistral-finetune is a lightweight codebase that utilizes the LoRA training paradigm, allowing fine-tuning by training only 1-2% of the additional weights in the form of low-rank matrix perturbations while freezing most of the original weights. It is optimized for multi-GPU single-node training setups. For smaller models, like the 7B model, a single GPU is sufficient. This codebase aims to provide a simple and guided fine-tuning entry point, particularly in data formatting, and does not intend to cover a wide range of model architectures or hardware types.

Llama-3[8B] Meditron V1.0

Llama 3[8B] Meditron V1.0

Llama-3[8B] Meditron V1.0 is an 8 billion parameter large language model (LLM) designed specifically for the biomedical field, fine-tuned within 24 hours after the release of Llama-3 by Meta. The model exceeds all existing open models at the same parameter level in standard benchmarks such as MedQA and MedMCQA, and approaches the performance of the leading open model in the medical field with 70 billion parameters, Llama-2[70B]-Meditron. This work demonstrates the innovative potential of open foundational models and is part of a broader initiative to ensure fair access to this technology in resource-poor areas.

Open-Source Large Model Cookbook

Open Source Large Model Cookbook

This project is a comprehensive guide to using open-source large models, covering environment setup, model deployment, and efficient fine-tuning. It simplifies the use and application of open-source large models, enabling more ordinary learners to access and utilize them. The project is targeted towards learners interested in open-source large models and who want to get hands-on experience. It provides detailed instructions on environment configuration, model deployment, and fine-tuning methods.

Orthogonal Finetuning (OFT)

Orthogonal Finetuning (OFT)

The study 'Controlling Text-to-Image Diffusion' explores how to effectively guide or control powerful text-to-image generation models for various downstream tasks. The orthogonal finetuning (OFT) method is proposed, which maintains the model's generative ability. OFT preserves the hypershell energy between neurons, preventing the model from collapsing. The authors consider two important fine-tuning tasks: subject-driven generation and controllable generation. Results show that the OFT method outperforms existing methods in terms of generation quality and convergence speed.

Image Generation

ASPIRE

ASPIRE is a well-designed framework designed to boost the selective prediction capability of large language models. It leverages parameter-efficient fine-tuning training to enable LLMs to self-assess and provide confidence scores for generated answers. Experimental results indicate that ASPIRE significantly outperforms current selective prediction methods on various question-answering datasets.

ReFT

ReFT is a simple yet effective method for enhancing the reasoning capabilities of large language models (LLMs). It first preheats the model through supervised fine-tuning (SFT), and then further fine-tunes the model using online reinforcement learning, specifically the PPO algorithm presented in this paper. ReFT significantly outperforms SFT by automatically sampling a large number of reasoning paths for a given problem and naturally deriving rewards from the true answers. ReFT's performance can be further improved by combining reasoning strategies (such as majority voting and re-ranking). It's noteworthy that ReFT achieves improvements by learning from the same training questions as SFT, without relying on additional or enhanced training questions. This demonstrates ReFT's stronger generalization ability.

AI model inference training

Astraios

Astraios is a platform for fine-tuning large language models, offering a variety of parameter-efficient fine-tuning methods and a range of model sizes to choose from. Users can perform fine-tuning on large-scale language models on this platform and achieve the best cost-performance balance. The platform also provides rich models, datasets, and documentation to facilitate user research and development. The flexible pricing scheme caters to the needs of users of different scales.

Windows AI Studio

Windows AI Studio

Windows AI Studio streamlines the development of generative AI applications by bringing together a curated collection of advanced AI development tools and models from sources like the Azure AI Studio Catalog and Hugging Face. You can browse the AI model catalog supported by Azure ML and Hugging Face, download models locally, fine-tune them, test them, and integrate them into your Windows applications. All computations are performed locally, so ensure your device has sufficient resources to handle the load. We plan to integrate ORT/DML into the Windows AI Studio workflow in the future, enabling developers to run AI models on any Windows hardware.

AI Development Assistant

Emu

Emu is a quality control tool designed to enhance the aesthetic quality of image generation models. It leverages fine-tuning with a limited number of high-quality images to significantly improve generation quality. Emu has been pre-trained on 110 million image-text pairs and fine-tuned using thousands of carefully selected high-quality images. Compared to models trained only with pre-training, Emu achieves an 82.9% win rate. In terms of visual appeal preference, Emu outperforms the state-of-the-art SDXLv1.0 with respective scores of 68.4% and 71.3%. Emu can also be applied to other architectures, including pixel diffusion and masked generative transformer models.

AI image generation

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase