

EXAONE 3.5 32B Instruct GGUF
Overview :
EXAONE-3.5-32B-Instruct-GGUF is a series of instruction-tuned bilingual (English and Korean) generative models developed by LG AI Research, with versions ranging from 2.4B to 32B parameters. These models support long-context processing up to 32K tokens, showcasing state-of-the-art performance in real-world use cases and long-context understanding while remaining competitive in the general domain compared to other recently released models of similar scale. This model series offers detailed information through technical reports, blogs, and GitHub, and includes various precision instruction-tuned 32B language models with the following features: 30.95B parameters (excluding embeddings), 64 layers, GQA attention heads, 40 Q-heads and 8 KV-heads, a vocabulary size of 102,400, a context length of 32,768 tokens, and quantization options such as Q8_0, Q6_0, Q5_K_M, Q4_K_M, IQ4_XS in GGUF format (including BF16 weights).
Target Users :
The target audience includes researchers, developers, and enterprises that require high-performance language models, especially in scenarios involving large data processing and long-context information. EXAONE-3.5-32B-Instruct-GGUF is suitable for tasks such as natural language processing, text generation, and machine translation due to its powerful performance and multilingual support, helping users enhance their work efficiency and ability to tackle complex problems.
Use Cases
Generate long articles and understand content using EXAONE-3.5-32B-Instruct-GGUF.
Utilize the model for cross-language text translation and information retrieval in a multilingual environment.
Apply the model in dialogue systems and chatbots to provide more natural and accurate language interactions.
Features
Supports long-context processing with a context length of up to 32K tokens.
Includes models of various precisions such as Q8_0, Q6_0, Q5_K_M, Q4_K_M, IQ4_XS, and BF16.
Optimized model deployment, including a 2.4B model tailored for small or resource-constrained devices.
Offers pre-quantized models utilizing AWQ and multiple quantization types.
Supports various deployment frameworks, including TensorRT-LLM, vLLM, SGLang, llama.cpp, and Ollama.
Model training incorporates system prompt usage, enhancing dialogue and interaction efficiency.
Generated texts do not reflect the views of LG AI Research, ensuring content neutrality.
How to Use
1. Install llama.cpp. Please refer to the installation guide in the llama.cpp GitHub repository.
2. Download the GGUF format file for the EXAONE 3.5 model.
3. (Optional) If using BF16 precision, you may need to merge the split files.
4. Run the model using llama.cpp and conduct tests in dialogue mode.
5. Follow the usage suggestions prompted by the system for optimal model performance.
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
6.9M