EXAONE-3.5-32B-Instruct-GGUF
E
EXAONE 3.5 32B Instruct GGUF
Overview :
EXAONE-3.5-32B-Instruct-GGUF is a series of instruction-tuned bilingual (English and Korean) generative models developed by LG AI Research, with versions ranging from 2.4B to 32B parameters. These models support long-context processing up to 32K tokens, showcasing state-of-the-art performance in real-world use cases and long-context understanding while remaining competitive in the general domain compared to other recently released models of similar scale. This model series offers detailed information through technical reports, blogs, and GitHub, and includes various precision instruction-tuned 32B language models with the following features: 30.95B parameters (excluding embeddings), 64 layers, GQA attention heads, 40 Q-heads and 8 KV-heads, a vocabulary size of 102,400, a context length of 32,768 tokens, and quantization options such as Q8_0, Q6_0, Q5_K_M, Q4_K_M, IQ4_XS in GGUF format (including BF16 weights).
Target Users :
The target audience includes researchers, developers, and enterprises that require high-performance language models, especially in scenarios involving large data processing and long-context information. EXAONE-3.5-32B-Instruct-GGUF is suitable for tasks such as natural language processing, text generation, and machine translation due to its powerful performance and multilingual support, helping users enhance their work efficiency and ability to tackle complex problems.
Total Visits: 29.7M
Top Region: US(17.94%)
Website Views : 47.5K
Use Cases
Generate long articles and understand content using EXAONE-3.5-32B-Instruct-GGUF.
Utilize the model for cross-language text translation and information retrieval in a multilingual environment.
Apply the model in dialogue systems and chatbots to provide more natural and accurate language interactions.
Features
Supports long-context processing with a context length of up to 32K tokens.
Includes models of various precisions such as Q8_0, Q6_0, Q5_K_M, Q4_K_M, IQ4_XS, and BF16.
Optimized model deployment, including a 2.4B model tailored for small or resource-constrained devices.
Offers pre-quantized models utilizing AWQ and multiple quantization types.
Supports various deployment frameworks, including TensorRT-LLM, vLLM, SGLang, llama.cpp, and Ollama.
Model training incorporates system prompt usage, enhancing dialogue and interaction efficiency.
Generated texts do not reflect the views of LG AI Research, ensuring content neutrality.
How to Use
1. Install llama.cpp. Please refer to the installation guide in the llama.cpp GitHub repository.
2. Download the GGUF format file for the EXAONE 3.5 model.
3. (Optional) If using BF16 precision, you may need to merge the split files.
4. Run the model using llama.cpp and conduct tests in dialogue mode.
5. Follow the usage suggestions prompted by the system for optimal model performance.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase