EXAONE-3.5-32B-Instruct-AWQ
E
EXAONE 3.5 32B Instruct AWQ
Overview :
EXAONE-3.5-32B-Instruct-AWQ is a series of instruction-tuned bilingual (English and Korean) generation models developed by LG AI Research, with parameters ranging from 2.4B to 32B. These models support long-context handling of up to 32K tokens, demonstrating state-of-the-art performance in real-world use cases and long-context comprehension while remaining competitive in general domains compared to similarly sized recently-released models. The model employs AWQ quantization technology, achieving 4-bit group-level weight quantization to optimize deployment efficiency.
Target Users :
The target audience includes researchers, developers, and enterprises that require text generation and processing in multilingual environments. The model's capability for long-context handling and bilingual processing makes it particularly suitable for applications that involve vast amounts of text data and cross-linguistic communication.
Total Visits: 29.7M
Top Region: US(17.94%)
Website Views : 47.7K
Use Cases
Researchers use this model for multilingual text translation and generation studies.
Developers leverage the model's long-context handling capabilities to create intelligent assistant applications.
Businesses utilize this model to optimize automated response systems in customer service.
Features
Supports long-context handling capabilities for up to 32K tokens.
Exhibits state-of-the-art performance in bilingual generation models in English and Korean.
Achieves 4-bit group-level weight quantization through AWQ quantization technology.
The model comprises 30.95 billion parameters, including 64 layers and 40 attention heads.
Supports rapid startup and deployment, compatible with various frameworks such as TensorRT-LLM and vLLM.
Provides a pre-quantized EXAONE 3.5 model for easy deployment across different devices.
The text generated by the model does not reflect the views of LG AI Research, ensuring content neutrality.
How to Use
1. Install the necessary libraries, such as transformers>=4.43 and autoawq>=0.2.7.post3.
2. Load the model and tokenizer from Hugging Face using AutoModelForCausalLM and AutoTokenizer.
3. Prepare the input prompt, which can be in English or Korean.
4. Use the tokenizer.apply_chat_template method to convert messages into the model's input format.
5. Call the model.generate method to produce text.
6. Use tokenizer.decode to convert the generated tokens into readable text.
7. Adjust model parameters such as max_new_tokens and do_sample as needed to control the length and diversity of the generated text.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase