hallucination-leaderboard
H
Hallucination Leaderboard
Overview :
This product is an open-source project developed by Vectara to evaluate the hallucination rate of Large Language Models (LLMs) when summarizing short documents. It utilizes Vectara's Hughes Hallucination Evaluation Model (HHEM-2.1) to calculate rankings by detecting hallucinations in the model's output. This tool is significant for researching and developing more reliable LLMs, helping developers understand and improve the accuracy of their models.
Target Users :
This product is primarily aimed at researchers, developers, and users interested in evaluating LLM performance in the field of Natural Language Processing (NLP). It helps them understand the accuracy and reliability of different LLMs when generating content, enabling them to choose the most suitable model for specific tasks.
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 63.5K
Use Cases
Researchers can use this leaderboard to compare the hallucination rates of different LLM models when generating summaries, thereby selecting more reliable models.
Developers can use this tool to evaluate the performance of their own LLM models and optimize them to reduce hallucination.
Businesses can refer to this leaderboard to select LLM models suitable for their business needs, for applications such as content generation and customer service.
Features
Provides LLM hallucination evaluation based on the HHEM-2.1 model.
Supports comparison and ranking of various LLMs.
Uses the CNN/Daily Mail corpus for document summarization testing.
Evaluates models through API calls to various LLM models.
Provides key metrics such as hallucination rate, factuality rate, and answer rate.
Supports the evaluation of multilingual models (currently primarily supports English).
Regularly updated to reflect changes in model performance.
How to Use
1. Visit the project homepage (https://github.com/vectara/hallucination-leaderboard) to understand the project background and usage instructions.
2. Refer to the README file to learn how to use the HHEM-2.1 model for evaluation.
3. Prepare the LLM models to be evaluated and their corresponding API interfaces.
4. Use the project's provided scripts or code to call the LLM models and generate summaries.
5. Evaluate the generated summaries using the HHEM-2.1 model to obtain metrics such as hallucination rate.
6. Analyze the evaluation results and compare the performance of different models.
7. Adjust the models or select a better model for applications based on your needs.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase