Elimination Game : A benchmark testing framework for evaluating the intelligence of large language models in complex social games, inspired by the game "Werewolf".

Elimination Game

AI Model Research Tools #Artificial Intelligence #Social Game #Benchmark Test #Werewolf #Multi-round Interaction #AI Education Standard Picks Open Source

Overview :

Elimination Game is an innovative benchmark framework for evaluating the performance of large language models (LLMs) in complex social environments. It simulates a multi-player competitive scenario similar to "Werewolf", testing the model's social reasoning, strategy selection, and deception capabilities through open discussions, private communications, and voting elimination mechanisms. This framework not only provides an important tool for researching AI intelligence in social games but also gives developers the opportunity to gain insights into the potential of models in real-world social scenarios. Its main advantages include a multi-round interactive design, dynamic alliance and betrayal mechanisms, and detailed evaluation metrics, enabling a comprehensive assessment of AI's social capabilities.

Target Users :

This product is suitable for AI researchers, developers, and professionals interested in social games and AI social capabilities. It offers a unique perspective and tools for researching the performance of language models in complex social environments, helping to advance AI research and development in the field of social intelligence.

Total Visits： 492.1M

Top Region： US(19.34%)

Website Views ： 51.3K

Use Cases

Researchers use Elimination Game to test the performance of different language models in social reasoning and deception capabilities, providing data support for model optimization.

Educational institutions use it as a teaching tool to help students understand the behavior patterns of AI in complex social scenarios.

Developers use this framework to evaluate and improve the strategic choices and social interaction capabilities of their self-developed language models.

Features

Simulates a multi-player competitive environment to test the model's comprehensive capabilities in social games.

Supports open discussions and private communications, simulating information transmission in real social scenarios.

Evaluates the model's strategic decision-making and social reasoning abilities through a voting elimination mechanism.

Provides detailed evaluation metrics, including betrayal rate and jury persuasiveness, to comprehensively measure model performance.

Supports multiple language models to participate in testing, providing rich experimental data for AI research.

How to Use

1. Access the Elimination Game's official website or GitHub repository to learn about the testing framework's basic information and usage guide.

2. Prepare the language model to participate in the test, ensuring its compatibility and interaction with the testing framework.

3. Run Elimination Game in the testing environment, setting parameters such as the number of players and game rounds.

4. Observe the model's performance in the game, recording data from open discussions, private communications, and voting eliminations.

5. Based on the test results, analyze the model's social reasoning, strategy selection, and deception capabilities, and optimize it based on evaluation metrics.