TAG-Bench
T
TAG Bench
Overview :
TAG-Bench is a benchmark for evaluating and researching the performance of natural language processing models in answering database queries. It is built on the BIRD Text2SQL benchmark, enhancing query complexity by incorporating semantic reasoning that leans on world knowledge or goes beyond the explicit information in the database. TAG-Bench aims to foster the integration of AI and database technologies by simulating realistic database query scenarios, providing researchers with a platform to challenge existing models.
Target Users :
TAG-Bench is primarily designed for researchers and developers in the fields of natural language processing and database studies. It is suitable for professionals looking to evaluate and enhance model performance in handling complex database queries. By utilizing TAG-Bench, they can gain insights into the strengths and weaknesses of their models and explore new algorithms and techniques to improve reasoning and query processing capabilities.
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 50.0K
Use Cases
Researchers use TAG-Bench to assess the performance of their newly developed natural language processing models in handling complex database queries.
Developers leverage TAG-Bench to test and optimize their database query processing systems to enhance their performance in real-world applications.
Educational institutions utilize TAG-Bench as a teaching tool to help students understand the application of natural language processing in database queries.
Features
Offers 80 complex queries based on the BIRD Text2SQL benchmark, covering matching, comparison, ranking, and aggregation queries.
Requires models to possess world knowledge or perform semantic reasoning beyond database information.
Supports the use of Pandas DataFrames to simulate a database environment.
Recommends using GPU for creating table indexes to enhance query efficiency.
Provides detailed setup guidelines, including environment creation, database conversion, and index creation.
Supports multiple evaluation methods, including handwritten TAG, Text2SQL, Text2SQL+LM, RAG, and retrieval+LM ranking.
Offers detailed documentation for model configuration and evaluation through LOTUS.
How to Use
Create a conda environment and download dependencies.
Download the BIRD database and convert it to Pandas DataFrames.
Create indexes for each table (GPU usage is recommended).
Obtain Text2SQL prompts and modify the tag_queries.csv file.
Run the evaluation command in the tag directory to reproduce the results from the paper.
Edit the lm object as needed to point to the language model server being used.
Configure the model and evaluate the accuracy and latency of the methods using LOTUS documentation.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase