MathPile
M
Mathpile
Overview :
MathPile is a mathematics-centric corpus containing approximately 9.5 billion tokens. It draws mathematical content from textbooks (including lecture notes), arXiv, Wikipedia, ProofWiki, StackExchange, and web pages. It is suitable for K-12, university, graduate-level, and math competition applications. MathPile boasts high data quality and comprehensive data documentation to enhance transparency and provide users with flexible data utilization capabilities. MathPile adheres to the BY-NC-SA 4.0 license and plans to release a commercially available version soon.
Target Users :
Used to build foundational math models and enhance mathematical reasoning abilities.
Total Visits: 1.8K
Top Region: US(77.23%)
Website Views : 56.0K
Use Cases
Research and development for university mathematics courses
Training of middle school mathematics competition models
Building language models to reason about mathematical problems
Features
A mathematics-centric corpus containing approximately 9.5 billion tokens
Mathematical content suitable for K-12, university, graduate-level, and math competition applications
High data quality with comprehensive data documentation
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase