gmft
G
Gmft
Overview :
gmft is a toolkit designed to convert tables in PDFs into various formats. It is lightweight, modular, and delivers exceptional performance. gmft relies on Microsoft's Table Transformers, recognized as one of the best-performing and most reliable solutions among many alternatives. It operates without the need for a GPU, offering high throughput and easy installation, requiring only a single line of code. It utilizes PyPDFium2, favored for its high throughput and permissive licensing. The training model TATR used by gmft is trained on the diverse dataset PubTables-1M, ensuring high reliability.
Target Users :
The target audience for gmft includes data analysts, researchers, and anyone who needs to extract tabular data from PDF documents. Due to its lightweight and high-performance characteristics, gmft is especially suitable for those who need to handle large volumes of PDF files and convert data quickly.
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 65.1K
Use Cases
Data analysts use gmft to extract data from research reports for further analysis.
Researchers utilize gmft to extract experimental data from academic papers.
Business users automate the process of extracting table data from contract documents using gmft.
Features
Supports converting PDF tables into various formats including Pandas DataFrame.
Can output lists of table text and positions.
Supports outputting cropped images of tables.
Extracts table titles.
Fast table extraction without OCR, suitable for image and scanned PDFs.
High-throughput PDF processing via PyPDFium2.
Highly configurable, supporting custom models and extraction methods.
How to Use
Install gmft: Enter `pip install gmft` in the command line.
Import necessary modules: Import `CroppedTable`, `TableDetector`, `AutoTableFormatter`, etc., in your Python script.
Create PyPDFium2Document object: Create a document object using the file path of the PDF containing the tables to be extracted.
Use TableDetector for table detection: Traverse each page of the document and use the detector to extract tables.
Use AutoTableFormatter to format tables: Process the detected tables for formatting.
Convert the extracted table data into the desired format: For example, convert it to a Pandas DataFrame or other supported formats.
Close the document object: Call the close method of the document object to release resources after extraction.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase