

Ollama OCR
Overview :
Ollama-OCR is an OCR tool utilizing the latest visual language models, supported by Ollama, capable of extracting text from images. It supports various output formats, including Markdown, plain text, JSON, structured data, and key-value pairs, and offers batch processing capabilities. This project is available as a Python package and a Streamlit web application, providing convenience for users in various scenarios.
Target Users :
The target audience includes users who need to extract text from images, such as document managers, researchers, and developers. Ollama-OCR is well-suited for them due to its high accuracy and support for multiple output formats, significantly enhancing the efficiency and accuracy of text extraction.
Use Cases
Researchers use Ollama-OCR to extract data from images of academic papers for further analysis.
Businesses utilize Ollama-OCR to process large volumes of customer documents for digital storage and retrieval.
Developers integrate Ollama-OCR into their applications to provide image-to-text conversion functionality.
Features
Supports various visual models, such as LLaVA 7B and Llama 3.2 Vision, to meet different document recognition complexity requirements.
Offers multiple output formats, including Markdown, plain text, JSON, structured data, and key-value pairs to fulfill diverse user needs.
Provides batch processing capabilities to process multiple images in parallel and track the processing status of each image.
Includes image pre-processing features, such as resizing and normalization, to enhance recognition accuracy.
User-friendly Streamlit web application interface that supports drag-and-drop image uploads, real-time processing, and downloading of extracted text.
Supports the extraction of structured data from images, such as tables and organizational data, as well as label information.
How to Use
1. Install Ollama-OCR: Run the command 'pip install ollama-ocr' in your terminal.
2. Pull the required model: Use the command 'ollama pull llama3.2-vision:11b'.
3. Initialize the OCR processor: Import OCRProcessor in your Python code and create an instance, specifying the model name.
4. Process a single image: Call the process_image method, passing in the image path and desired output format.
5. Batch process images: Use the process_batch method, providing the folder path containing images, and set the output format and processing options.
6. View results: Once processing is complete, you can view the extracted text by printing the results or saving them to a file.
7. Run the Streamlit application: Execute 'streamlit run app.py' in the project directory, then view and use the web application interface in your browser.