Skip to main content
ScriptComputer Visionv1.3

OCR Pipeline Script

by Community · open-source · Last verified 2026-03-17

Multi-engine OCR pipeline that routes documents to Tesseract, PaddleOCR, or a cloud OCR API based on image quality heuristics. Outputs structured JSON with bounding boxes, confidence scores, and reading-order-sorted text blocks ready for downstream NLP.

https://github.com/PaddlePaddle/PaddleOCR
B
BAbove Average
Adoption: AQuality: B+Freshness: ACitations: C+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
multi-engine-routing, bounding-box-output, reading-order-sort, confidence-scoring
Integrations
paddleocr, tesseract, opencv, google-cloud-vision
Use Cases
invoice-processing, id-document-extraction, historical-document-digitization
API Available
No
Language
python
Dependencies
paddlepaddle, paddleocr, pytesseract, opencv-python, pillow
Environment
Python 3.9+
Est. Runtime
1-10 minutes
Tags
ocr, text-extraction, document-ai, tesseract, paddleocr
Added
2026-03-17
Completeness
100%

Index Score

62.1
Adoption
80
Quality
78
Freshness
82
Citations
58
Engagement
0

Explore the full AI ecosystem on Agents as a Service