Unstructured
by Unstructured.io · freemium · Last verified 2026-03-17
Open-source library and platform for preprocessing unstructured data for LLM applications. Extracts and transforms content from PDFs, images, HTML, and Office documents into structured, LLM-ready formats.
https://unstructured.io ↗B
B—Above Average
Adoption: B+Quality: AFreshness: ACitations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- freemium
- Capabilities
- document-parsing, ocr, table-extraction, chunking, multi-format-support
- Integrations
- langchain, llamaindex, pinecone, weaviate
- Use Cases
- document-ingestion, rag-preprocessing, data-extraction, enterprise-search
- API Available
- Yes
- SDK Languages
- python
- Deployment
- self-hosted, cloud-api, docker
- Rate Limits
- Free tier: 1K pages/month; paid plans scale
- Data Privacy
- SOC 2 Type II; self-hosted option for data control
- Tags
- document-processing, etl, parsing, rag-ingestion
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
61.45Adoption
72
Quality
82
Freshness
85
Citations
65
Engagement
0