Idefics 3
by Hugging Face · free · Last verified 2026-03-17
Idefics 3 is Hugging Face's third-generation open vision-language model, built on Llama 3 with a custom SigLIP vision encoder and a novel image splitting strategy called Anyres for handling high-resolution inputs. It excels at document understanding, OCR tasks, and visual question answering while maintaining a fully open and reproducible training pipeline.
https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3 ↗C
C—Below Average
Adoption: C+Quality: AFreshness: ACitations: C+Engagement: F
Specifications
- License
- Apache 2.0
- Pricing
- free
- Capabilities
- vision, visual-question-answering, document-understanding, ocr, image-captioning
- Integrations
- Hugging Face, Transformers
- Use Cases
- document-analysis, visual-qa, ocr, image-captioning, multimodal-research
- API Available
- Yes
- Parameters
- 8B
- Context Window
- 128K
- Modalities
- text, image
- Training Cutoff
- 2024
- Tags
- hugging-face, open-source, vision-language, document-understanding, ocr
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
49Adoption
50
Quality
80
Freshness
86
Citations
52
Engagement
0