Modelmultimodalv3.0

Idefics 3

by Hugging Face · free · Last verified 2026-03-17

Idefics 3 is Hugging Face's third-generation open vision-language model, built on Llama 3 with a custom SigLIP vision encoder and a novel image splitting strategy called Anyres for handling high-resolution inputs. It excels at document understanding, OCR tasks, and visual question answering while maintaining a fully open and reproducible training pipeline.

https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3 ↗

C—Below Average

Adoption: C+Quality: AFreshness: ACitations: C+Engagement: F

Specifications

License: Apache 2.0
Pricing: free
Capabilities: vision, visual-question-answering, document-understanding, ocr, image-captioning
Integrations: Hugging Face, Transformers
Use Cases: document-analysis, visual-qa, ocr, image-captioning, multimodal-research
API Available: Yes
Parameters: 8B
Context Window: 128K
Modalities: text, image
Training Cutoff: 2024
Tags: hugging-face, open-source, vision-language, document-understanding, ocr
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service