Skip to main content
Modelmultimodalv3.0

Idefics 3

by Hugging Face · free · Last verified 2026-03-17

Idefics 3 is Hugging Face's third-generation open vision-language model, built on Llama 3 with a custom SigLIP vision encoder and a novel image splitting strategy called Anyres for handling high-resolution inputs. It excels at document understanding, OCR tasks, and visual question answering while maintaining a fully open and reproducible training pipeline.

https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3
C
CBelow Average
Adoption: C+Quality: AFreshness: ACitations: C+Engagement: F

Specifications

License
Apache 2.0
Pricing
free
Capabilities
vision, visual-question-answering, document-understanding, ocr, image-captioning
Integrations
Hugging Face, Transformers
Use Cases
document-analysis, visual-qa, ocr, image-captioning, multimodal-research
API Available
Yes
Parameters
8B
Context Window
128K
Modalities
text, image
Training Cutoff
2024
Tags
hugging-face, open-source, vision-language, document-understanding, ocr
Added
2026-03-17
Completeness
100%

Index Score

49
Adoption
50
Quality
80
Freshness
86
Citations
52
Engagement
0

Explore the full AI ecosystem on Agents as a Service