Emu3
by BAAI (Beijing Academy of AI) · free · Last verified 2026-03-17
Emu3 is a unified multimodal model from BAAI (Beijing Academy of AI) that handles image understanding, image generation, and text generation within a single next-token prediction framework using discrete tokens for all modalities. It demonstrates that a single autoregressive model can replace separate diffusion and vision-language models for diverse generative tasks.
https://huggingface.co/BAAI/Emu3-Gen ↗D
D—Poor
Adoption: DQuality: B+Freshness: ACitations: CEngagement: F
Specifications
- License
- Apache 2.0
- Pricing
- free
- Capabilities
- text-generation, vision, image-generation, visual-question-answering, image-captioning
- Integrations
- Hugging Face
- Use Cases
- multimodal-generation, image-understanding, image-generation, unified-ai-research
- API Available
- No
- Parameters
- 8B
- Context Window
- 8K
- Modalities
- text, image
- Training Cutoff
- 2024
- Tags
- baai, generalist, vision-language, image-generation, unified, open-source
- Added
- 2026-03-17
- Completeness
- 95%
Index Score
38.6Adoption
30
Quality
78
Freshness
84
Citations
44
Engagement
0