Skip to main content
Modelmultimodalv3.0

Emu3

by BAAI (Beijing Academy of AI) · free · Last verified 2026-03-17

Emu3 is a unified multimodal model from BAAI (Beijing Academy of AI) that handles image understanding, image generation, and text generation within a single next-token prediction framework using discrete tokens for all modalities. It demonstrates that a single autoregressive model can replace separate diffusion and vision-language models for diverse generative tasks.

https://huggingface.co/BAAI/Emu3-Gen
D
DPoor
Adoption: DQuality: B+Freshness: ACitations: CEngagement: F

Specifications

License
Apache 2.0
Pricing
free
Capabilities
text-generation, vision, image-generation, visual-question-answering, image-captioning
Integrations
Hugging Face
Use Cases
multimodal-generation, image-understanding, image-generation, unified-ai-research
API Available
No
Parameters
8B
Context Window
8K
Modalities
text, image
Training Cutoff
2024
Tags
baai, generalist, vision-language, image-generation, unified, open-source
Added
2026-03-17
Completeness
95%

Index Score

38.6
Adoption
30
Quality
78
Freshness
84
Citations
44
Engagement
0

Explore the full AI ecosystem on Agents as a Service