Fuyu-8B
by Adept AI · open-source · Last verified 2026-03-17
Adept AI's multimodal model with a radically simplified architecture that feeds image patches directly to the transformer without a separate vision encoder. Designed for digital agent use cases like UI understanding and screen parsing.
https://huggingface.co/adept/fuyu-8b ↗D
D—Poor
Adoption: DQuality: C+Freshness: CCitations: DEngagement: F
Specifications
- License
- CC-BY-NC-4.0
- Pricing
- open-source
- Capabilities
- image-understanding, ui-understanding, screen-parsing, visual-qa, chart-reading
- Integrations
- huggingface, transformers
- Use Cases
- ui-automation, screen-understanding, digital-agent-vision, chart-analysis
- API Available
- No
- Parameters
- 8B
- Context Window
- 16K tokens
- Modalities
- text, image
- Training Cutoff
- Mid 2023
- Tags
- multimodal, vision, open-source, adept-ai, simplified-architecture
- Added
- 2026-03-17
- Completeness
- 82%
Index Score
33.5Adoption
32
Quality
56
Freshness
40
Citations
38
Engagement
0