Skip to main content
ModelLLMsv1.0

Fuyu-8B

by Adept AI · open-source · Last verified 2026-03-17

Adept AI's multimodal model with a radically simplified architecture that feeds image patches directly to the transformer without a separate vision encoder. Designed for digital agent use cases like UI understanding and screen parsing.

https://huggingface.co/adept/fuyu-8b
D
DPoor
Adoption: DQuality: C+Freshness: CCitations: DEngagement: F

Specifications

License
CC-BY-NC-4.0
Pricing
open-source
Capabilities
image-understanding, ui-understanding, screen-parsing, visual-qa, chart-reading
Integrations
huggingface, transformers
Use Cases
ui-automation, screen-understanding, digital-agent-vision, chart-analysis
API Available
No
Parameters
8B
Context Window
16K tokens
Modalities
text, image
Training Cutoff
Mid 2023
Tags
multimodal, vision, open-source, adept-ai, simplified-architecture
Added
2026-03-17
Completeness
82%

Index Score

33.5
Adoption
32
Quality
56
Freshness
40
Citations
38
Engagement
0

Explore the full AI ecosystem on Agents as a Service