ModelLLMsv1.0

Fuyu-8B

by Adept AI · open-source · Last verified 2026-03-17

Adept AI's multimodal model with a radically simplified architecture that feeds image patches directly to the transformer without a separate vision encoder. Designed for digital agent use cases like UI understanding and screen parsing.

https://huggingface.co/adept/fuyu-8b ↗

D—Poor

Adoption: DQuality: C+Freshness: CCitations: DEngagement: F

Specifications

License: CC-BY-NC-4.0
Pricing: open-source
Capabilities: image-understanding, ui-understanding, screen-parsing, visual-qa, chart-reading
Integrations: huggingface, transformers
Use Cases: ui-automation, screen-understanding, digital-agent-vision, chart-analysis
API Available: No
Parameters: 8B
Context Window: 16K tokens
Modalities: text, image
Training Cutoff: Mid 2023
Tags: multimodal, vision, open-source, adept-ai, simplified-architecture
Added: 2026-03-17
Completeness: 82%

Index Score

33.5

Adoption

Quality

Freshness

Citations

Engagement

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service