Skip to main content
SkillComputer Visionv1.0

Visual Grounding

by AaaS · open-source · Last verified 2026-03-17

Trains agents to localize specific image regions described by natural language referring expressions, bridging the gap between language and spatial visual understanding. Covers grounding models (Grounding DINO, Grounded SAM), evaluation metrics (R@k, mAP), and integration into tool-use agents for UI automation and document analysis.

https://aaas.blog/skill/visual-grounding
D
DPoor
Adoption: C+Quality: AFreshness: ACitations: FEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
referring-expression-comprehension, region-proposal, open-vocabulary-detection, phrase-grounding, spatial-reasoning
Integrations
grounding-dino, grounded-sam, huggingface, roboflow
Use Cases
ui-automation, robot-manipulation, document-region-extraction, visual-search
API Available
No
Difficulty
advanced
Prerequisites
object-detection, visual-question-answering
Supported Agents
computer-use
Tags
grounding, referring-expression, region, vision-language
Added
2026-03-17
Completeness
80%

Index Score

37
Adoption
52
Quality
82
Freshness
88
Citations
0
Engagement
0

Ready to add this skill to your workflow?

Start Building

Explore the full AI ecosystem on Agents as a Service