Skip to main content
SkillComputer Visionv1.0

Visual Grounding

by AaaS · open-source · Last verified 2026-03-17

Trains agents to localize specific image regions described by natural language referring expressions, bridging the gap between language and spatial visual understanding. Covers grounding models (Grounding DINO, Grounded SAM), evaluation metrics (R@k, mAP), and integration into tool-use agents for UI automation and document analysis.

https://aaas.blog/skill/visual-grounding
C
CBelow Average
Adoption: C+Quality: AFreshness: ACitations: CEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
referring-expression-comprehension, region-proposal, open-vocabulary-detection, phrase-grounding, spatial-reasoning
Integrations
grounding-dino, grounded-sam, huggingface, roboflow
Use Cases
ui-automation, robot-manipulation, document-region-extraction, visual-search
API Available
No
Difficulty
advanced
Prerequisites
object-detection, visual-question-answering
Supported Agents
computer-use
Tags
grounding, referring-expression, region, vision-language
Added
2026-03-17
Completeness
100%

Index Score

49.2
Adoption
52
Quality
82
Freshness
88
Citations
48
Engagement
0

Explore the full AI ecosystem on Agents as a Service