Visual Grounding
by AaaS · open-source · Last verified 2026-03-17
Trains agents to localize specific image regions described by natural language referring expressions, bridging the gap between language and spatial visual understanding. Covers grounding models (Grounding DINO, Grounded SAM), evaluation metrics (R@k, mAP), and integration into tool-use agents for UI automation and document analysis.
https://aaas.blog/skill/visual-grounding ↗C
C—Below Average
Adoption: C+Quality: AFreshness: ACitations: CEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- referring-expression-comprehension, region-proposal, open-vocabulary-detection, phrase-grounding, spatial-reasoning
- Integrations
- grounding-dino, grounded-sam, huggingface, roboflow
- Use Cases
- ui-automation, robot-manipulation, document-region-extraction, visual-search
- API Available
- No
- Difficulty
- advanced
- Prerequisites
- object-detection, visual-question-answering
- Supported Agents
- computer-use
- Tags
- grounding, referring-expression, region, vision-language
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
49.2Adoption
52
Quality
82
Freshness
88
Citations
48
Engagement
0