Visual Grounding
by AaaS · open-source · Last verified 2026-03-17
Trains agents to localize specific image regions described by natural language referring expressions, bridging the gap between language and spatial visual understanding. Covers grounding models (Grounding DINO, Grounded SAM), evaluation metrics (R@k, mAP), and integration into tool-use agents for UI automation and document analysis.
https://aaas.blog/skill/visual-grounding ↗D
D—Poor
Adoption: C+Quality: AFreshness: ACitations: FEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- referring-expression-comprehension, region-proposal, open-vocabulary-detection, phrase-grounding, spatial-reasoning
- Integrations
- grounding-dino, grounded-sam, huggingface, roboflow
- Use Cases
- ui-automation, robot-manipulation, document-region-extraction, visual-search
- API Available
- No
- Difficulty
- advanced
- Prerequisites
- object-detection, visual-question-answering
- Supported Agents
- computer-use
- Tags
- grounding, referring-expression, region, vision-language
- Added
- 2026-03-17
- Completeness
- 80%
Index Score
37Adoption
52
Quality
82
Freshness
88
Citations
0
Engagement
0