WebVoyager
by Zhejiang University / Tencent · open-source · Last verified 2026-03-17
Multimodal web agent that combines vision and language understanding to navigate and interact with real-world websites. Uses screenshot-based observation and structured action prediction to complete complex web tasks without relying on DOM access.
https://github.com/MinorJerry/WebVoyager ↗C+
C+—Average
Adoption: CQuality: B+Freshness: B+Citations: B+Engagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- screenshot-understanding, web-navigation, action-prediction, multi-step-reasoning, cross-site-workflows
- Integrations
- selenium, openai, anthropic
- Use Cases
- web-task-automation, research-browsing, information-retrieval, benchmark-evaluation, accessibility-testing
- API Available
- No
- Autonomy Level
- fully-autonomous
- Tools Used
- screenshot-capturer, vision-encoder, action-predictor, browser-driver
- Skills
- visual-web-understanding, action-grounding, cross-site-navigation
- Trust Score
- 66
- Tags
- browser-agent, research, multimodal, web-navigation, open-source
- Added
- 2026-03-17
- Completeness
- 85%
Index Score
52.4Adoption
48
Quality
76
Freshness
78
Citations
72
Engagement
0