Skip to main content
BenchmarkAI Agentsv1.0

AgentBoard

by Ma et al. / Shanghai AI Lab · open-source · Last verified 2026-03-17

AgentBoard is a comprehensive evaluation framework for LLM-based agents covering nine diverse tasks: embodied AI, gaming, web browsing, tool use, database operations, OS interaction, code execution, puzzle solving, and creative writing. It measures both final task success and partial progress via a fine-grained sub-goal metric.

https://hkust-nlp.github.io/agentboard/
B
BAbove Average
Adoption: BQuality: AFreshness: ACitations: B+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
evaluation, agent-evaluation, multi-task-agent
Integrations
Use Cases
model-evaluation, ai-agents, autonomous-agents
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, gemini-2-5-pro, llama-3-70b
Metrics
success-rate, progress-rate
Methodology
Nine task environments; each task has multiple sub-goals. Success rate = fraction of complete task resolutions. Progress rate = average fraction of sub-goals completed, enabling partial-credit evaluation that discriminates between agent capability levels.
Last Run
2026-02-22
Tags
agents, multi-task, web, games, tool-use, evaluation
Added
2026-03-17
Completeness
100%

Index Score

61.1
Adoption
65
Quality
88
Freshness
82
Citations
70
Engagement
0

Explore the full AI ecosystem on Agents as a Service