Skip to main content
BenchmarkAI Agentsv1.0

ToolBench

by Qin et al. / Tsinghua University · open-source · Last verified 2026-03-17

ToolBench evaluates LLMs on their ability to use real-world REST APIs to complete user instructions. It provides 16,000+ real APIs from RapidAPI Hub across 49 categories and 12,000+ instruction–API solution pairs, measuring whether models can plan and execute multi-step API call sequences.

https://github.com/OpenBMB/ToolBench
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
evaluation, tool-use, api-integration, agent-planning
Integrations
rapidapi
Use Cases
model-evaluation, ai-agents, tool-augmented-llm
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, toolllama, llama-3-70b
Metrics
pass-rate, win-rate, solvable-pass-rate
Methodology
Instructions require single-tool or multi-tool API call sequences. Models interact with live or cached APIs; solutions are evaluated by ChatGPT preference scoring (win rate) and functional correctness (pass rate). Solvable-pass-rate filters to instructions with valid API solutions.
Last Run
2026-02-14
Tags
tool-use, api, agents, rest, planning
Added
2026-03-17
Completeness
100%

Index Score

67
Adoption
74
Quality
88
Freshness
79
Citations
79
Engagement
0

Explore the full AI ecosystem on Agents as a Service