Skip to main content
BenchmarkLLMsv2.0

Chatbot Arena

by LMSYS · open-source · Last verified 2026-03-01

Crowdsourced platform where users chat with two anonymous models side-by-side and vote for the better response. Produces Elo ratings reflecting real-world human preferences across open-ended conversation, instruction following, and creative tasks.

https://chat.lmsys.org
B+
B+Good
Adoption: A+Quality: A+Freshness: A+Citations: A+Engagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
model-evaluation, human-preference-testing, elo-ranking
Integrations
Use Cases
model-ranking, human-preference-evaluation, chat-quality-assessment
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
Metrics
elo-rating, win-rate, confidence-interval
Methodology
Blind A/B testing with crowdsourced human judges. Users chat with two anonymous models and vote for the preferred response. Elo ratings computed from pairwise comparisons.
Last Run
2026-03-15
Tags
benchmark, evaluation, chat, elo, human-preference
Added
2026-03-17
Completeness
100%

Index Score

78.6
Adoption
94
Quality
90
Freshness
94
Citations
92
Engagement
0

Explore the full AI ecosystem on Agents as a Service