BenchmarkLLMsv1.0

MedQA

by Jin et al. / UC San Diego · open-source · Last verified 2026-03-17

MedQA tests medical knowledge using free-form multiple-choice questions drawn from the US Medical Licensing Examination (USMLE). It evaluates whether language models can reason through complex clinical scenarios requiring deep biomedical knowledge.

https://github.com/jind11/MedQA ↗

B+

B+—Good

Adoption: AQuality: A+Freshness: B+Citations: AEngagement: F

Specifications

License: Apache-2.0
Pricing: open-source
Capabilities: evaluation, benchmarking, medical-reasoning
Integrations
Use Cases: model-evaluation, medical-ai, clinical-nlp
API Available: No
Evaluated Models: gpt-4o, claude-opus-4, gemini-2-5-pro, meditron-70b, llama-3-70b
Metrics: accuracy, pass-rate
Methodology: Models are presented with four-option multiple-choice clinical vignettes from USMLE Steps 1-3. Accuracy is measured on the 1,273-question test split. No chain-of-thought is required by default.
Last Run: 2026-02-15
Tags: medical, qa, clinical, multiple-choice, usmle
Added: 2026-03-17
Completeness: 100%

Index Score

72.8

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service