MedQA
by Jin et al. / UC San Diego · open-source · Last verified 2026-03-17
MedQA tests medical knowledge using free-form multiple-choice questions drawn from the US Medical Licensing Examination (USMLE). It evaluates whether language models can reason through complex clinical scenarios requiring deep biomedical knowledge.
https://github.com/jind11/MedQA ↗B+
B+—Good
Adoption: AQuality: A+Freshness: B+Citations: AEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- evaluation, benchmarking, medical-reasoning
- Integrations
- Use Cases
- model-evaluation, medical-ai, clinical-nlp
- API Available
- No
- Evaluated Models
- gpt-4o, claude-opus-4, gemini-2-5-pro, meditron-70b, llama-3-70b
- Metrics
- accuracy, pass-rate
- Methodology
- Models are presented with four-option multiple-choice clinical vignettes from USMLE Steps 1-3. Accuracy is measured on the 1,273-question test split. No chain-of-thought is required by default.
- Last Run
- 2026-02-15
- Tags
- medical, qa, clinical, multiple-choice, usmle
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
72.8Adoption
82
Quality
90
Freshness
74
Citations
88
Engagement
0