Skip to main content
BenchmarkLLMsv1.0

PubMedQA

by Jin et al. / Carnegie Mellon University · open-source · Last verified 2026-03-17

PubMedQA is a biomedical question-answering dataset sourced from PubMed abstracts. Models must answer yes/no/maybe questions about biomedical research findings, testing the ability to reason over scientific literature.

https://pubmedqa.github.io
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
evaluation, biomedical-reasoning, reading-comprehension
Integrations
Use Cases
model-evaluation, biomedical-nlp, research-qa
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, gemini-2-5-pro, biogpt, llama-3-70b
Metrics
accuracy, f1-score
Methodology
Questions are derived from PubMed abstracts. Models answer yes/no/maybe using the abstract as context. The labeled split contains 1,000 expert-annotated QA pairs; the artificially generated split contains 211,269 pairs.
Last Run
2026-01-20
Tags
medical, biomedical, research, yes-no, pubmed
Added
2026-03-17
Completeness
100%

Index Score

68.4
Adoption
76
Quality
85
Freshness
70
Citations
84
Engagement
0

Explore the full AI ecosystem on Agents as a Service