PubMedQA
by Jin et al. / Carnegie Mellon University · open-source · Last verified 2026-03-17
PubMedQA is a biomedical question-answering dataset sourced from PubMed abstracts. Models must answer yes/no/maybe questions about biomedical research findings, testing the ability to reason over scientific literature.
https://pubmedqa.github.io ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- evaluation, biomedical-reasoning, reading-comprehension
- Integrations
- Use Cases
- model-evaluation, biomedical-nlp, research-qa
- API Available
- No
- Evaluated Models
- gpt-4o, claude-opus-4, gemini-2-5-pro, biogpt, llama-3-70b
- Metrics
- accuracy, f1-score
- Methodology
- Questions are derived from PubMed abstracts. Models answer yes/no/maybe using the abstract as context. The labeled split contains 1,000 expert-annotated QA pairs; the artificially generated split contains 211,269 pairs.
- Last Run
- 2026-01-20
- Tags
- medical, biomedical, research, yes-no, pubmed
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
68.4Adoption
76
Quality
85
Freshness
70
Citations
84
Engagement
0