BenchmarkLLMsv1.0

PubMedQA

by Jin et al. / Carnegie Mellon University · open-source · Last verified 2026-03-17

PubMedQA is a biomedical question-answering dataset sourced from PubMed abstracts. Models must answer yes/no/maybe questions about biomedical research findings, testing the ability to reason over scientific literature.

https://pubmedqa.github.io ↗

C—Below Average

Adoption: B+Quality: AFreshness: B+Citations: FEngagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: evaluation, biomedical-reasoning, reading-comprehension
Integrations
Use Cases: model-evaluation, biomedical-nlp, research-qa
API Available: No
Evaluated Models: gpt-4o, claude-opus-4, gemini-2-5-pro, biogpt, llama-3-70b
Metrics: accuracy, f1-score
Methodology: Questions are derived from PubMed abstracts. Models answer yes/no/maybe using the abstract as context. The labeled split contains 1,000 expert-annotated QA pairs; the artificially generated split contains 211,269 pairs.
Last Run: 2026-01-20
Tags: medical, biomedical, research, yes-no, pubmed
Added: 2026-03-17
Completeness: 80%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service