BenchmarkLLMsv12b

BioASQ

by Tsatsaronis et al. / BioASQ Challenge · free · Last verified 2026-03-17

BioASQ is a large-scale benchmark for biomedical semantic question answering. It challenges systems to perform document retrieval, concept mapping, and answer extraction from PubMed literature. The benchmark includes diverse question types like yes/no, factoid, list, and summary, with gold-standard answers curated by experts.

http://bioasq.org ↗

B—Above Average

Adoption: B+Quality: AFreshness: ACitations: AEngagement: F

Specifications

License: CC BY 2.5
Pricing: free
Capabilities: biomedical-question-answering, information-retrieval, document-retrieval, answer-summarization, factoid-question-answering, yes-no-question-answering, list-question-answering, semantic-search, model-evaluation, scientific-reasoning
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object], [object Object]
API Available: No
Evaluated Models: gpt-4o, claude-opus-4, biogpt, llama-3-70b
Metrics: exact-match, mean-average-precision, f-measure
Methodology: Annual challenge with Phase A (document/snippet retrieval) and Phase B (answer generation). Expert biomedical curators create questions; gold snippets and answers used for evaluation. Factoid EM, list F-measure, and yes/no accuracy reported per type.
Last Run: 2026-03-05
Tags: biomedical, question-answering, information-retrieval, pubmed, expert-annotated, benchmark, nlp, semantic-search, scientific-reasoning, dataset, text-summarization
Added: 2026-03-17
Completeness: 0.9%

Index Score

67.7

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service