Skip to main content
Datasetscientificv20230827

Semantic Scholar ORC

by Allen Institute for AI (AI2) · open-source · Last verified 2026-03-17

The Semantic Scholar Open Research Corpus (S2ORC) is a large English-language corpus of 136 million academic papers with structured metadata, abstracts, citation graphs, and full-text body paragraphs where licensing allows. Maintained by the Allen Institute for AI, it covers 19 scientific fields and is widely used for scientific NLP tasks including citation prediction, claim verification, and scientific QA.

https://api.semanticscholar.org/api-docs/graph
B+
B+Good
Adoption: AQuality: AFreshness: ACitations: AEngagement: F

Specifications

License
ODC-By 1.0
Pricing
open-source
Capabilities
citation-graph-search, full-text-retrieval, scientific-nlp
Integrations
huggingface-datasets, elasticsearch
Use Cases
scientific-lm-pretraining, citation-prediction, fact-verification
API Available
Yes
Tags
scientific-papers, open-research, full-text, citations, nlp
Added
2026-03-17
Completeness
100%

Index Score

71.7
Adoption
82
Quality
88
Freshness
85
Citations
85
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service