Semantic Scholar ORC
by Allen Institute for AI (AI2) · open-source · Last verified 2026-03-17
The Semantic Scholar Open Research Corpus (S2ORC) is a large English-language corpus of 136 million academic papers with structured metadata, abstracts, citation graphs, and full-text body paragraphs where licensing allows. Maintained by the Allen Institute for AI, it covers 19 scientific fields and is widely used for scientific NLP tasks including citation prediction, claim verification, and scientific QA.
https://api.semanticscholar.org/api-docs/graph ↗B+
B+—Good
Adoption: AQuality: AFreshness: ACitations: AEngagement: F
Specifications
- License
- ODC-By 1.0
- Pricing
- open-source
- Capabilities
- citation-graph-search, full-text-retrieval, scientific-nlp
- Integrations
- huggingface-datasets, elasticsearch
- Use Cases
- scientific-lm-pretraining, citation-prediction, fact-verification
- API Available
- Yes
- Tags
- scientific-papers, open-research, full-text, citations, nlp
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
71.7Adoption
82
Quality
88
Freshness
85
Citations
85
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.