Skip to main content
Datasetsyntheticv2.0

OpenMathInstruct

by NVIDIA · open-source · Last verified 2026-03-17

OpenMathInstruct is a large-scale synthetic mathematics instruction dataset produced by NVIDIA, containing over 1.8 million math problem-solution pairs covering arithmetic, algebra, geometry, calculus, and competition mathematics. Solutions are generated by Mixtral models and verified for correctness to provide reliable step-by-step reasoning chains for training math-capable language models.

https://huggingface.co/datasets/nvidia/OpenMathInstruct-2
B
BAbove Average
Adoption: B+Quality: AFreshness: ACitations: B+Engagement: F

Specifications

License
CC BY 4.0
Pricing
open-source
Capabilities
math-reasoning, instruction-tuning, chain-of-thought
Integrations
huggingface-datasets
Use Cases
math-model-training, reasoning-finetuning, competition-math
API Available
Yes
Tags
synthetic, math, instruction-tuning, reasoning, step-by-step
Added
2026-03-17
Completeness
100%

Index Score

64.2
Adoption
73
Quality
85
Freshness
88
Citations
72
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service