Tulu V2 Mix
by Allen Institute for AI (AI2) · open-source · Last verified 2026-03-17
Tulu V2 Mix is a curated 326,000-sample mixture of instruction-tuning datasets assembled by AI2 for training the Tulu 2 family of models, blending FLAN, Open Assistant, ShareGPT, GPT-4 Alpaca, Code Alpaca, and other sources. The careful mixing strategy and dataset ablations make Tulu V2 Mix a reference benchmark for understanding the contribution of different instruction data sources to final model quality.
https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F
Specifications
- License
- ODC-By 1.0
- Pricing
- open-source
- Capabilities
- instruction-tuning, sft-training, data-mixing
- Integrations
- huggingface-datasets
- Use Cases
- sft-finetuning, data-mixture-research, open-source-alignment
- API Available
- Yes
- Tags
- instruction-tuning, mixed, sft, diverse, open-source
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
63.1Adoption
72
Quality
84
Freshness
78
Citations
70
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.