Skip to main content
Datasetalignmentv2.0

Tulu V2 Mix

by Allen Institute for AI (AI2) · open-source · Last verified 2026-03-17

Tulu V2 Mix is a curated 326,000-sample mixture of instruction-tuning datasets assembled by AI2 for training the Tulu 2 family of models, blending FLAN, Open Assistant, ShareGPT, GPT-4 Alpaca, Code Alpaca, and other sources. The careful mixing strategy and dataset ablations make Tulu V2 Mix a reference benchmark for understanding the contribution of different instruction data sources to final model quality.

https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F

Specifications

License
ODC-By 1.0
Pricing
open-source
Capabilities
instruction-tuning, sft-training, data-mixing
Integrations
huggingface-datasets
Use Cases
sft-finetuning, data-mixture-research, open-source-alignment
API Available
Yes
Tags
instruction-tuning, mixed, sft, diverse, open-source
Added
2026-03-17
Completeness
100%

Index Score

63.1
Adoption
72
Quality
84
Freshness
78
Citations
70
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service