Tulu V2 Mix
by Allen Institute for AI (AI2) · free · Last verified 2026-03-17
Tulu V2 Mix is a curated 326,000-sample mixture of instruction-tuning datasets from AI2. It blends diverse sources like FLAN, Open Assistant, and Code Alpaca to train the Tulu 2 model family. The dataset serves as a benchmark for analyzing the impact of different data sources on model performance and quality.
https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F
Specifications
- License
- ODC-By 1.0
- Pricing
- free
- Capabilities
- Instruction Following, Supervised Fine-Tuning (SFT), Multi-task Language Understanding, Code Generation and Reasoning, Conversational AI Training, Question Answering, Comparative Data Analysis, Model Benchmarking
- Integrations
- [object Object], [object Object], [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- Yes
- Tags
- instruction-tuning, sft, data-mixture, llm-training, conversational-ai, code-generation, question-answering, benchmark-dataset, ai2, open-source
- Added
- 2026-03-17
- Completeness
- 0.95%
Index Score
63.1Adoption
72
Quality
84
Freshness
78
Citations
70
Engagement
0