Deita 6K
by HKUST / Community · open-source · Last verified 2026-03-17
Deita 6K is an ultra-compact, high-quality instruction-tuning dataset of 6,000 carefully selected samples produced by the Data-Efficient Instruction Tuning for Alignment (DEITA) framework, which scores and filters instruction data by complexity and quality using LLM judges. Despite its small size, models trained on Deita 6K match or outperform those trained on datasets 10-100x larger, demonstrating the power of principled data selection over scale.
https://huggingface.co/datasets/hkust-nlp/deita-6k-v0 ↗C+
C+—Average
Adoption: BQuality: AFreshness: ACitations: BEngagement: F
Specifications
- License
- Apache 2.0
- Pricing
- open-source
- Capabilities
- instruction-tuning, data-efficient-sft, quality-scoring
- Integrations
- huggingface-datasets
- Use Cases
- efficient-sft-training, data-selection-research, quality-filtering
- API Available
- Yes
- Tags
- instruction-tuning, data-selection, quality-filtering, sft, efficient
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
58.6Adoption
62
Quality
88
Freshness
80
Citations
65
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.