Skip to main content
Datasetalignmentv1.0

Deita 6K

by HKUST / Community · open-source · Last verified 2026-03-17

Deita 6K is an ultra-compact, high-quality instruction-tuning dataset of 6,000 carefully selected samples produced by the Data-Efficient Instruction Tuning for Alignment (DEITA) framework, which scores and filters instruction data by complexity and quality using LLM judges. Despite its small size, models trained on Deita 6K match or outperform those trained on datasets 10-100x larger, demonstrating the power of principled data selection over scale.

https://huggingface.co/datasets/hkust-nlp/deita-6k-v0
C+
C+Average
Adoption: BQuality: AFreshness: ACitations: BEngagement: F

Specifications

License
Apache 2.0
Pricing
open-source
Capabilities
instruction-tuning, data-efficient-sft, quality-scoring
Integrations
huggingface-datasets
Use Cases
efficient-sft-training, data-selection-research, quality-filtering
API Available
Yes
Tags
instruction-tuning, data-selection, quality-filtering, sft, efficient
Added
2026-03-17
Completeness
100%

Index Score

58.6
Adoption
62
Quality
88
Freshness
80
Citations
65
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service