Datasetalignmentv1.0

Deita 6K

by HKUST / Community · open-source · Last verified 2026-03-17

Deita 6K is an ultra-compact, high-quality instruction-tuning dataset of 6,000 carefully selected samples produced by the Data-Efficient Instruction Tuning for Alignment (DEITA) framework, which scores and filters instruction data by complexity and quality using LLM judges. Despite its small size, models trained on Deita 6K match or outperform those trained on datasets 10-100x larger, demonstrating the power of principled data selection over scale.

https://huggingface.co/datasets/hkust-nlp/deita-6k-v0 ↗

C+

C+—Average

Adoption: BQuality: AFreshness: ACitations: BEngagement: F

Specifications

License: Apache 2.0
Pricing: open-source
Capabilities: instruction-tuning, data-efficient-sft, quality-scoring
Integrations: huggingface-datasets
Use Cases: efficient-sft-training, data-selection-research, quality-filtering
API Available: Yes
Tags: instruction-tuning, data-selection, quality-filtering, sft, efficient
Added: 2026-03-17
Completeness: 100%

Index Score

58.6

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service