Skip to main content
brand
context
industry
strategy
AaaS
Datasetalignmentv1.0

Orca DPO Pairs

by Intel Labs / Community · free · Last verified 2026-03-17

Orca DPO Pairs is a synthetic dataset containing 12,000 instruction-following examples. Each example includes a prompt, a high-quality response from GPT-4 (chosen), and a lower-quality response from GPT-3.5 (rejected). It is designed for efficiently aligning language models using Direct Preference Optimization (DPO) without a reward model.

https://huggingface.co/datasets/Intel/orca_dpo_pairs
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: BEngagement: F

Specifications

License
MIT
Pricing
free
Capabilities
Direct Preference Optimization (DPO) Training, Reward-Free Reinforcement Learning from Human Feedback (RLHF), Instruction Following Alignment, Model Preference Learning, Comparative Data Training, Safety and Helpfulness Fine-Tuning, Style and Tone Alignment
Integrations
[object Object], [object Object], [object Object], [object Object]
Use Cases
[object Object], [object Object], [object Object], [object Object], [object Object]
API Available
Yes
Tags
dpo, preference, alignment, synthetic, rlhf, instruction-tuning, llm-training, comparative-data, fine-tuning, chatbot-training
Added
2026-03-17
Completeness
0.85%

Index Score

60.2
Adoption
70
Quality
80
Freshness
76
Citations
65
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service