Datasetinstruction-tuningv1.0

UltraFeedback

by Tsinghua University · open-source · Last verified 2026-03-17

A large-scale, high-quality preference dataset with 64,000 instructions each answered by 4 LLMs and rated by GPT-4 on instruction-following, truthfulness, honesty, and helpfulness. UltraFeedback is the backbone of the Zephyr and Tulu 2 DPO models.

https://huggingface.co/datasets/openbmb/UltraFeedback ↗

B+

B+—Good

Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: reward-model-training, rlhf, dpo-training, preference-learning
Integrations: huggingface-datasets
Use Cases: rlhf, dpo, reward-modeling, alignment-research
API Available: No
Tags: rlhf, preference-data, gpt-4-annotated, reward-model, alignment
Added: 2026-03-17
Completeness: 100%

Index Score

70.2

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service