Papertrainingv1.0

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

by OpenAI · free · Last verified 2026-03-17

Introduces InstructGPT, fine-tuning GPT-3 with Reinforcement Learning from Human Feedback (RLHF) to follow instructions. A 1.3B InstructGPT model is preferred over 175B GPT-3 by human labelers, establishing RLHF as the dominant alignment technique.

https://arxiv.org/abs/2203.02155 ↗

B+

B+—Good

Adoption: A+Quality: A+Freshness: C+Citations: AEngagement: F

Specifications

License: Open Access
Pricing: free
Capabilities: instruction-following, alignment, human-preference-learning
Integrations
Use Cases: instruction-following, alignment, safe-language-modeling
API Available: No
Tags: rlhf, instructgpt, alignment, human-feedback, ppo, instruction-following
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service