Skip to main content
Papertrainingv1.0

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

by OpenAI · free · Last verified 2026-03-17

Introduces InstructGPT, fine-tuning GPT-3 with Reinforcement Learning from Human Feedback (RLHF) to follow instructions. A 1.3B InstructGPT model is preferred over 175B GPT-3 by human labelers, establishing RLHF as the dominant alignment technique.

https://arxiv.org/abs/2203.02155
B+
B+Good
Adoption: A+Quality: A+Freshness: C+Citations: AEngagement: F

Specifications

License
Open Access
Pricing
free
Capabilities
instruction-following, alignment, human-preference-learning
Integrations
Use Cases
instruction-following, alignment, safe-language-modeling
API Available
No
Tags
rlhf, instructgpt, alignment, human-feedback, ppo, instruction-following
Added
2026-03-17
Completeness
100%

Index Score

77
Adoption
90
Quality
95
Freshness
58
Citations
88
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service