Skip to main content
PaperAI Ethics & Safetyv1.0

Training Language Models to Follow Instructions with Human Feedback

by OpenAI · free · Last verified 2026-03-17

Presents InstructGPT, which uses Reinforcement Learning from Human Feedback (RLHF) to align GPT-3 with human intent. By fine-tuning on human demonstrations and training a reward model on human preference comparisons, InstructGPT produces outputs that human evaluators prefer to GPT-3 outputs despite having 100× fewer parameters.

https://arxiv.org/abs/2203.02155
A
AGreat
Adoption: A+Quality: A+Freshness: BCitations: A+Engagement: F

Specifications

License
Open Access
Pricing
free
Capabilities
instruction-following, alignment, reward-modeling, human-feedback
Integrations
Use Cases
ai-alignment, safety-training, instruction-tuning, research
API Available
No
Tags
rlhf, alignment, instruction-following, human-feedback, openai
Added
2026-03-17
Completeness
100%

Index Score

81.8
Adoption
95
Quality
95
Freshness
60
Citations
99
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service