Training Language Models to Follow Instructions with Human Feedback
by OpenAI · free · Last verified 2026-03-17
Presents InstructGPT, which uses Reinforcement Learning from Human Feedback (RLHF) to align GPT-3 with human intent. By fine-tuning on human demonstrations and training a reward model on human preference comparisons, InstructGPT produces outputs that human evaluators prefer to GPT-3 outputs despite having 100× fewer parameters.
https://arxiv.org/abs/2203.02155 ↗A
A—Great
Adoption: A+Quality: A+Freshness: BCitations: A+Engagement: F
Specifications
- License
- Open Access
- Pricing
- free
- Capabilities
- instruction-following, alignment, reward-modeling, human-feedback
- Integrations
- Use Cases
- ai-alignment, safety-training, instruction-tuning, research
- API Available
- No
- Tags
- rlhf, alignment, instruction-following, human-feedback, openai
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
81.8Adoption
95
Quality
95
Freshness
60
Citations
99
Engagement
0
Put AI to work for your business
Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.