PaperAI Ethics & Safetyv1.0

Training Language Models to Follow Instructions with Human Feedback

by OpenAI · free · Last verified 2026-03-17

Presents InstructGPT, which uses Reinforcement Learning from Human Feedback (RLHF) to align GPT-3 with human intent. By fine-tuning on human demonstrations and training a reward model on human preference comparisons, InstructGPT produces outputs that human evaluators prefer to GPT-3 outputs despite having 100× fewer parameters.

https://arxiv.org/abs/2203.02155 ↗

A—Great

Adoption: A+Quality: A+Freshness: BCitations: A+Engagement: F

Specifications

License: Open Access
Pricing: free
Capabilities: instruction-following, alignment, reward-modeling, human-feedback
Integrations
Use Cases: ai-alignment, safety-training, instruction-tuning, research
API Available: No
Tags: rlhf, alignment, instruction-following, human-feedback, openai
Added: 2026-03-17
Completeness: 100%

Index Score

81.8

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service