Constitutional AI: Harmlessness from AI Feedback
by Anthropic · free · Last verified 2026-03-17
Introduces Constitutional AI (CAI), a method for training harmless AI assistants using a set of written principles (a 'constitution') to guide both supervised learning and reinforcement learning from AI feedback (RLAIF). CAI enables Anthropic to reduce reliance on human harm labels while maintaining helpfulness and making AI reasoning about harmlessness explicit.
https://arxiv.org/abs/2212.08073 ↗B+
B+—Good
Adoption: AQuality: A+Freshness: BCitations: A+Engagement: F
Specifications
- License
- Open Access
- Pricing
- free
- Capabilities
- alignment, harmlessness-training, rlaif, principle-based-feedback
- Integrations
- Use Cases
- ai-alignment, safety-training, research
- API Available
- No
- Tags
- alignment, safety, constitutional-ai, rlhf, harmlessness, anthropic
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
74.7Adoption
84
Quality
93
Freshness
63
Citations
90
Engagement
0
Put AI to work for your business
Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.