Skip to main content
PaperAI Ethics & Safetyv1.0

Constitutional AI: Harmlessness from AI Feedback

by Anthropic · free · Last verified 2026-03-17

Introduces Constitutional AI (CAI), a method for training harmless AI assistants using a set of written principles (a 'constitution') to guide both supervised learning and reinforcement learning from AI feedback (RLAIF). CAI enables Anthropic to reduce reliance on human harm labels while maintaining helpfulness and making AI reasoning about harmlessness explicit.

https://arxiv.org/abs/2212.08073
B+
B+Good
Adoption: AQuality: A+Freshness: BCitations: A+Engagement: F

Specifications

License
Open Access
Pricing
free
Capabilities
alignment, harmlessness-training, rlaif, principle-based-feedback
Integrations
Use Cases
ai-alignment, safety-training, research
API Available
No
Tags
alignment, safety, constitutional-ai, rlhf, harmlessness, anthropic
Added
2026-03-17
Completeness
100%

Index Score

74.7
Adoption
84
Quality
93
Freshness
63
Citations
90
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service