PaperAI Ethics & Safetyv1.0

Constitutional AI: Harmlessness from AI Feedback

by Anthropic · free · Last verified 2026-03-17

Introduces Constitutional AI (CAI), a method for training harmless AI assistants using a set of written principles (a 'constitution') to guide both supervised learning and reinforcement learning from AI feedback (RLAIF). CAI enables Anthropic to reduce reliance on human harm labels while maintaining helpfulness and making AI reasoning about harmlessness explicit.

https://arxiv.org/abs/2212.08073 ↗

C+

C+—Average

Adoption: AQuality: A+Freshness: BCitations: FEngagement: F

Specifications

License: Open Access
Pricing: free
Capabilities: alignment, harmlessness-training, rlaif, principle-based-feedback
Integrations
Use Cases: ai-alignment, safety-training, research
API Available: No
Tags: alignment, safety, constitutional-ai, rlhf, harmlessness, anthropic
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service