Skip to main content
PaperAI Ethics & Safetyv1.0

Representation Engineering: A Top-Down Approach to AI Transparency

by Center for AI Safety / UC Berkeley · free · Last verified 2026-03-17

Introduces Representation Engineering (RepE), a top-down approach to AI transparency that identifies and manipulates high-level representations of cognitive phenomena in LLMs. RepE enables reading and controlling truthfulness, honesty, emotion, and power-seeking without fine-tuning, using linear probes over activation differences from contrastive stimulus pairs.

https://arxiv.org/abs/2310.01405
B
BAbove Average
Adoption: B+Quality: A+Freshness: BCitations: B+Engagement: F

Specifications

License
Open Access
Pricing
free
Capabilities
interpretability, representation-reading, model-control, transparency, alignment
Integrations
Use Cases
ai-safety-research, interpretability, model-steering, research
API Available
No
Tags
interpretability, transparency, representation, alignment, control
Added
2026-03-17
Completeness
100%

Index Score

65.2
Adoption
70
Quality
91
Freshness
68
Citations
76
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service