Model Interpretability
by AaaS · open-source · Last verified 2026-03-17
Provides a systematic framework for understanding the internal representations, circuits, and learned concepts of deep learning models beyond surface-level feature attribution. Covers probing classifiers, concept activation vectors (TCAV), sparse autoencoders for mechanistic interpretability, and best practices for communicating findings.
https://aaas.blog/skill/model-interpretability ↗C+
C+—Average
Adoption: CQuality: AFreshness: ACitations: C+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- probing-classifiers, concept-activation-vectors, sparse-autoencoder-analysis, circuit-discovery, representation-analysis
- Integrations
- captum, huggingface, pytorch, anthropic-mech-interp
- Use Cases
- safety-research, bias-auditing, knowledge-distillation, model-compression
- API Available
- No
- Difficulty
- advanced
- Prerequisites
- feature-attribution, attention-visualization
- Supported Agents
- compliance-agent
- Tags
- xai, interpretability, probing, concept-activation, mechanistic
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
50.8Adoption
48
Quality
88
Freshness
82
Citations
56
Engagement
0