LLM Regression Testing
by AaaS · open-source · Last verified 2026-03-01
Detects regressions in LLM behavior across model updates, prompt changes, or configuration modifications. Runs golden test sets, compares outputs using semantic similarity and LLM judges, and flags significant quality degradation with detailed diff reports.
https://aaas.blog/script/regression-testing-llm ↗D
D—Poor
Adoption: C+Quality: AFreshness: ACitations: FEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- golden-set-evaluation, semantic-comparison, llm-judging, regression-detection, diff-reporting
- Integrations
- openai, anthropic, sentence-transformers, pytest
- Use Cases
- model-update-validation, prompt-change-testing, quality-monitoring, deployment-gating
- API Available
- No
- Language
- python
- Dependencies
- openai, anthropic, sentence-transformers, pytest, numpy
- Environment
- Python 3.11+
- Est. Runtime
- 5-20 minutes depending on test set size
- Tags
- script, automation, regression, testing, quality
- Added
- 2026-03-17
- Completeness
- 80%
Index Score
37Adoption
52
Quality
82
Freshness
80
Citations
0
Engagement
0