LLM Regression Testing
by AaaS · open-source · Last verified 2026-03-01
Detects regressions in LLM behavior across model updates, prompt changes, or configuration modifications. Runs golden test sets, compares outputs using semantic similarity and LLM judges, and flags significant quality degradation with detailed diff reports.
https://aaas.blog/script/regression-testing-llm ↗C
C—Below Average
Adoption: C+Quality: AFreshness: ACitations: CEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- golden-set-evaluation, semantic-comparison, llm-judging, regression-detection, diff-reporting
- Integrations
- openai, anthropic, sentence-transformers, pytest
- Use Cases
- model-update-validation, prompt-change-testing, quality-monitoring, deployment-gating
- API Available
- No
- Language
- python
- Dependencies
- openai, anthropic, sentence-transformers, pytest, numpy
- Environment
- Python 3.11+
- Est. Runtime
- 5-20 minutes depending on test set size
- Tags
- script, automation, regression, testing, quality
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
48.7Adoption
52
Quality
82
Freshness
80
Citations
46
Engagement
0