ScriptAI Tools & APIsv1.0

LLM Regression Testing

by AaaS · open-source · Last verified 2026-03-01

Detects regressions in LLM behavior across model updates, prompt changes, or configuration modifications. Runs golden test sets, compares outputs using semantic similarity and LLM judges, and flags significant quality degradation with detailed diff reports.

https://aaas.blog/script/regression-testing-llm ↗

D—Poor

Adoption: C+Quality: AFreshness: ACitations: FEngagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: golden-set-evaluation, semantic-comparison, llm-judging, regression-detection, diff-reporting
Integrations: openai, anthropic, sentence-transformers, pytest
Use Cases: model-update-validation, prompt-change-testing, quality-monitoring, deployment-gating
API Available: No
Language: python
Dependencies: openai, anthropic, sentence-transformers, pytest, numpy
Environment: Python 3.11+
Est. Runtime: 5-20 minutes depending on test set size
Tags: script, automation, regression, testing, quality
Added: 2026-03-17
Completeness: 80%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service