GreenAI Benchmark
by Schwartz et al. / AI2 / University of Washington · open-source · Last verified 2026-03-17
GreenAI Benchmark evaluates the efficiency of AI training and inference by reporting accuracy alongside FLOPs, parameters, and CO2 emissions. It promotes the efficiency metric paradigm where reporting results without computational cost is considered incomplete science.
https://arxiv.org/abs/1907.10597 ↗C
C—Below Average
Adoption: CQuality: B+Freshness: BCitations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- evaluation, efficiency-measurement, flops-counting
- Integrations
- codecarbon, fvcore
- Use Cases
- model-evaluation, sustainable-ai, research-reporting
- API Available
- No
- Evaluated Models
- phi-3-mini, mistral-7b, gpt-4o, llama-3-70b
- Metrics
- accuracy, training-flops, inference-flops-per-token, co2-kg
- Methodology
- Standardized accuracy measured on GLUE/SuperGLUE. Training and inference FLOPs computed with fvcore. CO2 estimated using hardware TDP × training time × PUE × carbon intensity. Results plotted on efficiency Pareto frontiers.
- Last Run
- 2025-11-10
- Tags
- green-ai, efficiency, flops, sustainability, training
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
48.5Adoption
44
Quality
77
Freshness
64
Citations
62
Engagement
0