GreenAI Benchmark
by Schwartz et al. / AI2 / University of Washington · open-source · Last verified 2026-03-17
GreenAI Benchmark evaluates the efficiency of AI training and inference by reporting accuracy alongside FLOPs, parameters, and CO2 emissions. It promotes the efficiency metric paradigm where reporting results without computational cost is considered incomplete science.
https://arxiv.org/abs/1907.10597 ↗D
D—Poor
Adoption: CQuality: B+Freshness: BCitations: FEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- evaluation, efficiency-measurement, flops-counting
- Integrations
- codecarbon, fvcore
- Use Cases
- model-evaluation, sustainable-ai, research-reporting
- API Available
- No
- Evaluated Models
- phi-3-mini, mistral-7b, gpt-4o, llama-3-70b
- Metrics
- accuracy, training-flops, inference-flops-per-token, co2-kg
- Methodology
- Standardized accuracy measured on GLUE/SuperGLUE. Training and inference FLOPs computed with fvcore. CO2 estimated using hardware TDP × training time × PUE × carbon intensity. Results plotted on efficiency Pareto frontiers.
- Last Run
- 2025-11-10
- Tags
- green-ai, efficiency, flops, sustainability, training
- Added
- 2026-03-17
- Completeness
- 80%
Index Score
33Adoption
44
Quality
77
Freshness
64
Citations
0
Engagement
0