Skip to main content
BenchmarkLLMsv1.0

GreenAI Benchmark

by Schwartz et al. / AI2 / University of Washington · open-source · Last verified 2026-03-17

GreenAI Benchmark evaluates the efficiency of AI training and inference by reporting accuracy alongside FLOPs, parameters, and CO2 emissions. It promotes the efficiency metric paradigm where reporting results without computational cost is considered incomplete science.

https://arxiv.org/abs/1907.10597
C
CBelow Average
Adoption: CQuality: B+Freshness: BCitations: BEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
evaluation, efficiency-measurement, flops-counting
Integrations
codecarbon, fvcore
Use Cases
model-evaluation, sustainable-ai, research-reporting
API Available
No
Evaluated Models
phi-3-mini, mistral-7b, gpt-4o, llama-3-70b
Metrics
accuracy, training-flops, inference-flops-per-token, co2-kg
Methodology
Standardized accuracy measured on GLUE/SuperGLUE. Training and inference FLOPs computed with fvcore. CO2 estimated using hardware TDP × training time × PUE × carbon intensity. Results plotted on efficiency Pareto frontiers.
Last Run
2025-11-10
Tags
green-ai, efficiency, flops, sustainability, training
Added
2026-03-17
Completeness
100%

Index Score

48.5
Adoption
44
Quality
77
Freshness
64
Citations
62
Engagement
0

Explore the full AI ecosystem on Agents as a Service