Skip to main content
BenchmarkLLMsv1.0

GreenAI Benchmark

by Schwartz et al. / AI2 / University of Washington · open-source · Last verified 2026-03-17

GreenAI Benchmark evaluates the efficiency of AI training and inference by reporting accuracy alongside FLOPs, parameters, and CO2 emissions. It promotes the efficiency metric paradigm where reporting results without computational cost is considered incomplete science.

https://arxiv.org/abs/1907.10597
D
DPoor
Adoption: CQuality: B+Freshness: BCitations: FEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
evaluation, efficiency-measurement, flops-counting
Integrations
codecarbon, fvcore
Use Cases
model-evaluation, sustainable-ai, research-reporting
API Available
No
Evaluated Models
phi-3-mini, mistral-7b, gpt-4o, llama-3-70b
Metrics
accuracy, training-flops, inference-flops-per-token, co2-kg
Methodology
Standardized accuracy measured on GLUE/SuperGLUE. Training and inference FLOPs computed with fvcore. CO2 estimated using hardware TDP × training time × PUE × carbon intensity. Results plotted on efficiency Pareto frontiers.
Last Run
2025-11-10
Tags
green-ai, efficiency, flops, sustainability, training
Added
2026-03-17
Completeness
80%

Index Score

33
Adoption
44
Quality
77
Freshness
64
Citations
0
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service