BenchmarkLLMsv1.0

GreenAI Benchmark

by Schwartz et al. / AI2 / University of Washington · open-source · Last verified 2026-03-17

GreenAI Benchmark evaluates the efficiency of AI training and inference by reporting accuracy alongside FLOPs, parameters, and CO2 emissions. It promotes the efficiency metric paradigm where reporting results without computational cost is considered incomplete science.

https://arxiv.org/abs/1907.10597 ↗

C—Below Average

Adoption: CQuality: B+Freshness: BCitations: BEngagement: F

Specifications

License: Apache-2.0
Pricing: open-source
Capabilities: evaluation, efficiency-measurement, flops-counting
Integrations: codecarbon, fvcore
Use Cases: model-evaluation, sustainable-ai, research-reporting
API Available: No
Evaluated Models: phi-3-mini, mistral-7b, gpt-4o, llama-3-70b
Metrics: accuracy, training-flops, inference-flops-per-token, co2-kg
Methodology: Standardized accuracy measured on GLUE/SuperGLUE. Training and inference FLOPs computed with fvcore. CO2 estimated using hardware TDP × training time × PUE × carbon intensity. Results plotted on efficiency Pareto frontiers.
Last Run: 2025-11-10
Tags: green-ai, efficiency, flops, sustainability, training
Added: 2026-03-17
Completeness: 100%

Index Score

48.5

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service