Skip to main content
BenchmarkLLMsv4.1

MLPerf Inference

by MLCommons · open-source · Last verified 2026-03-17

MLPerf Inference is the industry-standard benchmark for measuring AI inference performance across hardware platforms. It covers image classification, object detection, NLP, speech recognition, and generative AI workloads, enabling fair apples-to-apples comparison of accelerators and inference stacks.

https://mlcommons.org/benchmarks/inference/
B+
B+Good
Adoption: AQuality: A+Freshness: ACitations: AEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
evaluation, inference-benchmarking, hardware-evaluation
Integrations
tensorrt, onnx, triton
Use Cases
model-evaluation, hardware-selection, inference-optimization
API Available
No
Evaluated Models
bert-large, llama-3-70b, stable-diffusion-xl, resnet-50
Metrics
samples-per-second, latency-ms, tokens-per-second
Methodology
Standardized scenarios (single-stream, multi-stream, server, offline) run on physical hardware with reproducible software stacks. Results submitted to MLCommons and peer-reviewed before publication. Closed and open divisions allow different optimization levels.
Last Run
2026-02-25
Tags
inference, throughput, latency, hardware, mlcommons
Added
2026-03-17
Completeness
100%

Index Score

73.1
Adoption
85
Quality
93
Freshness
85
Citations
82
Engagement
0

Explore the full AI ecosystem on Agents as a Service