BenchmarkLLMsv4.1

MLPerf Inference

by MLCommons · open-source · Last verified 2026-03-17

MLPerf Inference is the industry-standard benchmark for measuring AI inference performance across hardware platforms. It covers image classification, object detection, NLP, speech recognition, and generative AI workloads, enabling fair apples-to-apples comparison of accelerators and inference stacks.

https://mlcommons.org/benchmarks/inference/ ↗

B+

B+—Good

Adoption: AQuality: A+Freshness: ACitations: AEngagement: F

Specifications

License: Apache-2.0
Pricing: open-source
Capabilities: evaluation, inference-benchmarking, hardware-evaluation
Integrations: tensorrt, onnx, triton
Use Cases: model-evaluation, hardware-selection, inference-optimization
API Available: No
Evaluated Models: bert-large, llama-3-70b, stable-diffusion-xl, resnet-50
Metrics: samples-per-second, latency-ms, tokens-per-second
Methodology: Standardized scenarios (single-stream, multi-stream, server, offline) run on physical hardware with reproducible software stacks. Results submitted to MLCommons and peer-reviewed before publication. Closed and open divisions allow different optimization levels.
Last Run: 2026-02-25
Tags: inference, throughput, latency, hardware, mlcommons
Added: 2026-03-17
Completeness: 100%

Index Score

73.1

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service