CoreWeave, the AI Hyperscaler™, has announced its latest MLPerf v5.0 results—establishing a new industry benchmark for AI inference. By leveraging NVIDIA’s powerful GB200 Grace Blackwell Superchips, CoreWeave achieved 800 tokens per second (TPS) on the open-source Llama 3.1 405B model, setting a high bar for large-model inference performance in the cloud. This milestone solidifies CoreWeave’s role as a top-tier cloud infrastructure provider, purpose-built to meet the demands of cutting-edge AI workloads.
Unmatched Performance with NVIDIA GB200 Superchips
In its MLPerf v5.0 submission, CoreWeave utilized a cloud instance powered by:
- 2x NVIDIA Grace CPUs
- 4x NVIDIA Blackwell GPUs
This setup delivered 800 TPS on Llama 3.1 405B—one of the most demanding open-source models to date. This performance leap showcases CoreWeave’s commitment to building infrastructure specifically optimized for inference at scale.
“These benchmark MLPerf results reinforce CoreWeave’s position as a preferred cloud provider for leading AI labs and enterprises,” said Peter Salanki, Chief Technology Officer at CoreWeave.
40% Throughput Gain on NVIDIA H200 Instances
In addition to the GB200 benchmark, CoreWeave submitted MLPerf v5.0 results for NVIDIA H200 GPU instances, achieving:
- 33,000 TPS on the Llama 2 70B model
- A 40% increase in throughput over equivalent NVIDIA H100 deployments
These results demonstrate CoreWeave’s ability to deliver high-performance, cost-efficient AI infrastructure across various configurations and model types.
First to Market with GB200 NVL72 Availability
In 2024, CoreWeave became the first provider to offer general availability of NVIDIA GB200 NVL72-based instances. The company continues to lead the market by rapidly adopting and scaling the most advanced GPU architectures, including:
- Early access to NVIDIA H100 and H200
- One of the first to demo NVIDIA GB200 NVL72
This innovation pipeline ensures that CoreWeave clients stay ahead of the curve in performance, scalability, and time-to-deployment for LLMs and other inference-heavy AI applications.
Why MLPerf Matters
MLPerf Inference is the industry-standard benchmark suite for evaluating machine learning performance in real-world deployments. It measures how quickly and efficiently systems process inputs and deliver results using trained models—critical for applications in conversational AI, vision, search, and more.
By consistently leading in MLPerf scores, CoreWeave proves its infrastructure delivers real-world speed, scalability, and reliability for enterprises and AI labs alike.
As AI model sizes continue to grow, efficient inference performance becomes critical to delivering real-time experiences and minimizing infrastructure costs. CoreWeave’s record-setting MLPerf v5.0 results—powered by NVIDIA’s most advanced silicon—highlight its dedication to optimizing for high-throughput, low-latency inference.
With a purpose-built cloud platform and early access to next-gen hardware, CoreWeave is redefining what’s possible in AI infrastructure and inference at scale.