F5 has announced powerful new capabilities for its BIG-IP Next for Kubernetes platform, now running natively on NVIDIA BlueField-3 DPUs using the NVIDIA DOCA software framework. The integration brings enterprise-grade performance, multi-tenancy, and security to large-scale AI infrastructure, validated by Sesterce, a European operator focused on sovereign AI and next-gen infrastructure.
This collaboration marks a significant step toward optimizing GPU utilization, streamlining AI inference pipelines, and strengthening LLM traffic security and performance—all in real-time.
Highlights of the F5 + NVIDIA Integration
1. LLM Traffic Optimization and Smart Routing
With the explosion of multi-model AI architectures, routing AI tasks efficiently is critical. BIG-IP Next, deployed on BlueField-3 DPUs, enables:
- Intelligent LLM routing that directs simple tasks to lightweight models and reserves high-capacity models for complex queries.
- Integration with NVIDIA NIM microservices for seamless workload orchestration.
- Latency reduction and improved time-to-first-token via real-time traffic shaping.
“By programming routing logic directly on NVIDIA BlueField-3 DPUs, F5 BIG-IP Next for Kubernetes is the most efficient approach for delivering and securing LLM traffic,” said Kunal Anand, Chief Innovation Officer, F5.
2. Integration with NVIDIA Dynamo and KV Cache Manager
To meet the compute demands of distributed inference at scale, the platform leverages NVIDIA Dynamo and KV Cache Manager to:
- Reduce recomputation latency through cache-based inference memory management.
- Boost GPU efficiency by offloading memory routing logic to DPUs.
- Enable real-time LLM reasoning with cost-effective data storage alternatives to high-end GPU memory.
“BIG-IP Next for Kubernetes gives enterprises a single point of control for routing AI traffic—from model training to agentic inference,” noted Ash Bhalgat, NVIDIA.
3. Model Context Protocol (MCP) Security Enhancements
With increasing adoption of the Model Context Protocol (MCP)—an Anthropic-led open standard—F5’s deployment as a reverse proxy adds:
- Application-layer protections for MCP servers.
- Traffic filtering, threat mitigation, and adaptive rule enforcement via F5 iRules.
- Enhanced security posture for agentic AI and RAG-based systems.
“F5 and NVIDIA are delivering AI security and programmability that we’re not seeing elsewhere in the industry,” said Greg Schoeny, SVP at World Wide Technology.
Sesterce Validation: Enterprise-Ready, Performance-Proven
Sesterce’s deployment tested the full stack—including NVIDIA Dynamo and F5 BIG-IP Next—achieving:
- 20% improvement in GPU utilization
- Seamless scaling and LLM task routing with Kubernetes ingress/egress control
- Secure multi-tenant environments ideal for sovereign AI deployments in Europe
“This approach empowers us to optimize GPU use while delivering unique value to our customers,” said Youssef El Manssouri, CEO of Sesterce.
Availability and What’s Next
The solution is generally available today, with more innovation expected as enterprises build out agentic AI and sovereign cloud environments. F5 and NVIDIA are showcasing the technology at NVIDIA GTC Paris, part of VivaTech 2025.