May 9, 2025

Rafay Systems Launches Serverless Inference to Simplify GenAI Application Development for GPU Cloud Providers

Zebra Ventures invests in Apera AI’s 4D Vision to accelerate intelligent automation

Intel‑FPT AI‑Driven Factory Optimization Platform

Intel and FPT Unveil AI‑Driven Factory Optimization Platform for Autonomous Manufacturing

Rafay Systems, a leader in cloud-native and AI infrastructure orchestration, has announced the general availability of its Serverless Inference offering. This token-metered API allows organizations to run open-source and privately trained large language models (LLMs) without the overhead of managing GPU infrastructure. Rafay’s platform streamlines the AI application development process, offering multi-tenant, Platform-as-a-Service (PaaS) solutions to customers across the growing GenAI market.

1. The Serverless Inference Advantage

Simplified AI Application Development: Serverless Inference enables rapid deployment of GenAI models as a service. This makes it easier for organizations to integrate advanced AI capabilities into their applications without dealing with the complexities of infrastructure setup or management.
Free for NCPs and GPU Clouds: Rafay’s offering is available at no additional cost to NCPs and GPU Cloud providers, empowering them to expand their service offerings and cater to the booming AI inference market.
Growth of the AI Inference Market: The global AI inference market is projected to reach $106 billion by 2025, and $254 billion by 2030, offering a massive opportunity for GPU Cloud Providers and NCPs.

2. Key Features of Rafay’s Serverless Inference

Developer-Friendly Integration: The platform is OpenAI-compatible, offering secure RESTful APIs with streaming support, meaning developers can integrate GenAI workflows quickly without code migration.
Intelligent Infrastructure Management: Rafay automatically manages GPU nodes, optimizing resource allocation across multi-tenant and dedicated isolation options, ensuring efficient performance without over-provisioning.
Comprehensive Billing and Metering: The token-based billing system tracks both input and output usage, providing granular analytics and integration with existing billing platforms for a transparent, consumption-based pricing model.

3. Enterprise-Grade Security and Governance

Rafay’s platform ensures robust protection with HTTPS-only API endpoints, rotating bearer token authentication, and configurable token quotas per team or application. These features ensure that enterprise-level security and compliance requirements are met.
Performance and Observability: Rafay offers comprehensive performance monitoring and logging, with support for object storage systems like MinIO and Weka, enabling end-to-end visibility into both infrastructure and model performance.

4. Empowering NCPs and GPU Clouds to Shift to AI-as-a-Service

Rafay is pioneering the shift from GPU-as-a-Service to AI-as-a-Service. By offering Serverless Inference, NCPs and GPU Cloud providers can offer customers scalable, cost-effective access to GenAI capabilities, streamlining the adoption of AI across industries.

Rafay Systems’ Serverless Inference offering is a game-changer for GPU Cloud Providers and NCPs looking to capitalize on the rapidly growing AI inference market. By providing a seamless, secure, and cost-effective solution for running GenAI models, Rafay enables developers to focus on building applications rather than managing infrastructure. This innovation positions Rafay as a leader in AI infrastructure orchestration, advancing the future of AI-as-a-Service.