Fleet
The central orchestration layer that coordinates your entire AI infrastructure. Fleet manages model deployments, load balancing and request routing across all nodes with Redis-backed distributed state for consistency and resilience.
FoundryOSDeploy, unify and scale your local-AI infrastructure - powered by vLLM, SGLang or LlamaCpp - with cloud-level reliability and the control, monitoring and privacy of self-hosting. Run it on private cloud, air-gapped systems or on-premises hardware.
FoundryOS is built on a distributed architecture with three core components working in harmony to deliver enterprise-grade AI inference at scale. Built with Go 1.24 for maximum performance.
The central orchestration layer that coordinates your entire AI infrastructure. Fleet manages model deployments, load balancing and request routing across all nodes with Redis-backed distributed state for consistency and resilience.
Lightweight monitoring agents deployed on each inference node. Scout collects real-time metrics including GPU utilisation, memory usage and latency percentiles (p50/p90/p99) for comprehensive observability.
High-performance gRPC communication layer enabling seamless inter-component messaging. Relay ensures low-latency coordination between Fleet and Scout agents with built-in circuit breakers and exponential backoff for resilience.
FoundryOS is built for teams deploying AI infrastructure on internal clouds, private networks, or airgapped systems. With native vLLM, SGLang, llama.cpp support and robust features to help you deploy, manage and scale your AI workloads efficiently. With low-overhead and high-performance at scale.
FoundryOS unifies multiple inference backends and models under a single API with protocol-level translation. Applications and users can query using either OpenAI or Anthropic API formats - FoundryOS handles the translation at the protocol level, preserving streaming, tool use and all advanced features.
FoundryOS unifies your AI models across multiple inference backends under a single API and management plane. Seamlessly switch between vLLM, SGLang, llama.cpp, validate which backend works best for your workload.
FoundryOS integrates natively with inference backends to provide monitoring and management capabilities.
FoundryOS provides intelligent health checks with circuit breakers, exponential backoff and gradual recovery to ensure your AI infrastructure stays online and available for your users, applications and customers.
Get complete visibility into your AI infrastructure with comprehensive observability. Scout agents collect detailed metrics from every node, providing real-time insights into performance and resource utilisation.
FoundryOS provides enterprise-grade security controls with Redis-backed distributed state management for consistent configuration and access control across your entire AI infrastructure.
Be among the first to deploy enterprise AI infrastructure when FoundryOS launches in Q2 2026.