Coming Q2 2026
FoundryOS

Scalable. Secure. Private.

Deploy, unify and scale your local-AI infrastructure - powered by vLLM, SGLang or LlamaCpp - with cloud-level reliability and the control, monitoring and privacy of self-hosting. Run it on private cloud, air-gapped systems or on-premises hardware.

CursorOpenAI APIClaude CodeAnthropic APIOpen WebUIOpenAI APIYour AppOpenAI APIFoundryOSAPI TranslationPass-throughLoad BalancingIntelligent routingHealth MonitoringAll systems operationalRequest RoutingModel-aware routingOpenAI APIClaude APIvLLM Instance 1phi-435%vLLM Instance 2phi-472%SGLangglm-4-645%llama.cppunsloth-qwen328%

Architecture Overview

FoundryOS is built on a distributed architecture with three core components working in harmony to deliver enterprise-grade AI inference at scale. Built with Go 1.24 for maximum performance.

Fleet

The central orchestration layer that coordinates your entire AI infrastructure. Fleet manages model deployments, load balancing and request routing across all nodes with Redis-backed distributed state for consistency and resilience.

Scout

Lightweight monitoring agents deployed on each inference node. Scout collects real-time metrics including GPU utilisation, memory usage and latency percentiles (p50/p90/p99) for comprehensive observability.

Relay

High-performance gRPC communication layer enabling seamless inter-component messaging. Relay ensures low-latency coordination between Fleet and Scout agents with built-in circuit breakers and exponential backoff for resilience.

Enterprise Ready Inference

FoundryOS is built for teams deploying AI infrastructure on internal clouds, private networks, or airgapped systems. With native vLLM, SGLang, llama.cpp support and robust features to help you deploy, manage and scale your AI workloads efficiently. With low-overhead and high-performance at scale.

  • Container-based deployment (not SaaS)
  • Monitoring and management tools for realtime analysis
  • Air-gapped and on-premises / own-cloud support
  • Enterprise GPU optimisation
  • Built with Go 1.24

Unified API

FoundryOS unifies multiple inference backends and models under a single API with protocol-level translation. Applications and users can query using either OpenAI or Anthropic API formats - FoundryOS handles the translation at the protocol level, preserving streaming, tool use and all advanced features.

  • Protocol-level API translation (not just format conversion)
  • Query using OpenAI or Anthropic API formats interchangeably
  • Full streaming and tool use support across translations
  • Support Claude Code, Cursor with on-premises models easily
  • Automatic request/response transformation

Model Unification

FoundryOS unifies your AI models across multiple inference backends under a single API and management plane. Seamlessly switch between vLLM, SGLang, llama.cpp, validate which backend works best for your workload.

  • Multi-backend model unification
  • Test & validate models across backends
  • Run-time backend switching with zero downtime
  • Provide model redundancy across multiple nodes

Native Inference Backends

FoundryOS integrates natively with inference backends to provide monitoring and management capabilities.

vLLM
SGLang
llama.cpp
TensorRT-LLM coming soon

Health Monitoring

FoundryOS provides intelligent health checks with circuit breakers, exponential backoff and gradual recovery to ensure your AI infrastructure stays online and available for your users, applications and customers.

  • Circuit breakers with configurable failure thresholds
  • Exponential backoff to prevent cascading failures
  • Gradual recovery with incremental traffic restoration
  • Self-healing and automatic failover mechanisms
  • Real-time health dashboards and alerting

Inference Observability

Get complete visibility into your AI infrastructure with comprehensive observability. Scout agents collect detailed metrics from every node, providing real-time insights into performance and resource utilisation.

  • GPU utilisation and memory tracking per node
  • Latency percentiles: p50, p90 and p99 for SLA compliance
  • Tokens per second and throughput metrics
  • Distributed tracing across Fleet, Scout and Relay
  • Model performance analysis with usage attribution

Enterprise Control

FoundryOS provides enterprise-grade security controls with Redis-backed distributed state management for consistent configuration and access control across your entire AI infrastructure.

  • Token authentication with role-based access control (RBAC) coming soon
  • Redis-backed distributed state management
  • Comprehensive audit logging for compliance
  • Consistent configuration across all nodes

Join the FoundryOS Waitlist

Be among the first to deploy enterprise AI infrastructure when FoundryOS launches in Q2 2026.