Deploy LLMs on Your Own Infrastructure.
From Lab to Enterprise.

We build the foundational tools and platforms that empower organisations to deploy, manage and scale AI applications with confidence on their own terms.

Explore our products Try Olla, our open-source LLM proxy

LATEST FROM THE FOUNDRY

What's new

Fresh releases, announcements and writing from the workshop.

Release

Olla v0.0.28

The latest stable build of our open-source LLM proxy and load balancer.

View release

News

14 June 2026

Olla v0.0.28: Native oMLX Support, Per-Endpoint Auth & CORS

Olla v0.0.28 is a big release: native oMLX support for fast multi-model inference on Apple Silicon with Anthropic passthrough, per-endpoint authentication for local backends, opt-in CORS for browser clients, and a round of Anthropic translation and reliability hardening.

Read article

Blog

8 days ago

The Bandwidth Wall: A Roofline for Local LLMs

Why token generation on a Mac is bound by memory bandwidth, not compute - the roofline model, prefill versus decode, the M-series bandwidth ladder, and what the M5 actually changed.

Read post

PRODUCT ECOSYSTEM

From user to backend - every layer

TensorFoundry covers the full LLM stack. Pick the layers you need.

Apps LLM-powered applications

Gateways Route, manage and scale LLM traffic

Backends Run LLMs on your hardware

Forge Pure-Rust LLM inference engine

View all products

Start with what you need

From open-source tools to enterprise platforms

Olla

Available now

Open-source LLM proxy and load balancer. Unified interface for Ollama, LM Studio, vLLM, SGLang and llama.cpp with intelligent routing and automatic failover.

Learn more

Pivotal

Early Access

Agentic enterprise knowledge platform that builds a persistent wiki from your documents. Not stateless RAG - a living knowledge base that reasons across your organisation.

Learn more

Kaizen

Early Access

Terminal-first AI coding agent with persistent memory, multi-agent orchestration and CAS-based undo. Ships with scout, cody and sage agents.

Learn more

Alloy

Early Access

Enterprise LLM gateway and control plane. Centralised API keys, team budgets and policy enforcement across 16+ providers. 11,400 RPS.

Learn more

FoundryOS

EAP Q3 2026

Self-hosted enterprise AI platform. Run vLLM, SGLang and llama.cpp behind a unified API with health monitoring, load balancing and observability.

Learn more

Forge

EAP Q3 2026

Pure-Rust CUDA-native LLM inference engine. Single binary, no Python. 13+ model families, 8 quantisation formats, 25 CUDA kernels.

Learn more

AgentOS

EAP Q4 2026

Multi-agent orchestration platform for enterprise AI workflows. Coordinate specialised agents, manage state and automate complex reasoning pipelines.

Learn more

OUR JOURNEY

Building Data-led AI Innovation for 5+ Years

From ML pipelines to enterprise orchestration - a snapshot of our team's journey.

2010s

Sherpa

ML model management

2023+

Scout

LLMs for the Enterprise

2025

Olla

Open-source inference proxy

Q2 2026

FoundryOS

Enterprise ready inference platform

Late 2026

AgentOS

Multi-agent orchestration

Explore Our Full Journey

Why Local AI?

Deploy AI where it matters most - at the edge or closer to you, on your own terms.

Privacy First

Keep sensitive data on-premises. Finely tune models and maintain complete sovereignty without Cloud.

Low Latency

Sub-millisecond response times with edge deployment. Eliminate network overhead for real-time AI.

Cost Effective

Reduce cloud API costs by 90%. Pay once for hardware, run inference & training indefinitely.

Optimised Performance

Hardware-accelerated inference with support for CUDA, Metal and custom accelerators.

OPEN SOURCE

Join the Olla Community

Open-source AI inference proxy perfect for small businesses and development teams. Unified interface for Ollama, LM Studio, vLLM and others with load balancing and failover.

20K+

Container Pulls

200+

GitHub Stars

Apache 2.0

Open Source

Explore Full Community View on GitHub

EARLY ACCESS

Get Early Access to the Full Stack

Join the waitlist for priority access to Alloy, FoundryOS, Pivotal and Forge as they launch through 2026.

Get exclusive updates, beta access and founding member pricing.

Priority Beta Access

Up to 50% Off Licensing

Direct Engineering Access

Join the Waitlist Explore all products

Now Olla + Kaizen

Q2 2026 Alloy + FoundryOS + Pivotal

Q3 2026 Forge Preview

Ready to deploy AI at the edge?

Let TensorFoundry build your AI Inference & Training Lab for you.

Explore Services Contact Sales

Deploy LLMs on Your Own Infrastructure. From Lab to Enterprise.

What's new

Olla v0.0.28

Olla v0.0.28: Native oMLX Support, Per-Endpoint Auth & CORS

The Bandwidth Wall: A Roofline for Local LLMs

From user to backend - every layer

Start with what you need

Olla

Pivotal

Kaizen

Alloy

FoundryOS

Forge

AgentOS

Building Data-led AI Innovation for 5+ Years

Why Local AI?

Privacy First

Low Latency

Cost Effective

Optimised Performance

Join the Olla Community

Get Early Access to the Full Stack

Ready to deploy AI at the edge?

Deploy LLMs on Your Own Infrastructure.
From Lab to Enterprise.