Olla v0.0.28: oMLX, Auth & CORS

Olla v0.0.28 is a big one. It adds native support for the oMLX runtime, bringing fast multi-model inference on Apple Silicon behind Olla with native Anthropic Messages passthrough, lands the most-requested feature so far in per-endpoint authentication, and adds opt-in CORS for browser-based clients.

What's changed

Native oMLX backend with Anthropic Messages passthrough, for multi-model inference on Apple Silicon
Per-endpoint authentication for local backends: bearer, API key and basic auth, from inline values, environment variables or file siblings
Opt-in CORS support for browser-based clients such as OpenWebUI and dashboards
Anthropic translation hardening: reasoning now maps to thinking blocks, with more reliable streaming, usage and tool-call handling
Backend fixes: Docker Model Runner Anthropic passthrough, Lemonade model routing, and OpenAI alias model listings
New timeout tunables (read-header, response-header, keep-alive, TLS handshake), all environment-overridable
Reliability hardening: atomic config snapshots per request, race-free endpoint pools, and panic-safe cleanup loops

Heads up on behaviour changes: the default proxy engine is now Olla with least-connections load balancing, and /internal/status/models now returns endpoint names for every entry. Upstream X-Olla-* response headers are also stripped.

Resources

GitHub Release Full release notes on GitHub

oMLX Integration Guide Documentation for the new backend

Running MLX behind Olla One endpoint for your Mac AI stack

Olla Product Page Learn more about Olla