Back to News

Olla v0.0.28: Native oMLX Support, Per-Endpoint Auth & CORS

Olla v0.0.28 is a big one. It adds native support for the oMLX runtime, bringing fast multi-model inference on Apple Silicon behind Olla with native Anthropic Messages passthrough, lands the most-requested feature so far in per-endpoint authentication, and adds opt-in CORS for browser-based clients.

What's changed

  • Native oMLX backend with Anthropic Messages passthrough, for multi-model inference on Apple Silicon
  • Per-endpoint authentication for local backends: bearer, API key and basic auth, from inline values, environment variables or file siblings
  • Opt-in CORS support for browser-based clients such as OpenWebUI and dashboards
  • Anthropic translation hardening: reasoning now maps to thinking blocks, with more reliable streaming, usage and tool-call handling
  • Backend fixes: Docker Model Runner Anthropic passthrough, Lemonade model routing, and OpenAI alias model listings
  • New timeout tunables (read-header, response-header, keep-alive, TLS handshake), all environment-overridable
  • Reliability hardening: atomic config snapshots per request, race-free endpoint pools, and panic-safe cleanup loops

Heads up on behaviour changes: the default proxy engine is now Olla with least-connections load balancing, and /internal/status/models now returns endpoint names for every entry. Upstream X-Olla-* response headers are also stripped.

Resources