Olla v0.0.28 is a big one. It adds native support for the oMLX runtime, bringing fast multi-model inference on Apple Silicon behind Olla with native Anthropic Messages passthrough, lands the most-requested feature so far in per-endpoint authentication, and adds opt-in CORS for browser-based clients.
What's changed
- Native oMLX backend with Anthropic Messages passthrough, for multi-model inference on Apple Silicon
- Per-endpoint authentication for local backends: bearer, API key and basic auth, from inline values, environment variables or file siblings
- Opt-in CORS support for browser-based clients such as OpenWebUI and dashboards
- Anthropic translation hardening: reasoning now maps to thinking blocks, with more reliable streaming, usage and tool-call handling
- Backend fixes: Docker Model Runner Anthropic passthrough, Lemonade model routing, and OpenAI alias model listings
- New timeout tunables (read-header, response-header, keep-alive, TLS handshake), all environment-overridable
- Reliability hardening: atomic config snapshots per request, race-free endpoint pools, and panic-safe cleanup loops
Heads up on behaviour changes: the default proxy engine is now Olla with least-connections
load balancing, and /internal/status/models now returns endpoint names for
every entry. Upstream X-Olla-* response headers are also stripped.