Olla v0.0.27 brings native support for LMDeploy, a high-throughput inference engine popular in production deployments, alongside a round of sticky session fixes that improve KV cache alignment across multi-endpoint setups.
What's changed
- Add LMDeploy backend with model converter, response parser, and provider constants
- Add LMDeploy profile and wire it into the backend factory and routes
- Fix sticky sessions for provider-scoped routes
- Fix sticky diversity check to ignore failed requests
- Fix prefix hash fallback for empty messages array
- Add AI mock test harness for sticky session verification
- Clarify provider prefix helpers return no trailing slash
LMDeploy TPS metrics are still in progress and will follow in a later release.