Olla v0.0.27: LMDeploy & Session Fixes

Olla v0.0.27 brings native support for LMDeploy, a high-throughput inference engine popular in production deployments, alongside a round of sticky session fixes that improve KV cache alignment across multi-endpoint setups.

What's changed

Add LMDeploy backend with model converter, response parser, and provider constants
Add LMDeploy profile and wire it into the backend factory and routes
Fix sticky sessions for provider-scoped routes
Fix sticky diversity check to ignore failed requests
Fix prefix hash fallback for empty messages array
Add AI mock test harness for sticky session verification
Clarify provider prefix helpers return no trailing slash

LMDeploy TPS metrics are still in progress and will follow in a later release.

Resources

GitHub Release Full release notes on GitHub

LMDeploy Integration Guide Documentation for the new backend

Sticky Sessions How KV cache alignment works in Olla

Olla Product Page Learn more about Olla