Back to News

Olla v0.0.27: Native LMDeploy Support & Sticky Session Fixes

Olla v0.0.27 brings native support for LMDeploy, a high-throughput inference engine popular in production deployments, alongside a round of sticky session fixes that improve KV cache alignment across multi-endpoint setups.

What's changed

  • Add LMDeploy backend with model converter, response parser, and provider constants
  • Add LMDeploy profile and wire it into the backend factory and routes
  • Fix sticky sessions for provider-scoped routes
  • Fix sticky diversity check to ignore failed requests
  • Fix prefix hash fallback for empty messages array
  • Add AI mock test harness for sticky session verification
  • Clarify provider prefix helpers return no trailing slash

LMDeploy TPS metrics are still in progress and will follow in a later release.

Resources