Olla v0.0.25 is a substantial release with two headline features: model aliasing lets you map friendly names to model identifiers in config, and sticky sessions keep requests pinned to the same endpoint for better KV cache utilisation. A set of inspector fixes from community contributor dnnspaul also lands in this release.
What's changed
Features
- Add model aliasing - map arbitrary names to model identifiers via configuration, with validation and test coverage
- Add sticky sessions - keep requests aligned to the same endpoint to improve KV cache hit rates across multi-endpoint deployments
Bug fixes
- Fix inspector buffer copy before pool return to avoid aliasing issues
- Fix inspector incremental scan with token-level skipping for field-order independence
- Fix inspector to use a proper ReadCloser and preserve body Close delegation
- Fix inspector body restoration on error and decoder state in model field extraction
- Fix model name extraction from large requests via streaming JSON prefix scan
- Address various CodeRabbit review issues