Back to Releases

Olla 0.3.5

Version 0.3.5
25% Faster on ARM
15% Lower Latency
4 Bug Fixes

We're excited to release Olla 0.3.5, bringing significant performance improvements and new model support. This release focuses on optimising inference speed, especially on ARM-based devices.

New Features

Llama 3.2 Support

Full support for Meta's latest Llama 3.2 models

ARM Optimisation

25% faster inference on Apple Silicon and other ARM devices

Memory Pooling

Improved memory management reduces allocation overhead

Enhanced Metrics

More detailed performance metrics and logging

Performance Improvements

15%
Reduced latency for first token
Improved throughput for batch processing
⚖️
Optimised quantisation quality/speed
⏱️
Faster model loading times

🐛 Bug Fixes

  • Fixed memory leak in long-running sessions
  • Resolved issue with model switching
  • Corrected token counting for certain models
  • Fixed crash on malformed requests

📦 Upgrade Instructions

Docker

docker pull ghcr.io/thushan/olla:0.3.5

Homebrew (macOS)

brew upgrade olla

Cargo

cargo install olla --version 0.3.5

Breaking Changes

None! This release is fully backward compatible with 0.3.x versions.