Olla 0.3.5 Release - TensorFoundry

We're excited to release Olla 0.3.5, bringing significant performance improvements and new model support. This release focuses on optimising inference speed, especially on ARM-based devices.

✨ New Features

Llama 3.2 Support

Full support for Meta's latest Llama 3.2 models

ARM Optimisation

25% faster inference on Apple Silicon and other ARM devices

Memory Pooling

Improved memory management reduces allocation overhead

Enhanced Metrics

More detailed performance metrics and logging

⚡ Performance Improvements

15%

Reduced latency for first token

↑

Improved throughput for batch processing

⚖️

Optimised quantisation quality/speed

⏱️

Faster model loading times

🐛 Bug Fixes

Fixed memory leak in long-running sessions
Resolved issue with model switching
Corrected token counting for certain models
Fixed crash on malformed requests

📦 Upgrade Instructions

Docker

docker pull ghcr.io/thushan/olla:0.3.5

Homebrew (macOS)

brew upgrade olla

Cargo

cargo install olla --version 0.3.5

Breaking Changes

None! This release is fully backward compatible with 0.3.x versions.

Download

View on GitHub Documentation