Ollama v0.19

Ollama Powered by MLX: High-Performance AI Inference for Apple Silicon Mac

Introduction:

Discover the new Ollama powered by MLX, Apple's machine learning framework. This update brings unprecedented speeds to Apple Silicon, leveraging unified memory and GPU Neural Accelerators on M5 chips. Featuring NVFP4 support for production-grade accuracy and an upgraded intelligent caching system, Ollama 0.19 optimizes coding agents like Claude Code and personal assistants like OpenClaw. Experience faster time to first token (TTFT) and high-token generation speeds for demanding local AI workloads on macOS.

Added On:

2026-04-03

Monthly Visitors:

--K

Code & IT

Ollama v0.19 - AI Tool Screenshot and Interface Preview

Ollama v0.19 Product Information

Ollama Powered by MLX: Accelerating AI on Apple Silicon

In a landmark update for local AI enthusiasts and developers, Ollama is now powered by MLX on Apple Silicon in a new preview release. This integration marks the fastest way to run Ollama on macOS, leveraging Apple’s dedicated machine learning framework to push the boundaries of performance. By utilizing the unique strengths of Apple’s hardware, Ollama provides an optimized environment for running complex language models with unprecedented speed and efficiency.

What's Ollama Powered by MLX?

Ollama powered by MLX is a specialized version of the Ollama inference engine designed specifically for Apple Silicon. MLX is Apple’s native machine learning framework, and by building Ollama on top of it, the platform can now take full advantage of the unified memory architecture found in Mac devices.

This update is particularly transformative for users of Apple’s latest hardware. On M5, M5 Pro, and M5 Max chips, Ollama leverages new GPU Neural Accelerators. These hardware improvements significantly accelerate both the time to first token (TTFT) and the overall generation speed (tokens per second). Whether you are running a personal assistant or a heavy-duty coding agent, Ollama powered by MLX ensures your local models respond with production-level agility.

Key Features of Ollama 0.19

MLX Acceleration and Performance

The core of this release is the integration with MLX. By moving to this framework, Ollama achieves a large speedup across all Apple Silicon devices. Testing conducted on March 29, 2026, showed that the Qwen3.5-35B-A3B model can reach a prefill performance of 1851 token/s and a decode performance of 134 token/s when running with int4 quantization in Ollama 0.19.

NVFP4 Support

Ollama now leverages NVIDIA’s NVFP4 format. This inclusion allows Ollama to:

Maintain high model accuracy while reducing memory bandwidth.
Lower storage requirements for intensive inference workloads.
Achieve production parity, allowing users to share the same results locally as they would in a scaled production environment.
Run models specifically optimized by NVIDIA’s model optimizer.

Improved Caching System

Efficiency is at the heart of the new Ollama update. The upgraded cache makes agentic tasks much smoother through:

Lower Memory Utilization: Cache is now reused across different conversations, leading to more cache hits when using shared system prompts.
Intelligent Checkpoints: Ollama stores snapshots of the cache at strategic locations in the prompt, reducing processing time.
Smarter Eviction: Shared prefixes are retained longer in the memory, even when older branches are dropped, ensuring faster responses for branching dialogues.

Use Case Scenarios

Coding Agents

With the release of Ollama 0.19, coding agents like Claude Code, OpenCode, Codex, and Pi see a massive boost. The improved caching and MLX acceleration allow these tools to process large codebases and respond to complex queries almost instantaneously.

Personal Assistants

Personal assistants like OpenClaw benefit from the reduced latency. The faster TTFT ensures that interacting with an AI assistant feels natural and fluid, making it a viable tool for daily productivity on macOS.

Production Model Testing

Because Ollama now supports NVFP4, developers can test models locally with the confidence that the performance and accuracy will match the scale of NVIDIA-optimized production environments.

How to Use Ollama 0.19

To get started with the preview release of Ollama powered by MLX, ensure you are using a Mac with at least 32GB of unified memory.

Download Ollama 0.19 from the official source.
To launch specific agents or models, use the following commands in your terminal:

For Claude Code: ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4
For OpenClaw: ollama launch openclaw --model qwen3.5:35b-a3b-coding-nvfp4
To chat directly with the model: ollama run qwen3.5:35b-a3b-coding-nvfp4

Note: This version is specifically tuned for the Qwen3.5-35B-A3B model with sampling parameters optimized for coding tasks.

FAQ

What hardware is required for Ollama powered by MLX?

You need an Apple Silicon Mac (M1, M2, M3, M4, or M5). For the preview release of the Qwen3.5-35B model, it is recommended to have more than 32GB of unified memory.

What is the benefit of NVFP4 support in Ollama?

NVFP4 reduces memory and storage requirements without sacrificing model quality. It allows Ollama users to run models that are optimized by NVIDIA's tools and ensures that local results match production environment outputs.

Will more models be supported in the future?

Yes. While this release focuses on the Qwen3.5 architecture, the team is actively working to expand supported architectures and will introduce easier ways to import custom fine-tuned models into Ollama.

How does the new caching system work?

Ollama now reuses its cache across conversations and stores "intelligent checkpoints" in the prompt. This means if you use a shared system prompt (common in tools like Claude Code), Ollama doesn't have to re-process the entire prompt every time, leading to faster responses.

Alternatives Tools

Cloud Computer by Manus

Manus Cloud Computer: A Persistent, Always-On Cloud Environment for 24/7 AI Automations and Scripts

Manus Cloud Computer is a dedicated cloud-based machine designed to run AI bots, Python scripts, and software 24/7 without requiring technical server management. Unlike temporary sandboxes, it offers a persistent environment where files, tools, and setups remain intact indefinitely. Users can host 24/7 Slack or Discord bots, manage live MySQL databases, and deploy open-source tools like WordPress or Home Assistant through simple plain-English prompts. By removing the need for AWS configuration or coding knowledge, Manus Cloud Computer allows anyone to build professional-grade, always-on projects. It operates on an Ubuntu OS and provides monitoring for CPU, memory, and storage, bridging the gap between local laptop limitations and professional cloud infrastructure.

Code & IT

Zed 1.0

Zed 1.0: The High-Performance AI-Native Code Editor Built with Rust and GPU-Powered Craftsmanship

Zed 1.0 is a revolutionary code editor built from the ground up for maximum performance and collaboration. Developed in Rust using a custom GPUI framework, the Zed editor leverages GPU hardware to deliver unmatched speed. As an AI-native tool, it integrates agents like Claude and Cursor directly into the development workflow. Supporting macOS, Windows, and Linux, Zed offers essential features like Git integration, SSH remoting, and a built-in debugger. Designed for character-level synchronization via DeltaDB, Zed enables seamless collaboration between humans and AI agents.

Code & IT

Mintlify Editor

Mintlify: The Intelligent AI-Native Knowledge Platform for World-Class Documentation

Mintlify is an AI-native documentation platform that enables teams to create, maintain, and scale world-class knowledge systems optimized for both human readers and artificial intelligence agents.

Code & IT

Plurai

Plurai: The First Vibe-Training Platform for High-Accuracy AI Evals and Guardrails

Plurai is a revolutionary vibe-training platform designed to build real-time, tailored evals and guardrails for AI agents. By utilizing purpose-built Small Language Models (SLMs) and optimized LLM evaluators, Plurai achieves a 43% failure rate reduction and an 8x cost reduction compared to GPT 5.2. With sub-100ms latency, intent calibration, and synthetic data generation, Plurai enables production-grade AI safety and performance across on-prem and VPC deployments.

Code & IT

Actian VectorAI DB

Actian VectorAI DB: High-Performance Vector Database for Edge and On-Premises AI Applications

VectorAI DB is a high-performance vector database designed for edge and on-premises deployments. It enables reliable Retrieval-Augmented Generation (RAG) and semantic search in disconnected or regulated environments, offering sub-15ms latency and 99% recall. Ideal for manufacturing, healthcare, and robotics, VectorAI DB ensures data stays within your control while providing enterprise-grade scalability from Raspberry Pi to high-end edge servers.

Code & IT

Lovable mobile app

Lovable: Build Apps With AI - The Ultimate AI App and Website Builder for iPhone and iPad

Lovable: Build Apps With AI is a revolutionary productivity tool that enables users to create full-stack applications and websites simply by describing them. No coding knowledge or technical co-founders are required to turn ideas into reality.

Code & IT

Social Fetch

Social Fetch: A Unified API for Real-Time Social Data from 20+ Platforms

Social Fetch is a developer-friendly API that simplifies gathering real-time data from TikTok, Instagram, YouTube, and more. It offers normalized profiles, posts, and metrics through a single endpoint with no maintenance required.

Code & IT

Logic

Logic: Build and Deploy Production-Ready AI Agents from Plain English in 60 Seconds

Logic is an advanced platform designed to transform plain English specifications into production-ready AI agents. It simplifies the AI development lifecycle by handling testing, versioning, deployment, and intelligent model routing without the need for complex frameworks or SDKs. Trusted by industry leaders, Logic offers SOC 2 and HIPAA-certified security for mission-critical workflows.

Code & IT

Loading related products...