Check that it's using GPU acceleration:
Step 5: Configure Auto-Start on Login
5a.
Ollama App — Launch at Login
Alternatively, go to System Settings > General > Login Items and add Ollama.
5b.
Auto-Preload Gemma 4 on Startup
Create a launch agent that loads the model into memory after Ollama starts and keeps it warm:
This sends an empty prompt to ollama run every 5 minutes, keeping the model warm in memory.
5c.
Keep Models Loaded Indefinitely
By default, Ollama unloads models after 5 minutes of inactivity.
To keep them loaded forever:
Then restart Ollama for the change to take effect.
Note: This environment variable is session-scoped.
To persist across reboots, add export OLLAMA_KEEP_ALIVE="-1" to your ~/.zshrc , or set it via a dedicated launch agent.
Step 6: Verify Everything Works
Expected output from ollama ps :
Ollama exposes a local API at http://localhost:11434 .
Use it with coding agents:
What's New in Ollama v0.19+ (March 31, 2026)
On Apple Silicon, Ollama automatically uses Apple's MLX framework for faster inference — no manual configuration needed.
M5/M5 Pro/M5 Max chips get additional acceleration via GPU Neural Accelerators.
M4 and earlier still benefit from general MLX speedups.
Improved Caching for Coding and Agentic Tasks
Related Stories
Source: This article was originally published by Hacker News
Read Full Original Article →
Comments (0)
No comments yet. Be the first to comment!
Leave a Comment