April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Article URL: https://gist.github.com/greenstevester/fc49b4e60a4fef9effc79066c1033ae5 Comments URL: https://news.ycombinator.com/item?id=47624731 Points: 101 # Comments: 31

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini
April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini Photo: Hacker News

Check that it's using GPU acceleration:
Step 5: Configure Auto-Start on Login
5a.

Ollama App — Launch at Login
Alternatively, go to System Settings > General > Login Items and add Ollama.

5b.

Auto-Preload Gemma 4 on Startup
Create a launch agent that loads the model into memory after Ollama starts and keeps it warm:
This sends an empty prompt to ollama run every 5 minutes, keeping the model warm in memory.

5c.

Keep Models Loaded Indefinitely
By default, Ollama unloads models after 5 minutes of inactivity.

To keep them loaded forever:
Then restart Ollama for the change to take effect.

Note: This environment variable is session-scoped.

To persist across reboots, add export OLLAMA_KEEP_ALIVE="-1" to your ~/.zshrc , or set it via a dedicated launch agent.

Step 6: Verify Everything Works
Expected output from ollama ps :
Ollama exposes a local API at http://localhost:11434 .

Use it with coding agents:
What's New in Ollama v0.19+ (March 31, 2026)
On Apple Silicon, Ollama automatically uses Apple's MLX framework for faster inference — no manual configuration needed.

M5/M5 Pro/M5 Max chips get additional acceleration via GPU Neural Accelerators.

M4 and earlier still benefit from general MLX speedups.

Improved Caching for Coding and Agentic Tasks

Source: This article was originally published by Hacker News

Read Full Original Article →

Share this article

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

Maximum 2000 characters