Running GPT-OSS Locally on macOS with Ollama + Open WebUI

Wouldn’t it be nice to have a ChatGPT-grade assistant running locally on your laptop? That’s now possible with OpenAI’s first open-weight models, GPT-OSS. The 20B version is small enough to run on a modern consumer laptop.

My MacBook Pro has an M2 Max with 32 GB of unified memory, by far not the latest and greatest, but the experience is… nice, running at around 33 output tokens per second. It’s not going to outrun GPT-5 in the cloud, and it’s not multi-modal, but when you just need text-based conversations, it’s a powerful model. And the offline factor is a real advantage when you’re disconnected or want to keep everything on-device.

Back in February 2024, I wrote about running Llama 2 locally on my Mac. Fast-forward ~18 months and a lot has changed, not just model performance (it’s now far superior on the same hardware), but also the tooling. This time I chose Ollama for its active community and ecosystem of add-ons. In this post, I’ll walk you through all the steps to get GPT-OSS up and running on a Mac (as long as you have at least 16 GB of RAM), complete with a nice web UI via Open WebUI.

I recorded the process as I went, so this guide is intentionally more verbose than OpenAI’s official Cookbook. (-> to be explicit about every step you’ll need to take)

Install Ollama

Ollama is the runtime that will download, run, and manage your local models. You can install it in two ways:

Option A — macOS installer

Download from https://ollama.com/download
Run the installer — this gives you both the Ollama app and the ollama CLI.
To upgrade: simply download the latest installer and run it again.

Option B — Homebrew (my recommendation)

If you’re comfortable with Terminal, Homebrew makes installation and upgrades much quicker:

brew install ollama  # installs Ollama
ollama --version     # check your installed version
brew upgrade ollama  # upgrade Ollama to the latest version

Starting and stopping Ollama

With the Homebrew installation, I run Ollama as a background service so it’s always ready:

brew services start ollama   # starts in the background, auto-starts after reboot
brew services restart ollama # restarts after upgrading
brew services stop ollama    # stops completely

If you’d rather start it manually, run:

ollama serve

This will keep it running in that terminal until you press Ctrl+C.

Managing models

Ollama has its own model library, and GPT-OSS lives here: https://ollama.com/library/gpt-oss.

You can manage the models on your Mac with the following commands (of course, you can replace gpt-oss with other model names from the library):

ollama list           # see which models are installed
ollama pull gpt-oss   # download GPT-OSS
ollama rm gpt-oss     # delete GPT-OSS

Testing GPT-OSS from the command line

Testing as early as possible is important. Before adding the UI, let’s confirm everything is working correctly from the CLI:

ollama run gpt-oss

This will start a chat-like session with the model. Try out prompts like:

List the planets of the solar system
Reverse the list

Once you’re done, you can exit the chat with the command /exit.

You can also do a one-off prompt without entering chat mode:

ollama run gpt-oss "What is the meaning of life in one sentence?"

Installing Open WebUI

To make the whole experience more user-friendly, let’s add a web UI. I use Open WebUI, which provides a clean interface similar to OpenAI, Anthropic, or Google, including chat history and multiple chat sessions.

If you read through the documentation, they mention a Docker setup first. However, the method below (listed as “recommended” later in their docs) is easier.

First, install uv, a modern Python runner:

brew install uv

Then start Open WebUI with:

DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve

Let’s break it down:

DATA_DIR=~/.open-webui: This sets an environment variable for the command that follows. It tells Open WebUI where to store its chat history and configuration. This path will persist your data between runs.
uvx: Use UVX (a modern Python environment manager) to run Open WebUI in an isolated environment.
--python 3.11: Specifies the Python version to use. The Open WebUI team recommends Python 3.11.
open-webui@latest serve: Installs (if necessary) and starts the latest version of Open WebUI.

Going Offline

For full offline capability, you need to pre-install a specific (pinned) version of Open WebUI once while online. This ensures you can run it later without any network connection. (Get the latest version number from Pypi.)

uv tool install --python 3.11 open-webui==0.6.18

Afterwards, you can launch this version offline with:

DATA_DIR=~/.open-webui ~/.local/bin/open-webui serve

Running Open WebUI

Loading and setting up the virtual environment takes around 30 seconds (longer the very first time you run it). Once it’s ready, open:

http://localhost:8080

Tip: Ollama must be running first (brew services start ollama). You can check if it’s active by visiting http://127.0.0.1:11434 — if the service responds, you’re good.

From here, the UI should feel familiar if you’ve used ChatGPT. Start a new chat, select GPT-OSS, and you’re ready to go.

When you’re finished, press Ctrl+C in the terminal where you started Open WebUI.

Note: Leaving Ollama running as a background service is fine — it uses almost no CPU or battery when idle.

Conclusion

Yes, took a few steps, but the whole setup can be done in 30–60 minutes, even if you pause to test things along the way. In return, you get your own personal ChatGPT-style assistant running entirely on your Mac: Now you are ready for curiosity projects, privacy-sensitive work, or building your next idea without sending data to the cloud.

Reuse

CC BY 4.0