28.3.2026

Running an AI Model Locally

Using AI for coding has proven quite helpful. I must admit, that I am using it for writing simple utility functions and for autocompletion. Although this is just the beginning, it helps a lot and I would not like to miss it.

But what happens, when prices for AI usage rise? This is a plausible scenario when the market consolidates.

Wouldn’t it be neat to run a model locally? That’s what I thought and tried. I have a powerful home computer with 32 GB or RAM and a mighty NVIDIA graphics card. I am using VS Code with the GitHub Copilot Chat extension.

Using Ollama with VS Code

Download and Install ollama
Download a suitable model, e.g. qwen3-coder
Launch the VS Code integration

The results were disappointing. Completing a simple task like add documentation takes minutes and uses my CPU heavily. The output is sometimes totally misleading and not usable. So for the time being, I must rely on the cloud models, that run way more efficient and quick.

Running a chat bot locally

What works quite good is to run a general-purpose model and ask it questions, the well-known basic use case. This is also possible with ollama. The big advantage is that your data stays on your machine and is not sent to servers somewhere in the world.

ollama run llama3.1:8b
>>> Who was prime minister in UK in 1992?
The Prime Minister of the United Kingdom in 1992 was John Major. He served as Prime Minister from November
28, 1990 to May 27, 1997. In 1992, he led the Conservative Party to victory in the general election held
on April 9, 1992.

ai coding llm programming