Development AI LLM [LLM] Running Qwen3:8B with Ollama on Windows 11

Overview

A complete guide to installing Ollama and running the Qwen3:8B model locally on Windows 11.

Steps

1. What is Ollama

Ollama is a tool that lets you run LLMs locally. You can use conversational AI directly from the terminal without cloud APIs, and it automatically leverages NVIDIA GPU acceleration when available.

The advantages of running LLMs locally are as follows.

Works without an internet connection
Data never leaves your machine
No API costs
Provides a REST API for integration with other apps

2. System Requirements

Ollama works with CPU alone, but it’s much faster with a GPU.

Component	Minimum	Recommended
OS	Windows 10 or later	Windows 11
RAM	8GB	16GB or more
GPU	Not required	NVIDIA (8GB+ VRAM)
Disk	10GB free space	SSD recommended

The environment used in this post is as follows.

CPU: AMD Ryzen 5 7500F
GPU: NVIDIA RTX 4070 (12GB VRAM)
RAM: 32GB
OS: Windows 11 64-bit

3. Installing Ollama

Download the Windows installer from ollama.com/download. Run OllamaSetup.exe and the installation completes without any additional configuration.

After installation, verify the version in PowerShell.

ollama --version

Ollama version check

If you see output like ollama version is 0.16.3, the installation was successful.

4. Running Qwen3:8B

Qwen3:8B is an 8-billion parameter model released by Alibaba Cloud. It supports multiple languages including Korean and delivers solid performance for its size.

A single command downloads the model and starts an interactive chat session.

ollama run qwen3:8b

On first run, the model will be downloaded (approximately 5GB). Once the download is complete, you can start chatting immediately.

Qwen3:8B running

Qwen3 has thinking mode enabled by default, so it goes through a Thinking... process before responding. Type /bye to exit the conversation.

5. Verifying GPU Usage

Ollama automatically detects and uses NVIDIA GPUs. You can check VRAM usage with the nvidia-smi command.

nvidia-smi

nvidia-smi output

The Qwen3:8B model uses approximately 6GB of VRAM. On an RTX 4070 with 12GB, there’s plenty of headroom.

6. Basic Usage

6.1. Interactive Chat

Enter interactive mode with the ollama run command.

ollama run qwen3:8b

Commands available during a conversation are as follows.

/bye — Exit the conversation
/clear — Clear conversation history
/set parameter temperature 0.7 — Change parameters

6.2. REST API

Ollama automatically starts an API server on localhost:11434 upon installation. You can call it directly from other apps or scripts.

$body = '{"model":"qwen3:8b","prompt":"Explain how to set environment variables on Windows.","stream":false}'
Invoke-RestMethod -Uri http://localhost:11434/api/generate -Method Post -ContentType "application/json" -Body $body

REST API response

7. Useful Commands

Command	Description
`ollama list`	List installed models
`ollama pull qwen3:8b`	Download a model (without running)
`ollama rm qwen3:8b`	Delete a model
`ollama show qwen3:8b`	Show model details
`ollama ps`	Show running models
`ollama stop qwen3:8b`	Stop a model

References

Share on

X Facebook LinkedIn

[LLM] Running Qwen3:8B with Ollama on Windows 11

binaryloader