How to Run AI Models Locally on Windows, Mac, and Linux
Running a Large Language Model (LLM) like Llama 3 or Mistral on your own computer might sound like something only software engineers can do. It's not. If you can download an app and type a few words, you can run an AI locally.
This guide will take you from absolute zero to chatting with a powerful AI on your own hardware in under 10 minutes.
ollama run llama3.2:3b, wait for the model to download, then type your prompt. This runs the model on your own computer instead of sending every prompt to a cloud AI API.
llama3.2:3b, then moving to an 8B model if your machine has enough memory.
What You Need Before You Start
- RAM: 8GB can run very small models, but 16GB or more is recommended for a smoother experience.
- Disk space: keep at least 5GB free for your first small model. Larger models can use 10GB to 40GB+.
- Internet: required only for downloading the model the first time.
- GPU: optional for small models, but strongly recommended for faster generation. NVIDIA users should install recent GPU drivers.
Method 1: The Easiest Way (Using Ollama)
Ollama is widely considered the best tool for beginners. It runs in the background and lets you download and chat with models using a single command. There's no complex configuration required.
Jump to your operating system below, or scroll through all steps. Keeping each operating system visible also makes this guide easier for search engines and AI assistants to understand.
Windows: Run a Local AI Model with Ollama
Download the Ollama Installer
Open your web browser and navigate to the official Ollama download page: ollama.com/download.
Click the large Windows button. This will download a file named OllamaSetup.exe to your Downloads folder.
Install Ollama
Locate OllamaSetup.exe in your Downloads folder and double-click it. Follow the standard Windows installation prompts (just click "Install" or "Next").
Once installed, Ollama will automatically start running in the background. You might see a small llama icon appear in your Windows system tray (bottom right corner, near the clock).
Open the Command Prompt
Ollama doesn't have a traditional graphical window; you interact with it through the Command Prompt.
- Click the Start Menu (Windows icon) in the bottom left.
- Type the letters
cmd. - You will see "Command Prompt" appear in the search results. Click it or press Enter.
A black window with white text will appear. This is normal!
Download and Run Your First AI Model
In the black Command Prompt window, type the following exact command and press Enter:
ollama run llama3.2:3b
What happens now?
- Ollama will connect to the internet and begin downloading the "Llama 3.2 3B" model.
- You will see a progress bar. The file is much smaller than most 8B models, so it is a safer first choice for laptops and older PCs.
Start Chatting!
Once the download is complete, the prompt will change to show three arrows: >>>. This means the AI is loaded into your computer's memory and is ready to talk.
Type a question and press Enter. For example:
>>> Write a short poem about a robot learning to paint.
The AI will generate the response right before your eyes. Congratulations! You are now running AI locally.
To exit the chat: Type /bye and press Enter. The model will unload from memory.
macOS: Run a Local AI Model with Ollama
Download the Ollama App
Open Safari or Chrome and go to the official Ollama download page: ollama.com/download.
Click the macOS button. This will download a file named Ollama-darwin.zip to your Downloads folder.
Install and Start Ollama
- Double-click the
Ollama-darwin.zipfile to extract it. You will see an application icon named Ollama. - Drag the Ollama application into your Applications folder.
- Double-click the Ollama app in your Applications folder to run it.
- A prompt may appear asking if you want to open an app downloaded from the internet. Click Open.
- Ollama will ask for permission to install its command-line tool. Follow the prompts and enter your Mac password if requested.
You will now see a small llama icon in your Mac's top menu bar (near the WiFi icon). Ollama is running in the background.
Open the Terminal
You interact with Ollama using the Mac Terminal.
- Press Command (⌘) + Spacebar to open Spotlight Search.
- Type
Terminaland press Return.
A white or black window with text will open. This is your command line.
Download and Run Your First AI Model
In the Terminal window, type the following exact command and press Return:
ollama run llama3.2:3b
What happens now?
- Ollama will connect to the internet and begin downloading the "Llama 3.2 3B" model.
- You will see a progress bar. This model is a safe first test for most Apple Silicon Macs and many Intel Macs.
Start Chatting!
Once the download is complete, the prompt will change to show three arrows: >>>. This means the AI is loaded into your Mac's Unified Memory and is ready to talk.
Type a question and press Return. For example:
>>> Explain quantum computing to a 5-year-old.
The AI will generate the response right before your eyes. Because Macs have Unified Memory, performance is usually excellent! Congratulations! You are now running AI locally.
To exit the chat: Type /bye and press Return. The model will unload from memory.
Linux: Run a Local AI Model with Ollama
Open Your Terminal
Press Ctrl + Alt + T (on most distributions) to open your terminal.
Install Ollama using the Official Script
Ollama provides a single command that downloads the software, installs it, and configures it to run as a background service. It also automatically detects if you have NVIDIA or AMD GPU drivers installed.
Paste the following command into your terminal and press Enter:
curl -fsSL https://ollama.com/install.sh | sh
You may be prompted to enter your sudo password.
Download and Run Your First AI Model
Once the installation finishes, you can immediately run a model. Type the following command and press Enter:
ollama run llama3.2:3b
What happens now?
- Ollama will connect to the internet and begin downloading the "Llama 3.2 3B" model.
- You will see a progress bar. This is a good first model before trying larger 7B, 8B, or 13B models.
Start Chatting!
Once the download is complete, the prompt will change to show three arrows: >>>. This means the AI is loaded into your computer's memory and is ready to talk.
Type a question and press Enter. For example:
>>> Write a bash script to backup my documents folder.
The AI will generate the response right before your eyes. Congratulations! You are now running AI locally.
To exit the chat: Type /bye and press Enter. The model will unload from memory.
How to Confirm the Model Is Running Locally
After the first download finishes, your prompts are handled by the local Ollama runtime on your own machine. Here are practical ways to confirm that the model is installed and running locally.
List the models installed on your computer
Open Command Prompt or Terminal and run:
ollama list
You should see llama3.2:3b in the output. This means the model files are stored on your computer.
Check which model is currently running
While the chat is active, open a second Command Prompt or Terminal window and run:
ollama ps
This shows the currently loaded model and how much memory it is using.
Watch your computer's resource usage
On Windows, open Task Manager. On macOS, open Activity Monitor. On Linux, use System Monitor or tools like htop and nvidia-smi. When the model is generating text, you should see CPU, GPU, RAM, or VRAM usage increase locally.
Ollama Commands Cheat Sheet
These are the most useful commands once your first local AI model is working.
ollama run llama3.2:3b
Download and start a small beginner-friendly model.
ollama list
Show models already downloaded to your computer.
ollama ps
Show models currently loaded in memory.
ollama stop llama3.2:3b
Stop a running model and free memory.
ollama rm llama3.2:3b
Delete a model from disk if you no longer need it.
ollama run mistral
Try another popular local model after your first test works.
Method 2: Using a Graphical Interface (LM Studio)
If you absolutely hate the command line and want an interface that looks exactly like ChatGPT, LM Studio is the best choice. It is slightly more complex to set up than Ollama, but gives you more control over model files.
Download LM Studio
Go to lmstudio.ai and click the download button for your operating system (Windows, Mac, or Linux). Install the application normally.
Search for a Model
Open LM Studio. At the top of the home screen, you will see a search bar. Type the name of a model, for example, Mistral Instruct, and press Enter.
Choose the Right File (Crucial Step!)
LM Studio searches Hugging Face and will present you with a list of files on the right side. These are different quantizations (compressions) of the model.
- Look at the Size column.
- Use our compatibility checker to ensure the file size is smaller than your GPU's VRAM.
- As a general rule, look for files ending in
Q4_K_M.gguf. This offers the best balance of small size and high intelligence. - Click the Download button next to that specific file.
Load the Model and Chat
Once the download is complete:
- Click the Chat icon (a speech bubble) on the far left sidebar.
- At the top center of the screen, click the dropdown menu that says "Select a model to load".
- Select the model you just downloaded. Wait a few seconds for it to load into your RAM/VRAM.
- Type your message in the chat box at the bottom and press Enter!
Troubleshooting & Next Steps
The AI is generating text incredibly slowly (1 word per second or less)
This means the model is too big for your GPU's VRAM, and your computer is forced to use standard System RAM, which is much slower. Solution: Use our homepage tool to find a smaller model, or download a more heavily compressed version (e.g., Q3 instead of Q4).
My computer crashed or froze when loading the model
You ran out of System RAM entirely. Solution: Restart your computer, close all other applications (especially web browsers with many tabs), and try a significantly smaller model.
How do I try other models in Ollama?
You can find a list of available models at ollama.com/library. To run them, just use the command ollama run [model-name]. For example: ollama run gemma:7b or ollama run mistral.
What is the easiest way to run an AI model locally?
The easiest way for most beginners is to install Ollama, open Command Prompt or Terminal, and run ollama run llama3.2:3b. Ollama downloads the model, starts it, and opens a chat prompt in one step.
Do I need a GPU to run local AI models?
You do not strictly need a GPU for small models, but a GPU makes generation much faster. If your machine has limited RAM or VRAM, start with a smaller model like llama3.2:3b before trying 7B, 8B, or larger models.
How do I know if the AI model is running locally?
Use ollama list to confirm the model is downloaded on your computer, and ollama ps to see which model is currently loaded. You can also watch local CPU, GPU, RAM, or VRAM activity while the model generates text.
Ready to explore more models?
Return to Compatibility Checker