Open-weights & local models
What they are, how to run one, and when it’s worth it
Most of this course uses frontier closed models (Claude, GPT, Gemini) through a harness. But you should know the alternative: open-weights models you can download and run yourself. This page is a short orientation — enough to make an informed choice, not a deployment guide.
Closed vs open-weights — the distinction
| Closed / frontier | Open-weights | |
|---|---|---|
| Examples | Claude Opus, GPT-5, Gemini 3 | Llama, Qwen, Mistral, DeepSeek, Kimi |
| How you use it | API or app, you rent it per token | Download the weights, run on your own (or rented) hardware |
| Capability | State of the art | Roughly SOTA-minus-(4–9 months) |
| Cost | Per-token, ongoing | Hardware/electricity, or cheap hosted inference |
| Privacy | Data leaves your machine (zero-retention only on paid/enterprise tiers) | Data can stay fully local |
| Reproducibility | Model can change under you | Pin a version, it never changes |
“Open-weights” is not the same as “open-source.” Most of these models release the trained weights (so you can run and fine-tune them) but not the training data or full recipe. Truly open-source-everything models exist but are rarer.
Why a data analyst might care
- Privacy / compliance — sensitive data (health, HR, confidential firm data) that legally or contractually cannot be sent to a third-party API. A local model keeps everything on your machine.
- Cost at scale — if you’re classifying hundreds of thousands of texts, hosted open-weights inference can be far cheaper per token than frontier APIs.
- Reproducibility — a pinned local model gives you a frozen, citable artifact that won’t drift between when you run your analysis and when a referee re-runs it.
- Offline / air-gapped work.
The trade-off is capability and convenience: you do more setup, get somewhat weaker reasoning, and own the ops.
How to actually run one
The easiest on-ramp for a laptop:
- Ollama — one install, then
ollama run llama3.2(orqwen2.5,mistral, etc.). Pulls the weights and gives you a local chat + a local API endpoint. - LM Studio — a GUI for the same thing, with a model browser.
- Hosted inference — if your laptop is too small, services like OpenRouter or Together let you call open-weights models through an API without owning a GPU.
A 7–8B model runs on a modern laptop; 70B+ models need a serious GPU or hosted inference.
The key payoff for this course: open-weights models speak the same API shape as the closed ones. You can point a harness or an API script at a local Ollama endpoint and reuse almost all of your code — only the base URL and model name change.
When to use what (rule of thumb)
- Default to a frontier model for everyday analysis, reasoning-heavy work, and anything where capability matters more than cost or privacy.
- Reach for open-weights when privacy is non-negotiable, when you’re running the same prompt at massive scale, or when you need a frozen, reproducible model.
See also
- Which AI model — the broader model landscape and recommendations.
- Open-code tools — open-source harnesses you can point at these models.
- Glossary of LLM terms