Gemma 4 Guides
Fix "unknown model architecture" for gemma4 and diffusion-gemma in llama.cpp

If you see one of these errors:
error loading model: unknown model architecture: 'gemma4'
error loading model: unknown model architecture: 'diffusion-gemma'
they look similar but have completely different causes. Do not apply the same fix to both.
Which error do you have?
| Error string | What it means | Fix |
|---|---|---|
unknown model architecture: 'gemma4' |
Your runtime predates Gemma 4's release. | Update llama.cpp, Ollama, or your app. |
unknown model architecture: 'diffusion-gemma' |
DiffusionGemma is not in any released version of llama.cpp yet. | Build from PR #24423 or use a different runtime. |
The key difference: gemma4 support exists in main-branch llama.cpp and in current Ollama/LM Studio releases. Updating solves it. diffusion-gemma support does not exist in any official release — it lives in an unmerged pull request (#24423). Updating to the latest official release will not fix it.
Fix for unknown model architecture: 'gemma4'
This error means your runtime is older than Gemma 4's April 2026 release date. The fix is to update.
llama.cpp
cd llama.cpp
git pull
cmake -B build
cmake --build build --config Release -j
# Verify the binary you're running is the new one
./build/bin/llama-cli --version
Then confirm your shell is using the right binary:
which llama-cli
llama-cli --version
If the version is old, your system is still finding an older installed copy. Call the binary from the build directory directly, or update the installed version.
Homebrew
brew update && brew upgrade llama.cpp
llama-cli --version
If the Homebrew package lags behind, build from source temporarily.
Ollama
ollama pull gemma4
ollama run gemma4
Ollama maintains a managed gemma4 model that it serves through its own runtime. Pulling through Ollama is usually easier than managing a custom GGUF if you just want to run standard Gemma 4.
LM Studio
Update LM Studio through its built-in updater. Current versions support the gemma4 architecture.
Fix for unknown model architecture: 'diffusion-gemma'
This error is different. DiffusionGemma support has not merged into llama.cpp main as of this writing. It exists only in PR #24423, which also introduces a separate dedicated binary called llama-diffusion-cli.
Updating llama.cpp to the latest official release will not fix this. You need to either:
- Build from the PR #24423 branch, or
- Use a different runtime that already supports DiffusionGemma
Option A: Build from PR #24423
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git fetch origin pull/24423/head:diffusion-gemma-pr
git checkout diffusion-gemma-pr
# For CPU only:
cmake -B build
cmake --build build --config Release -j
# For NVIDIA CUDA:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j
# The binary you need:
./build/bin/llama-diffusion-cli
Note: you must use llama-diffusion-cli, not llama-cli. Running llama-cli against a DiffusionGemma GGUF will still fail even from this branch.
Option B: Use a different runtime
| Runtime | DiffusionGemma support |
|---|---|
| Unsloth Studio | Yes, since June 12, 2026 (v0.1.463-beta). Easiest option. |
| vLLM | Yes, since June 10, 2026. Best for serving. |
| HF Transformers | Yes, via official Google weights. |
| Ollama | No. Issue #16664 is open. |
| LM Studio | No. Bundled runtime does not include PR #24423. Bug #2037 is open. |
Things that will not fix either error
- Renaming the GGUF file. Architecture metadata is inside the file, not the filename.
- Changing context length or sampling settings. The loader fails before inference begins.
- Using a different prompt. Same reason.
- Downloading from a different source but using the same runtime. If the runtime does not know the architecture, no GGUF will load.
- Running through Ollama if you have the
diffusion-gemmaerror. Ollama's bundled runtime has the same limitation.
How to confirm whether your file is valid
If you are not sure whether the problem is the runtime or the file itself:
# Test with latest main llama.cpp
./build/bin/llama-cli -m /path/to/your/model.gguf -p "Hello." -n 10
| Result | Meaning |
|---|---|
| Loads successfully | Your runtime is up to date. If another app fails, that app's runtime is behind. |
unknown model architecture: 'gemma4' |
Update your runtime. |
unknown model architecture: 'diffusion-gemma' |
You need PR #24423 or a different runtime. |
| Other error (corrupt file, wrong format) | Your file may be incomplete or from an untrusted source. |
Trusted GGUF sources for Gemma 4: ggml-org, Unsloth, bartowski, mradermacher.
FAQ
I updated to the latest llama.cpp and still get diffusion-gemma. Why?
Because the PR has not merged. Latest main does not include DiffusionGemma support. You need the PR branch specifically.
Is it safe to build from a PR branch?
For personal testing, yes. For production use, treat it as pre-release code that has not gone through the project's full review process.
Can I use the same GGUF for both llama-cli and llama-diffusion-cli?
No. They handle different architectures. A DiffusionGemma GGUF requires llama-diffusion-cli. A standard Gemma 4 GGUF uses the standard llama-cli.
My app says it uses llama.cpp but still gets the error. What do I do?
The app bundles its own llama.cpp version that may be weeks behind upstream. Check the app's release notes for DiffusionGemma or diffusion-gemma architecture support. Until the app updates its bundled runtime, you cannot use DiffusionGemma through that app.
Related guides:
Related guides
Continue through the Gemma 4 cluster with the next guide that matches your current decision.

Does DiffusionGemma Work with llama.cpp? The Actual Status
Standard llama.cpp cannot run DiffusionGemma. Support lives in PR #24423, which ships a separate llama-diffusion-cli binary. Here is what actually works right now.

Does llama.cpp Support Gemma 4? GGUF Status, Fixes, and What Works
A practical answer to whether llama.cpp supports Gemma 4, with the official GGUF links, current support status, and what 'supported' really means.

How to Run Gemma 4 with llama.cpp: GGUF Setup, Hardware & Quantization Guide
Everything you need to get Gemma 4 running locally with llama.cpp: hardware tables, copy-paste build commands, quantization guide, and multimodal setup.
Still deciding what to read next?
Go back to the guide hub to browse model comparisons, setup walkthroughs, and hardware planning pages.
