Fix "unknown model architecture" for gemma4 and diffusion-gemma in llama.cpp

If you see one of these errors:

error loading model: unknown model architecture: 'gemma4'

error loading model: unknown model architecture: 'diffusion-gemma'

they look similar but have completely different causes. Do not apply the same fix to both.

Which error do you have?

Error string	What it means	Fix
`unknown model architecture: 'gemma4'`	Your runtime predates Gemma 4's release.	Update llama.cpp, Ollama, or your app.
`unknown model architecture: 'diffusion-gemma'`	DiffusionGemma is not in any released version of llama.cpp yet.	Build from PR #24423 or use a different runtime.

The key difference: gemma4 support exists in main-branch llama.cpp and in current Ollama/LM Studio releases. Updating solves it. diffusion-gemma support does not exist in any official release — it lives in an unmerged pull request (#24423). Updating to the latest official release will not fix it.

Fix for `unknown model architecture: 'gemma4'`

This error means your runtime is older than Gemma 4's April 2026 release date. The fix is to update.

llama.cpp

cd llama.cpp
git pull
cmake -B build
cmake --build build --config Release -j

# Verify the binary you're running is the new one
./build/bin/llama-cli --version

Then confirm your shell is using the right binary:

which llama-cli
llama-cli --version

If the version is old, your system is still finding an older installed copy. Call the binary from the build directory directly, or update the installed version.

Homebrew

brew update && brew upgrade llama.cpp
llama-cli --version

If the Homebrew package lags behind, build from source temporarily.

Ollama

ollama pull gemma4
ollama run gemma4

Ollama maintains a managed gemma4 model that it serves through its own runtime. Pulling through Ollama is usually easier than managing a custom GGUF if you just want to run standard Gemma 4.

LM Studio

Update LM Studio through its built-in updater. Current versions support the gemma4 architecture.

Fix for `unknown model architecture: 'diffusion-gemma'`

This error is different. DiffusionGemma support has not merged into llama.cpp main as of this writing. It exists only in PR #24423, which also introduces a separate dedicated binary called llama-diffusion-cli.

Updating llama.cpp to the latest official release will not fix this. You need to either:

Build from the PR #24423 branch, or
Use a different runtime that already supports DiffusionGemma

Option A: Build from PR #24423

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git fetch origin pull/24423/head:diffusion-gemma-pr
git checkout diffusion-gemma-pr

# For CPU only:
cmake -B build
cmake --build build --config Release -j

# For NVIDIA CUDA:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j

# The binary you need:
./build/bin/llama-diffusion-cli

Note: you must use llama-diffusion-cli, not llama-cli. Running llama-cli against a DiffusionGemma GGUF will still fail even from this branch.

Option B: Use a different runtime

Runtime	DiffusionGemma support
Unsloth Studio	Yes, since June 12, 2026 (v0.1.463-beta). Easiest option.
vLLM	Yes, since June 10, 2026. Best for serving.
HF Transformers	Yes, via official Google weights.
Ollama	No. Issue #16664 is open.
LM Studio	No. Bundled runtime does not include PR #24423. Bug #2037 is open.

Things that will not fix either error

Renaming the GGUF file. Architecture metadata is inside the file, not the filename.
Changing context length or sampling settings. The loader fails before inference begins.
Using a different prompt. Same reason.
Downloading from a different source but using the same runtime. If the runtime does not know the architecture, no GGUF will load.
Running through Ollama if you have the diffusion-gemma error. Ollama's bundled runtime has the same limitation.

How to confirm whether your file is valid

If you are not sure whether the problem is the runtime or the file itself:

# Test with latest main llama.cpp
./build/bin/llama-cli -m /path/to/your/model.gguf -p "Hello." -n 10

Result	Meaning
Loads successfully	Your runtime is up to date. If another app fails, that app's runtime is behind.
`unknown model architecture: 'gemma4'`	Update your runtime.
`unknown model architecture: 'diffusion-gemma'`	You need PR #24423 or a different runtime.
Other error (corrupt file, wrong format)	Your file may be incomplete or from an untrusted source.

Trusted GGUF sources for Gemma 4: ggml-org, Unsloth, bartowski, mradermacher.

FAQ

I updated to the latest llama.cpp and still get diffusion-gemma. Why?
Because the PR has not merged. Latest main does not include DiffusionGemma support. You need the PR branch specifically.

Is it safe to build from a PR branch?
For personal testing, yes. For production use, treat it as pre-release code that has not gone through the project's full review process.

Can I use the same GGUF for both llama-cli and llama-diffusion-cli?
No. They handle different architectures. A DiffusionGemma GGUF requires llama-diffusion-cli. A standard Gemma 4 GGUF uses the standard llama-cli.

My app says it uses llama.cpp but still gets the error. What do I do?
The app bundles its own llama.cpp version that may be weeks behind upstream. Check the app's release notes for DiffusionGemma or diffusion-gemma architecture support. Until the app updates its bundled runtime, you cannot use DiffusionGemma through that app.

Related guides:

Fix "unknown model architecture" for gemma4 and diffusion-gemma in llama.cpp

Which error do you have?

Fix for `unknown model architecture: 'gemma4'`

llama.cpp

Homebrew

Ollama

LM Studio

Fix for `unknown model architecture: 'diffusion-gemma'`

Option A: Build from PR #24423

Option B: Use a different runtime

Things that will not fix either error

How to confirm whether your file is valid

FAQ

Related guides

DiffusionGemma + llama.cpp: Yes, Here's How to Run It (2026)

Does llama.cpp Support Gemma 4? GGUF Status, Fixes, and What Works

Run Gemma 4 with llama.cpp: Complete GGUF Setup Guide (2026)

Still deciding what to read next?

Fix "unknown model architecture" for gemma4 and diffusion-gemma in llama.cpp

Which error do you have?

Fix for unknown model architecture: 'gemma4'

llama.cpp

Homebrew

Ollama

LM Studio

Fix for unknown model architecture: 'diffusion-gemma'

Option A: Build from PR #24423

Option B: Use a different runtime

Things that will not fix either error

How to confirm whether your file is valid

FAQ

Related guides

DiffusionGemma + llama.cpp: Yes, Here's How to Run It (2026)

Does llama.cpp Support Gemma 4? GGUF Status, Fixes, and What Works

Run Gemma 4 with llama.cpp: Complete GGUF Setup Guide (2026)

Still deciding what to read next?

Fix for `unknown model architecture: 'gemma4'`

Fix for `unknown model architecture: 'diffusion-gemma'`