Gemma 4 Guides
Does DiffusionGemma Work in LM Studio? Current Status (June 2026)

No, DiffusionGemma does not work in LM Studio right now. This is not a configuration issue or a file problem. LM Studio's bundled runtimes — both the llama.cpp engine and the MLX engine for Apple Silicon — do not support the diffusion-gemma architecture. Two confirmed bug reports are open on GitHub tracking this.
What actually happens when you try
On Apple Silicon (MLX path)
When you try to load DiffusionGemma through LM Studio's MLX engine (version 1.8.5), you get:
Failed to load model.
Error when loading model: ValueError: Model type diffusion_gemma not supported.
Error: No module named 'mlx_vlm.speculative.drafters.diffusion_gemma'
This is because LM Studio bundles mlx-vlm 0.4.5 (an April 2026 dev build). DiffusionGemma requires mlx-vlm 0.6.3 or later. You cannot fix this by updating LM Studio's engines from within the app — the bundled library version is what it is until LM Studio ships an update.
Tracked in: lmstudio-bug-tracker #2037
On Windows / Linux (llama.cpp path)
LM Studio's llama.cpp engine (Metal llama.cpp v2.21.0 or similar) will fail with:
error loading model: unknown model architecture: 'diffusion-gemma'
This is because DiffusionGemma support in llama.cpp lives in PR #24423, which is unmerged. LM Studio bundles a released version of llama.cpp, so it does not include the PR.
Tracked in: lmstudio-ai/lms #583
When will LM Studio support DiffusionGemma?
LM Studio support depends on two things merging upstream:
- PR #24423 merging into llama.cpp main (for the llama.cpp path)
- mlx-vlm 0.6.3+ being bundled (for the Apple MLX path)
Neither has happened yet. LM Studio will need to ship a new release after both happen. Realistically this is weeks away, not days.
What actually works right now
| Runtime | DiffusionGemma support | Notes |
|---|---|---|
| Unsloth Studio | Yes | Easiest local path. Works on macOS/Windows/Linux. Supported since June 12, 2026 (v0.1.463-beta). |
| vLLM | Yes | Best for serving. Native support since June 10, 2026. Requires Linux + NVIDIA GPU. |
| HF Transformers | Yes | Python-only. Official Google weights at google/diffusiongemma-26B-A4B-it. |
| llama.cpp (PR #24423 branch) | Yes | CLI only. Must build from the PR branch. Uses llama-diffusion-cli, not llama-cli. |
| LM Studio | No | Both MLX and llama.cpp engines fail. |
| Ollama | No | Issue #16664 open. |
Recommended path by what you are trying to do
You want a desktop GUI: Unsloth Studio is currently the only working local GUI. Install it and search for DiffusionGemma in the model browser.
You are on Apple Silicon: Unsloth Studio supports macOS. The MLX path in LM Studio does not work yet.
You are comfortable with the command line: Build llama.cpp from PR #24423 and use llama-diffusion-cli. You get the most control over diffusion step count and other parameters.
You are running Python and want quick experimentation: Use Hugging Face Transformers with the official google/diffusiongemma-26B-A4B-it weights.
You need to serve DiffusionGemma to multiple users: vLLM has native support and published benchmarks.
You use Ollama: Wait. No workaround exists without building custom binaries.
Before you commit to DiffusionGemma: what to know
DiffusionGemma has real speed advantages in the right environment. On NVIDIA RTX 3090/4090 and higher-end cards, generation can be several times faster than standard autoregressive Gemma 4 at low concurrency. On lower-end NVIDIA cards (3060, 4060) and on Apple Silicon, the speed advantage may not appear at all. The model shifts from memory-bandwidth-bound inference (where Apple Silicon excels) to compute-bound inference (where high-end discrete NVIDIA GPUs excel).
More importantly: Google explicitly states that DiffusionGemma's output quality is lower than standard Gemma 4. This is not a temporary limitation. The speed-quality tradeoff is the fundamental characteristic of the diffusion approach. If you need maximum quality, standard Gemma 4 is the right model.
DiffusionGemma is best suited for:
- Code infilling (filling in the middle of existing code)
- Inline editing where you provide before/after context
- Interactive local applications where latency matters and you can accept some quality reduction
It is less suited for:
- Tasks that require maximum factual accuracy
- Complex multi-step reasoning where precision accumulates
- Any use case where you would compare outputs critically to standard Gemma 4
FAQ
Will updating LM Studio fix this?
Not until LM Studio ships a release that bundles either mlx-vlm 0.6.3+ (for Apple) or a llama.cpp build incorporating PR #24423 (for others). No current release does this.
Can I point LM Studio at a custom runtime?
LM Studio does not currently support swapping in a custom llama.cpp binary. The bundled runtime is what you get.
Is standard Gemma 4 in LM Studio working?
Yes. The gemma4 architecture is supported in current LM Studio releases. The limitation is specific to diffusion-gemma.
How long will this take to resolve?
Hard to predict. It depends on PR #24423 merging into llama.cpp, LM Studio shipping an update with the new llama.cpp version, and the MLX team releasing and LM Studio bundling a newer mlx-vlm. Best estimate is weeks rather than days.
Related guides:
Related guides
Continue through the Gemma 4 cluster with the next guide that matches your current decision.

Does DiffusionGemma Work with llama.cpp? The Actual Status
Standard llama.cpp cannot run DiffusionGemma. Support lives in PR #24423, which ships a separate llama-diffusion-cli binary. Here is what actually works right now.

Fix "unknown model architecture" for gemma4 and diffusion-gemma in llama.cpp
The gemma4 and diffusion-gemma architecture errors have different causes and different fixes. Treating them the same way will waste your time.

Does llama.cpp Support Gemma 4? GGUF Status, Fixes, and What Works
A practical answer to whether llama.cpp supports Gemma 4, with the official GGUF links, current support status, and what 'supported' really means.
Still deciding what to read next?
Go back to the guide hub to browse model comparisons, setup walkthroughs, and hardware planning pages.
