Does DiffusionGemma Work in LM Studio? Current Status (June 2026)

No, DiffusionGemma does not work in LM Studio right now. This is not a configuration issue or a file problem. LM Studio's bundled runtimes — both the llama.cpp engine and the MLX engine for Apple Silicon — do not support the diffusion-gemma architecture. Two confirmed bug reports are open on GitHub tracking this.

What actually happens when you try

On Apple Silicon (MLX path)

When you try to load DiffusionGemma through LM Studio's MLX engine (version 1.8.5), you get:

Failed to load model.
Error when loading model: ValueError: Model type diffusion_gemma not supported.
Error: No module named 'mlx_vlm.speculative.drafters.diffusion_gemma'

This is because LM Studio bundles mlx-vlm 0.4.5 (an April 2026 dev build). DiffusionGemma requires mlx-vlm 0.6.3 or later. You cannot fix this by updating LM Studio's engines from within the app — the bundled library version is what it is until LM Studio ships an update.

Tracked in: lmstudio-bug-tracker #2037

On Windows / Linux (llama.cpp path)

LM Studio's llama.cpp engine (Metal llama.cpp v2.21.0 or similar) will fail with:

error loading model: unknown model architecture: 'diffusion-gemma'

This is because DiffusionGemma support in llama.cpp lives in PR #24423, which is unmerged. LM Studio bundles a released version of llama.cpp, so it does not include the PR.

Tracked in: lmstudio-ai/lms #583

When will LM Studio support DiffusionGemma?

LM Studio support depends on two things merging upstream:

PR #24423 merging into llama.cpp main (for the llama.cpp path)
mlx-vlm 0.6.3+ being bundled (for the Apple MLX path)

Neither has happened yet. LM Studio will need to ship a new release after both happen. Realistically this is weeks away, not days.

What actually works right now

Runtime	DiffusionGemma support	Notes
Unsloth Studio	Yes	Easiest local path. Works on macOS/Windows/Linux. Supported since June 12, 2026 (v0.1.463-beta).
vLLM	Yes	Best for serving. Native support since June 10, 2026. Requires Linux + NVIDIA GPU.
HF Transformers	Yes	Python-only. Official Google weights at `google/diffusiongemma-26B-A4B-it`.
llama.cpp (PR #24423 branch)	Yes	CLI only. Must build from the PR branch. Uses `llama-diffusion-cli`, not `llama-cli`.
LM Studio	No	Both MLX and llama.cpp engines fail.
Ollama	No	Issue #16664 open.

Recommended path by what you are trying to do

You want a desktop GUI: Unsloth Studio is currently the only working local GUI. Install it and search for DiffusionGemma in the model browser.

You are on Apple Silicon: Unsloth Studio supports macOS. The MLX path in LM Studio does not work yet.

You are comfortable with the command line: Build llama.cpp from PR #24423 and use llama-diffusion-cli. You get the most control over diffusion step count and other parameters.

You are running Python and want quick experimentation: Use Hugging Face Transformers with the official google/diffusiongemma-26B-A4B-it weights.

You need to serve DiffusionGemma to multiple users: vLLM has native support and published benchmarks.

You use Ollama: Wait. No workaround exists without building custom binaries.

Before you commit to DiffusionGemma: what to know

DiffusionGemma has real speed advantages in the right environment. On NVIDIA RTX 3090/4090 and higher-end cards, generation can be several times faster than standard autoregressive Gemma 4 at low concurrency. On lower-end NVIDIA cards (3060, 4060) and on Apple Silicon, the speed advantage may not appear at all. The model shifts from memory-bandwidth-bound inference (where Apple Silicon excels) to compute-bound inference (where high-end discrete NVIDIA GPUs excel).

More importantly: Google explicitly states that DiffusionGemma's output quality is lower than standard Gemma 4. This is not a temporary limitation. The speed-quality tradeoff is the fundamental characteristic of the diffusion approach. If you need maximum quality, standard Gemma 4 is the right model.

DiffusionGemma is best suited for:

Code infilling (filling in the middle of existing code)
Inline editing where you provide before/after context
Interactive local applications where latency matters and you can accept some quality reduction

It is less suited for:

Tasks that require maximum factual accuracy
Complex multi-step reasoning where precision accumulates
Any use case where you would compare outputs critically to standard Gemma 4

FAQ

Will updating LM Studio fix this?
Not until LM Studio ships a release that bundles either mlx-vlm 0.6.3+ (for Apple) or a llama.cpp build incorporating PR #24423 (for others). No current release does this.

Can I point LM Studio at a custom runtime?
LM Studio does not currently support swapping in a custom llama.cpp binary. The bundled runtime is what you get.

Is standard Gemma 4 in LM Studio working?
Yes. The gemma4 architecture is supported in current LM Studio releases. The limitation is specific to diffusion-gemma.

How long will this take to resolve?
Hard to predict. It depends on PR #24423 merging into llama.cpp, LM Studio shipping an update with the new llama.cpp version, and the MLX team releasing and LM Studio bundling a newer mlx-vlm. Best estimate is weeks rather than days.

Related guides:

Does DiffusionGemma Work in LM Studio? Current Status (June 2026)

What actually happens when you try

On Apple Silicon (MLX path)

On Windows / Linux (llama.cpp path)

When will LM Studio support DiffusionGemma?

What actually works right now

Recommended path by what you are trying to do

Before you commit to DiffusionGemma: what to know

FAQ

Related guides

DiffusionGemma + llama.cpp: Yes, Here's How to Run It (2026)

Fix "unknown model architecture" for gemma4 and diffusion-gemma in llama.cpp

Does llama.cpp Support Gemma 4? GGUF Status, Fixes, and What Works

Still deciding what to read next?