Installation Guide for Z-Image Turbo

This page explains how to set up a local environment for running Z-Image Turbo, the distilled text-to-image model in the Z-Image family. The instructions assume that you have access to a recent GPU with sufficient memory and are comfortable working with Python and a terminal.

1. Hardware and Software Requirements

Z-Image Turbo is designed to run on graphics cards with around 16 GB of VRAM. Lower memory devices may still work with offloading and reduced resolution, but generation time will increase.

A GPU with around 16 GB of VRAM (for example, a recent consumer or data center card).
A recent Python 3 environment (Python 3.9 or later is a good starting point).
A working installation of CUDA and a compatible PyTorch build.

2. Create and Activate a Virtual Environment

To keep dependencies isolated, it is helpful to use a virtual environment. The example below uses the built-in virtual environment module.

python -m venv zimage-env
source zimage-env/bin/activate  # On Windows: zimage-env\Scripts\activate

3. Install PyTorch and Core Libraries

Install a GPU-enabled build of PyTorch and the core libraries required by Z-Image Turbo. The exact index URL for PyTorch may differ depending on your CUDA version, so adjust it if needed.

pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install git+https://github.com/huggingface/diffusers
pip install transformers accelerate safetensors

4. Load the Z-Image Turbo Pipeline

Once the environment is prepared, you can load the dedicated pipeline for Z-Image Turbo. The diffusion library includes a ZImagePipeline class that wraps all components needed to run the model.

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

5. Generate Your First Image

The following example shows how to generate an image with a fixed seed. It uses nine diffusion steps, which correspond to around eight passes through the diffusion transformer.

prompt = "City street at night with clear bilingual store signs, warm lighting, and detailed reflections on wet pavement."

image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(123),
).images[0]

image.save("z_image_turbo_city.png")

6. Optional: Attention Backends and Compilation

On some hardware, you can switch the internal attention implementation or compile the model for additional speedups. These options are not required to use Z-Image Turbo but can help reduce latency in production settings.

# Switch attention backend if supported
pipe.transformer.set_attention_backend("flash")      # Flash-Attention-2
# pipe.transformer.set_attention_backend("_flash_3")  # Flash-Attention-3

# Optionally compile the transformer module
# pipe.transformer.compile()

7. Optional: CPU Offloading for Memory-Constrained Devices

If your GPU memory is limited, you can enable model offloading so that some parts of the model move between CPU and GPU during generation. This increases generation time but allows Z-Image Turbo to run on smaller devices.

pipe.enable_model_cpu_offload()

8. Integrating Z-Image Turbo into Applications

After you are comfortable running the basic script, you can embed Z-Image Turbo inside larger systems. Some examples include a lightweight web interface, a job queue that processes prompts from a database, or an internal tool that helps designers explore prompt variations.

The key pattern remains the same: prepare the pipeline once at startup, reuse it across requests, control seeds and prompts carefully, and monitor latency and memory use. Because Z-Image Turbo is designed for few-step sampling, it naturally fits interactive tools where quick responses are helpful.

For more background on the model family and training methods such as Decoupled-DMD and DMDR, you can read the main Z-Image documentation and research material, which explain the reasoning behind the architecture and distillation choices that power Z-Image Turbo.