| Documentation | Intel® Gaudi® Documentation | Optimizing Training Platform Guide |
Latest News 🔥
-
[2026/04] Version 0.19.0 is now available, built on vLLM 0.19.0 and fully compatible with Intel® Gaudi® v1.24.0 with PyTorch 2.10.
This release upgrades the platform to Intel® Gaudi® Software v1.24.0 with PyTorch 2.10. It introduces Qwen 3.5 model support, Mamba prefix caching for hybrid models, MxFP4 weight dequantization, LMCache integration, and a custom depthwise conv1d TPC kernel for MambaMixer2. Performance improvements include torch.compile-compatible online defragmentation, improved warmup time, and optimized hybrid KV cache visibility.
-
[2026/04] Version 0.17.1 is now available, built on vLLM 0.17.1 and fully compatible with Intel® Gaudi® v1.23.0.
This patch release backports critical fixes and improvements including MxFP4 weight loading, Granite 4.0-h calibration, prefix caching for HPUMambaMixer2, OOM crash fixes, and SDL secure error handling improvements.
-
[2026/03] Version 0.16.0 is now available, built on vLLM 0.16.0 and fully compatible with Intel® Gaudi® v1.23.0.
This release introduces validated support and critical stability fixes for Qwen3-VL models leveraging HPUMMEncoderAttention. Performance and stability were improved through backported Mamba architecture optimizations, Docker and UBI infrastructure enhancements, and a forced CPU loading mechanism for INC quantization to prevent OOM errors.
The vLLM Hardware Plugin for Intel® Gaudi® integrates Intel® Gaudi® AI accelerators with vLLM to optimize large language model inference. It follows the [RFC]: Hardware pluggable and [RFC]: Enhancing vLLM Plugin Architecture principles, providing a modular interface for Intel® Gaudi® hardware. For more information, see the Plugin System document.
-
Set up your execution environment. Additionally, to achieve the best performance on HPU, follow the methods outlined in the Optimizing Training Platform Guide.
-
Get the last verified vLLM commit. While vLLM Hardware Plugin for Intel® Gaudi® follows the latest vLLM commits, upstream API updates may introduce compatibility issues. The saved commit has been thoroughly validated.
git clone https://github.com/vllm-project/vllm-gaudi cd vllm-gaudi export VLLM_COMMIT_HASH=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null) cd ..
-
Install vLLM using
pipor build it from source:# Build vLLM from source for empty platform, reusing existing torch installation git clone https://github.com/vllm-project/vllm cd vllm git checkout $VLLM_COMMIT_HASH pip install -r <(sed '/^torch/d' requirements/build/cuda.txt) VLLM_TARGET_DEVICE=empty pip install --no-build-isolation -e . cd ..
-
Install vLLM Hardware Plugin for Intel® Gaudi® from source:
cd vllm-gaudi pip install -e . cd ..
-
Install torchaudio (required by some upstream vLLM models such as QWEN3_5). Use the CPU wheel with
--no-depsto avoid pulling a conflicting CUDA torch:pip install --no-deps torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
To see all the available installation methods, such as NIXL, see the Installation guide.
We welcome and value any contributions and collaborations.
- For technical questions and feature requests, please use GitHub Issues.

