Fixed Win11 install issues of compile flags and isfnite() crashing. by original-doc · Pull Request #1977 · NVIDIA/apex

original-doc · 2026-01-18T07:50:39Z

Build Fixes for NVIDIA Apex on Windows 11 (CUDA 12.8 / MSVC 2022)

Installation Command

Make sure you run below commands in x64 Native Tools Command Prompt for VS 2022 (use search in the win11 to find it). Before install it, make sure your environment has the necessary dependencies like Pytorch and ninja.

git clone https://github.com/NVIDIA/apex.git
cd apex
set APEX_CPP_EXT=1
set APEX_CUDA_EXT=1
set DISTUTILS_USE_SDK=1
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation ./

Trouble shooting

If you encounter trouble with compiled_autograd.h(1134 / 1108 / 1181), based on the Pytorch issue #148317, you may need to navigate to \anaconda\envs\basic\lib\site-packages\torch\include\torch\csrc\dynamo\compiled_autograd.hto Line 1134, and change it from:

} else if constexpr (::std::is_same_v<T, ::std::string>) {
  return at::StringType::get();

to

// } else if constexpr (::std::is_same_v<T, ::std::string>) {
//   return at::StringType::get();

Note: Building NVIDIA Apex on Windows is challenging and may find different errors on different devices. This guide documents a successful build on Win11 RTX5070 (sm_120) with CUDA 12.8.

Build Environment

Component	Version
OS	Windows 11
CUDA Toolkit	12.8 (Blackwell / SM_100 / SM_120)
CUDA Path	`E:\CUDA128`
Compiler	MSVC 2022 (Visual Studio Build Tools)
Python	3.10
PyTorch	2.9.1+cu128
Build Flags	`APEX_CPP_EXT=1`, `APEX_CUDA_EXT=1`

NVCC Version Info

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:38:46_Pacific_Standard_Time_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0

Summary of Changes

This patch addresses three primary categories of build failures encountered on Windows:

Standard type definitions
MSVC-specific compiler flags for memory alignment
Explicit library linking for cuBLAS

1. `setup.py` Configuration

Changes

Added libraries=["cublas", "cublasLt"] and extra_compile_args with -D_DISABLE_EXTENDED_ALIGNED_STORAGE to several CUDA extensions.

Affected Extensions

mlp_cuda
fused_dense_cuda
fused_weight_gradient_mlp_cuda
(And potentially others using cuBLAS or aligned storage)

Code Diff

ext_modules.append(
    CUDAExtension(
        name="module_name",
        sources=["..."],
        # Fix 1: Explicitly link cuBLAS for Windows
        libraries=["cublas", "cublasLt"], 
        extra_compile_args={
            # Fix 2: Disable extended aligned storage to fix VS2019+ static assertion errors
            "cxx": ["-O3", "-D_DISABLE_EXTENDED_ALIGNED_STORAGE"],
            "nvcc": ["-O3", "-D_DISABLE_EXTENDED_ALIGNED_STORAGE", ...],
        },
    )
)

Reasoning

Issue	Explanation
Linker Errors (`LNK2001`)	Unlike Linux, the Windows build environment does not automatically link `cublas.lib` and `cublasLt.lib` when these headers are used. Explicit linking resolves unresolved external symbols for `cublasGemmEx`, `cublasLtMatmul`, etc.
Alignment Errors	Visual Studio 2017 (15.8 update) and later changed how `std::aligned_storage` works, causing compliance standard errors with older CUDA headers. The flag `_DISABLE_EXTENDED_ALIGNED_STORAGE` restores the necessary behavior for compilation to succeed.

2. Source Code Fixes (`csrc/`)

A. Type Definition Fix (`uint`)

File: csrc/mlp_cuda.cu

Change: Replaced uint with unsigned int.

Reasoning: The type alias uint is standard in Linux system headers but is not defined by default in the MSVC (Windows) environment. Using the standard C++ type unsigned int ensures cross-platform compatibility.

B. Device Function Compatibility (`isfinite`)

Files:

csrc/multi_tensor_scale_kernel.cu
csrc/multi_tensor_axpby_kernel.cu

Change: Replaced the isfinite() check with a robust floating-point check using fabsf. Affected variables including r_in[ii], r_x[ii] and r_y[ii].

// Before
finite = finite && (isfinite(r_in[ii])); ...

// After
finite = finite && (fabsf((float)r_in[ii]) <= 3.40282e+38f); ... 
// Checks if value is within finite float range

Reasoning: On Windows NVCC, isfinite often resolves to the host-only C++ standard library function (std::isfinite) rather than the device intrinsic, causing a "calling a host function from a device function" error. Replacing it with fabsf (which is correctly mapped to a device intrinsic) bypasses this restriction while maintaining logical correctness.

for more information, see https://pre-commit.ci

crcrpar · 2026-01-21T04:00:49Z

                "csrc/megatron/fused_weight_gradient_dense_cuda.cu",
                "csrc/megatron/fused_weight_gradient_dense_16bit_prec_cuda.cu",
            ],
+            libraries=["cublas", "cublasLt"],


at glance this change looks not quite relevant, why would we need this?

Hi @crcrpar ,

Sorry for the late reply. This change is necessary to fix build failures on Windows.

Unlike on Linux, where linking against CUDA libraries can sometimes be handled implicitly or via shared object dependencies, the MSVC linker (link.exe) on Windows is strict and requires explicit linking of the import libraries (.lib) for any external functions used.

Without explicitly adding cublas and cublasLt to libraries, the build fails during the linking stage with LNK2001 (Unresolved External Symbol) errors because the linker cannot find the definitions for functions like cublasGemmEx or cublasLtMatmul.

mlp_cuda.obj : error LNK2001: unresolved external symbol cublasGetStream_v2 mlp_cuda.obj : error LNK2001: unresolved external symbol cublasGetMathMode mlp_cuda.obj : error LNK2001: unresolved external symbol cublasGemmEx mlp_cuda.obj : error LNK2001: unresolved external symbol cublasLtMatmul mlp_cuda.obj : error LNK2001: unresolved external symbol cublasLtMatrixLayoutInit_internal ... build\lib.win-amd64-cpython-310\mlp_cuda.cp310-win_amd64.pyd : fatal error LNK1120: 10 unresolved externals

crcrpar · 2026-01-21T04:02:01Z

I'm not quite sure if this file should be in the repo

original-doc and others added 4 commits January 17, 2026 20:59

win11 adapted

2f2e78a

win11 adapted

800bf1a

win11 adapted

ce883ae

[pre-commit.ci] auto fixes from pre-commit.com hooks

4f5f7b0

for more information, see https://pre-commit.ci

crcrpar reviewed Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Win11 install issues of compile flags and isfnite() crashing.#1977

Fixed Win11 install issues of compile flags and isfnite() crashing.#1977
original-doc wants to merge 4 commits intoNVIDIA:masterfrom
original-doc:master

original-doc commented Jan 18, 2026

Uh oh!

crcrpar Jan 21, 2026

Uh oh!

original-doc Feb 13, 2026

Uh oh!

crcrpar Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

original-doc commented Jan 18, 2026

Build Fixes for NVIDIA Apex on Windows 11 (CUDA 12.8 / MSVC 2022)

Installation Command

Trouble shooting

Build Environment

NVCC Version Info

Summary of Changes

1. setup.py Configuration

Changes

Affected Extensions

Code Diff

Reasoning

2. Source Code Fixes (csrc/)

A. Type Definition Fix (uint)

B. Device Function Compatibility (isfinite)

Uh oh!

crcrpar Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

original-doc Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

crcrpar Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `setup.py` Configuration

2. Source Code Fixes (`csrc/`)

A. Type Definition Fix (`uint`)

B. Device Function Compatibility (`isfinite`)