Fixed Win11 install issues of compile flags and isfnite() crashing.#1977
Fixed Win11 install issues of compile flags and isfnite() crashing.#1977original-doc wants to merge 4 commits intoNVIDIA:masterfrom
Conversation
| "csrc/megatron/fused_weight_gradient_dense_cuda.cu", | ||
| "csrc/megatron/fused_weight_gradient_dense_16bit_prec_cuda.cu", | ||
| ], | ||
| libraries=["cublas", "cublasLt"], |
There was a problem hiding this comment.
at glance this change looks not quite relevant, why would we need this?
There was a problem hiding this comment.
Hi @crcrpar ,
Sorry for the late reply. This change is necessary to fix build failures on Windows.
Unlike on Linux, where linking against CUDA libraries can sometimes be handled implicitly or via shared object dependencies, the MSVC linker (link.exe) on Windows is strict and requires explicit linking of the import libraries (.lib) for any external functions used.
Without explicitly adding cublas and cublasLt to libraries, the build fails during the linking stage with LNK2001 (Unresolved External Symbol) errors because the linker cannot find the definitions for functions like cublasGemmEx or cublasLtMatmul.
mlp_cuda.obj : error LNK2001: unresolved external symbol cublasGetStream_v2
mlp_cuda.obj : error LNK2001: unresolved external symbol cublasGetMathMode
mlp_cuda.obj : error LNK2001: unresolved external symbol cublasGemmEx
mlp_cuda.obj : error LNK2001: unresolved external symbol cublasLtMatmul
mlp_cuda.obj : error LNK2001: unresolved external symbol cublasLtMatrixLayoutInit_internal
...
build\lib.win-amd64-cpython-310\mlp_cuda.cp310-win_amd64.pyd : fatal error LNK1120: 10 unresolved externals
There was a problem hiding this comment.
I'm not quite sure if this file should be in the repo
Build Fixes for NVIDIA Apex on Windows 11 (CUDA 12.8 / MSVC 2022)
Installation Command
Make sure you run below commands in x64 Native Tools Command Prompt for VS 2022 (use search in the win11 to find it). Before install it, make sure your environment has the necessary dependencies like
Pytorchandninja.Trouble shooting
If you encounter trouble with
compiled_autograd.h(1134 / 1108 / 1181), based on the Pytorch issue #148317, you may need to navigate to\anaconda\envs\basic\lib\site-packages\torch\include\torch\csrc\dynamo\compiled_autograd.hto Line 1134, and change it from:to
Build Environment
E:\CUDA128APEX_CPP_EXT=1,APEX_CUDA_EXT=1NVCC Version Info
Summary of Changes
This patch addresses three primary categories of build failures encountered on Windows:
1.
setup.pyConfigurationChanges
Added
libraries=["cublas", "cublasLt"]andextra_compile_argswith-D_DISABLE_EXTENDED_ALIGNED_STORAGEto several CUDA extensions.Affected Extensions
mlp_cudafused_dense_cudafused_weight_gradient_mlp_cudaCode Diff
Reasoning
LNK2001)cublas.libandcublasLt.libwhen these headers are used. Explicit linking resolves unresolved external symbols forcublasGemmEx,cublasLtMatmul, etc.std::aligned_storageworks, causing compliance standard errors with older CUDA headers. The flag_DISABLE_EXTENDED_ALIGNED_STORAGErestores the necessary behavior for compilation to succeed.2. Source Code Fixes (
csrc/)A. Type Definition Fix (
uint)File:
csrc/mlp_cuda.cuChange: Replaced
uintwithunsigned int.Reasoning: The type alias
uintis standard in Linux system headers but is not defined by default in the MSVC (Windows) environment. Using the standard C++ typeunsigned intensures cross-platform compatibility.B. Device Function Compatibility (
isfinite)Files:
csrc/multi_tensor_scale_kernel.cucsrc/multi_tensor_axpby_kernel.cuChange: Replaced the
isfinite()check with a robust floating-point check usingfabsf. Affected variables includingr_in[ii],r_x[ii]andr_y[ii].Reasoning: On Windows NVCC,
isfiniteoften resolves to the host-only C++ standard library function (std::isfinite) rather than the device intrinsic, causing a "calling a host function from a device function" error. Replacing it withfabsf(which is correctly mapped to a device intrinsic) bypasses this restriction while maintaining logical correctness.