Add 64-bit indexing fallback for large multi_tensor_l2norm kernels by SongXiaoXi · Pull Request #1989 · NVIDIA/apex

SongXiaoXi · 2026-03-25T07:33:02Z

Summary

This PR adds a 64-bit indexing fallback for the multi_tensor_l2norm kernel family when any input tensor has numel() above INT_MAX.

The existing int32 fast path is preserved for normal tensor sizes, while large tensors are dispatched to an int64-indexed path.

Problem

Apex's multi-tensor metadata stores tensor sizes in int64, but the l2norm family still narrows sizes and chunk indexing to int32 inside the device functors.

For tensors larger than INT_MAX elements, this can produce incorrect norm results and may also lead to out-of-bounds accesses once chunk offsets overflow 32-bit indexing.

Fix

add a shared helper to detect when tensor lists require 64-bit indexing
template the l2norm family functors on index type
dispatch to int64 indexing only for large tensors
preserve the existing int32 fast path for the common case

Affected ops

multi_tensor_l2norm
multi_tensor_l2norm_mp
multi_tensor_l2norm_scale

Testing

Added large-tensor regression tests covering:

multi_tensor_l2norm
multi_tensor_l2norm_mp
multi_tensor_l2norm_scale

The new tests verify correctness for tensors larger than INT_MAX elements while keeping the existing small/normal tensor path unchanged.

Signed-off-by: Xiaoxi Song <song_xiaoxi@126.com>

for more information, see https://pre-commit.ci

Copilot

Pull request overview

Adds a safe 64-bit indexing fallback for Apex’s multi-tensor L2-norm CUDA kernels when tensor sizes exceed INT_MAX, preventing incorrect results and potential OOB accesses while preserving the existing int32 fast path for typical tensor sizes.

Changes:

Introduces a shared host-side helper to detect when any tensor list requires 64-bit indexing.
Templates the L2-norm kernel functors on an index_t and dispatches to int64_t only when needed.
Adds large-tensor regression tests covering multi_tensor_l2norm, multi_tensor_l2norm_mp, and multi_tensor_l2norm_scale.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/L0/run_optimizers/test_large_tensor_l2norm.py	Adds regression tests for tensors with `numel() > INT_MAX` across the L2-norm kernel family.
csrc/multi_tensor_l2norm_scale_kernel.cu	Adds `index_t`-templated functor + runtime dispatch to int64 indexing for large tensors.
csrc/multi_tensor_l2norm_kernel_mp.cu	Adds `index_t`-templated functor + runtime dispatch to int64 indexing for large tensors.
csrc/multi_tensor_l2norm_kernel.cu	Adds `index_t`-templated functors + runtime dispatch (including unscale + norm_out paths).
csrc/multi_tensor_apply.cuh	Adds `tensor_lists_require_64bit_indexing(...)` helper used by updated kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

SongXiaoXi and others added 2 commits March 25, 2026 13:12

Fix large-tensor indexing in multi-tensor l2norm kernels

5d03c51

Signed-off-by: Xiaoxi Song <song_xiaoxi@126.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

73ba453

for more information, see https://pre-commit.ci

crcrpar reviewed Mar 25, 2026

View reviewed changes

Comment thread tests/L0/run_optimizers/test_large_tensor_l2norm.py

crcrpar requested a review from Copilot March 25, 2026 08:36

Copilot started reviewing on behalf of crcrpar March 25, 2026 08:37 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Comment thread tests/L0/run_optimizers/test_large_tensor_l2norm.py Outdated

Update tests/L0/run_optimizers/test_large_tensor_l2norm.py

bdd10e1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 64-bit indexing fallback for large multi_tensor_l2norm kernels#1989

Add 64-bit indexing fallback for large multi_tensor_l2norm kernels#1989
SongXiaoXi wants to merge 3 commits intoNVIDIA:masterfrom
SongXiaoXi:master

SongXiaoXi commented Mar 25, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SongXiaoXi commented Mar 25, 2026

Summary

Problem

Fix

Affected ops

Testing

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants