Version
1.0.0
Version
13.1
Which installation method(s) does this occur on?
Pip
Describe the bug.
It seems that launching the FP8 kernel may cause PyTorch memory leaks, where the memory occupied by the input tensor cannot be released properly.
No similar issues have been observed when the input tensor is of other data types.
Test Environment:
GPU: RTX 5090
CUDA Version: 13.1
CuTILE Version: 1.0.0
Minimum reproducible example
try following script:
import cuda.tile as ct
import torch
@ct.kernel
def kernel(A: ct.Array):
pass
for i in range(100000):
x = torch.zeros(size=[4025, 5394], device="cuda", dtype=torch.float8_e4m3fn)
ct.launch(torch.cuda.current_stream(), (1, 1, 1), kernel, (x, ))
Relevant log output
Traceback (most recent call last):
File "/root/autodl-tmp/super.py", line 9, in <module>
x = torch.zeros(size=[4025, 5394], device="cuda", dtype=torch.float8_e4m3fn)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB. GPU 0 has a total capacity of 31.36 GiB of which 17.06 MiB is free. Including non-PyTorch memory, this process has 31.33 GiB memory in use. Of the allocated memory 28.96 GiB is allocated by PyTorch, and 1.81 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Full env printout
Other/Misc.
No response
Contributing Guidelines
Version
1.0.0
Version
13.1
Which installation method(s) does this occur on?
Pip
Describe the bug.
It seems that launching the FP8 kernel may cause PyTorch memory leaks, where the memory occupied by the input tensor cannot be released properly.
No similar issues have been observed when the input tensor is of other data types.
Test Environment:
GPU: RTX 5090
CUDA Version: 13.1
CuTILE Version: 1.0.0
Minimum reproducible example
try following script:
Relevant log output
Full env printout
Other/Misc.
No response
Contributing Guidelines