Describe the bug
The SafeTensors loader reads data_offsets from JSON metadata but does not validate the entry count, ordering, or consistency with the declared tensor shape and dtype. Either through malice or incompetence, a user can declare a large shape (e.g., 1000×1000 float32 = 4 MB) while specifying data_offsets that cover only 4 bytes of actual data. When the tensor is evaluated, the loader reads far beyond the provided data, producing out-of-bounds memory access.
To Reproduce
Include code snippet
import mlx.core as mx
arrays, _ = mx.load("crash_v3006.safetensors", return_metadata=True)
mx.eval(arrays["t"])
t = arrays["t"]
print(t.shape) # [1000, 1000]
print(t.nbytes) # 4000000
print(t[0, :10]) # whatever was in memory past the file
Expected behavior
We are fetching 4000 bytes but passed in only 82 bytes. In MacOS the OS fills the out of bounds read with 0s but I don't think this is in any way guaranteed. Best practice would be to throw an error if we read outside the array.
Desktop (please complete the following information):
- OS Version: MacOS 26.3.1 (a) on M2 16GB
crash_v3006.safetensors.gz
Describe the bug
The SafeTensors loader reads
data_offsetsfrom JSON metadata but does not validate the entry count, ordering, or consistency with the declared tensor shape and dtype. Either through malice or incompetence, a user can declare a large shape (e.g., 1000×1000 float32 = 4 MB) while specifyingdata_offsetsthat cover only 4 bytes of actual data. When the tensor is evaluated, the loader reads far beyond the provided data, producing out-of-bounds memory access.To Reproduce
Include code snippet
Expected behavior
We are fetching 4000 bytes but passed in only 82 bytes. In MacOS the OS fills the out of bounds read with 0s but I don't think this is in any way guaranteed. Best practice would be to throw an error if we read outside the array.
Desktop (please complete the following information):
crash_v3006.safetensors.gz