Skip to content

[BUG] SafeTensors data_offsets Not Validated #3363

@MillaFleurs

Description

@MillaFleurs

Describe the bug

The SafeTensors loader reads data_offsets from JSON metadata but does not validate the entry count, ordering, or consistency with the declared tensor shape and dtype. Either through malice or incompetence, a user can declare a large shape (e.g., 1000×1000 float32 = 4 MB) while specifying data_offsets that cover only 4 bytes of actual data. When the tensor is evaluated, the loader reads far beyond the provided data, producing out-of-bounds memory access.

To Reproduce

Include code snippet

import mlx.core as mx
arrays, _ = mx.load("crash_v3006.safetensors", return_metadata=True)
mx.eval(arrays["t"])
t = arrays["t"]
print(t.shape)    # [1000, 1000]
print(t.nbytes)   # 4000000
print(t[0, :10])  # whatever was in memory past the file

Expected behavior
We are fetching 4000 bytes but passed in only 82 bytes. In MacOS the OS fills the out of bounds read with 0s but I don't think this is in any way guaranteed. Best practice would be to throw an error if we read outside the array.

Desktop (please complete the following information):

  • OS Version: MacOS 26.3.1 (a) on M2 16GB

crash_v3006.safetensors.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions