Problem
_to_edge_and_lower_llama_xnnpack uses to_edge_transform_and_lower(). The generic _to_edge_and_lower_llama (CoreML/MPS/QNN/Vulkan) uses the deprecated export_to_edge() + to_backend() split. CoreMLPartitioner emits a deprecation warning about this on every invocation.
For LFM2.5 hybrid models the split path desynchronises subgraph output-node names from the parent program's buffers_to_mutate map (short-conv self.conv_state.copy_(...) decomposes to slice_copy + index_put, only one of which the partitioner records as the mutation source). The verifier then raises:
torch._export.verifier.SpecViolationError: Mutation node aten_index_put_default_N is neither a buffer nor a user input.
Reproduce
git clone https://github.com/pytorch/executorch && cd executorch
./install_executorch.sh
source .venv/bin/activate
pip install coremltools
cat > examples/models/lfm2/config/lfm2_coreml.yaml <<'EOF'
base:
metadata: '{"get_bos_id": 1, "get_eos_ids":[7]}'
model:
use_kv_cache: True
enable_dynamic_shape: False
dtype_override: fp32
backend:
coreml:
enabled: True
ios: 18
enable_state: True
preserve_sdpa: True
compute_units: cpu_and_ne
EOF
python -m extension.llm.export.export_llm \
--config examples/models/lfm2/config/lfm2_coreml.yaml \
+base.model_class=lfm2_5_1_2b \
+base.params=examples/models/lfm2/config/lfm2_5_1_2b_config.json \
+export.max_seq_length=2048 \
+export.max_context_length=2048 \
+export.output_name=lfm2_coreml.pte
Suggested fix
Add a CoreML helper analogous to _to_edge_and_lower_llama_xnnpack, or short-circuit _to_edge_and_lower_llama when coreml=True:
if coreml:
coreml_partitioner = get_coreml_partitioner(
coreml_ios, embedding_quantize, pt2e_quantize,
coreml_quantize, coreml_compute_units,
)
builder = builder_exported.pt2e_quantize(quantizers).to_edge_transform_and_lower(
[coreml_partitioner]
)
return builder.to_executorch(passes=additional_passes)
The same migration likely applies to MPS, QNN, and Vulkan branches; only CoreML has been exercised here.
Problem
_to_edge_and_lower_llama_xnnpackusesto_edge_transform_and_lower(). The generic_to_edge_and_lower_llama(CoreML/MPS/QNN/Vulkan) uses the deprecatedexport_to_edge() + to_backend()split.CoreMLPartitioneremits a deprecation warning about this on every invocation.For LFM2.5 hybrid models the split path desynchronises subgraph output-node names from the parent program's
buffers_to_mutatemap (short-convself.conv_state.copy_(...)decomposes toslice_copy + index_put, only one of which the partitioner records as the mutation source). The verifier then raises:Reproduce
Suggested fix
Add a CoreML helper analogous to
_to_edge_and_lower_llama_xnnpack, or short-circuit_to_edge_and_lower_llamawhencoreml=True:The same migration likely applies to MPS, QNN, and Vulkan branches; only CoreML has been exercised here.