Problem
After applying the workaround in #19634, an LFM2.5 1.2B CoreML PTE loads via executorch.runtime.Runtime.load_program(...) and all metadata methods (get_eos_ids, get_max_seq_len, use_kv_cache, …) succeed. prog.load_method("forward") then fails:
[ETCoreMLModelManager.mm:495] Successfully got compiled model ...
[ETCoreMLModelAnalyzer.mm:68] [Core ML] Failed to create model profiler.
Failed to build the model execution plan using a model architecture file '.../model.mil'
[coreml_backend_delegate.mm:324] CoreMLBackend: Failed to init the model.
[method.cpp:114] Init failed for backend CoreMLBackend: 0x23
E5RT encountered an STL exception. msg = MILCompilerForANE error:
failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED.
MIL → mlmodelc compilation succeeds; the ANE-specific execution-plan build fails. Reproduces via executorch.runtime on macOS and on iPhone 17 Pro / iOS 26.4.2 (surfaces in react-native-executorch as code: 35 "Failed to load LLM runner"), so it's a CoreML/ANE-side issue rather than a runtime one. compute_units: cpu_only and cpu_and_gpu succeed, but XNNPACK already covers the CPU case at higher throughput — the value of CoreML for this model is the ANE.
Reproduces with two different quantisation modes, ruling out a quantiser-specific cause:
- unquantised fp16 (no
quantization: block)
- weight-only 4-bit via the documented torchao
quantize_ path (qmode: 4w, see docs/source/backends/coreml/coreml-quantization.md)
So the failure is in lowering LFM2's short-conv conv_state mutation (self.conv_state.copy_(new_state) in examples/models/lfm2/short_conv.py, which decomposes to slice_copy + index_put) to an ANE-compatible MIL representation. The same model graph works on cpu_and_gpu.
Reproduce
# Apply workaround from #19634 first.
cat > examples/models/lfm2/config/lfm2_coreml_4w.yaml <<'EOF'
base:
metadata: '{"get_bos_id": 1, "get_eos_ids":[7]}'
model:
use_kv_cache: True
enable_dynamic_shape: False
dtype_override: fp32
quantization:
qmode: 4w
group_size: 32
backend:
coreml:
enabled: True
ios: 18
enable_state: True
preserve_sdpa: True
compute_units: cpu_and_ne
EOF
python -m extension.llm.export.export_llm \
--config examples/models/lfm2/config/lfm2_coreml_4w.yaml \
+base.model_class=lfm2_5_1_2b \
+base.params=examples/models/lfm2/config/lfm2_5_1_2b_config.json \
+export.max_seq_length=2048 \
+export.max_context_length=2048 \
+export.output_name=lfm2_coreml_4w.pte
python -c "
from executorch.runtime import Runtime
prog = Runtime.get().load_program('lfm2_coreml_4w.pte')
prog.load_method('get_eos_ids') # OK
prog.load_method('forward') # fails with backend init 0x23 + ANECCompile FAILED
"
Asks
- Is the short-conv
.copy_(...) mutation pattern expected to lower to ANE-compatible MIL? If not, what's the recommended rewrite of examples/models/lfm2/short_conv.py to produce an ANE-friendly graph?
- Is there a documented way to identify, before compilation, which ops in a model will block ANE compilation?
cc @kimishpatel @YifanShenSZ @cymbalrush @metascroy
Problem
After applying the workaround in #19634, an LFM2.5 1.2B CoreML PTE loads via
executorch.runtime.Runtime.load_program(...)and all metadata methods (get_eos_ids,get_max_seq_len,use_kv_cache, …) succeed.prog.load_method("forward")then fails:MIL →
mlmodelccompilation succeeds; the ANE-specific execution-plan build fails. Reproduces viaexecutorch.runtimeon macOS and on iPhone 17 Pro / iOS 26.4.2 (surfaces inreact-native-executorchascode: 35 "Failed to load LLM runner"), so it's a CoreML/ANE-side issue rather than a runtime one.compute_units: cpu_onlyandcpu_and_gpusucceed, but XNNPACK already covers the CPU case at higher throughput — the value of CoreML for this model is the ANE.Reproduces with two different quantisation modes, ruling out a quantiser-specific cause:
quantization:block)quantize_path (qmode: 4w, seedocs/source/backends/coreml/coreml-quantization.md)So the failure is in lowering LFM2's short-conv
conv_statemutation (self.conv_state.copy_(new_state)inexamples/models/lfm2/short_conv.py, which decomposes toslice_copy + index_put) to an ANE-compatible MIL representation. The same model graph works oncpu_and_gpu.Reproduce
Asks
.copy_(...)mutation pattern expected to lower to ANE-compatible MIL? If not, what's the recommended rewrite ofexamples/models/lfm2/short_conv.pyto produce an ANE-friendly graph?cc @kimishpatel @YifanShenSZ @cymbalrush @metascroy