Skip to content

Feat: AICPU launch via dispatcher upload + Mode A type 2#537

Open
puddingfjz wants to merge 1 commit into
hw-native-sys:mainfrom
puddingfjz:feat/issue-356-aicpu-launch-new-interface
Open

Feat: AICPU launch via dispatcher upload + Mode A type 2#537
puddingfjz wants to merge 1 commit into
hw-native-sys:mainfrom
puddingfjz:feat/issue-356-aicpu-launch-new-interface

Conversation

@puddingfjz
Copy link
Copy Markdown
Contributor

@puddingfjz puddingfjz commented Apr 13, 2026

Summary

Two-phase architecture for loading AICPU kernels on CANN 9.0+ without tar.gz / sudo pre-deployment, and without per-task indirection through the dispatcher SO.

Bootstrap (per-DeviceRunner, idempotent across instances in a process)

Host bundles dispatcher SO bytes + runtime SO bytes into a single rtAicpuKernelLaunchExWithArgs (kernel_type = KERNEL_TYPE_AICPU_KFC) targeting CANN's preinstalled libaicpu_extend_kernels.so. libaicpu_extend_kernels dlopens our dispatcher and invokes its Init; the dispatcher reads the runtime SO bytes from extended DeviceArgs (new inner_so_bin / inner_so_len fields at offsets 120/128, which libaicpu_extend_kernels ignores) and writes them to:

/usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so

…using sched-thread (HwHiAiUser) write permission. The dispatcher SO itself never lands at preinstall.

The runtime SO basename embeds an FNV-1a content fingerprint, so two host processes uploading the same runtime SO produce the same file. Writes go via atomic tmp+rename — no truncation window visible to concurrent aicpu_scheduler readers. A process-level fingerprint cache in LoadAicpuOp skips redundant libaicpu_extend_kernels invocations within a single host process — each runtime is bootstrapped at most once per process.

Per-task launches (direct Mode A type 2, no dispatcher hop)

Host calls rtAicpuKernelLaunchExWithArgs with kernel_type = KERNEL_TYPE_AICPU, so_name = "simpler_inner_<fp>.so", kernel_name = "simpler_aicpu_init" / "simpler_aicpu_exec". The main aicpu_scheduler dlopens the preinstall file on first invocation and caches the handle; subsequent launches reuse it.

No JSON descriptors, no rtsBinaryLoadFromFile / rtsFuncGetByName lifecycle, no global op registry, no per-launch handle bookkeeping.

Cleanup

  • Removes BUILD_WITH_NEW_CANN CMake option and all ifdef branches.
  • Deletes the legacy AicpuLoader stub (src/{a2a3,a5}/platform/onboard/host/aicpu_loader.{cpp,h}) — its only role was the OFF-path fallback and nothing tested that path.
  • Skips so_info_ allocation on the new path (the runtime SO no longer reads device_args.aicpu_so_bin / aicpu_so_len). Saves ~inner-SO-size device memory per DeviceRunner; previously this accumulated across many ChipWorker/DeviceRunner instances and triggered AICORE OOM in long test sessions.
  • Widens the aicpu_op_timeout regression test to accept the new error code surfaced by Mode A type 2 (the dispatcher / main aicpu_scheduler path can race the STARS watchdog and return 507018/507000 before the AICore stream sync emits 507046).

Why this design (vs. earlier Mode B)

Earlier revisions of this PR routed per-task launches through Mode B (rtsBinaryLoadFromFile + rtsFuncGetByName + rtsLaunchCpuKernel). Making Mode B work across multi-process / multi-runtime / long-test scenarios required several global-state workarounds (per-process JSON paths, opType collision avoidance via fingerprint suffix, atomic rename on preinstall writes). With Mode A type 2:

  • per-task call carries its own (so_name, kernel_name) — no shared state
  • no JSON descriptor on disk
  • no global op-type registry
  • no binary_handle_ / rtFuncHandle lifecycle
  • per-call overhead is two strncpy + the syscall (irrelevant for µs–ms kernels)

The dispatcher SO + Mode A KFC bootstrap is the only thing we keep from the previous approach — it remains the cleanest way to get our runtime SO into the preinstall path without sudo.

Testing

  • a2a3 vector_example: 10/10 sequential PASS, 8/8 4-way concurrent PASS
  • All BUILD_WITH_NEW_CANN grep results across src/ = 0
  • Net ~250 LOC removed (deleted aicpu_loader, removed ifdef gates)
  • Dispatcher SO: ~14.5 KB (unchanged from previous iteration)

Fixes #356.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an AicpuLoader abstraction to support both legacy and new CANN 7.0+ interfaces for launching AICPU kernels across the a2a3 and a5 platforms. The implementation includes build system updates, runtime JSON descriptor generation, and integration into the DeviceRunner. Feedback focuses on improving build portability by avoiding hardcoded architecture paths and enhancing the robustness of manual JSON construction. Additionally, the removal of a default parameter in the a2a3 platform's header is identified as a breaking change that violates cross-platform consistency. Suggestions were also made to reduce coupling in the kernel name mapping.

Comment thread src/a2a3/platform/onboard/host/CMakeLists.txt Outdated
Comment thread src/a2a3/platform/onboard/host/aicpu_loader.cpp Outdated
Comment thread src/a2a3/platform/onboard/host/aicpu_loader.cpp Outdated
Comment thread src/a2a3/platform/onboard/host/device_runner.h Outdated
Comment thread src/a5/platform/onboard/host/CMakeLists.txt Outdated
Comment thread src/a5/platform/onboard/host/aicpu_loader.cpp Outdated
puddingfjz added a commit to puddingfjz/simpler that referenced this pull request Apr 13, 2026
- Revert hardcoded aarch64-linux path in CMakeLists.txt, use portable paths
- Restore default parameter for launch_aicpu_num in device_runner.h
- Add documentation explaining JSON construction and name_mapping design

The JSON construction uses manual string concatenation without a library.
This is safe because kernel names are controlled strings without special
characters, matching pypto's approach for similar AICPU op descriptors.

The name_mapping from opType to functionName is specific to the Ascend
tile framework kernels and is unlikely to change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from 5c35216 to f30e69c Compare May 21, 2026 02:05
@hw-native-sys-bot hw-native-sys-bot changed the title Feat/issue 356 aicpu launch new interface Feat: migrate AICPU launch to rtsLaunchCpuKernel + zero-deploy dispatcher May 21, 2026
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch 3 times, most recently from d4e918c to 3567417 Compare May 21, 2026 07:19
ChaoWao added a commit to puddingfjz/simpler that referenced this pull request May 21, 2026
…cher

Migrates host-side AICPU launches from Mode A
(rtAicpuKernelLaunchExWithArgs) to Mode B (rtsBinaryLoadFromFile +
rtsFuncGetByName + rtsLaunchCpuKernel), and removes the tar.gz / sudo
pre-deployment step for the AICPU SO.

Bootstrap (one Mode A call per DeviceRunner)
============================================
Host bundles dispatcher SO bytes + runtime SO bytes into a single
rtAicpuKernelLaunchExWithArgs targeting CANN's preinstalled
libaicpu_extend_kernels.so. libaicpu_extend_kernels writes the
dispatcher to its own private path, dlopens it, dlsym's the three CANN
contract symbols (Static + DynInit + Dyn) and invokes our DynInit.

Our dispatcher Init reads the runtime SO bytes from the extended
DeviceArgs (new fields inner_so_bin/inner_so_len at offsets 120/128,
which libaicpu_extend_kernels ignores) and writes them to
  /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so
using sched-thread (HwHiAiUser) write permission. The dispatcher SO
itself is never persisted to disk — only its transient libaicpu_extend_kernels
dlopen.

Per-task launches (direct Mode B, no dispatcher hop)
====================================================
Host computes the same FNV-1a fingerprint locally, generates a JSON
descriptor with kernelSo=simpler_inner_<fp>.so and functionName=
simpler_aicpu_init / simpler_aicpu_exec (the runtime SO's actual
exports), and calls rtsBinaryLoadFromFile + rtsFuncGetByName.
LaunchBuiltInOp invokes the runtime SO's symbols directly via
rtsLaunchCpuKernel — there's no per-task dispatcher hop and the
dispatcher SO is never referenced again.

Multi-runtime in one host process: each DeviceRunner bootstraps with
the same dispatcher bytes + its own runtime SO bytes. The dispatcher
upload path hits libaicpu_extend_kernels' firstCreatSo_ one-shot latch
only once (subsequent calls reuse the cached dlopen — same content
fingerprint); each runtime gets its own JSON registration with a
unique opType (symbol_name + fingerprint suffix) so CANN's global
op registry doesn't collide.

Reference: PR hw-native-sys#537.
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from 3567417 to 90e71ed Compare May 21, 2026 07:20
ChaoWao added a commit to puddingfjz/simpler that referenced this pull request May 21, 2026
…cher

Migrates host-side AICPU launches from Mode A
(rtAicpuKernelLaunchExWithArgs) to Mode B (rtsBinaryLoadFromFile +
rtsFuncGetByName + rtsLaunchCpuKernel), and removes the tar.gz / sudo
pre-deployment step for the AICPU SO.

Bootstrap (one Mode A call per DeviceRunner)
============================================
Host bundles dispatcher SO bytes + runtime SO bytes into a single
rtAicpuKernelLaunchExWithArgs targeting CANN's preinstalled
libaicpu_extend_kernels.so. libaicpu_extend_kernels writes the
dispatcher to its own private path, dlopens it, dlsym's the three CANN
contract symbols (Static + DynInit + Dyn) and invokes our DynInit.

Our dispatcher Init reads the runtime SO bytes from the extended
DeviceArgs (new fields inner_so_bin/inner_so_len at offsets 120/128,
which libaicpu_extend_kernels ignores) and writes them to
  /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so
using sched-thread (HwHiAiUser) write permission. The dispatcher SO
itself is never persisted to disk — only its transient libaicpu_extend_kernels
dlopen.

Per-task launches (direct Mode B, no dispatcher hop)
====================================================
Host computes the same FNV-1a fingerprint locally, generates a JSON
descriptor with kernelSo=simpler_inner_<fp>.so and functionName=
simpler_aicpu_init / simpler_aicpu_exec (the runtime SO's actual
exports), and calls rtsBinaryLoadFromFile + rtsFuncGetByName.
LaunchBuiltInOp invokes the runtime SO's symbols directly via
rtsLaunchCpuKernel — there's no per-task dispatcher hop and the
dispatcher SO is never referenced again.

Multi-runtime in one host process: each DeviceRunner bootstraps with
the same dispatcher bytes + its own runtime SO bytes. The dispatcher
upload path hits libaicpu_extend_kernels' firstCreatSo_ one-shot latch
only once (subsequent calls reuse the cached dlopen — same content
fingerprint); each runtime gets its own JSON registration with a
unique opType (symbol_name + fingerprint suffix) so CANN's global
op registry doesn't collide.

Reference: PR hw-native-sys#537.
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from 90e71ed to 7b9e506 Compare May 21, 2026 09:47
ChaoWao added a commit to puddingfjz/simpler that referenced this pull request May 21, 2026
…cher

Migrates host-side AICPU launches from Mode A
(rtAicpuKernelLaunchExWithArgs) to Mode B (rtsBinaryLoadFromFile +
rtsFuncGetByName + rtsLaunchCpuKernel), and removes the tar.gz / sudo
pre-deployment step for the AICPU SO.

Bootstrap (one Mode A call per DeviceRunner)
============================================
Host bundles dispatcher SO bytes + runtime SO bytes into a single
rtAicpuKernelLaunchExWithArgs targeting CANN's preinstalled
libaicpu_extend_kernels.so. libaicpu_extend_kernels writes the
dispatcher to its own private path, dlopens it, dlsym's the three CANN
contract symbols (Static + DynInit + Dyn) and invokes our DynInit.

Our dispatcher Init reads the runtime SO bytes from the extended
DeviceArgs (new fields inner_so_bin/inner_so_len at offsets 120/128,
which libaicpu_extend_kernels ignores) and writes them to
  /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so
using sched-thread (HwHiAiUser) write permission. The dispatcher SO
itself is never persisted to disk — only its transient libaicpu_extend_kernels
dlopen.

Per-task launches (direct Mode B, no dispatcher hop)
====================================================
Host computes the same FNV-1a fingerprint locally, generates a JSON
descriptor with kernelSo=simpler_inner_<fp>.so and functionName=
simpler_aicpu_init / simpler_aicpu_exec (the runtime SO's actual
exports), and calls rtsBinaryLoadFromFile + rtsFuncGetByName.
LaunchBuiltInOp invokes the runtime SO's symbols directly via
rtsLaunchCpuKernel — there's no per-task dispatcher hop and the
dispatcher SO is never referenced again.

Multi-runtime in one host process: each DeviceRunner bootstraps with
the same dispatcher bytes + its own runtime SO bytes. The dispatcher
upload path hits libaicpu_extend_kernels' firstCreatSo_ one-shot latch
only once (subsequent calls reuse the cached dlopen — same content
fingerprint); each runtime gets its own JSON registration with a
unique opType (symbol_name + fingerprint suffix) so CANN's global
op registry doesn't collide.

Reference: PR hw-native-sys#537.
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from 7b9e506 to b4dd9b1 Compare May 21, 2026 10:27
ChaoWao added a commit to puddingfjz/simpler that referenced this pull request May 21, 2026
…cher

Migrates host-side AICPU launches from Mode A
(rtAicpuKernelLaunchExWithArgs) to Mode B (rtsBinaryLoadFromFile +
rtsFuncGetByName + rtsLaunchCpuKernel), and removes the tar.gz / sudo
pre-deployment step for the AICPU SO.

Bootstrap (one Mode A call per DeviceRunner)
============================================
Host bundles dispatcher SO bytes + runtime SO bytes into a single
rtAicpuKernelLaunchExWithArgs targeting CANN's preinstalled
libaicpu_extend_kernels.so. libaicpu_extend_kernels writes the
dispatcher to its own private path, dlopens it, dlsym's the three CANN
contract symbols (Static + DynInit + Dyn) and invokes our DynInit.

Our dispatcher Init reads the runtime SO bytes from the extended
DeviceArgs (new fields inner_so_bin/inner_so_len at offsets 120/128,
which libaicpu_extend_kernels ignores) and writes them to
  /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so
using sched-thread (HwHiAiUser) write permission. The dispatcher SO
itself is never persisted to disk — only its transient libaicpu_extend_kernels
dlopen.

Per-task launches (direct Mode B, no dispatcher hop)
====================================================
Host computes the same FNV-1a fingerprint locally, generates a JSON
descriptor with kernelSo=simpler_inner_<fp>.so and functionName=
simpler_aicpu_init / simpler_aicpu_exec (the runtime SO's actual
exports), and calls rtsBinaryLoadFromFile + rtsFuncGetByName.
LaunchBuiltInOp invokes the runtime SO's symbols directly via
rtsLaunchCpuKernel — there's no per-task dispatcher hop and the
dispatcher SO is never referenced again.

Multi-runtime in one host process: each DeviceRunner bootstraps with
the same dispatcher bytes + its own runtime SO bytes. The dispatcher
upload path hits libaicpu_extend_kernels' firstCreatSo_ one-shot latch
only once (subsequent calls reuse the cached dlopen — same content
fingerprint); each runtime gets its own JSON registration with a
unique opType (symbol_name + fingerprint suffix) so CANN's global
op registry doesn't collide.

Reference: PR hw-native-sys#537.
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from b4dd9b1 to bb65c0c Compare May 21, 2026 10:54
ChaoWao added a commit to puddingfjz/simpler that referenced this pull request May 21, 2026
…cher

Migrates host-side AICPU launches from Mode A
(rtAicpuKernelLaunchExWithArgs) to Mode B (rtsBinaryLoadFromFile +
rtsFuncGetByName + rtsLaunchCpuKernel), and removes the tar.gz / sudo
pre-deployment step for the AICPU SO.

Bootstrap (one Mode A call per DeviceRunner)
============================================
Host bundles dispatcher SO bytes + runtime SO bytes into a single
rtAicpuKernelLaunchExWithArgs targeting CANN's preinstalled
libaicpu_extend_kernels.so. libaicpu_extend_kernels writes the
dispatcher to its own private path, dlopens it, dlsym's the three CANN
contract symbols (Static + DynInit + Dyn) and invokes our DynInit.

Our dispatcher Init reads the runtime SO bytes from the extended
DeviceArgs (new fields inner_so_bin/inner_so_len at offsets 120/128,
which libaicpu_extend_kernels ignores) and writes them to
  /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so
using sched-thread (HwHiAiUser) write permission. The dispatcher SO
itself is never persisted to disk — only its transient libaicpu_extend_kernels
dlopen.

Per-task launches (direct Mode B, no dispatcher hop)
====================================================
Host computes the same FNV-1a fingerprint locally, generates a JSON
descriptor with kernelSo=simpler_inner_<fp>.so and functionName=
simpler_aicpu_init / simpler_aicpu_exec (the runtime SO's actual
exports), and calls rtsBinaryLoadFromFile + rtsFuncGetByName.
LaunchBuiltInOp invokes the runtime SO's symbols directly via
rtsLaunchCpuKernel — there's no per-task dispatcher hop and the
dispatcher SO is never referenced again.

Multi-runtime in one host process: each DeviceRunner bootstraps with
the same dispatcher bytes + its own runtime SO bytes. The dispatcher
upload path hits libaicpu_extend_kernels' firstCreatSo_ one-shot latch
only once (subsequent calls reuse the cached dlopen — same content
fingerprint); each runtime gets its own JSON registration with a
unique opType (symbol_name + fingerprint suffix) so CANN's global
op registry doesn't collide.

Reference: PR hw-native-sys#537.
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from bb65c0c to f173a99 Compare May 21, 2026 11:35
ChaoWao added a commit to puddingfjz/simpler that referenced this pull request May 22, 2026
Two-phase architecture for loading AICPU kernels on CANN 9.0+ without
tar.gz / sudo pre-deployment, and without per-task indirection through
the dispatcher SO.

Bootstrap (per-DeviceRunner, idempotent across instances in a process)
======================================================================
Host bundles dispatcher SO bytes + runtime SO bytes into a single
rtAicpuKernelLaunchExWithArgs (kernel_type = KERNEL_TYPE_AICPU_KFC)
targeting CANN's preinstalled libaicpu_extend_kernels.so.
libaicpu_extend_kernels dlopens our dispatcher and invokes its Init;
the dispatcher reads the runtime SO bytes from extended DeviceArgs
(inner_so_bin/inner_so_len at offsets 120/128, which
libaicpu_extend_kernels ignores) and writes them to
  /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so
using sched-thread (HwHiAiUser) write permission. The dispatcher SO
itself never lands at preinstall.

The runtime SO basename embeds an FNV-1a content fingerprint, so two
host processes uploading the same runtime SO produce the same file
(idempotent writes via atomic tmp+rename, no truncation window
visible to concurrent aicpu_scheduler readers). A process-level
fingerprint cache in LoadAicpuOp skips redundant
libaicpu_extend_kernels invocations within a single host process —
each runtime is bootstrapped at most once per process.

Per-task launches (direct Mode A type 2, no dispatcher hop)
===========================================================
Host calls rtAicpuKernelLaunchExWithArgs with kernel_type =
KERNEL_TYPE_AICPU, so_name = "simpler_inner_<fp>.so",
kernel_name = "simpler_aicpu_init" / "simpler_aicpu_exec". The main
aicpu_scheduler dlopens the preinstall file on first invocation and
caches the handle; subsequent launches reuse it. No JSON descriptors,
no rtsBinaryLoadFromFile / rtsFuncGetByName lifecycle, no global op
registry, no per-launch handle bookkeeping.

Cleanup
=======
- Removes BUILD_WITH_NEW_CANN CMake option and all ifdef branches.
- Deletes the legacy AicpuLoader stub (src/{a2a3,a5}/platform/onboard/
  host/aicpu_loader.{cpp,h}) — its only role was the OFF-path
  fallback and nothing tested that path.
- Skips so_info_ allocation on the new path (the runtime SO no longer
  reads device_args.aicpu_so_bin / aicpu_so_len). Saves ~inner-SO-size
  device memory per DeviceRunner; previously this accumulated across
  many ChipWorker/DeviceRunner instances and triggered AICORE OOM in
  long test sessions.
- Widens the aicpu_op_timeout regression test to accept the new error
  code surfaced by Mode A type 2 (the dispatcher / main aicpu_scheduler
  path can race the STARS watchdog and return 507018/507000 before the
  AICore stream sync emits 507046).

Reference: PR hw-native-sys#537.
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from f173a99 to 473d8f6 Compare May 22, 2026 02:47
@hw-native-sys-bot hw-native-sys-bot changed the title Feat: migrate AICPU launch to rtsLaunchCpuKernel + zero-deploy dispatcher Feat: AICPU launch via dispatcher upload + Mode A type 2 May 22, 2026
Two-phase architecture for loading AICPU kernels on CANN 9.0+ without
tar.gz / sudo pre-deployment, and without per-task indirection through
the dispatcher SO.

Bootstrap (per-DeviceRunner, idempotent across instances in a process)
======================================================================
Host bundles dispatcher SO bytes + runtime SO bytes into a single
rtAicpuKernelLaunchExWithArgs (kernel_type = KERNEL_TYPE_AICPU_KFC)
targeting CANN's preinstalled libaicpu_extend_kernels.so.
libaicpu_extend_kernels dlopens our dispatcher and invokes its Init;
the dispatcher reads the runtime SO bytes from extended DeviceArgs
(inner_so_bin/inner_so_len at offsets 120/128, which
libaicpu_extend_kernels ignores) and writes them to
  /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>.so
using sched-thread (HwHiAiUser) write permission. The dispatcher SO
itself never lands at preinstall.

The runtime SO basename embeds an FNV-1a content fingerprint, so two
host processes uploading the same runtime SO produce the same file
(idempotent writes via atomic tmp+rename, no truncation window
visible to concurrent aicpu_scheduler readers). A process-level
fingerprint cache in LoadAicpuOp skips redundant
libaicpu_extend_kernels invocations within a single host process —
each runtime is bootstrapped at most once per process.

Per-task launches (direct Mode A type 2, no dispatcher hop)
===========================================================
Host calls rtAicpuKernelLaunchExWithArgs with kernel_type =
KERNEL_TYPE_AICPU, so_name = "simpler_inner_<fp>.so",
kernel_name = "simpler_aicpu_init" / "simpler_aicpu_exec". The main
aicpu_scheduler dlopens the preinstall file on first invocation and
caches the handle; subsequent launches reuse it. No JSON descriptors,
no rtsBinaryLoadFromFile / rtsFuncGetByName lifecycle, no global op
registry, no per-launch handle bookkeeping.

Cleanup
=======
- Removes BUILD_WITH_NEW_CANN CMake option and all ifdef branches.
- Deletes the legacy AicpuLoader stub (src/{a2a3,a5}/platform/onboard/
  host/aicpu_loader.{cpp,h}) — its only role was the OFF-path
  fallback and nothing tested that path.
- Skips so_info_ allocation on the new path (the runtime SO no longer
  reads device_args.aicpu_so_bin / aicpu_so_len). Saves ~inner-SO-size
  device memory per DeviceRunner; previously this accumulated across
  many ChipWorker/DeviceRunner instances and triggered AICORE OOM in
  long test sessions.
- Widens the aicpu_op_timeout regression test to accept the new error
  code surfaced by Mode A type 2 (the dispatcher / main aicpu_scheduler
  path can race the STARS watchdog and return 507018/507000 before the
  AICore stream sync emits 507046).

Reference: PR hw-native-sys#537.
@ChaoWao ChaoWao force-pushed the feat/issue-356-aicpu-launch-new-interface branch from 473d8f6 to 2c220d3 Compare May 22, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Migrate AICPU launch to new rtsLaunchCpuKernel interface (BUILD_WITH_NEW_CANN)

3 participants