Summary
Follow-up to PR #792: --dump-args currently only exports orchestrator-level arguments to tensor_dump/args_dump.json.
Downstream Insight Trace needs the actual per-dispatch kernel_entry(args) layout for individual incore kernels so it can replay a single kernel dispatch directly.
Motivation / Use Case
The current args_dump.json is useful for orchestration-level inspection, but it is not sufficient to reconstruct one real kernel dispatch such as QK / SF / PV / UP.
Insight Trace needs the finalized args after scheduler payload construction, including the real slot ordering and per-dispatch metadata. Without that, downstream tooling cannot reliably replay one incore kernel from dump artifacts.
Proposed API / Behavior
Add a separate kernel-level dump artifact, for example:
tensor_dump/kernel_args_dump.json
This new dump should:
- keep existing
tensor_dump/args_dump.json unchanged for compatibility
- capture records after scheduler payload construction, using the actual
kernel_entry(args) layout
- include per-dispatch identifiers such as:
dispatch_id
func_id
task_id
subtask_id
core_type
core_id
block_idx
- mark the capture stage as
before_dispatch
- preserve the real
arg_index ordering seen by the kernel
- include tensor arg metadata:
dtype
ndims
shape
- pointer value if needed
- include scalar arg raw values with enough information to distinguish value/bits semantics
- include context pointer args separately from normal tensor/scalar args
A possible top-level schema would group args by dispatch and include:
schema_version
total_dispatches
total_args
dispatches[]
Alternatives Considered
- Reusing only
args_dump.json: insufficient, because it reflects orchestration-level arguments rather than real per-kernel dispatch payload layout.
- Reconstructing dispatch args offline from existing dump artifacts: possible only heuristically, and too fragile for downstream replay tooling.
Additional Context
Summary
Follow-up to PR #792:
--dump-argscurrently only exports orchestrator-level arguments totensor_dump/args_dump.json.Downstream Insight Trace needs the actual per-dispatch
kernel_entry(args)layout for individual incore kernels so it can replay a single kernel dispatch directly.Motivation / Use Case
The current
args_dump.jsonis useful for orchestration-level inspection, but it is not sufficient to reconstruct one real kernel dispatch such as QK / SF / PV / UP.Insight Trace needs the finalized args after scheduler payload construction, including the real slot ordering and per-dispatch metadata. Without that, downstream tooling cannot reliably replay one incore kernel from dump artifacts.
Proposed API / Behavior
Add a separate kernel-level dump artifact, for example:
This new dump should:
tensor_dump/args_dump.jsonunchanged for compatibilitykernel_entry(args)layoutdispatch_idfunc_idtask_idsubtask_idcore_typecore_idblock_idxbefore_dispatcharg_indexordering seen by the kerneldtypendimsshapeA possible top-level schema would group args by dispatch and include:
schema_versiontotal_dispatchestotal_argsdispatches[]Alternatives Considered
args_dump.json: insufficient, because it reflects orchestration-level arguments rather than real per-kernel dispatch payload layout.Additional Context