Skip to content

[Feature] Add per-kernel dispatch args dump for Insight Trace #837

@vegetabledoww

Description

@vegetabledoww

Summary

Follow-up to PR #792: --dump-args currently only exports orchestrator-level arguments to tensor_dump/args_dump.json.

Downstream Insight Trace needs the actual per-dispatch kernel_entry(args) layout for individual incore kernels so it can replay a single kernel dispatch directly.

Motivation / Use Case

The current args_dump.json is useful for orchestration-level inspection, but it is not sufficient to reconstruct one real kernel dispatch such as QK / SF / PV / UP.

Insight Trace needs the finalized args after scheduler payload construction, including the real slot ordering and per-dispatch metadata. Without that, downstream tooling cannot reliably replay one incore kernel from dump artifacts.

Proposed API / Behavior

Add a separate kernel-level dump artifact, for example:

tensor_dump/kernel_args_dump.json

This new dump should:

  • keep existing tensor_dump/args_dump.json unchanged for compatibility
  • capture records after scheduler payload construction, using the actual kernel_entry(args) layout
  • include per-dispatch identifiers such as:
    • dispatch_id
    • func_id
    • task_id
    • subtask_id
    • core_type
    • core_id
    • block_idx
  • mark the capture stage as before_dispatch
  • preserve the real arg_index ordering seen by the kernel
  • include tensor arg metadata:
    • dtype
    • ndims
    • shape
    • pointer value if needed
  • include scalar arg raw values with enough information to distinguish value/bits semantics
  • include context pointer args separately from normal tensor/scalar args

A possible top-level schema would group args by dispatch and include:

  • schema_version
  • total_dispatches
  • total_args
  • dispatches[]

Alternatives Considered

  • Reusing only args_dump.json: insufficient, because it reflects orchestration-level arguments rather than real per-kernel dispatch payload layout.
  • Reconstructing dispatch args offline from existing dump artifacts: possible only heuristically, and too fragile for downstream replay tooling.

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions