Skip to content

dstack-mr: diagnose subcommand for operator-facing MR verification#679

Draft
Leechael wants to merge 1 commit into
fix/dstack-mr-ovmf-202505-eventsfrom
feat/dstack-mr-diagnose
Draft

dstack-mr: diagnose subcommand for operator-facing MR verification#679
Leechael wants to merge 1 commit into
fix/dstack-mr-ovmf-202505-eventsfrom
feat/dstack-mr-diagnose

Conversation

@Leechael
Copy link
Copy Markdown
Collaborator

Summary

Adds a dstack-mr diagnose subcommand that takes a VmConfig JSON (the same
payload VMM serializes into KMS metadata) plus an image directory, computes
the expected MRTD / RTMR0-2 via Machine::measure_with_logs(), and prints
each RTMR0 event log entry with a semantic label and what it varies with.
Optionally compares against --actual-{mrtd,rtmr0,rtmr1,rtmr2} hex from a
quote and reports MATCH / MISMATCH per measurement.

Stacked on top of #678 (fix/dstack-mr-ovmf-202505-events) because the
labels follow the two OVMF event-log layouts introduced there.

Why

While debugging the 0.5.9 → 0.5.10 RTMR0 mismatch, the iteration cost of
"rebuild KMS image → redeploy → trigger onboard → read mismatch" was the
main bottleneck. A small offline tool that consumes the same VmConfig
schema as the verifier and prints each RTMR0 event entry with a label lets
operators:

  • Verify whether a given (VmConfig, image) pair produces the expected MRs
    without rebuilding any service image.
  • Locate which RTMR0 event entry drifted when actual ≠ expected — e.g. the
    new edk2-stable202505 events (fwcfg:BootMenu, fwcfg:bootorder,
    variable_authority, Boot0001) vs the legacy acpi_* and Boot0000.

Approach

  • New Diagnose(DiagnoseConfig) subcommand alongside existing Measure.
  • OvmfVariant resolution mirrors the verifier order:
    vm_config.ovmf_variant > image_info.ovmf_variant >
    ovmf_variant_for_version(image_info.version) >
    ovmf_variant_for_image(vm_config.image).
  • Labels are tabulated per variant (13 entries for Pre202505, 17 for
    Stable202505), each row tagged with whether the hash is fixed or which
    inputs it varies with.

Usage

dstack-mr diagnose \
    --vm-config /path/to/vm_config.json \
    --image-dir /opt/dstack/dstack-images/dstack-0.5.10-... \
    [--actual-mrtd HEX] [--actual-rtmr0 HEX] \
    [--actual-rtmr1 HEX] [--actual-rtmr2 HEX] \
    [--json]

When --actual-* is provided, exit code is non-zero on any mismatch.

Test plan

  • cargo fmt -p dstack-mr-cli
  • cargo clippy -p dstack-mr-cli --all-features -- -D warnings clean
  • cargo check -p dstack-mr-cli clean
  • End-to-end on tdx host: dstack-mr diagnose against a real VmConfig
    and 0.5.10 image produces expected MRTD / RTMR0-2 plus the 17-entry
    labeled event log
  • --actual-rtmr0 <self> reports MATCH; intentionally wrong actual
    reports MISMATCH with both hex values printed and exits non-zero

Backward compatibility

Draft pending #678 merge.

Takes a VmConfig JSON (the same payload VMM serializes into KMS metadata)
plus an image directory, computes the expected MRTD/RTMR0-2 via
Machine::measure_with_logs(), and prints each RTMR0 event log entry with a
semantic label and what it varies with. Labels switch between the legacy
13-event layout (Pre202505) and the edk2-stable202505 17-event layout based
on the resolved OvmfVariant.

OvmfVariant resolution follows the verifier order: explicit
vm_config.ovmf_variant > image_info.ovmf_variant > parse image_info.version
> ovmf_variant_for_image(vm_config.image) fallback.

Optionally accepts --actual-{mrtd,rtmr0,rtmr1,rtmr2} hex strings to compare
against and report MATCH/MISMATCH per measurement.

Intended as an operator-facing acceptance tool: validate a given VmConfig +
image combination produces the expected MRs without rebuilding or
redeploying KMS. When a quote-side mismatch shows up, this lets you locate
which RTMR0 event entry drifted (e.g. acpi_* group vs new
fwcfg:* / variable_authority / Boot* introduced by edk2-stable202505).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant