Skip to content

dtls: reject malformed handshake record tails#139

Open
zRedShift wants to merge 1 commit into
algesten:mainfrom
zRedShift:fix/handshake-record-malformed-tail
Open

dtls: reject malformed handshake record tails#139
zRedShift wants to merge 1 commit into
algesten:mainfrom
zRedShift:fix/handshake-record-malformed-tail

Conversation

@zRedShift
Copy link
Copy Markdown
Contributor

Handshake record parsing could accept a valid leading handshake while ignoring a
malformed trailing handshake in the same record. That can leave receive/replay
state committed for a record whose full handshake payload was not valid.

This requires exact same-record handshake parsing for DTLS 1.2 and DTLS 1.3,
defers replay commit until protected records decrypt and parse cleanly, and
discards malformed buffered protected handshakes so they cannot pin recovery.

This intentionally does not try to solve broader multi-record/datagram replay
transactionality.

Line delta:

area added removed
non-test code 293 68
tests 100 0
docs/changelog 1 0
total 394 68

Validation:

  • focused malformed-tail regressions
  • cargo fmt --check
  • git diff --check
  • git diff --check upstream/main...HEAD
  • /home/ronen/.codex/skills/dimpl/scripts/check-snowflake-local.pl upstream/main
  • cargo test --all-targets --features rcgen
  • cargo clippy --all-targets --features rcgen -- -D warnings
  • cargo test --no-default-features --features rust-crypto
  • cargo clippy --no-default-features --features rust-crypto -- -D warnings
  • cargo test --doc --features rcgen

@algesten
Copy link
Copy Markdown
Owner

This looks almost exactly like that other case where we decided to not consider packets all or nothing.

Is this really guarding for a real concern?

@zRedShift
Copy link
Copy Markdown
Contributor Author

The reason I think this is different from the broader all-or-nothing datagram question is that this is scoped to one Handshake record, not the whole UDP datagram.

If the record header says the fragment contains N bytes of handshake messages, then accepting only a valid prefix and ignoring a malformed trailing handshake means we are accepting a record whose declared handshake payload was not actually parseable.

For plaintext this is mostly strict framing hygiene. For protected records it matters more, because the record has authenticated and replay/bookkeeping may advance. I think replay state should only be committed after the decrypted record payload is fully accepted, otherwise a malformed authenticated record can partially mutate handshake/replay state and make recovery depend on retransmission edge cases.

The fallout from not doing this is limited: mostly permissive acceptance of a malformed record, and in protected-record edge cases possibly replay/handshake state advancing for a record that later has to be treated as malformed.

@algesten
Copy link
Copy Markdown
Owner

Thanks — but I don't think the protected-record part holds up, and that's the part that carried the security argument.

replay_update is already gated on AEAD success: a record that fails to decrypt returns Ok(None) and never reaches the window update (the existing RFC 6347 §4.1.2.6 comment spells this out). So garbage sprayed at a protected epoch never advances replay state in either ordering — the reorder doesn't move that boundary.

That means the only record that hits the new gap — authenticates fine, but the decrypted payload fails to parse as handshakes — has to come from the genuine peer, since forging an authenticated record needs the key. That's a buggy peer, not an attacker.

And the "consume the retransmission slot for a clean record" concern doesn't match DTLS framing: retransmissions don't reuse the record sequence number. The handshake message_seq stays fixed but each on-wire record gets a fresh, increasing record sequence. The replay window blocks exact-sequence replays, not retransmissions — so advancing the window for the malformed record's seq N can't block the retransmit, which arrives at seq > N. There's no clean record waiting for slot N.

I think we are largely in the same bucket as the all-or-nothing-datagram case. I'm inclined to pass on this unless there's a concrete case where an authenticated-but-unparseable record from the real peer actually wedges recovery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants