Skip to content

Skip per-boot update-initramfs when nouveau blacklist is baked into the VHD#161

Draft
ganeshkumarashok wants to merge 1 commit into
Azure:mainfrom
ganeshkumarashok:gpu-nouveau-bake-skip
Draft

Skip per-boot update-initramfs when nouveau blacklist is baked into the VHD#161
ganeshkumarashok wants to merge 1 commit into
Azure:mainfrom
ganeshkumarashok:gpu-nouveau-bake-skip

Conversation

@ganeshkumarashok
Copy link
Copy Markdown
Collaborator

What

install.sh blacklists nouveau and runs update-initramfs -u on every node boot (~10-30s). This change skips that rebuild when AgentBaker has already baked the nouveau blacklist into the VHD initramfs for the running kernel.

The fast path engages only when all of:

  • the kernel-gated marker /opt/azure/aks-gpu/nouveau-blacklist-marker exists and its kernel= line matches uname -r, and
  • the on-disk /etc/modprobe.d/blacklist-nouveau.conf is byte-identical (cmp) to the image's /opt/gpu/blacklist-nouveau.conf.

Any mismatch — older VHD without the marker, kernel drift between VHD build and node boot, or altered content — falls back to the original cp + update-initramfs path. Backward/forward compatible in both directions.

Why

GPU provisioning-time reduction. Removes a deterministic ~10-30s boot cost and makes nouveau blacklisted from first boot.

Cross-repo dependency

The win only materializes once the VHD writes the marker. Pairs with Azure/AgentBaker PR (bake nouveau blacklist into the Ubuntu VHD). Draft until that lands and is validated on a VHD build. This PR is safe to ship independently (no marker ⇒ unchanged behavior).

…he VHD

install.sh blacklists nouveau and runs `update-initramfs -u` on every node boot
(~10-30s). When AgentBaker has already baked the nouveau blacklist into the VHD
initramfs for the running kernel, skip the rebuild. The fast path engages only
when a kernel-gated marker (/opt/azure/aks-gpu/nouveau-blacklist-marker) matches
the running kernel AND the on-disk blacklist content matches the image's copy,
so any VHD/image version skew, kernel drift, or content mismatch falls back to
the original copy + update-initramfs path. Backward/forward compatible with VHDs
that don't write the marker.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ganeshkumarashok
Copy link
Copy Markdown
Collaborator Author

Paired AgentBaker PR that writes the VHD marker: Azure/AgentBaker#8614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant