Skip to content

[26.04_linux-nvidia-bos] DGX-16136: backport CXL Type-2 dependencies#426

Closed
kobak2026 wants to merge 25 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
kobak2026:bug-DGX-16136/cxl-backport-26.04-bos
Closed

[26.04_linux-nvidia-bos] DGX-16136: backport CXL Type-2 dependencies#426
kobak2026 wants to merge 25 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
kobak2026:bug-DGX-16136/cxl-backport-26.04-bos

Conversation

@kobak2026
Copy link
Copy Markdown
Collaborator

@kobak2026 kobak2026 commented May 18, 2026

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2153819

Summary

Backport the CXL Type-2 dependency stack onto 26.04_linux-nvidia-bos.

This series brings in CXL Type-2 enablement for NVIDIA/SFC CXL accelerator plumbing, updates ATS always-on handling to Nicolin Chen's v4 series, includes required CXL region fixes, and adds the NVIDIA CXL config annotations needed by the stack.

Included changes

  • Add CXL Type-2 prerequisites:
    • export internal CXL structs for external Type-2 drivers
    • support Type-2 initialization through cxl_dev_state
  • Backport Alejandro Lucero's CXL Type-2 v26 series:
    • SFC CXL support
    • CXL register mapping for SFC
    • DPA initialization without mailbox
    • Type-2 memdev creation
    • accelerator region attachment
    • DAX avoidance for accelerators
    • SFC PIO mapping based on CXL
  • Replace stale ATS v1 import with Nicolin Chen's ATS v4 series:
    • CXL.cache ATS always-on
    • pre-CXL device ATS always-on quirks
    • arm-smmu-v3 ATS always-on support
  • Add CXL region fixes:
    • skip decoder reset on detach for autodiscovered regions
    • support multi-level interleaving with smaller granularities for lower levels
  • Add NVIDIA config updates:
    • Type-2 / RAS config annotations
    • CXL DAX/KMEM config
    • PCI_CXL annotation for CXL state save/restore

Not included

  • Vishal Aslot's zero-sized decoder patches are already present in 26.04-bos:
    • 20ff7877b5a5cxl: Allow zero sized HDM decoders
    • 7e237452e5f7cxl_test: enable zero sized decoders under hb0
  • Koba's DPA partition discovery infinite-loop fix is already present in the target branch.
    • Upstream equivalent: d4026a446264 (cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve())
    • No duplicate backport added.
  • [NACK] Koba Ko's cxl region partition index validation before array access patch is not included.

Verification

Build verification completed:

  • Full kernel build passed.
  • Full modules build passed.

Runtime verification completed:

  • Installed and booted the built kernel on the Type-2 CXL host.
  • Booted kernel:
    • 7.0.0-vfio-cxl-downstream-2026-05-14
  • Type-2 GPU CXL DVSEC verification:
    • visible GPUs: 2
    • CXL DVSEC blocks: 2
    • Range1 Valid+ Active+: 2
  • cxl list -BMRDu exits successfully on the Type-2 host.
  • No unknown-symbol / undefined-symbol errors observed.
  • No cxl_pci probe failure observed.

Additional Type-3 evidence:

  • veraos-43 exposes two CXL Type-3 devices under PCI domain 0003.
  • Current firmware exposes two committed 1-way RAM regions, not one 2-way interleaved region.
  • ACPI CEDT confirms two single-target CFMWS windows.
  • No known Robert interleave failure signatures were observed:
    • no invalid granularity calculation (16384 * 2)
    • no failed to attach decoder... -22/-6
    • no failed to find decoder mapping

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ✅ All checks passed

Details
Checking 25 commits...

Cherry-pick digest:
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 5268ad904265 │ [SAUCE] [config] add pci_cxl annotation for cxl state save/resto │ N/A        │ N/A     │ jan, bfigg, kobak         │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ e3aadf637883 │ [SAUCE] [config] enable cxl dax and kmem built-in for cxl memory │ N/A        │ N/A     │ jan, bfigg, kobak         │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ a2eba9dce2b6 │ [SAUCE] [config] cxl config annotations for type-2 device and ra │ N/A        │ N/A     │ jan, bfigg, kobak         │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 7a6aebb18177 │ cxl/region: support multi-level interleaving with smaller granul │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 8cb4feab45af │ [SAUCE] dax/hmem: reintroduce soft reserved ranges back into the │ N/A        │ N/A     │ schofiel, lizhijia, Koral │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c0db8c8c8fda │ [SAUCE] dax/hmem, cxl: defer and resolve ownership of soft reser │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2e433461671f │ [SAUCE] dax: add deferred-work helpers for dax_hmem and dax_cxl  │ N/A        │ N/A     │ Koralaha, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a6c6f56dc2c │ cxl/region: add helper to check soft reserved containment by cxl │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a52fc67323f │ [SAUCE] dax: track all dax_region allocations under a global res │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 79356d9fd78d │ [SAUCE] dax/cxl, hmem: initialize hmem early and defer dax_cxl b │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d8ce89e1116e │ [SAUCE] cxl/region: skip decoder reset on detach for autodiscove │ N/A        │ N/A     │ Koralaha, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 77ac7f9272ba │ [SAUCE] dax/hmem: gate soft reserved deferral on dev_dax_cxl     │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1d651dbee5bc │ [SAUCE] dax/hmem: request cxl_acpi and cxl_pci before walking so │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 0fdb2010fcfc │ sfc: support pio mapping based on cxl                            │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d6acd1ddf9f6 │ [SAUCE] cxl: avoid dax creation for accelerators                 │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 388dc8c0c90a │ cxl: attach region to an accelerator/type2 memdev                │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9c9123639c35 │ [SAUCE] sfc: create type2 cxl memdev                             │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 79654508fdaf │ [SAUCE] cxl: prepare memdev creation for type2                   │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ cb8c41246ed1 │ [SAUCE] cxl/sfc: initialize dpa without a mailbox                │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ba9972ccf965 │ cxl/sfc: map cxl regs                                            │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 513a2a51e219 │ [SAUCE] sfc: add cxl support                                     │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d3059e8eab00 │ d537d953c478 cxl/pci: Remove redundant cxl_pci_find_port() call  │ match      │ match   │ preserved + kobak added   │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 8b61bd238bbb │ 58f28930c7fb cxl: Move pci generic code from cxl_pci to core/cxl │ match      │ match   │ preserved + kobak added   │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2bc7fd2fd558 │ 005869886d1d cxl: export internal structs for external Type2 dri │ match      │ match   │ preserved + kobak added   │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 645dd5c6be9f │ 9a775c07bb04 cxl: support Type2 when initializing cxl_dev_state  │ match      │ match   │ preserved + kobak added   │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented May 18, 2026

Boro review

Latest watcher review: open review

Head: 5268ad904265

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 19, 2026

@kobak2026 Several comments...

General comment, please use this format for the pick tag: (Note: no need to include “linux” when the SHA is the upstream SHA)

(cherry picked from commit c6890f36fc49848c61d2113a3442eb1b59e0bc4b)
Signed-off-by: XXXX

1ce2941 NVIDIA: SAUCE: iommu/arm-smmu-v3: Allow ATS to be always on
a575cdf NVIDIA: SAUCE: PCI: Allow ATS to be always on for pre-CXL devices
3765488 NVIDIA: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices
afabb59 Revert "NVIDIA: VR: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices”

Nirmoy already handled these via #401

Please rebase to most recent version of branch and you’ll get them.


6f5df25 cxl: export internal structs for external Type2 drivers

Why does this patch list you as the author? Where is the upstream provenance?

Also, this patch differs from the source. If that is intentional, can you add a backport note?


For both this patch

2cb8ec3 NVIDIA: SAUCE: cxl/region: Skip decoder reset on detach for autodiscovered regions

and these patches

54d3cbf cxl: support Type2 when initializing cxl_dev_state
6f5df25 cxl: export internal structs for external Type2 drivers

They were part of larger series. How did you determine that only these patches were needed and that other patches from the series can be ignored?


20eddbc NVIDIA: SAUCE: cxl: attach region to an accelerator/type2 memdev

Codex found this critical issues from this patch:

  1. drivers/net/ethernet/sfc/efx_cxl.c:116 dereferences probe_data->cxl before it is assigned at line 123. This can crash immediately during CXL-capable SFC probe.
  2. drivers/cxl/core/region.c:4270 assumes cxlmd->endpoint is valid and then locks &endpoint->dev. If the memdev registered but topology attach failed, endpoint can still be ERR_PTR(-ENXIO), causing a crash.

Comment thread drivers/net/ethernet/sfc/efx_cxl.c Outdated
@kobak2026 kobak2026 force-pushed the bug-DGX-16136/cxl-backport-26.04-bos branch from 02b7f23 to 5c1f6a4 Compare May 19, 2026 07:44
@kobak2026
Copy link
Copy Markdown
Collaborator Author

kobak2026 commented May 19, 2026

@jamieNguyenNVIDIA @nvmochs thanks

Update pushed for DGX-16136 CXL backport:

  • Reworked the branch as whole source-series chunks instead of isolated picks.
  • Type2 prep is now the full 4-patch upstream series, followed by the Alejandro Type2/SFC series.
  • Smita CXL EINJ/Soft Reserved handling is now the full v6 9-patch series, contiguous and in source order.
  • Addressed Matt review items: pick-tag format, upstream author/provenance, whole-series concern, and the SFC probe / endpoint ERR_PTR crash findings.
  • Addressed Jamie review item: cxl_memdev_attach_region() now unwinds attach side effects if devres region cleanup registration fails.
  • Metadata cleanup: added missing Koba signoffs, required Source URLs, and explicit backported-from wording where needed.

Validation:

  • Commit-message/provenance audit passed after the fixes.
  • git diff --check passed for the branch delta.
  • Whole-kernel arm64 build passed earlier on the reworked branch before the final metadata/unwind amend; no install target was run.

Latest pushed branch tip:
5c1f6a4

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 19, 2026

I got this feedback with codex: drivers/cxl/core/region.c:2645: DEVMEM regions no longer get the port->uport_dev unregister action, but existing cleanup paths still assume it exists. For example delete_region_store() at drivers/cxl/core/region.c:2778 calls devm_release_action(port->uport_dev, unregister_region, cxlr) and returns success; for DEVMEM this does not unregister anything. The construct failure path at drivers/cxl/core/region.c:3953 has the same problem and can leak the region.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 19, 2026

@kobak2026 Thanks for addressing my last round of comments.

Some additional feedback...

General comment: Can you use add VR to the commit title tags? e.g.: NVIDIA: VR: SAUCE:


293d110fe04f NVIDIA: SAUCE: dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
7511ba2ae268 NVIDIA: SAUCE: dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
5e706ddfead3 NVIDIA: SAUCE: dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
518319bcfdee NVIDIA: SAUCE: cxl/region: Add helper to check Soft Reserved containment by CXL regions
cbcf25ace9a2 NVIDIA: SAUCE: dax: Track all dax_region allocations under a global resource tree
9ee45264978f NVIDIA: SAUCE: dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
b98635c71697 NVIDIA: SAUCE: cxl/region: Skip decoder reset on detach for autodiscovered regions
26fa27c832b0 NVIDIA: SAUCE: dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
6ef116cb2ed0 NVIDIA: SAUCE: dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges

Nit: Several of these patches are missing “backported from” tags.


8c1b27b NVIDIA: SAUCE: cxl: attach region to an accelerator/type2 memdev

Can the note be updated to also include the changes to the devres registration flow?

e.g.

[kobak: Check cxl_memdev_attach_region() errors and propagate failure so SFC probe does not continue after CXL core tears down the attached region. Set probe_data->cxl before attaching so the attach callback can use it, guard attach
  attempts before a valid endpoint exists, and explicitly unwind attach/autoremove side effects if devres action registration fails.]

314511028df9 NVIDIA: SAUCE: sfc: create type2 cxl memdev
d525164daa4e NVIDIA: SAUCE: cxl: Prepare memdev creation for type2
32e1b559c6b5 NVIDIA: SAUCE: cxl/sfc: Initialize dpa without a mailbox
b348bc471b1e NVIDIA: SAUCE: cxl/sfc: Map cxl regs
13aa2f47dd89 NVIDIA: SAUCE: sfc: add cxl support

Nit: A few of these commits have extra blank lines between the pick tags and sign-off.

e.g.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
    
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-6-alejandro.lucero-palau@amd.com)
    
Signed-off-by: Koba Ko <kobak@nvidia.com>

vs.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-6-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>

alucerop and others added 9 commits May 20, 2026 14:21
In preparation for type2 drivers add function and macro for
differentiating CXL memory expanders (type 3) from CXL device
accelerators (type 2) helping drivers built from public headers
to embed struct cxl_dev_state inside a private struct.

Update type3 driver for using this same initialization.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260306164741.3796372-2-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 9a775c0)
Signed-off-by: Koba Ko <kobak@nvidia.com>
In preparation for type2 support, move structs and functions a type2
driver will need to access to into a new shared header file.

Differentiate between public and private data to be preserved by type2
drivers.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260306164741.3796372-3-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 0058698)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
meanwhile cxl/pci_drv.c implements the functionality for a Type3 device
initialization.

In preparation for type2 support, move helper functions from cxl/pci.c to
cxl/core/pci.c in order to be exported and used by type2 drivers.

[ dj: Clarified subject. ]

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20260306164741.3796372-4-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 58f2893)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Remove the redundant port lookup from cxl_rcrb_get_comp_regs() and use the
dport parameter directly. The caller has already validated the port is
non-NULL before invoking this function, and dport is given as a param.
This is simpler than getting dport in the callee and return the pointer
to the caller what would require more changes.

Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://patch.msgid.link/20260306164741.3796372-5-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit d537d95)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Add CXL initialization based on new CXL API for accel drivers and make
it dependent on kernel CXL configuration.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-2-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Export cxl core functions for a Type2 driver being able to discover and
map the device registers.

Use it in sfc driver cxl initialization.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
(backported from https://lore.kernel.org/r/20260423180528.17166-3-alejandro.lucero-palau@amd.com)
[kobak: Kept cxl_pci_setup_regs() in the core/pci provider added by the full Type2 prerequisite series and dropped the duplicate provider hunk from drivers/cxl/pci.c.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
memdev state params which end up being used for DPA initialization.

Allow a Type2 driver to initialize DPA simply by giving the size of its
volatile hardware partition.

Move related functions to memdev.

Add sfc driver as the client.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-4-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
creating a memdev leading to problems when obtaining cxl_memdev_state
references from a CXL_DEVTYPE_DEVMEM type.

Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
support.

Make devm_cxl_add_memdev accessible from an accel driver.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-5-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Use cxl API for creating a cxl memory device using the type2
cxl_dev_state struct.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-6-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
@kobak2026 kobak2026 force-pushed the bug-DGX-16136/cxl-backport-26.04-bos branch from 5c1f6a4 to 918f703 Compare May 20, 2026 06:51
@kobak2026
Copy link
Copy Markdown
Collaborator Author

@nvmochs @clsotog thanks for the review feedback.
Updated the DGX-16136 CXL backport branch.
Addressed Matt's latest comments:

  • Added VR to NVIDIA SAUCE commit title tags.
  • Fixed provenance trailers: exact matches use cherry picked from, local
    adaptations use backported from.
  • Expanded the Type2 attach [kobak: ...] note to cover the devres
    registration-flow changes.
  • Removed extra blank lines between review/pick/sign-off trailers.
    Addressed Carol's DEVMEM cleanup feedback:
  • DEVMEM autodiscovery now preserves the endpoint decoder target type.
  • delete_region_store() and construct-failure cleanup now use an owner-aware
    cleanup helper.
  • HOSTONLY regions still release the root uport_dev action.
  • DEVMEM regions release endpoint-owned detach/autoremove actions under the
    endpoint device lock.
    Validation:
  • Commit-message/provenance audit passed.
  • git diff --check passed.
  • scripts/checkpatch.pl --strict passed.
  • drivers/cxl/core/region.o compile passed with CONFIG_CXL_REGION=y.
  • Remote whole-kernel arm64 build passed.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 20, 2026

Thanks Koba!

I confirmed that you addressed my findings and reviewed the DEVMEM cleanup with codex and did not spot any issues.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

Comment thread drivers/cxl/core/region.c
struct cxl_endpoint_decoder *cxled = cxlr->params.targets[0];
struct cxl_port *endpoint = cxled_to_port(cxled);

guard(device)(&endpoint->dev);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude suggests that there still might be a race here:

  T1: cxl_region_release_action(cxlr)        T2: cxl_memdev_attach_region(cxlmd, &attach)                               
  ─────────────────────────────────────       ───────────────────────────────────────────                               
  cxlr->type == DEVMEM ✓                                                                                                
  nr_targets > 0, cxlr->detach == NULL                                                                                  
  acquire endpoint->dev lock                                                                                            
  release endpoint->dev lock (end of block)   acquire endpoint->dev lock                                                
                                              device_find_child → cxled, cxlr                                           
                                              attach->attach() → ioremap succeeds                                       
                                              devm_add_action(&endpoint->dev,                                           
                                                cxl_endpoint_region_autoremove, cxlr)                                   
                                              devm_add_action_or_reset(detach)                                          
                                              cxlr->detach = attach->detach                                             
                                              return 0; release endpoint->dev lock                                      
  unregister_region(cxlr)                                                                                               
    device_del(&cxlr->dev)                                                                                              
    detach_target loop drops cxld->region ref                                                                           
    put_device → cxlr FREED                                                                                             
                                                                                                                        
  [later] &endpoint->dev teardown →                                                                                     
    cxl_endpoint_region_autoremove(cxlr)  ← UAF on freed cxlr  

It further suggests keeping the teardown inside the guarded scope:

  --- a/drivers/cxl/core/region.c                                                                                       
  +++ b/drivers/cxl/core/region.c                                                                                       
  @@ -2481,8 +2481,8 @@ static void cxl_region_release_action(struct cxl_region *cxlr)                                  
        if (cxlr->params.nr_targets) {                                                                                  
                struct cxl_endpoint_decoder *cxled = cxlr->params.targets[0];                                          
                struct cxl_port *endpoint = cxled_to_port(cxled);                                             
  +             guard(device)(&endpoint->dev);                                                                          
                                                                                                                        
  -             guard(device)(&endpoint->dev);                                                                          
                if (cxlr->detach) {                                                                                     
                        void (*detach)(void *data) = cxlr->detach;                                                      
                        void *detach_data = cxlr->detach_data;                                                
  @@ -2493,11 +2493,11 @@ static void cxl_region_release_action(struct cxl_region *cxlr)                      
                        devm_release_action(&endpoint->dev,                                                             
                                            cxl_endpoint_region_autoremove,                                             
                                            cxlr);                                                                      
  -                     return;                                                                                         
  +             } else {                                                                                       
  +                     unregister_region(cxlr);                                                                        
                }                                                                                                       
  +             return;                                                                                                 
        }                                                    
                                                             
        unregister_region(cxlr);
   }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamieNguyenNVIDIA thanks
this race is fixed in the current pushed DGX-16136 branch.
The DEVMEM no-detach path now keeps unregister_region(cxlr) inside the
endpoint->dev guard in cxl_region_release_action(). That means
cxl_memdev_attach_region() cannot acquire the same endpoint lock and install
endpoint devres actions between observing cxlr->detach == NULL and freeing the
region.
Verified pushed tip:
d24cf0e

@clsotog clsotog self-requested a review May 20, 2026 17:36
Copy link
Copy Markdown
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

@kobak2026 kobak2026 force-pushed the bug-DGX-16136/cxl-backport-26.04-bos branch from 918f703 to d24cf0e Compare May 21, 2026 03:34
@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Acked-by: Jamie Nguyen <jamien@nvidia.com>

Thanks @kobak2026!

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 21, 2026

Acked-by: Carol L Soto <csoto@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

@kobak2026

Here is an updated version of what Codex found...

The finding is in 1a24ae4, in drivers/cxl/core/region.c:cxl_memdev_attach_region().

Current flow:

  guard(device)(&endpoint->dev);
  guard(rwsem_read)(&cxl_rwsem.region);
  guard(rwsem_read)(&cxl_rwsem.dpa);

  rc = attach->attach(attach->data);
  if (rc)
          return rc;

  rc = devm_add_action(&endpoint->dev, cxl_endpoint_region_autoremove, cxlr);
  if (rc) {
          attach->detach(attach->data);
          cxl_endpoint_region_autoremove(cxlr);
          return rc;
  }

  rc = devm_add_action_or_reset(&endpoint->dev, attach->detach, attach->data);
  if (rc) {
          devm_release_action(&endpoint->dev,
                              cxl_endpoint_region_autoremove, cxlr);
          return rc;
  }

The problem is that both failure paths can run cxl_endpoint_region_autoremove() while cxl_rwsem.region is still held for read.

Call path:

cxl_memdev_attach_region()
holds cxl_rwsem.region read lock
cxl_endpoint_region_autoremove(cxlr)
unregister_region(cxlr)
detach_target(cxlr, i)
cxl_decoder_detach(... DETACH_ONLY)
down_write_killable(&cxl_rwsem.region)

That is a same-thread lock upgrade from read to write. The read lock cannot be dropped because the thread is blocked trying to acquire the write lock, so this can deadlock.

The second failure path has the same issue because devm_release_action() runs the action. It does not merely unregister the action. So this line:

devm_release_action(&endpoint->dev, cxl_endpoint_region_autoremove, cxlr);

also calls cxl_endpoint_region_autoremove(cxlr) while the region read lock is still held.

The fix should preserve two things:

  1. Do not run unregister_region() while holding cxl_rwsem.region for read.
  2. Keep the endpoint device lock held while unregistering, so the no-detach race Koba just fixed does not come back.

The shape I would suggest is:

  guard(device)(&endpoint->dev);

  {
          guard(rwsem_read)(&cxl_rwsem.region);
          guard(rwsem_read)(&cxl_rwsem.dpa);

          ...
          rc = attach->attach(attach->data);
          if (rc)
                  return rc;

          rc = devm_add_action(&endpoint->dev,
                               cxl_endpoint_region_autoremove, cxlr);
          if (rc) {
                  attach->detach(attach->data);
                  goto err_unregister;
          }

          rc = devm_add_action_or_reset(&endpoint->dev,
                                        attach->detach, attach->data);
          if (rc) {
                  devm_remove_action(&endpoint->dev,
                                     cxl_endpoint_region_autoremove, cxlr);
                  goto err_unregister;
          }

          cxlr->detach = attach->detach;
          cxlr->detach_data = attach->data;
          return 0;
  }

  err_unregister:
          unregister_region(cxlr);
          return rc;

The important parts are the inner scope and devm_remove_action().

The inner scope releases cxl_rwsem.region and cxl_rwsem.dpa before err_unregister runs. The outer endpoint lock remains held across unregister_region(cxlr), which keeps the recent no-detach race fix intact. devm_remove_action() removes the endpoint autoremove action without invoking it; then the explicit unregister_region(cxlr) runs once, after the read locks are gone.

alucerop added 3 commits May 21, 2026 23:30
Support an accelerator driver to safely work with an autodiscovered
region from a committed HDM decoder through:

        1) an accelerator driver cxl_attach_region struct with attach
           and detach callbacks.

        2) a specific function, cxl_memdev_attach_region() keeping the
           required locks for finding a region linked to the memdev
           endpoint, and

        3) invoking attach callback while keeping the locking allowing to
           work (ioremap and other internal stuff) with the related physical
           range by the accelerator driver, and

        4) linking a detach callback to the endpoint device removal where
           the accelerator driver can stop using the region range.

This covers the cases of a potential removal of cxl_acpi module or a
accelerator memdev unbinding from cxl_mem driver through sysfs.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
(backported from https://lore.kernel.org/r/20260423180528.17166-7-alejandro.lucero-palau@amd.com)
[kobak: Check cxl_memdev_attach_region() errors and propagate failure so SFC probe does not continue after CXL core tears down the attached region. Set probe_data->cxl before attaching so the attach callback can use it, guard attach attempts before a valid endpoint exists, explicitly unwind attach/autoremove side effects if devres action registration fails, preserve DEVMEM target type for autodiscovered regions, and route delete / construct-failure cleanup through endpoint-owned devres actions.]
[kobak: Keep no-detach DEVMEM unregister under the endpoint-device guard so attach cannot install endpoint devres actions for a region being freed.]
[kobak: Avoid devres-registration failure cleanup under cxl_rwsem.region read lock: keep endpoint->dev locked, drop the region/DPA read guards before unregister_region(), and use devm_remove_action() so failed detach-action registration does not run cxl_endpoint_region_autoremove() under the read lock.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
By definition a type2 cxl device will use the host managed memory for
specific functionality, therefore it should not be available to other
uses like DAX.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-8-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
A PIO buffer is a region of device memory to which the driver can write a
packet for TX, with the device handling the transmit doorbell without
requiring a DMA for getting the packet data, which helps reducing latency
in certain exchanges. With CXL mem protocol this latency can be lowered
further.

With a device supporting CXL and successfully initialised, use the cxl
region to map the memory range and use this mapping for PIO buffers.

Add the disabling of those CXL-based PIO buffers if the callback for
potential cxl endpoint removal by the CXL core happens.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
(backported from https://lore.kernel.org/r/20260423180528.17166-9-alejandro.lucero-palau@amd.com)
[kobak: Added a !EFX_USE_PIO same-module stub for efx_ef10_disable_piobufs() so non-x86 builds that still enable CONFIG_SFC_CXL do not leave efx_cxl.o unresolved.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
djbw and others added 13 commits May 21, 2026 23:30
…ing Soft Reserved ranges

Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
Reserved ranges.

Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause HMEM to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.

Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before HMEM makes ownership decisions for Soft Reserved ranges.

Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
Soft Reserved ranges before CXL drivers have had a chance to claim them.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-2-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-2-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
so that HMEM only defers Soft Reserved ranges when CXL DAX support is
enabled. This makes the coordination between HMEM and the CXL stack more
precise and prevents deferral in unrelated CXL configurations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-3-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-3-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…iscovered regions

__cxl_decoder_detach() currently resets decoder programming whenever a
region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
autodiscovered regions, this can incorrectly tear down decoder state
that may be relied upon by other consumers or by subsequent ownership
decisions.

Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
set.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-4-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-4-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…_cxl binding

Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
dax_cxl.

In addition, defer registration of the dax_cxl driver to a workqueue
instead of using module_cxl_driver(). This ensures that dax_hmem has
an opportunity to initialize and register its deferred callback and make
ownership decisions before dax_cxl begins probing and claiming Soft
Reserved ranges.

Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
out of line from other synchronous probing avoiding ordering
dependencies while coordinating ownership decisions with dax_hmem.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-5-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-5-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…al resource tree

Introduce a global "DAX Regions" resource root and register each
dax_region->res under it via request_resource(). Release the resource on
dax_region teardown.

By enforcing a single global namespace for dax_region allocations, this
ensures only one of dax_hmem or dax_cxl can successfully register a
dax_region for a given range.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-6-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-6-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…ainment by CXL regions

Add a helper to determine whether a given Soft Reserved memory range is
fully contained within the committed CXL region.

This helper provides a primitive for policy decisions in subsequent
patches such as co-ordination with dax_hmem to determine whether CXL has
fully claimed ownership of Soft Reserved memory ranges.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-7-Smita.KoralahalliChannabasappa@amd.com
(backported from https://lore.kernel.org/r/20260210064501.157591-7-Smita.KoralahalliChannabasappa@amd.com)
[kobak: Added the Soft Reserved declaration to the existing Type2 include/cxl/cxl.h header instead of recreating that header.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…x_cxl coordination

Add helpers to register, queue and flush the deferred work.

These helpers allow dax_hmem to execute ownership resolution outside the
probe context before dax_cxl binds.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-8-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-8-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
… Reserved memory ranges

The current probe time ownership check for Soft Reserved memory based
solely on CXL window intersection is insufficient. dax_hmem probing is not
always guaranteed to run after CXL enumeration and region assembly, which
can lead to incorrect ownership decisions before the CXL stack has
finished publishing windows and assembling committed regions.

Introduce deferred ownership handling for Soft Reserved ranges that
intersect CXL windows. When such a range is encountered during dax_hmem
probe, schedule deferred work and wait for the CXL stack to complete
enumeration and region assembly before deciding ownership.

Evaluate ownership of Soft Reserved ranges based on CXL region
containment.

   - If all Soft Reserved ranges are fully contained within committed CXL
     regions, DROP handling Soft Reserved ranges from dax_hmem and allow
     dax_cxl to bind.

   - If any Soft Reserved range is not fully claimed by committed CXL
     region, REGISTER the Soft Reserved ranges with dax_hmem.

Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
ranges. Once, ownership resolution is complete, flush the deferred work
from dax_cxl before allowing dax_cxl to bind.

This enforces a strict ownership. Either CXL fully claims the Soft
reserved ranges or it relinquishes it entirely.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-9-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-9-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…to the iomem tree

Reworked from a patch by Alison Schofield <alison.schofield@intel.com>

Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
to consume.

This restores visibility in /proc/iomem for ranges actively in use, while
avoiding the early-boot conflicts that occurred when Soft Reserved was
published into iomem before CXL window and region discovery.

Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
Co-developed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…smaller granularities for lower levels

The CXL specification supports multi-level interleaving "as long as
all the levels use different, but consecutive, HPA bits to select the
target and no Interleave Set has more than 8 devices" (from 3.2).

Currently the kernel expects that a decoder's "interleave granularity
is a multiple of @parent_port granularity". That is, the granularity
of a lower level is bigger than those of the parent and uses the outer
HPA bits as selector. It works e.g. for the following 8-way config:

 * cross-link (cross-hostbridge config in CFMWS):
   * 4-way
   * 256 granularity
   * Selector: HPA[8:9]
 * sub-link (CXL Host bridge config of the HDM):
   * 2-way
   * 1024 granularity
   * Selector: HPA[10]

Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way
config could look like this:

 * cross-link (cross-hostbridge config in CFMWS):
   * 4-way
   * 512 granularity
   * Selector: HPA[9:10]
 * sub-link (CXL Host bridge config of the HDM):
   * 2-way
   * 256 granularity
   * Selector: HPA[8]

The enumeration of decoders for this configuration fails then with
following error:

 cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200]
 cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff
 cxl_port endpoint12: failed to attach decoder12.0 to region0: -6

Note that this happens only if firmware is setting up the decoders
(CXL_REGION_F_AUTO). For userspace region assembly the granularities
are chosen to increase from root down to the lower levels. That is,
outer HPA bits are always used for lower interleaving levels.

Rework the implementation to also support multi-level interleaving
with smaller granularities for lower levels. Determine the interleave
set of autodetected decoders. Check that it is a subset of the root
interleave.

The HPA selector bits are extracted for all decoders of the set and
checked that there is no overlap and bits are consecutive. All
decoders can be programmed now to use any bit range within the
region's target selector.

Signed-off-by: Robert Richter <rrichter@amd.com>
(backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/)
[kobak: resolved conflicts with cxlr->cxlrd and spa_maps_hpa()]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…and RAS support

BugLink: https://bugs.launchpad.net/bugs/2143032

Source: NVIDIA@f80636d

Add Ubuntu kernel config annotations for CXL-related configs introduced
or changed by the CXL Type-2, RAS, and autodiscovered-region support
backports.

CONFIG_CXL_BUS, CONFIG_CXL_PCI, CONFIG_CXL_MEM, and CONFIG_CXL_PORT are
built in for Type-2 device support. CONFIG_CXL_RAS and the EINJ symbols
cover CXL RAS/error-injection support. CONFIG_SFC_CXL remains disabled
for NVIDIA platforms.

Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit f80636d nv-kernels/24.04_linux-nvidia-6.17-next)
[kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos; PCIEAER_CXL is overridden as removed instead of editing debian.master.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…memory access

BugLink: https://bugs.launchpad.net/bugs/2143032

Source: NVIDIA@c5c11cf

Override debian.master policy for DEV_DAX, DEV_DAX_CXL, and
DEV_DAX_KMEM so CXL memory regions are available as raw DAX devices and
as hotplugged System-RAM without relying on module load ordering.

Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit c5c11cf nv-kernels/24.04_linux-nvidia-6.17-next)
[kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…/restore

BugLink: https://bugs.launchpad.net/bugs/2143032

Source: NVIDIA@a5544cb

Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by
the CXL DVSEC and HDM state save/restore series.

CONFIG_PCI_CXL is a hidden bool auto-enabled when CXL_BUS=y. It gates
compilation of drivers/pci/cxl.o, which saves and restores CXL DVSEC
control/range registers and HDM decoder state across PCI resets and
link transitions.

Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit a5544cb nv-kernels/24.04_linux-nvidia-6.17-next)
[kobak: Backported annotation override from debian.nvidia-6.17 to debian.nvidia-bos.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
@kobak2026 kobak2026 force-pushed the bug-DGX-16136/cxl-backport-26.04-bos branch from d24cf0e to 5268ad9 Compare May 21, 2026 16:20
@kobak2026
Copy link
Copy Markdown
Collaborator Author

@nvmochs thanks, I folded the devres cleanup fix into the attach-region commit and will rebase PR2.

The failure cleanup now keeps the endpoint device lock held for unregister
serialization, but moves the region/DPA read guards into an inner scope.
That means unregister_region(cxlr) runs only after cxl_rwsem.region and
cxl_rwsem.dpa have been released, while still preserving the no-detach race
fix.

I also changed the second devres failure path from devm_release_action() to
devm_remove_action(), so cxl_endpoint_region_autoremove() is removed without
being invoked under the region read lock.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

@nvmochs thanks, I folded the devres cleanup fix into the attach-region commit and will rebase PR2.

The failure cleanup now keeps the endpoint device lock held for unregister serialization, but moves the region/DPA read guards into an inner scope. That means unregister_region(cxlr) runs only after cxl_rwsem.region and cxl_rwsem.dpa have been released, while still preserving the no-detach race fix.

I also changed the second devres failure path from devm_release_action() to devm_remove_action(), so cxl_endpoint_region_autoremove() is removed without being invoked under the region read lock.

Thanks Koba, no further issues from me!

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

@kobak2026 Can you create a LP and then we can get this applied?

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Re-adding my ACK:

Acked-by: Jamie Nguyen <jamien@nvidia.com>

@kobak2026
Copy link
Copy Markdown
Collaborator Author

@kobak2026 Can you create a LP and then we can get this applied?

sure but one question
what kind of information I can put in LP?
could you give me a example.
thanks

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

@kobak2026 Can you create a LP and then we can get this applied?

sure but one question what kind of information I can put in LP? could you give me a example. thanks

Typically I put the content from the PR description. What is being backported, why it's needed, where it came from, etc.

@nvmochs nvmochs changed the title DGX-16136: backport CXL Type-2 dependencies [26.04_linux-nvidia-bos] DGX-16136: backport CXL Type-2 dependencies May 21, 2026
@nirmoy nirmoy added help wanted Extra attention is needed question Further information is requested labels May 21, 2026
@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

Merged, closing PR.

3353d1e04b38 NVIDIA: VR: SAUCE: [Config] Add PCI_CXL annotation for CXL state save/restore
55e78b240ed7 NVIDIA: VR: SAUCE: [Config] Enable CXL DAX and KMEM built-in for CXL memory access
32214ea51795 NVIDIA: VR: SAUCE: [Config] CXL config annotations for Type-2 device and RAS support
b40f1cbaf3e7 NVIDIA: VR: SAUCE: cxl/region: Support multi-level interleaving with smaller granularities for lower levels
f5cc28f01865 NVIDIA: VR: SAUCE: dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
e694b2cfa727 NVIDIA: VR: SAUCE: dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
8f050560bff3 NVIDIA: VR: SAUCE: dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
df2664403b6e NVIDIA: VR: SAUCE: cxl/region: Add helper to check Soft Reserved containment by CXL regions
bf0d72eba0cf NVIDIA: VR: SAUCE: dax: Track all dax_region allocations under a global resource tree
f1e7ea86c16d NVIDIA: VR: SAUCE: dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
e21a965b457c NVIDIA: VR: SAUCE: cxl/region: Skip decoder reset on detach for autodiscovered regions
7391cb17c5fe NVIDIA: VR: SAUCE: dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
ee96203d5e4c NVIDIA: VR: SAUCE: dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
1ddb3564f604 NVIDIA: VR: SAUCE: sfc: support pio mapping based on cxl
5ba804a71a76 NVIDIA: VR: SAUCE: cxl: Avoid dax creation for accelerators
48e604250ee8 NVIDIA: VR: SAUCE: cxl: attach region to an accelerator/type2 memdev
b1dbe9e15ddb NVIDIA: VR: SAUCE: sfc: create type2 cxl memdev
42b3ecbbeaeb NVIDIA: VR: SAUCE: cxl: Prepare memdev creation for type2
2beeedf46338 NVIDIA: VR: SAUCE: cxl/sfc: Initialize dpa without a mailbox
f1d9c24b3762 NVIDIA: VR: SAUCE: cxl/sfc: Map cxl regs
8b46db5f88f3 NVIDIA: VR: SAUCE: sfc: add cxl support
5fa362d71c97 cxl/pci: Remove redundant cxl_pci_find_port() call
19651939fff5 cxl: Move pci generic code from cxl_pci to core/cxl_pci
38034e611c99 cxl: export internal structs for external Type2 drivers
a3bb9bcd510f cxl: support Type2 when initializing cxl_dev_state

@nvmochs nvmochs closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants