[26.04_linux-nvidia-bos] DGX-16136: backport CXL Type-2 dependencies#426
[26.04_linux-nvidia-bos] DGX-16136: backport CXL Type-2 dependencies#426kobak2026 wants to merge 25 commits into
Conversation
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint ✅ All checks passedDetailsChecking 25 commits... Cherry-pick digest: ┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐ │ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 5268ad904265 │ [SAUCE] [config] add pci_cxl annotation for cxl state save/resto │ N/A │ N/A │ jan, bfigg, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e3aadf637883 │ [SAUCE] [config] enable cxl dax and kmem built-in for cxl memory │ N/A │ N/A │ jan, bfigg, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ a2eba9dce2b6 │ [SAUCE] [config] cxl config annotations for type-2 device and ra │ N/A │ N/A │ jan, bfigg, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 7a6aebb18177 │ cxl/region: support multi-level interleaving with smaller granul │ noted │ found │ ok, backporter: kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 8cb4feab45af │ [SAUCE] dax/hmem: reintroduce soft reserved ranges back into the │ N/A │ N/A │ schofiel, lizhijia, Koral │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ c0db8c8c8fda │ [SAUCE] dax/hmem, cxl: defer and resolve ownership of soft reser │ N/A │ N/A │ williams, Koralaha, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2e433461671f │ [SAUCE] dax: add deferred-work helpers for dax_hmem and dax_cxl │ N/A │ N/A │ Koralaha, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 3a6c6f56dc2c │ cxl/region: add helper to check soft reserved containment by cxl │ noted │ found │ ok, backporter: kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 3a52fc67323f │ [SAUCE] dax: track all dax_region allocations under a global res │ N/A │ N/A │ williams, Koralaha, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 79356d9fd78d │ [SAUCE] dax/cxl, hmem: initialize hmem early and defer dax_cxl b │ N/A │ N/A │ williams, Koralaha, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d8ce89e1116e │ [SAUCE] cxl/region: skip decoder reset on detach for autodiscove │ N/A │ N/A │ Koralaha, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 77ac7f9272ba │ [SAUCE] dax/hmem: gate soft reserved deferral on dev_dax_cxl │ N/A │ N/A │ williams, Koralaha, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 1d651dbee5bc │ [SAUCE] dax/hmem: request cxl_acpi and cxl_pci before walking so │ N/A │ N/A │ williams, Koralaha, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0fdb2010fcfc │ sfc: support pio mapping based on cxl │ noted │ found │ ok, backporter: kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d6acd1ddf9f6 │ [SAUCE] cxl: avoid dax creation for accelerators │ N/A │ N/A │ alucerop, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 388dc8c0c90a │ cxl: attach region to an accelerator/type2 memdev │ noted │ found │ ok, backporter: kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 9c9123639c35 │ [SAUCE] sfc: create type2 cxl memdev │ N/A │ N/A │ alucerop, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 79654508fdaf │ [SAUCE] cxl: prepare memdev creation for type2 │ N/A │ N/A │ alucerop, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ cb8c41246ed1 │ [SAUCE] cxl/sfc: initialize dpa without a mailbox │ N/A │ N/A │ alucerop, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ ba9972ccf965 │ cxl/sfc: map cxl regs │ noted │ found │ ok, backporter: kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 513a2a51e219 │ [SAUCE] sfc: add cxl support │ N/A │ N/A │ alucerop, kobak │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d3059e8eab00 │ d537d953c478 cxl/pci: Remove redundant cxl_pci_find_port() call │ match │ match │ preserved + kobak added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 8b61bd238bbb │ 58f28930c7fb cxl: Move pci generic code from cxl_pci to core/cxl │ match │ match │ preserved + kobak added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2bc7fd2fd558 │ 005869886d1d cxl: export internal structs for external Type2 dri │ match │ match │ preserved + kobak added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 645dd5c6be9f │ 9a775c07bb04 cxl: support Type2 when initializing cxl_dev_state │ match │ match │ preserved + kobak added │ └──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘ Lint: all checks passed. |
Boro reviewLatest watcher review: open review Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review. |
|
@kobak2026 Several comments... General comment, please use this format for the pick tag: (Note: no need to include “linux” when the SHA is the upstream SHA) 1ce2941 NVIDIA: SAUCE: iommu/arm-smmu-v3: Allow ATS to be always on Nirmoy already handled these via #401 Please rebase to most recent version of branch and you’ll get them. 6f5df25 cxl: export internal structs for external Type2 drivers Why does this patch list you as the author? Where is the upstream provenance? Also, this patch differs from the source. If that is intentional, can you add a backport note? For both this patch 2cb8ec3 NVIDIA: SAUCE: cxl/region: Skip decoder reset on detach for autodiscovered regions and these patches 54d3cbf cxl: support Type2 when initializing cxl_dev_state They were part of larger series. How did you determine that only these patches were needed and that other patches from the series can be ignored? 20eddbc NVIDIA: SAUCE: cxl: attach region to an accelerator/type2 memdev Codex found this critical issues from this patch:
|
02b7f23 to
5c1f6a4
Compare
|
@jamieNguyenNVIDIA @nvmochs thanks Update pushed for DGX-16136 CXL backport:
Validation:
Latest pushed branch tip: |
|
I got this feedback with codex: drivers/cxl/core/region.c:2645: DEVMEM regions no longer get the port->uport_dev unregister action, but existing cleanup paths still assume it exists. For example delete_region_store() at drivers/cxl/core/region.c:2778 calls devm_release_action(port->uport_dev, unregister_region, cxlr) and returns success; for DEVMEM this does not unregister anything. The construct failure path at drivers/cxl/core/region.c:3953 has the same problem and can leak the region. |
|
@kobak2026 Thanks for addressing my last round of comments. Some additional feedback... General comment: Can you use add VR to the commit title tags? e.g.: NVIDIA: VR: SAUCE: Nit: Several of these patches are missing “backported from” tags. 8c1b27b NVIDIA: SAUCE: cxl: attach region to an accelerator/type2 memdev Can the note be updated to also include the changes to the devres registration flow? e.g. Nit: A few of these commits have extra blank lines between the pick tags and sign-off. e.g. vs. |
In preparation for type2 drivers add function and macro for differentiating CXL memory expanders (type 3) from CXL device accelerators (type 2) helping drivers built from public headers to embed struct cxl_dev_state inside a private struct. Update type3 driver for using this same initialization. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260306164741.3796372-2-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit 9a775c0) Signed-off-by: Koba Ko <kobak@nvidia.com>
In preparation for type2 support, move structs and functions a type2 driver will need to access to into a new shared header file. Differentiate between public and private data to be preserved by type2 drivers. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260306164741.3796372-3-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit 0058698) Signed-off-by: Koba Ko <kobak@nvidia.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization meanwhile cxl/pci_drv.c implements the functionality for a Type3 device initialization. In preparation for type2 support, move helper functions from cxl/pci.c to cxl/core/pci.c in order to be exported and used by type2 drivers. [ dj: Clarified subject. ] Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Gregory Price <gourry@gourry.net> Link: https://patch.msgid.link/20260306164741.3796372-4-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit 58f2893) Signed-off-by: Koba Ko <kobak@nvidia.com>
Remove the redundant port lookup from cxl_rcrb_get_comp_regs() and use the dport parameter directly. The caller has already validated the port is non-NULL before invoking this function, and dport is given as a param. This is simpler than getting dport in the callee and return the pointer to the caller what would require more changes. Signed-off-by: Gregory Price <gourry@gourry.net> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://patch.msgid.link/20260306164741.3796372-5-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit d537d95) Signed-off-by: Koba Ko <kobak@nvidia.com>
Add CXL initialization based on new CXL API for accel drivers and make it dependent on kernel CXL configuration. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-2-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
Export cxl core functions for a Type2 driver being able to discover and map the device registers. Use it in sfc driver cxl initialization. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (backported from https://lore.kernel.org/r/20260423180528.17166-3-alejandro.lucero-palau@amd.com) [kobak: Kept cxl_pci_setup_regs() in the core/pci provider added by the full Type2 prerequisite series and dropped the duplicate provider hunk from drivers/cxl/pci.c.] Signed-off-by: Koba Ko <kobak@nvidia.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing memdev state params which end up being used for DPA initialization. Allow a Type2 driver to initialize DPA simply by giving the size of its volatile hardware partition. Move related functions to memdev. Add sfc driver as the client. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-4-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when creating a memdev leading to problems when obtaining cxl_memdev_state references from a CXL_DEVTYPE_DEVMEM type. Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM support. Make devm_cxl_add_memdev accessible from an accel driver. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-5-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
Use cxl API for creating a cxl memory device using the type2 cxl_dev_state struct. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-6-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
5c1f6a4 to
918f703
Compare
|
@nvmochs @clsotog thanks for the review feedback.
|
|
Thanks Koba! I confirmed that you addressed my findings and reviewed the DEVMEM cleanup with codex and did not spot any issues.
|
| struct cxl_endpoint_decoder *cxled = cxlr->params.targets[0]; | ||
| struct cxl_port *endpoint = cxled_to_port(cxled); | ||
|
|
||
| guard(device)(&endpoint->dev); |
There was a problem hiding this comment.
Claude suggests that there still might be a race here:
T1: cxl_region_release_action(cxlr) T2: cxl_memdev_attach_region(cxlmd, &attach)
───────────────────────────────────── ───────────────────────────────────────────
cxlr->type == DEVMEM ✓
nr_targets > 0, cxlr->detach == NULL
acquire endpoint->dev lock
release endpoint->dev lock (end of block) acquire endpoint->dev lock
device_find_child → cxled, cxlr
attach->attach() → ioremap succeeds
devm_add_action(&endpoint->dev,
cxl_endpoint_region_autoremove, cxlr)
devm_add_action_or_reset(detach)
cxlr->detach = attach->detach
return 0; release endpoint->dev lock
unregister_region(cxlr)
device_del(&cxlr->dev)
detach_target loop drops cxld->region ref
put_device → cxlr FREED
[later] &endpoint->dev teardown →
cxl_endpoint_region_autoremove(cxlr) ← UAF on freed cxlr
It further suggests keeping the teardown inside the guarded scope:
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2481,8 +2481,8 @@ static void cxl_region_release_action(struct cxl_region *cxlr)
if (cxlr->params.nr_targets) {
struct cxl_endpoint_decoder *cxled = cxlr->params.targets[0];
struct cxl_port *endpoint = cxled_to_port(cxled);
+ guard(device)(&endpoint->dev);
- guard(device)(&endpoint->dev);
if (cxlr->detach) {
void (*detach)(void *data) = cxlr->detach;
void *detach_data = cxlr->detach_data;
@@ -2493,11 +2493,11 @@ static void cxl_region_release_action(struct cxl_region *cxlr)
devm_release_action(&endpoint->dev,
cxl_endpoint_region_autoremove,
cxlr);
- return;
+ } else {
+ unregister_region(cxlr);
}
+ return;
}
unregister_region(cxlr);
}
There was a problem hiding this comment.
@jamieNguyenNVIDIA thanks
this race is fixed in the current pushed DGX-16136 branch.
The DEVMEM no-detach path now keeps unregister_region(cxlr) inside the
endpoint->dev guard in cxl_region_release_action(). That means
cxl_memdev_attach_region() cannot acquire the same endpoint lock and install
endpoint devres actions between observing cxlr->detach == NULL and freeing the
region.
Verified pushed tip:
d24cf0e
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
918f703 to
d24cf0e
Compare
|
Thanks @kobak2026! |
|
|
|
Here is an updated version of what Codex found... The finding is in 1a24ae4, in drivers/cxl/core/region.c:cxl_memdev_attach_region(). Current flow: The problem is that both failure paths can run cxl_endpoint_region_autoremove() while cxl_rwsem.region is still held for read. Call path: cxl_memdev_attach_region() That is a same-thread lock upgrade from read to write. The read lock cannot be dropped because the thread is blocked trying to acquire the write lock, so this can deadlock. The second failure path has the same issue because devm_release_action() runs the action. It does not merely unregister the action. So this line: devm_release_action(&endpoint->dev, cxl_endpoint_region_autoremove, cxlr); also calls cxl_endpoint_region_autoremove(cxlr) while the region read lock is still held. The fix should preserve two things:
The shape I would suggest is: The important parts are the inner scope and devm_remove_action(). The inner scope releases cxl_rwsem.region and cxl_rwsem.dpa before err_unregister runs. The outer endpoint lock remains held across unregister_region(cxlr), which keeps the recent no-detach race fix intact. devm_remove_action() removes the endpoint autoremove action without invoking it; then the explicit unregister_region(cxlr) runs once, after the read locks are gone. |
Support an accelerator driver to safely work with an autodiscovered
region from a committed HDM decoder through:
1) an accelerator driver cxl_attach_region struct with attach
and detach callbacks.
2) a specific function, cxl_memdev_attach_region() keeping the
required locks for finding a region linked to the memdev
endpoint, and
3) invoking attach callback while keeping the locking allowing to
work (ioremap and other internal stuff) with the related physical
range by the accelerator driver, and
4) linking a detach callback to the endpoint device removal where
the accelerator driver can stop using the region range.
This covers the cases of a potential removal of cxl_acpi module or a
accelerator memdev unbinding from cxl_mem driver through sysfs.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
(backported from https://lore.kernel.org/r/20260423180528.17166-7-alejandro.lucero-palau@amd.com)
[kobak: Check cxl_memdev_attach_region() errors and propagate failure so SFC probe does not continue after CXL core tears down the attached region. Set probe_data->cxl before attaching so the attach callback can use it, guard attach attempts before a valid endpoint exists, explicitly unwind attach/autoremove side effects if devres action registration fails, preserve DEVMEM target type for autodiscovered regions, and route delete / construct-failure cleanup through endpoint-owned devres actions.]
[kobak: Keep no-detach DEVMEM unregister under the endpoint-device guard so attach cannot install endpoint devres actions for a region being freed.]
[kobak: Avoid devres-registration failure cleanup under cxl_rwsem.region read lock: keep endpoint->dev locked, drop the region/DPA read guards before unregister_region(), and use devm_remove_action() so failed detach-action registration does not run cxl_endpoint_region_autoremove() under the read lock.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
By definition a type2 cxl device will use the host managed memory for specific functionality, therefore it should not be available to other uses like DAX. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <daves@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-8-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
A PIO buffer is a region of device memory to which the driver can write a packet for TX, with the device handling the transmit doorbell without requiring a DMA for getting the packet data, which helps reducing latency in certain exchanges. With CXL mem protocol this latency can be lowered further. With a device supporting CXL and successfully initialised, use the cxl region to map the memory range and use this mapping for PIO buffers. Add the disabling of those CXL-based PIO buffers if the callback for potential cxl endpoint removal by the CXL core happens. Signed-off-by: Alejandro Lucero <alucerop@amd.com> (backported from https://lore.kernel.org/r/20260423180528.17166-9-alejandro.lucero-palau@amd.com) [kobak: Added a !EFX_USE_PIO same-module stub for efx_ef10_disable_piobufs() so non-x86 builds that still enable CONFIG_SFC_CXL do not leave efx_cxl.o unresolved.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…ing Soft Reserved ranges
Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
Reserved ranges.
Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause HMEM to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.
Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before HMEM makes ownership decisions for Soft Reserved ranges.
Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
Soft Reserved ranges before CXL drivers have had a chance to claim them.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-2-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-2-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL) so that HMEM only defers Soft Reserved ranges when CXL DAX support is enabled. This makes the coordination between HMEM and the CXL stack more precise and prevents deferral in unrelated CXL configurations. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-3-Smita.KoralahalliChannabasappa@amd.com (cherry picked from https://lore.kernel.org/r/20260210064501.157591-3-Smita.KoralahalliChannabasappa@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
…iscovered regions __cxl_decoder_detach() currently resets decoder programming whenever a region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For autodiscovered regions, this can incorrectly tear down decoder state that may be relied upon by other consumers or by subsequent ownership decisions. Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is set. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-4-Smita.KoralahalliChannabasappa@amd.com (cherry picked from https://lore.kernel.org/r/20260210064501.157591-4-Smita.KoralahalliChannabasappa@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
…_cxl binding Move hmem/ earlier in the dax Makefile so that hmem_init() runs before dax_cxl. In addition, defer registration of the dax_cxl driver to a workqueue instead of using module_cxl_driver(). This ensures that dax_hmem has an opportunity to initialize and register its deferred callback and make ownership decisions before dax_cxl begins probing and claiming Soft Reserved ranges. Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs out of line from other synchronous probing avoiding ordering dependencies while coordinating ownership decisions with dax_hmem. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-5-Smita.KoralahalliChannabasappa@amd.com (cherry picked from https://lore.kernel.org/r/20260210064501.157591-5-Smita.KoralahalliChannabasappa@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
…al resource tree Introduce a global "DAX Regions" resource root and register each dax_region->res under it via request_resource(). Release the resource on dax_region teardown. By enforcing a single global namespace for dax_region allocations, this ensures only one of dax_hmem or dax_cxl can successfully register a dax_region for a given range. Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-6-Smita.KoralahalliChannabasappa@amd.com (cherry picked from https://lore.kernel.org/r/20260210064501.157591-6-Smita.KoralahalliChannabasappa@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
…ainment by CXL regions Add a helper to determine whether a given Soft Reserved memory range is fully contained within the committed CXL region. This helper provides a primitive for policy decisions in subsequent patches such as co-ordination with dax_hmem to determine whether CXL has fully claimed ownership of Soft Reserved memory ranges. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-7-Smita.KoralahalliChannabasappa@amd.com (backported from https://lore.kernel.org/r/20260210064501.157591-7-Smita.KoralahalliChannabasappa@amd.com) [kobak: Added the Soft Reserved declaration to the existing Type2 include/cxl/cxl.h header instead of recreating that header.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…x_cxl coordination Add helpers to register, queue and flush the deferred work. These helpers allow dax_hmem to execute ownership resolution outside the probe context before dax_cxl binds. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-8-Smita.KoralahalliChannabasappa@amd.com (cherry picked from https://lore.kernel.org/r/20260210064501.157591-8-Smita.KoralahalliChannabasappa@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
… Reserved memory ranges
The current probe time ownership check for Soft Reserved memory based
solely on CXL window intersection is insufficient. dax_hmem probing is not
always guaranteed to run after CXL enumeration and region assembly, which
can lead to incorrect ownership decisions before the CXL stack has
finished publishing windows and assembling committed regions.
Introduce deferred ownership handling for Soft Reserved ranges that
intersect CXL windows. When such a range is encountered during dax_hmem
probe, schedule deferred work and wait for the CXL stack to complete
enumeration and region assembly before deciding ownership.
Evaluate ownership of Soft Reserved ranges based on CXL region
containment.
- If all Soft Reserved ranges are fully contained within committed CXL
regions, DROP handling Soft Reserved ranges from dax_hmem and allow
dax_cxl to bind.
- If any Soft Reserved range is not fully claimed by committed CXL
region, REGISTER the Soft Reserved ranges with dax_hmem.
Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
ranges. Once, ownership resolution is complete, flush the deferred work
from dax_cxl before allowing dax_cxl to bind.
This enforces a strict ownership. Either CXL fully claims the Soft
reserved ranges or it relinquishes it entirely.
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-9-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-9-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…to the iomem tree Reworked from a patch by Alison Schofield <alison.schofield@intel.com> Reintroduce Soft Reserved range into the iomem_resource tree for HMEM to consume. This restores visibility in /proc/iomem for ranges actively in use, while avoiding the early-boot conflicts that occurred when Soft Reserved was published into iomem before CXL window and region discovery. Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/ Co-developed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Co-developed-by: Zhijian Li <lizhijian@fujitsu.com> Signed-off-by: Zhijian Li <lizhijian@fujitsu.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com (cherry picked from https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
…smaller granularities for lower levels The CXL specification supports multi-level interleaving "as long as all the levels use different, but consecutive, HPA bits to select the target and no Interleave Set has more than 8 devices" (from 3.2). Currently the kernel expects that a decoder's "interleave granularity is a multiple of @parent_port granularity". That is, the granularity of a lower level is bigger than those of the parent and uses the outer HPA bits as selector. It works e.g. for the following 8-way config: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 256 granularity * Selector: HPA[8:9] * sub-link (CXL Host bridge config of the HDM): * 2-way * 1024 granularity * Selector: HPA[10] Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way config could look like this: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 512 granularity * Selector: HPA[9:10] * sub-link (CXL Host bridge config of the HDM): * 2-way * 256 granularity * Selector: HPA[8] The enumeration of decoders for this configuration fails then with following error: cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200] cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff cxl_port endpoint12: failed to attach decoder12.0 to region0: -6 Note that this happens only if firmware is setting up the decoders (CXL_REGION_F_AUTO). For userspace region assembly the granularities are chosen to increase from root down to the lower levels. That is, outer HPA bits are always used for lower interleaving levels. Rework the implementation to also support multi-level interleaving with smaller granularities for lower levels. Determine the interleave set of autodetected decoders. Check that it is a subset of the root interleave. The HPA selector bits are extracted for all decoders of the set and checked that there is no overlap and bits are consecutive. All decoders can be programmed now to use any bit range within the region's target selector. Signed-off-by: Robert Richter <rrichter@amd.com> (backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/) [kobak: resolved conflicts with cxlr->cxlrd and spa_maps_hpa()] Signed-off-by: Koba Ko <kobak@nvidia.com>
…and RAS support BugLink: https://bugs.launchpad.net/bugs/2143032 Source: NVIDIA@f80636d Add Ubuntu kernel config annotations for CXL-related configs introduced or changed by the CXL Type-2, RAS, and autodiscovered-region support backports. CONFIG_CXL_BUS, CONFIG_CXL_PCI, CONFIG_CXL_MEM, and CONFIG_CXL_PORT are built in for Type-2 device support. CONFIG_CXL_RAS and the EINJ symbols cover CXL RAS/error-injection support. CONFIG_SFC_CXL remains disabled for NVIDIA platforms. Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit f80636d nv-kernels/24.04_linux-nvidia-6.17-next) [kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos; PCIEAER_CXL is overridden as removed instead of editing debian.master.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…memory access BugLink: https://bugs.launchpad.net/bugs/2143032 Source: NVIDIA@c5c11cf Override debian.master policy for DEV_DAX, DEV_DAX_CXL, and DEV_DAX_KMEM so CXL memory regions are available as raw DAX devices and as hotplugged System-RAM without relying on module load ordering. Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit c5c11cf nv-kernels/24.04_linux-nvidia-6.17-next) [kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…/restore BugLink: https://bugs.launchpad.net/bugs/2143032 Source: NVIDIA@a5544cb Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by the CXL DVSEC and HDM state save/restore series. CONFIG_PCI_CXL is a hidden bool auto-enabled when CXL_BUS=y. It gates compilation of drivers/pci/cxl.o, which saves and restores CXL DVSEC control/range registers and HDM decoder state across PCI resets and link transitions. Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit a5544cb nv-kernels/24.04_linux-nvidia-6.17-next) [kobak: Backported annotation override from debian.nvidia-6.17 to debian.nvidia-bos.] Signed-off-by: Koba Ko <kobak@nvidia.com>
d24cf0e to
5268ad9
Compare
|
@nvmochs thanks, I folded the devres cleanup fix into the attach-region commit and will rebase PR2. The failure cleanup now keeps the endpoint device lock held for unregister I also changed the second devres failure path from devm_release_action() to |
Thanks Koba, no further issues from me!
|
|
@kobak2026 Can you create a LP and then we can get this applied? |
|
Re-adding my ACK:
|
sure but one question |
Typically I put the content from the PR description. What is being backported, why it's needed, where it came from, etc. |
|
Merged, closing PR. |
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2153819
Summary
Backport the CXL Type-2 dependency stack onto
26.04_linux-nvidia-bos.This series brings in CXL Type-2 enablement for NVIDIA/SFC CXL accelerator plumbing, updates ATS always-on handling to Nicolin Chen's v4 series, includes required CXL region fixes, and adds the NVIDIA CXL config annotations needed by the stack.
Included changes
cxl_dev_stateNot included
26.04-bos:20ff7877b5a5—cxl: Allow zero sized HDM decoders7e237452e5f7—cxl_test: enable zero sized decoders under hb0d4026a446264(cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve())[NACK]Koba Ko'scxl region partition index validation before array accesspatch is not included.Verification
Build verification completed:
Runtime verification completed:
7.0.0-vfio-cxl-downstream-2026-05-1422Valid+ Active+:2cxl list -BMRDuexits successfully on the Type-2 host.cxl_pciprobe failure observed.Additional Type-3 evidence:
veraos-43exposes two CXL Type-3 devices under PCI domain0003.invalid granularity calculation (16384 * 2)failed to attach decoder... -22/-6failed to find decoder mapping