[linux-nvidia-6.17-next] CXL VFIO: Add CXL Type-2 device passthrough support#407
[linux-nvidia-6.17-next] CXL VFIO: Add CXL Type-2 device passthrough support#407JiandiAnNVIDIA wants to merge 51 commits into
Conversation
On new platforms greater than QM_HW_V3, the configuration region for the live migration function of the accelerator device is no longer placed in the VF, but is instead placed in the PF. Therefore, the configuration region of the live migration function needs to be opened when the QM driver is loaded. When the QM driver is uninstalled, the driver needs to clear this configuration. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerkolothum@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20251030015744.131771-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 4868d2d) Signed-off-by: Jiandi An <jan@nvidia.com>
On new platforms greater than QM_HW_V3, the migration region has been relocated from the VF to the PF. The VF's own configuration space is restored to the complete 64KB, and there is no need to divide the size of the BAR configuration space equally. The driver should be modified accordingly to adapt to the new hardware device. On the older hardware platform QM_HW_V3, the live migration configuration region is placed in the latter 32K portion of the VF's BAR2 configuration space. On the new hardware platform QM_HW_V4, the live migration configuration region also exists in the same 32K area immediately following the VF's BAR2, just like on QM_HW_V3. However, access to this region is now controlled by hardware. Additionally, a copy of the live migration configuration region is present in the PF's BAR2 configuration space. On the new hardware platform QM_HW_V4, when an older version of the driver is loaded, it behaves like QM_HW_V3 and uses the configuration region in the VF, ensuring that the live migration function continues to work normally. When the new version of the driver is loaded, it directly uses the configuration region in the PF. Meanwhile, hardware configuration disables the live migration configuration region in the VF's BAR2: reads return all 0xF values, and writes are silently ignored. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerkolothum@gmail.com> Link: https://lore.kernel.org/r/20251030015744.131771-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 2131c15) Signed-off-by: Jiandi An <jan@nvidia.com>
PR Validation ReportPR Lint ✅ All checks passedDetailsChecking 51 commits... Cherry-pick digest: ┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐ │ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 74b6b99bcd80 │ [SAUCE] config: enable config_vfio_cxl_core for cxl type-2 passt │ N/A │ N/A │ jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 67c66e735df5 │ [SAUCE] vfio/cxl: implement vfio_cxl_reset() │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 14fbdcb4d592 │ [SAUCE] vfio/cxl: virtualize dvsec status2 register in vconfig s │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 9e0e291bfc29 │ [SAUCE] vfio/cxl: preserve hdm decoder base addresses across res │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 5071d3b07627 │ [SAUCE] vfio/cxl: ensure pci memory space is enabled before post │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0bd9c4c7ab7c │ [SAUCE] vfio/pci: wire cxl dpa reset handling │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2d40efbb4f42 │ [SAUCE] cxl: export the cxl reset helpers for vfio users │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 696f0b100f03 │ docs: vfio-pci: document cxl type-2 device passthrough │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 595c1ad9c3cf │ vfio/cxl: provide opt-out for cxl feature │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 9cd924807287 │ vfio/pci: advertise cxl cap and sparse component bar to userspac │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 6e2d9e5f273d │ vfio/cxl: register regions with vfio layer │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 3ff6c19fc517 │ vfio/cxl: virtualize cxl dvsec config writes │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ f5e419121227 │ vfio/cxl: dpa vfio region with demand fault mmap and reset zap │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 799c46dc1495 │ vfio/cxl: cxl region management support │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 537d8a2414cf │ vfio/cxl: wait for hdm ranges and create memdev │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 4ab495542be1 │ vfio/cxl: introduce hdm decoder register emulation framework │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 07d714144702 │ vfio/pci: export config access helpers │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 939ebb73d430 │ vfio/cxl: detect cxl dvsec and probe hdm block │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 336a1448463a │ vfio/pci: add config_vfio_cxl_core and stub cxl hooks │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 87b80cc08c26 │ vfio/pci: add cxl state to vfio_pci_core_device │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ c0f4d247a0e7 │ vfio: uapi for cxl-capable pci device assignment │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 947749bd1b8d │ cxl: record bir and bar offset in cxl_register_map │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 023bae337329 │ cxl: split cxl_await_range_active() from media-ready wait │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 52ead24ed8ad │ cxl: move component/hdm register defines to uapi/cxl/cxl_regs.h │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e02c1b7ac02a │ cxl: declare cxl_find_regblock and cxl_probe_component_regs in p │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ fd317b86093e │ cxl: add cxl_get_hdm_info() for hdm decoder metadata │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 54d50bbc6111 │ 56c069307dfd vfio: Remove the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 21085759fbcd │ dc10734610e2 vfio: Move the remaining drivers to get_region_info │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ c0ad388ba741 │ 182c62861ba5 vfio/platform: Convert to get_region_info_caps │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2bf5a2cbb154 │ 1b0ecb5baf4a vfio/pci: Convert all PCI drivers to get_region_inf │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ bc1c993e783d │ 973af0c40eaf vfio/ccw: Convert to get_region_info_caps │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0282af066b10 │ 93165757c023 vfio/gvt: Convert to get_region_info_caps │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 29e1217fd909 │ 45f9fa18109d vfio/mbochs: Convert mbochs to use vfio_info_add_ca │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 7dd77b841190 │ 775f726a742a vfio: Add get_region_info_caps op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e7da10685f7f │ f97859503859 vfio: Require drivers to implement get_region_info │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 6c250ce18f9e │ e664067b6035 vfio/gvt: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 76b5171d117d │ 61b3f7b5a729 vfio/ccw: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 619333df0ce8 │ b9827eff6b4a vfio/cdx: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 8ba94bf6a94e │ 6cdae5d0c326 vfio/fsl: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 073f13c17982 │ d4635df279f5 vfio/platform: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 554dca9a1de1 │ 8339fccda837 vfio/mbochs: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0fbfd736592c │ cf16acc0af09 vfio/mdpy: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 4df20815cb64 │ 078775527109 vfio/mtty: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e54b8e086acd │ f3fddb71dd50 vfio/pci: Fill in the missing get_region_info ops │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 702622746ce4 │ 5ac720647477 vfio/nvgrace: Convert to the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ fad0d0d38ca4 │ c044eefa4786 vfio/virtio: Convert to the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 6b97c1b33bef │ e238f147d517 vfio/hisi: Convert to the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 897cefa739f7 │ 113557b04068 vfio: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 449e051b54c2 │ 767b1ed8b980 vfio/nvgrace-gpu: fix grammatical error │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0c7d38232410 │ 2131c1517f30 hisi_acc_vfio_pci: adapt to new migration configura │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 38c6eb3eed52 │ 4868d2d52df6 crypto: hisilicon - qm updates BAR configuration │ match │ match │ preserved + jan added │ └──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘ Lint: all checks passed. |
The word "as" in the comment should be replaced with "is", and there is an extra space in the comment. Signed-off-by: Morduan Zang <zhangdandan@uniontech.com> Reviewed-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/54E1ED6C5A2682C8+20250814110358.285412-1-zhangdandan@uniontech.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> (cherry picked from commit 767b1ed) Signed-off-by: Jiandi An <jan@nvidia.com>
Instead of hooking the general ioctl op, have the core code directly decode VFIO_DEVICE_GET_REGION_INFO and call an op just for it. This is intended to allow mechanical changes to the drivers to pull their VFIO_DEVICE_GET_REGION_INFO int oa function. Later patches will improve the function signature to consolidate more code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 113557b) Signed-off-by: Jiandi An <jan@nvidia.com>
Change the function signature of hisi_acc_vfio_pci_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (backported from commit e238f14) [jan: resolve minor conflict in hisi_acc_vfio_pci_ioctl()] Signed-off-by: Jiandi An <jan@nvidia.com>
Remove virtiovf_vfio_pci_core_ioctl() and change the signature of virtiovf_pci_ioctl_get_region_info(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit c044eef) Signed-off-by: Jiandi An <jan@nvidia.com>
Change the signature of nvgrace_gpu_ioctl_get_region_info() Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 5ac7206) Signed-off-by: Jiandi An <jan@nvidia.com>
Now that every variant driver provides a get_region_info op remove the ioctl based dispatch from vfio_pci_core_ioctl(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit f3fddb7) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of mtty_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 0787755) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of mdpy_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit cf16acc) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of mbochs_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 8339fcc) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of vfio_platform_ioctl() and re-indent it. Add it to all platform drivers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit d4635df) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of vfio_fsl_mc_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 6cdae5d) Signed-off-by: Jiandi An <jan@nvidia.com>
Change the signature of vfio_cdx_ioctl_get_region_info() and hook it to the op. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/11-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit b9827ef) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of vfio_ccw_mdev_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/12-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 61b3f7b) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of intel_vgpu_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/13-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit e664067) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the fallback through the ioctl callback, no drivers use this now. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/14-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit f978595) Signed-off-by: Jiandi An <jan@nvidia.com>
This op does the copy to/from user for the info and can return back a cap chain through a vfio_info_cap * result. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/15-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 775f726) Signed-off-by: Jiandi An <jan@nvidia.com>
This driver open codes the cap chain manipulations. Instead use vfio_info_add_capability() and the get_region_info_caps() op. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/16-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 45f9fa1) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and change info to a pointer. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/17-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 9316575) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and flatten the call chain. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/18-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 973af0c) Signed-off-by: Jiandi An <jan@nvidia.com>
Since the core function signature changes it has to flow up to all drivers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/19-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 1b0ecb5) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and change info to a pointer. caps are not used. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 182c628) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and change info to a pointer. caps are not used. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/21-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit dc10734) Signed-off-by: Jiandi An <jan@nvidia.com>
No driver uses it now, all are using get_region_info_caps(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/22-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 56c0693) Signed-off-by: Jiandi An <jan@nvidia.com>
cxl_probe_component_regs() finds the HDM decoder block during device probe and caches its location, but does not record the decoder count and does not expose the result outside drivers/cxl/. vfio-cxl needs the decoder count and the byte offset and size of the HDM block without re-running the probe sequence. Record decoder_cnt in rmap->count when parsing the HDM capability in cxl_probe_component_regs(), extend struct cxl_reg_map with a count member, and add cxl_get_hdm_info() to return offset, size, and count from the cached map. Export under the CXL namespace; stub to -EOPNOTSUPP when CONFIG_CXL_BUS is off. Co-developed-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…nent_regs in public header vfio-cxl lives outside drivers/cxl/ but still needs to locate the component register block and fill cxl_component_reg_map. Those prototypes were stuck in the internal drivers/cxl/cxl.h. Move the declarations to include/cxl/cxl.h next to the other vfio-facing hooks, with stubs when CXL bus support is disabled. Drop the duplicate prototypes from the private header. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Move cxl_probe_component_regs() to include/cxl/pci.h instead of include/cxl/cxl.h to align with existing Srirangan/Alejandro convention; skip cxl_find_regblock() move as it is already in include/cxl/pci.h; add struct cxl_component_reg_map forward declaration] Signed-off-by: Jiandi An <jan@nvidia.com>
3eede80 to
502020b
Compare
|
I didn't spot anything concerning. |
Boro watcher review skippedThe GitHub watcher skips automatic boro reviews for PRs with more than 50 commits. This PR currently has 51 commits. To run the review anyway, ask Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher sees a newer PR head. |
…xl/cxl_regs.h VFIO and other code outside the CXL core needs the same offset/mask constants the core uses for the component register block and HDM decoders. Pull them into a new include/uapi/cxl/cxl_regs.h (GPL-2.0 WITH Linux-syscall-note) and include it from include/cxl/cxl.h. Use the uapi-friendly __GENMASK helpers where needed. Section comments in the new file reference CXL spec r4.0 numbering. For UAPI change, replaced the SZ_64K with actual size as the macro will not be available for userspace programs. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Remove defines from include/cxl/cxl.h instead of drivers/cxl/cxl.h as they were already moved there by Srirangan's SAUCE commit, Add #include <asm/bitsperlong.h> needed by __GENMASK() in uapi header] Signed-off-by: Jiandi An <jan@nvidia.com>
…dy wait Before accessing CXL device memory after reset/power-on, the driver must ensure media is ready. Not every CXL device implements the CXL Memory Device register group (many Type-2 devices do not). cxl_await_media_ready() reads cxlds->regs.memdev. Access to the memory device registers on a Type-2 device may result in kernel panic. Split the HDM DVSEC range-active poll out of cxl_await_media_ready() into a new function, cxl_await_range_active(). Type-2 devices often lack the CXLMDEV status register, so they need the range check without the memdev read. cxl_await_media_ready() now calls cxl_await_range_active() for the DVSEC poll, then reads the memory device status as before. Co-developed-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Add cxl_await_range_active() declaration to include/cxl/pci.h unconditionally instead of include/cxl/cxl.h with CONFIG_CXL_BUS guards, consistent with existing convention] Signed-off-by: Jiandi An <jan@nvidia.com>
The Register Locator DVSEC (CXL 4.0 8.1.9) describes register blocks by BAR index (BIR) and offset within the BAR. CXL core currently only stores the resolved HPA (resource + offset) in struct cxl_register_map, so callers that need to use pci_iomap() or report the BAR to userspace must reverse-engineer the BAR from the HPA. Add bar_index and bar_offset to struct cxl_register_map and fill them in cxl_decode_regblock() when the regblock is BAR-backed (BIR 0-5). Add cxl_regblock_get_bar_info() so callers (e.g. vfio-cxl) can get BAR index and offset directly and use pci_iomap() instead of ioremap(HPA). Add cxl_regblock_get_bar_info() to return those fields; -EINVAL if the map is not BAR-backed. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Add cxl_regblock_get_bar_info() declaration to include/cxl/pci.h unconditionally instead of include/cxl/cxl.h with CONFIG_CXL_BUS guards, consistent with existing convention, Add BIR range validation (reject BIR >= PCI_STD_NUM_BARS) and bar_index bounds check in cxl_regblock_get_bar_info()] Signed-off-by: Jiandi An <jan@nvidia.com>
Vendor GPUs and accelerators can expose CXL.mem (HDM-D or HDM-DB)
without using PCI class code 0x0502. VMMs need a stable way to learn
DPA sizing, firmware commit state, and where the extra VFIO regions live.
Add VFIO_DEVICE_FLAGS_CXL (bit 9) and VFIO_DEVICE_INFO_CAP_CXL (cap ID 6).
The capability struct carries:
hdm_regs_bar_index PCI BAR containing the component register block
hdm_regs_offset byte offset within that BAR to the CXL.mem area
(comp_reg_offset + CXL_CM_OFFSET)
dpa_region_index VFIO region index for the DPA window
comp_regs_region_index VFIO region index for the emulated COMP_REGS
HDM decoder count and the HDM block offset within COMP_REGS are
intentionally absent; both are derivable from the CXL Capability Array at
COMP_REGS offset 0. Locate cap ID 0x5 (HDM) and read bits[31:20] of its
entry for the byte offset. Then read bits[3:0] of the HDM Decoder Capability
register for the count: count = (field == 0) ? 1 : field * 2.
Two flags accompany the capability:
VFIO_CXL_CAP_FIRMWARE_COMMITTED
A decoder covering @dpa_size bytes was programmed and committed by
platform firmware before device open. The VMM can use the DPA region
immediately without re-committing.
VFIO_CXL_CAP_CACHE_CAPABLE
The device is HDM-DB (CXL.mem + CXL.cache). HDM-DB requires a
Write-Back Invalidation sequence before FLR to flush dirty cache
lines; HDM-D (CXL.mem only) does not. QEMU uses this flag to
schedule WBI and to report Back-Invalidation capability accurately
in the virtual CXL topology. Mirrors the Cache_Capable bit from
the CXL DVSEC Capability register.
Signed-off-by: Manish Honap <mhonap@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Add struct vfio_pci_cxl_state and hang a pointer to it off vfio_pci_core_device. vdev->cxl stays NULL for non-CXL devices, so existing vfio-pci-core paths just pay a NULL check. The new struct embeds struct cxl_dev_state by value (CXL core uses container_of() against this field) and stores pointers to the cxl_memdev, root decoder, and endpoint decoder that the CXL core owns. cxl_region is not introduced here; it is added later when region management lands. The series builds the CXL Type-2 passthrough path inside vfio-pci-core rather than in a separate variant driver. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve context mismatch in vfio_pci_core.h; add #include <cxl/pci.h> to vfio_cxl_priv.h for cxl_find_regblock/cxl_probe_component_regs declarations] Signed-off-by: Jiandi An <jan@nvidia.com>
Introduce the Kconfig option CONFIG_VFIO_CXL_CORE and the necessary build rules to compile CXL.mem passthrough infrastructure for vendor-specific CXL devices into the vfio-pci-core module. The new option depends on VFIO_PCI_CORE, CXL_BUS and CXL_MEM. Wire up the detection and cleanup entry-point stubs in vfio_pci_core_register_device() and vfio_pci_core_unregister_device() so that subsequent patches can fill in the CXL-specific logic without touching the vfio-pci-core flow again. The vfio_cxl_core.c file added here is an empty skeleton; the actual CXL detection and initialisation code is introduced in the following patch to keep this build-system patch reviewable on its own. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve context mismatches in Kconfig, Makefile, and vfio_pci_priv.h due to missing upstream xe/dmabuf support in NV-Kernels base] Signed-off-by: Jiandi An <jan@nvidia.com>
Detect a vendor-specific CXL device at vfio-pci bind time and probe its HDM decoder register block. vfio_cxl_create_device_state() allocates per-device state via devm and reads MEM_CAPABLE and CACHE_CAPABLE from the CXL DVSEC. vfio_cxl_setup_regs() locates the component register block, temporarily maps it, calls cxl_probe_component_regs() to find the HDM block, then releases the mapping. vfio_pci_cxl_detect_and_init() chains these two steps. If either fails, vdev->cxl stays NULL and the device falls back to plain vfio-pci. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Use pci_get_dsn() instead of pdev->dev.id for cxlds serial; expand comment explaining why] Signed-off-by: Jiandi An <jan@nvidia.com>
Promote vfio_raw_config_write() and vfio_raw_config_read() to non-static so that the CXL DVSEC write handler in the next patch can call them. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
… framework Add HDM decoder register emulation for CXL devices assigned to a guest. New file vfio_cxl_emu.c allocates comp_reg_virt[] covering the full component register block (CXL_COMPONENT_REG_BLOCK_SIZE), snapshots it from MMIO after probe, and registers a VFIO device region (VFIO_REGION_SUBTYPE_CXL_COMP_REGS) with read/write ops but no mmap, so every access hits the emulated buffer and write dispatchers. vfio_cxl_setup_virt_regs() is called from the tail of vfio_cxl_setup_regs(); vfio_cxl_clean_virt_regs() runs on cleanup. HDM decoder register defines come from include/uapi/cxl/cxl_regs.h. Bits with no hardware equivalent stay in vfio_cxl_priv.h. hdm_decoder_n_ctrl_write() allows the guest to clear the LOCK bit. A firmware-committed decoder arrives with LOCK=1; the guest driver must clear it before reprogramming BASE and SIZE with the VM's GPA. Such a write clears the bit in the shadow while preserving all other fields. Co-developed-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve Makefile context mismatch due to missing upstream dmabuf support in NV-Kernels base, Add CTRL LOCK enforcement in BASE_LO/SIZE_LO writes, BI bit masking for non-cache-capable devices, pass max_size to vfio_cxl_setup_virt_regs() for bounds check, add vfio_pci_cxl_cleanup() in registration error path] Signed-off-by: Jiandi An <jan@nvidia.com>
After HDM registers are mapped, call cxl_await_range_active() so we only proceed when DVSEC ranges report active without touching the memdev register group Type-2 may lack. Re-snapshot component regs (vfio_cxl_reinit_comp_regs) once MEM_ACTIVE so firmware final SIZE_HIGH etc. land in comp_reg_virt. Read committed decoder size from hardware, set capacity via cxl_set_capacity(), and devm_cxl_add_memdev(). Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Line offset adjustments only (cascading from 0011 changes)] Signed-off-by: Jiandi An <jan@nvidia.com>
Region Management makes use of APIs provided by CXL_CORE as below: CREATE_REGION flow: 1. Validate request (size, decoder availability) 2. Allocate HPA via cxl_get_hpa_freespace() 3. Allocate DPA via cxl_request_dpa() 4. Create region via cxl_create_region() - commits HDM decoder 5. Get HPA range via cxl_get_region_range() DESTROY_REGION flow: 1. Detach decoder via cxl_decoder_detach() 2. Free DPA via cxl_dpa_free() 3. Release root decoder via cxl_put_root_decoder() Use DEFINE_FREE scope helpers so error paths unwind cleanly. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan:Add borrowed-reference comment for precommitted decoders, init region to NULL, don't unregister precommitted regions in teardown] Signed-off-by: Jiandi An <jan@nvidia.com>
…nd reset zap Wire the CXL DPA range up as a VFIO demand-paged region so QEMU can mmap guest device memory directly. Faults call vmf_insert_pfn() to insert one PFN at a time rather than mapping the full range upfront. CXL region lifecycle: - The CXL memory region is registered with VFIO layer during vfio_pci_open_device - mmap() establishes the VMA with vm_ops but inserts no PTEs - Each guest page fault calls vfio_cxl_region_page_fault() which inserts a single PFN under the memory_lock read side - On device reset, vfio_cxl_zap_region_locked() sets region_active=false and calls unmap_mapping_range() to invalidate all DPA PTEs atomically while holding memory_lock for writing - Faults racing with reset see region_active==false and return VM_FAULT_SIGBUS - vfio_cxl_reactivate_region() restores region_active after successful hardware reset Also integrate the zap/reactivate calls into vfio_pci_ioctl_reset() so that FLR correctly invalidates DPA mappings and restores them on success. Co-developed-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve context mismatches in vfio_pci_core.c and vfio_pci_priv.h due to missing upstream dmabuf support in NV-Kernels base, Add vdev back-pointer in cxl_state, hold memory_lock read-side in fault/rw paths, advance *ppos in region rw, add vfio_direct_config_read export and use it instead of vfio_raw_config_read in DVSEC fallback] Signed-off-by: Jiandi An <jan@nvidia.com>
CXL devices expose DVSEC registers in PCI configuration space. Several
of them affect device behavior (CXL.io/CXL.mem/CXL.cache enables, lock
state, range bases) and must be virtualised so the guest cannot disturb
host-owned policy.
Add CXL-aware read and write handlers that operate on vdev->vconfig:
- DVSEC reads come back from the vconfig shadow that vfio_config_init()
already populates via vfio_ecap_init().
- DVSEC writes go through per-register handlers (cxl_dvsec_*_write)
which apply the spec-defined reserved-bit and lock-bit masking
before updating the shadow.
- The handlers are wired in via vdev->dvsec_readfn / dvsec_writefn,
which the global ecap_perms[PCI_EXT_CAP_ID_DVSEC] dispatcher routes
to when the device is a CXL device. Non-CXL devices with a DVSEC
capability fall through to direct hardware access.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Manish Honap <mhonap@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/)
[jan: Resolve context mismatches in Makefile and vfio_pci_core.h due to missing upstream dmabuf/p2pdma forward declarations in NV-Kernels base, Carry Disable_Caching into Cache WBI hardware write, use vfio_direct_config_read fallback, add byte-aligned read/write routing for DVSEC registers, handle partial-byte W1C writes for STATUS/STATUS2, add PM_INIT_COMPLETION RW1CS handling]
Signed-off-by: Jiandi An <jan@nvidia.com>
Register the DPA and component register region with VFIO layer. Region indices for both these regions are cached for quick lookup. vfio_cxl_register_cxl_region() - memremap(WB) the region HPA (treat CXL.mem as RAM, not MMIO) - Register VFIO_REGION_SUBTYPE_CXL - Records dpa_region_idx. vfio_cxl_register_comp_regs_region() - Registers VFIO_REGION_SUBTYPE_CXL_COMP_REGS with size hdm_reg_offset + hdm_reg_size - Records comp_reg_region_idx. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Check HDM COMMITTED bit before activating DPA region on precommitted decoders, add pm_runtime/memory-enabled gate in fault and rw paths, split vfio_cxl_zap_dpa() from prepare_reset(), add DPA zap in vfio_pci_zap_and_down_write_memory_lock(), add hot-reset CXL prepare/finish passes] Signed-off-by: Jiandi An <jan@nvidia.com>
…AR to userspace Expose CXL device capability through the VFIO device info ioctl and give userspace mmap access to the GPU/accelerator register windows in the component BAR while keeping the CXL component register block off-limits to user mappings. vfio_cxl_get_info() fills VFIO_DEVICE_INFO_CAP_CXL with the HDM register BAR index and byte offset, commit flags, and VFIO region indices for the DPA and COMP_REGS regions. HDM decoder count and the HDM block offset within COMP_REGS are not populated; both are derivable from the CXL Capability Array in the COMP_REGS region itself. vfio_cxl_get_region_info() handles VFIO_DEVICE_GET_REGION_INFO for the component register BAR. It builds a sparse-mmap capability that advertises only the GPU/accelerator register windows, carving out the CXL component register block. Three physical layouts are handled: Topology A comp block at BAR end: one area [0, comp_reg_offset) Topology B comp block at BAR start: one area [comp_end, bar_len) Topology C comp block in the middle: two areas, one on each side vfio_cxl_mmap_overlaps_comp_regs() checks whether an mmap request overlaps [comp_reg_offset, comp_reg_offset + comp_reg_size). vfio_pci_core_mmap() calls it to reject mmap of the component register block while allowing mmap of the GPU register windows in the sparse capability. This replaces the earlier blanket rejection of any mmap on the component BAR index. vfio_pci_bar_rw() applies the same overlap check, so fd pread()/pwrite() on the component BAR is also rejected when it would touch the component register subrange. All access to those registers goes through the dedicated COMP_REGS region, where the emulated HDM shadow lives. Hook both helpers into vfio_pci_ioctl_get_info() and vfio_pci_ioctl_get_region_info() in vfio_pci_core.c. The component BAR cannot be claimed exclusively since the CXL subsystem holds persistent sub-range iomem claims during HDM decoder setup. pci_request_selected_regions() returns EBUSY; pass bars=0 to skip the request and map directly via pci_iomap(). Physical ownership is assured by driver binding. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Add BAR bounds check for component block, handle full-BAR component reg case, add bar_mmap_supported gate, block BAR fd read/write and ioeventfd in component reg subrange] Signed-off-by: Jiandi An <jan@nvidia.com>
This commit provides an opt-out mechanism to disable the CXL support from vfio module. The opt-out is provided both build time and module load time. Build time option CONFIG_VFIO_CXL_CORE is used to enable/disable CXL support in vfio-pci module. For runtime disabling the CXL support, use the module parameter disable_cxl. This is a per-device opt-out on the core device set by the driver before registration. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve context mismatch in vfio_pci.c probe function due to missing upstream pci_ops assignment in NV-Kernels base, Wrap disable_cxl field in #if IS_ENABLED(CONFIG_VFIO_CXL_CORE), update MODULE_PARM_DESC wording] Signed-off-by: Jiandi An <jan@nvidia.com>
…ough Add Documentation/driver-api/vfio-pci-cxl.rst describing the architecture, VFIO interfaces, and operational constraints for CXL Type-2 (cache-coherent accelerator) passthrough via vfio-pci-core, and link it from the driver-api index. The document covers: - VFIO_DEVICE_FLAGS_CXL and VFIO_DEVICE_INFO_CAP_CXL: what the capability struct contains and what the FIRMWARE_COMMITTED and CACHE_CAPABLE flags mean - How to derive hdm_decoder_offset and hdm_count from the COMP_REGS region by traversing the CXL Capability Array to find cap ID 0x5 and reading the HDM Decoder Capability register - Topology-aware sparse mmap on the component BAR (topologies A, B, C covering comp block at end, start, or middle of the BAR) - Two extra VFIO device regions: COMP_REGS for the emulated HDM register state and the DPA memory window - DVSEC config write virtualization: what the guest sees vs. hardware - FLR coordination: DPA PTEs zapped before reset, restored after Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Rename vfio_cxl_zap_region_locked to vfio_cxl_prepare_reset and vfio_cxl_reactivate_region to vfio_cxl_finish_reset in docs] Signed-off-by: Jiandi An <jan@nvidia.com>
Export two helpers for VFIO: - pci_cxl_reset_capable() - cxl_dev_reset() The change does not alter the reset flow itself, the capability checks, or the sysfs ABI. It only lifts the helper out of the private path so later VFIO patches can call the same code. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
This change adds/renames the vfio-cxl code nuggets to better suite the cxl-reset handling mechanism in later patches. - Rename the CXL DPA region helpers to prepare_reset() and finish_reset so call sites read as a matched pair around pci_try_reset_function Also call prepare_reset()/finish_reset() around pci_try_reset_function() in both the PCIe BCR FLR path and the Function FLR path, matching the logic already used on the VFIO_DEVICE_RESET ioctl path. - When pci_try_reset_function() fails: finish_reset() consults the hardware COMMITTED state before re-enabling the DPA mapping, so it is safe on error and avoids leaving the DPA region wedged off after a transient reset failure. - Add vfio_cxl_reset_capable(), a small wrapper over pci_cxl_reset_capable() Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
…e post-reset BAR access A reset caller may disable Memory Space to quiesce device DMA before issuing the reset. pci_try_reset_function() saves and restores PCI_COMMAND around the FLR. If the memory space was disabled before FLR, it will be restored in disabled state. vfio_cxl_finish_reset() reads HDM decoder registers through the component register BAR immediately after reset. Accessing a BAR with Memory Space disabled produces an Unsupported Request completion; on platforms that promote UR to a fatal error this triggers DPC. Add vfio_cxl_enable_memory_space() and call it at the start of vfio_cxl_finish_reset() before touching any BAR. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
…ss reset After FLR, reinit_comp_regs() re-reads HDM decoder registers from hardware into comp_reg_virt[]. Hardware is not all-zeros at this point: pci_dev_restore() ran first and re-committed the pre-reset host-physical decoder bases into the registers. reinit_comp_regs() therefore overwrites the emulated guest-physical bases that the device manager programmed with the host-physical bases used by the host CXL core. The kernel provides no notification that BASE was overwritten, so the emulated GPA bases are silently lost. The same issue affects the CTRL LOCK bit: FLR clears it in hardware and pci_dev_restore() does not re-apply it, so a decoder that the guest had locked re-emerges from reset with LOCK clear in shadow. Add vfio_cxl_reinit_hdm_shadow() which snapshots BASE_LOW, BASE_HIGH, and the CTRL LOCK bit from the shadow before calling reinit_comp_regs(), then writes them back after, keeping the emulated decoder consistent with what the guest programmed. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
…nfig shadow STATUS2 was read directly from hardware while all other DVSEC registers were served from the vconfig shadow. This created two problems: 1. VOLATILE_HDM_PRES_ERROR (RW1CS, bit 3): guest writes cleared the hardware bit but the shadow was not updated, so subsequent reads still returned the set bit from hardware (which the hardware had cleared). 2. CXL_RESET_COMPLETE and CXL_RESET_ERROR (bits 1-2): these outcome bits will be written by vfio_cxl_reset() into the shadow after a protocol reset. Hardware does not update them on its own; serving reads from hardware would hide the outcome from the guest. Add STATUS2 to the read switch so reads come from the shadow, and update cxl_dvsec_status2_write() to mirror VOLATILE_HDM_PRES_ERROR clears into the shadow after forwarding to hardware. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
Add vfio_cxl_reset() to drive a CXL protocol reset on behalf of a guest. Unlike cxl_do_reset(), this path skips host memory offlining since the DPA region is guest memory. The function takes memory_lock for the full sequence, calls vfio_cxl_prepare_reset() to zap DPA region PTEs, drives the hardware via pci_dev_save_and_disable() + cxl_dev_reset() + pci_dev_restore(), then calls vfio_cxl_finish_reset() to reinitialise emulated state. STATUS2 outcome bits (CXL_RESET_COMPLETE / CXL_RESET_ERROR) are written back to vconfig after the reset so the guest can poll for the result without reading hardware. cxl_save_dvsec() / cxl_restore_dvsec() cover CTRL, CTRL2, range_base_*, and LOCK; STATUS2 is not saved or restored across the reset, so the hardware value is re-read after restore (it will have both outcome bits clear) and the outcome is stamped on top. When the guest writes INIT_CXL_RST into DVSEC CONTROL2, invoke vfio_cxl_reset() to perform a CXL protocol reset. The bit is not forwarded to hardware; cxl_dev_reset() drives the reset sequence directly. Silently drop writes on devices that do not advertise RST_CAPABLE to avoid log noise for the reserved-bit case. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
… passthrough Enable VFIO CXL core support on amd64 and arm64 to allow CXL Type-2 device passthrough via vfio-pci. Signed-off-by: Jiandi An <jan@nvidia.com>
aef7e33 to
74b6b99
Compare
|
The code looks good to me now, so just a few commit-hygiene comments. These commits seem to all be missing the "(backported from..." or "(cherry picked from..." information: |
These 6 patches are Manish's vfio cxl reset series that has not been posted upstream. He sent me via tarball. I believe it's been posted for review internally via the linux-upstream email alias. There is no source I could specify for now. So treating them out of tree Nvidia developed patches for not. |
|
I re-reviewed with codex comparing the latest branch with the snapshot from my prior review. No issues or concerns from me; my prior ack still stands. |
Description
This patch series adds VFIO CXL Type-2 device passthrough support to the nvidia-6.17 kernel, enabling CXL-capable accelerator devices to be assigned to virtual machines via VFIO. It includes:
get_region_inforefactoring - Upstream series that splitsVFIO_DEVICE_GET_REGION_INFOinto its own driver op and introducesget_region_info_caps, which is a prerequisite for the CXL VFIO region implementationKey Features Added:
cxl_dev_reset)disable_cxlfor per-device opt-outinclude/uapi/cxl/cxl_regs.h) for CXL register definesLP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2152222
Justification
VFIO CXL passthrough is required for assigning CXL Type-2 accelerator devices (GPUs, SmartNICs) to virtual machines:
Source
Patch Breakdown (51 patches):
torvalds/master(merged)get_region_infoseriestorvalds/master(merged in v6.19)Notes on upstream prerequisites (item 1):
Three upstream commits cherry-picked:
4868d2d52df6— crypto: hisilicon - qm updates BAR configuration2131c1517f30— hisi_acc_vfio_pci: adapt to new migration configuration767b1ed8b980— vfio/nvgrace-gpu: fix grammatical errorThe first two resolve a dependency for
e238f147d517("vfio/hisi: Convertto the get_region_info op"). The third fixes a pre-existing comment typo in
the nvgrace-gpu driver that would otherwise cause a patch-ID mismatch with
upstream
1b0ecb5baf4a("vfio/pci: Convert all PCI drivers toget_region_info_caps").
Notes on the VFIO get_region_info series (item 2):
22 upstream commits from Jason Gunthorpe's series, already merged in v6.19:
https://lore.kernel.org/all/0-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com/
These refactor the VFIO region info infrastructure that the CXL VFIO
passthrough series depends on.
Notes on Manish's VFIO CXL series (item 3):
19 out of 20 patches ported from:
https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/
Patch 20/20 (selftests) was skipped as the upstream VFIO selftest
infrastructure (
tools/testing/selftests/vfio/) is not present inthe NV-Kernels base.
Conflict resolutions were required for 10 of 19 patches due to the
NV-Kernels base diverging from upstream in two ways:
cxl_find_regblock,cxl_probe_component_regs,cxl_await_range_active,cxl_regblock_get_bar_info) are ininclude/cxl/pci.hunconditionally (per Srirangan/Alejandro convention from PR [linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling, reset, state save/restore, and interleaving support #342),
rather than in
include/cxl/cxl.hwithCONFIG_CXL_BUSguardsas Manish's patches expect.
xedriver,dmabuf, andp2pdmasupport causescontext mismatches in Kconfig, Makefiles, and VFIO headers.
Notes on Manish's CXL reset series (item 4):
6 patches from internal RFC-v2 posting:
Patch 1/6 had a conflict resolution identical to item 3 (declarations
added to
include/cxl/pci.hinstead ofinclude/cxl/cxl.h).Lore Links:
Jason Gunthorpe's VFIO get_region_info series (v2, merged in v6.19):
https://lore.kernel.org/all/0-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com/
Manish Honap's VFIO CXL Type-2 passthrough series (v2):
https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/
Upstream Status:
torvalds/mastertorvalds/master(v6.19)Testing
Build Validation:
Config Verification:
CXL VFIO config enabled:
Runtime Testing:
Notes
CONFIG_VFIO_CXL_COREis a newboolconfig enabled for both amd64 andarm64. It depends on
VFIO_PCI_CORE(module),CXL_BUS(built-in), andCXL_MEM(built-in). As a bool, it compiles into thevfio-pci-coremodule.(Alejandro's v23, Srirangan's save/restore and reset series).
include/uapi/cxl/cxl_regs.his introduced for CXLcomponent and HDM register defines, using UAPI-safe macros (
__GENMASK,_BITUL) and raw hex sizes instead of kernel-internalSZ_*macros.intentionally skipped as the upstream VFIO selftest infrastructure is not
present in the NV-Kernels base.