Skip to content

Provide nvidia-imex system service#428

Merged
arnaldo2792 merged 5 commits into
bottlerocket-os:developfrom
arnaldo2792:imex-channels
May 11, 2026
Merged

Provide nvidia-imex system service#428
arnaldo2792 merged 5 commits into
bottlerocket-os:developfrom
arnaldo2792:imex-channels

Conversation

@arnaldo2792
Copy link
Copy Markdown
Contributor

Description of changes:

This series adds a new inactive systemd service for nvidia-imex. The service is inactive (not started on boot) because downstreams are expected to manage its lifecycle, as nvidia-imex requires a configuration file with the IPs of the nodes that will belong to the same cluster which are only known by the control plan and the capacity provider. In Kubernetes, nvidia-imex channels are managed by the NVIDIA DRA driver, so this service is intended to be used by other orchestrators.

As part of this change, a new modprobe override was added that allows the NVIDIA kmods to create a default IMEX channel. This configuration is opt-in, as enabling by default in all variants could interfere with the IMEX channels management performed by the NVIDIA DRA driver.

No API will be provided for the time being, but one might be needed if we decide to extend the support of nvidia-imex beyond what we currently have.

The systemd service was based off the nvidia-imex.service provided by the run archives, I just adapt it slightly

A default configuration file is provided with sensible defaults:

Details

LOG_LEVEL=3: log info messages
LOG_FILE_NAME: empty to log to STDOUT and get the logs to the journal
STATS_FILE_NAME: same as above
DAEMONIZE=0: run as an actual process (don't fork)
BIND_INTERFACE_IP=: set through nodes_config.cfg, managed by the downstreams
SERVER_PORT=50000: default port, but make it explicit
IMEX_NODE_CONFIG_FILE=/etc/nvidia-imex/nodes_config.cfg: default path, but make it explicit
NETWORK_INTERFACE= set through nodes_config.cfg, managed by the downstreams
OUTGOING_NETWORK_INTERFACE=
IMEX_WAIT_FOR_QUORUM=RECOVERY: safe default (wait for quorum)
IMEX_CMD_ENABLED=1
IMEX_CMD_UNIX_DOMAIN_PATH=/run/nvidia/nvidia-imex-cmd.sock: socket to send commands to the service
IMEX_NODE_DISCONNECTED_GRACE_TIME=-1: Wait indefinitely
IMEX_GRPC_DSCP_OVERRIDE=0:

NO-OP configurations but since I don't have access to the source code nor to instances that can connect through IMEX, I decided to keep them.

LOG_APPEND_TO_LOG=1:
LOG_FILE_MAX_SIZE=1024
LOG_MAX_ROTATE_COUNT=3
LOG_USE_SYSLOG=1

Testing done:

  • Confirmed the nvidia-imex service doesn't start on boot:
Details
bash-5.2# systemctl status nvidia-imex.service
○ nvidia-imex.service - NVIDIA IMEX service
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/nvidia-imex.service; static)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: inactive (dead)
  • Confirmed kernel module loaded successfully even with the new config:
Details
bash-5.2# lsmod | grep nvidia
nvidia_uvm           1990656  0
nvidia_modeset       1753088  0
nvidia              13963264  7 nvidia_uvm,nvidia_modeset
video                  81920  1 nvidia_modeset
drm                   794624  1 nvidia
i2c_core              122880  4 nvidia,i2c_smbus,i2c_piix4,drm
backlight              28672  3 video,drm,nvidia_modeset
bash-5.2# nvidia-smi
Wed May  6 22:58:47 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
  • Confirmed default IMEX channel was created:
Details
bash-5.2# ls /dev/nvidia-caps-imex-channels/
channel0
bash-5.2#

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@arnaldo2792
Copy link
Copy Markdown
Contributor Author

Forced push to:

  • Remove skip ci tag
  • Update placeholder for changelog

Comment thread CHANGELOG.md Outdated

## OS Changes
* Backport patch to prevent a race in neighbor resolution for RDMA workloads ([#427])
* Provide innactive nvidia-imex systemd service ([#428])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Provide innactive nvidia-imex systemd service ([#428])
* Provide inactive nvidia-imex systemd service ([#428])

ExecStart=/usr/bin/nvidia-imex -c /etc/nvidia-imex/config.cfg
StandardOutput=journal
StandardError=journal
LimitCORE=infinity
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the PR description the missing [Install] section seems intentional (start-only, never enabled). If so, can we add a short comment in the unit explaining that ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start-only, never enabled

This sounds confusing, the intention is never start, never enabled (which is what is shown in the PR details.) I can add the comment but I prefer to keep a good commit message stating why the change is what it is.

%{_cross_bindir}/nvidia-imex-ctl
%{_cross_unitdir}/nvidia-imex.service
%{_cross_factorydir}/etc/nvidia-imex/config.cfg
%{_cross_tmpfilesdir}/nvidia-imex-tmpfiles.conf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would it makes sense to rename nvidia-imex-tmpfiles.conf to be nvidia-imex.conf since they are already in the tmp dir ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should have caught that. Nice catch!

@@ -0,0 +1,2 @@
d /etc/nvidia-imex 0755 root root -
C /etc/nvidia-imex/config.cfg
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
C /etc/nvidia-imex/config.cfg
C /etc/nvidia-imex/config.cfg 0644 root root -

@arnaldo2792 arnaldo2792 force-pushed the imex-channels branch 2 times, most recently from 8b77147 to 46f90be Compare May 11, 2026 21:52
@arnaldo2792
Copy link
Copy Markdown
Contributor Author

(Forced push includes rebase to fix merge conflicts)

Provide an inactive nvidia-imex systemd service, which should be
managed by the downstreams.

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
Provide a modprobe override for the NVIDIA kernel module to create a
default IMEX channel

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
Provide an inactive nvidia-imex systemd service, which should be
managed by the downstreams

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
Provide a modprobe override for the NVIDIA kernel module to create a
default IMEX channel

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
Add changelog entries for the changes introduced in this series

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
@arnaldo2792
Copy link
Copy Markdown
Contributor Author

(forced push to fix typo in commit message)

@KCSesh
Copy link
Copy Markdown
Contributor

KCSesh commented May 11, 2026

nit: commit: a31a7e05cb15689fad9237d8b0f6fed389d91a46 has a period at the end - none of the others do.
Not a blocker.

LGTM

@arnaldo2792 arnaldo2792 merged commit 11aa113 into bottlerocket-os:develop May 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants