Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# v5.5.0 (TBD)

## OS Changes
* Backport patch to prevent a race in neighbor resolution for RDMA workloads ([#427])
* Provide inactive nvidia-imex systemd service ([#428])
* Provide NVIDIA modprobe override to create a default IMEX channel ([#428])

[#427]: https://github.com/bottlerocket-os/bottlerocket-kernel-kit/pull/427
[#428]: https://github.com/bottlerocket-os/bottlerocket-kernel-kit/pull/428

# v5.4.2 (2026-05-05)

## OS Changes
Expand Down
27 changes: 27 additions & 0 deletions packages/kmod-6.12-nvidia-r580/kmod-6.12-nvidia-r580.spec
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ Source211: grid-license-check.timer
Source212: open-gpu-license-fallback.service
Source213: tesla-license-fallback.service
Source214: grid-license-file-check.conf
Source215: nvidia-imex.service
Source216: nvidia-imex.cfg
Source217: nvidia-imex-tmpfiles.conf
Source218: nvidia-imex-default-channel.conf

# NVIDIA tesla conf files from 300 to 399
Source300: nvidia-tesla-tmpfiles.conf
Expand Down Expand Up @@ -112,6 +116,13 @@ Requires: %{name}
%description imex
%{summary}.

%package imex-config
Summary: NVIDIA IMEX modprobe configuration
Requires: %{name}-imex

%description imex-config
%{summary}.

%package open-gpu
Summary: NVIDIA %{tesla_major} Open GPU driver
Version: %{tesla_ver}
Expand Down Expand Up @@ -503,6 +514,16 @@ install -p -m 0755 usr/bin/nvidia-imex-ctl %{buildroot}%{_cross_bindir}

popd

# NVIDIA IMEX service, config, and tmpfiles
install -p -m 0644 %{S:215} %{buildroot}%{_cross_unitdir}
install -d %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/nvidia-imex
install -p -m 0644 %{S:216} %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/nvidia-imex/config.cfg
install -p -m 0644 %{S:217} %{buildroot}%{_cross_tmpfilesdir}/nvidia-imex.conf

# NVIDIA IMEX modprobe config
install -d %{buildroot}%{_cross_libdir}/modprobe.d
install -p -m 0644 %{S:218} %{buildroot}%{_cross_libdir}/modprobe.d/10-nvidia-default-imex-channel.conf

%files
%{_cross_attribution_file}
%dir %{_cross_libexecdir}/nvidia
Expand Down Expand Up @@ -786,6 +807,12 @@ popd
%files imex
%{_cross_bindir}/nvidia-imex
%{_cross_bindir}/nvidia-imex-ctl
%{_cross_unitdir}/nvidia-imex.service
%{_cross_factorydir}/etc/nvidia-imex/config.cfg
%{_cross_tmpfilesdir}/nvidia-imex.conf

%files imex-config
%{_cross_libdir}/modprobe.d/10-nvidia-default-imex-channel.conf

%files mps
%{_cross_bindir}/nvidia-cuda-mps-control
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
options nvidia NVreg_CreateImexChannel0=1
2 changes: 2 additions & 0 deletions packages/kmod-6.12-nvidia-r580/nvidia-imex-tmpfiles.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
d /etc/nvidia-imex 0755 root root -
C /etc/nvidia-imex/config.cfg 0644 root root -
140 changes: 140 additions & 0 deletions packages/kmod-6.12-nvidia-r580/nvidia-imex.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# NVIDIA IMEX configuration file.
# Note: This configuration file is read during IMEX startup. So, IMEX
# service restart is required for new settings to take effect.

# Description: IMEX logging levels
# Possible Values:
# 0 - All the logging is disabled
# 1 - Set log level to CRITICAL and above
# 2 - Set log level to ERROR and above
# 3 - Set log level to WARNING and above
# 4 - Set log level to INFO and above
# Default Value: 4
LOG_LEVEL=3

# Description: Filename for IMEX logs
# Possible Values:
# Full path/filename string (max length of 256). Logs will be redirected
# to console(stderr). If the specified log file can't be opened or the
# path is empty.
# Default Value: /var/log/nvidia-imex.log
LOG_FILE_NAME=

# Description: Filename for IMEX stats logging
# Possible Values:
# Full path/filename string (max length of 256). Stats will be redirected
# to console(stderr), if the specified stats file can't be opened or the
# path is empty.
# Default Value: /var/log/nvidia-imex-stats.log
# Note: If STATS_FILE_NAME is configured same as LOG_FILE_NAME, then stats will
# be redirected to the path/filename specified by LOG_FILE_NAME.
STATS_FILE_NAME=

# Description: Append to an existing log file or overwrite the logs
# Possible Values:
# 0 - No (Log file will be overwritten)
# 1 - Yes (Append to existing log)
# Default Value: 1
LOG_APPEND_TO_LOG=1

# Description: Max size of log file (in MB)
# Possible Values:
# Any Integer values
# Default Value: 1024
LOG_FILE_MAX_SIZE=1024

# Description: Number of times the IMEX log is rotated once it reaches LOG_FILE_MAX_SIZE
# Possible Values:
# 0 - Log is not rotated. Logging is stopped once the IMEX log file reaches
# the size specified in LOG_FILE_MAX_SIZE
# Non-zero Integer - Log is rotated upto the number of times specified in LOG_MAX_ROTATE_COUNT,
# after the size of the log file reaches the size specified in LOG_FILE_MAX_SIZE.
# Combined IMEX log size is LOG_FILE_MAX_SIZE multipled by LOG_MAX_ROTATE_COUNT+1
# Once this threshold is reached, the oldest log file is purged and reused.
LOG_MAX_ROTATE_COUNT=3

# Description: Redirect all the logs to syslog instead of logging to file
# Possible Values:
# 0 - No
# 1 - Yes
# Default Value: 0
LOG_USE_SYSLOG=1

# Description: daemonize IMEX on start-up
# Possible Values:
# 0 - No (Do not daemonize and run IMEX as a normal process)
# 1 - Yes (Run IMEX process as Unix daemon
# Default Value: 1
DAEMONIZE=0

# Description: Network interface to listen for IMEX peer communication.
# OPTIONAL - empty value will determine the bind IP from the node config file.
# Possible Values:
# A valid IPv4 address
# A valid IPv6 address
# No value - Determine bind IP from the node configuration file.
# Default Value:
BIND_INTERFACE_IP=

# Description: Starting TCP port number for IMEX peer communication
# Possible Values:
# Any value between 0 and 65535
# Default Value: 50000
SERVER_PORT=50000

# Description: Name of file containing IP addresses of nodes
# Possible Values:
# Full path/filename string (max length of 256).
# Default Value: /etc/nvidia-imex/nodes_config.cfg
IMEX_NODE_CONFIG_FILE=/etc/nvidia-imex/nodes_config.cfg

# Description: Name of the network interface used for communication.
# OPTIONAL - If empty, network interface will be determined by matching bind IP to
# node configuration file. Only necessary to configure if the bind IP
# is IPv6 link-local and on multiple network interfaces.
# Possible Values:
# A valid interface name. e.g. eth0, ens32 .. etc
# Default Value:
NETWORK_INTERFACE=

# Description: Name of the network interface used for outgoing connections.
# OPTIONAL - If empty, outgoing network interface will be determined automatically.
# Only necessary if user desires to force all
# outgoing connections to use a particular interface.
# Possible Values:
# A valid interface name. e.g. eth0, ens32 .. etc
# Default Value:
OUTGOING_NETWORK_INTERFACE=

# Description: Controls whether IMEX should complete initialization without establishing quorum
# Possible values:
# NONE: Do not wait for any quorum with other nodes.
# RECOVERY: In case of unsafe IMEX termination, wait until all nodes that had previously imported
# have connected, allowing them time to safely clean up any potentially hanging references
# Default value: RECOVERY
IMEX_WAIT_FOR_QUORUM=RECOVERY

# Description: Enabled the command/control service to allow for querying information from the IMEX daemon.
# Must be used with IMEX_CMD_PORT (optionally IMEX_CMD_BIND_INTERFACE_IP) and/or
# IMEX_CMD_UNIX_DOMAIN_PATH
IMEX_CMD_ENABLED=1

# Description: Unix domain socket path to attach to for the command/control service. Ignored if IMEX_CMD_ENABLED=0
IMEX_CMD_UNIX_DOMAIN_PATH=/run/nvidia/nvidia-imex-cmd.sock

# Description: Determines how long to wait after detecting that the IMEX daemon has lost connection to another
# node before triggering clean up imports and exports from that node. If a connection is reestablished
# before the grace period expires, and IMEX is able to identify that it is the same instance previously
# connected, then no clean up is required. If a connection is established and IMEX detects that it is
# a new instance (i.e. someone restarted the IMEX daemon), then clean up will be immediately triggered
# regardless of grace period.
# -1: Default - Wait indefinitely
# 0: Immediately trigger clean up
# >0: Number of seconds to wait before triggering clean up
IMEX_NODE_DISCONNECTED_GRACE_TIME=-1

# Description: Optionally configure the DSCP value for both the listening server socket and the outgoing client
# connections.
# 0: Default
# 1-63: Custom DSCP setting
IMEX_GRPC_DSCP_OVERRIDE=0
11 changes: 11 additions & 0 deletions packages/kmod-6.12-nvidia-r580/nvidia-imex.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[Unit]
Description=NVIDIA IMEX service
After=network-online.target
Requires=network-online.target

[Service]
Type=simple
ExecStart=/usr/bin/nvidia-imex -c /etc/nvidia-imex/config.cfg
StandardOutput=journal
StandardError=journal
LimitCORE=infinity
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the PR description the missing [Install] section seems intentional (start-only, never enabled). If so, can we add a short comment in the unit explaining that ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start-only, never enabled

This sounds confusing, the intention is never start, never enabled (which is what is shown in the PR details.) I can add the comment but I prefer to keep a good commit message stating why the change is what it is.

27 changes: 27 additions & 0 deletions packages/kmod-6.18-nvidia-r580/kmod-6.18-nvidia-r580.spec
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ Source211: grid-license-check.timer
Source212: open-gpu-license-fallback.service
Source213: tesla-license-fallback.service
Source214: grid-license-file-check.conf
Source215: nvidia-imex.service
Source216: nvidia-imex.cfg
Source217: nvidia-imex-tmpfiles.conf
Source218: nvidia-imex-default-channel.conf

# NVIDIA tesla conf files from 300 to 399
Source300: nvidia-tesla-tmpfiles.conf
Expand Down Expand Up @@ -112,6 +116,13 @@ Requires: %{name}
%description imex
%{summary}.

%package imex-config
Summary: NVIDIA IMEX modprobe configuration
Requires: %{name}-imex

%description imex-config
%{summary}.

%package open-gpu
Summary: NVIDIA %{tesla_major} Open GPU driver
Version: %{tesla_ver}
Expand Down Expand Up @@ -503,6 +514,16 @@ install -p -m 0755 usr/bin/nvidia-imex-ctl %{buildroot}%{_cross_bindir}

popd

# NVIDIA IMEX service, config, and tmpfiles
install -p -m 0644 %{S:215} %{buildroot}%{_cross_unitdir}
install -d %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/nvidia-imex
install -p -m 0644 %{S:216} %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/nvidia-imex/config.cfg
install -p -m 0644 %{S:217} %{buildroot}%{_cross_tmpfilesdir}/nvidia-imex.conf

# NVIDIA IMEX modprobe config
install -d %{buildroot}%{_cross_libdir}/modprobe.d
install -p -m 0644 %{S:218} %{buildroot}%{_cross_libdir}/modprobe.d/10-nvidia-default-imex-channel.conf

%files
%{_cross_attribution_file}
%dir %{_cross_libexecdir}/nvidia
Expand Down Expand Up @@ -786,6 +807,12 @@ popd
%files imex
%{_cross_bindir}/nvidia-imex
%{_cross_bindir}/nvidia-imex-ctl
%{_cross_unitdir}/nvidia-imex.service
%{_cross_factorydir}/etc/nvidia-imex/config.cfg
%{_cross_tmpfilesdir}/nvidia-imex.conf

%files imex-config
%{_cross_libdir}/modprobe.d/10-nvidia-default-imex-channel.conf

%files mps
%{_cross_bindir}/nvidia-cuda-mps-control
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
options nvidia NVreg_CreateImexChannel0=1
2 changes: 2 additions & 0 deletions packages/kmod-6.18-nvidia-r580/nvidia-imex-tmpfiles.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
d /etc/nvidia-imex 0755 root root -
C /etc/nvidia-imex/config.cfg 0644 root root -
Loading
Loading