Skip to content

hotplug: validate CPU hotplug using runtime-discovered topology#463

Merged
abbajaj806 merged 2 commits into
qualcomm-linux:mainfrom
smuppand:Kernel
May 29, 2026
Merged

hotplug: validate CPU hotplug using runtime-discovered topology#463
abbajaj806 merged 2 commits into
qualcomm-linux:mainfrom
smuppand:Kernel

Conversation

@smuppand
Copy link
Copy Markdown
Contributor

Rework the CPU hotplug validation to scale across Qualcomm platforms with different CPU counts, capacities, and cluster layouts.

The previous test used a fixed CPU range and attempted to offline CPUs directly. This was not scalable across SoCs and could produce weak or misleading failures when CPU topology differed from the assumed layout.

This PR updates the test to discover CPU topology dynamically at runtime and validate only CPUs that are actually exposed as hotplug-controllable through sysfs.

Please refer this lava job https://lava.infra.foundries.io/scheduler/job/236862 for the results with these patches.

smuppand added 2 commits May 29, 2026 12:09
Add reusable CPU hotplug helpers to functestlib.sh for CPU online control
handling, online mask checks, CPU state reads/writes, retry-based offline
handling, dmesg evidence collection, topology logging, and best-effort
cleanup of CPUs that were offlined during a test.

The helpers are intentionally limited to reusable hotplug mechanics. They
reuse existing functestlib.sh infrastructure such as check_dependencies(),
check_kernel_config(), logging helpers, and the generic CPU helper APIs
instead of duplicating dependency, kernel config, or local counting logic.

This allows CPU hotplug and related CPU topology tests to scale across
SoCs with different CPU counts, capacities, clusters, and topology layouts
without hardcoding board-specific assumptions.

Signed-off-by: Srikanth Muppandam <smuppand@qti.qualcomm.com>
Rework the hotplug test to dynamically discover online CPUs at runtime
instead of hardcoding a fixed cpu0-cpu7 range.

The test now:
- discovers online CPUs from sysfs
- logs CPU topology, capacity, cluster, and affinity details
- uses check_dependencies() for required userspace tools
- uses check_kernel_config() for CONFIG_HOTPLUG_CPU validation
- validates only CPUs with writable hotplug control
- checks CPU schedulability before and after hotplug
- retries transient offline failures before declaring failure
- treats persistent EBUSY as a CI failure for hotplug-controllable CPUs
- verifies offline state through cpuX/online, online mask, and taskset
- restores any offlined CPU through cleanup on failure or interruption

This makes the test scalable across different Qualcomm SoCs and CPU
cluster layouts while still catching real hotplug regressions such as
persistent EBUSY, failed online recovery, stale online masks, or CPUs
remaining schedulable while reported offline.

Signed-off-by: Srikanth Muppandam <smuppand@qti.qualcomm.com>
Copy link
Copy Markdown
Contributor

@abbajaj806 abbajaj806 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abbajaj806 abbajaj806 merged commit d45a7c5 into qualcomm-linux:main May 29, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants