Skip to content

Support CPMS test role in shiftstack-qa automation#9

Open
tusharjadhav3302 wants to merge 4 commits into
mainfrom
support_cpms_test_role_tj
Open

Support CPMS test role in shiftstack-qa automation#9
tusharjadhav3302 wants to merge 4 commits into
mainfrom
support_cpms_test_role_tj

Conversation

@tusharjadhav3302
Copy link
Copy Markdown

Summary

  • Add new cpms_test stage role that runs the upstream cluster-control-plane-machine-set-operator e2e tests (presubmit + periodic) with branch fallback logic for newer OCP versions
  • Add cpms_replace_attrs day2ops procedure ported from the openshift-ir-plugin, which validates CPMS reconciliation by patching failure domains, networks, and security groups on control plane nodes
  • Wire the new stage into ocp_testing.yaml and enable it in osp_verification and 4.17_ovnkubernetes_ipi job definitions

Details

The CPMS operator manages the lifecycle of OCP control plane machines. This PR ports the existing Jenkins/IR-based CPMS testing into shiftstack-qa's Ansible automation framework.
New cpms_test stage role:

  • Clones the upstream operator repo at the correct release branch (with fallback to main for versions without a release branch)
  • Runs make e2e-presubmit and make e2e-periodic with OPENSHIFT_CI=true for JUnit output
  • Post-processes results (XML tag modification, HTML conversion) and copies to the report directory for Polarion/ReportPortal
    New cpms_replace_attrs day2ops procedure:
  • Creates a test network, subnet, and security group in OpenStack
  • Patches the CPMS to inject these resources and swap failure domain attributes between masters
  • Waits for full CPMS reconciliation (rolling all 3 control plane nodes)
  • Validates that the resulting VMs reflect the expected AZ, volume type, network, and SG changes
  • Restores the original CPMS configuration and cleans up test resources

@tusharjadhav3302 tusharjadhav3302 added the enhancement New feature or request label May 20, 2026
Copy link
Copy Markdown
Contributor

@imatza-rh imatza-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good port from the IR-plugin. Nice improvements: OpenStack resource cleanup in always, must-gather on test failure. Nit: PR description says "branch fallback" but prepare_openshift_tests.yml has no fallback - same as all other roles, works fine.

Please run ./gate.sh (ansible-lint + pre-commit inside the shiftstack-client container) - no CI checks are configured on GitHub PRs. yamllint passes locally, but ansible-lint needs the container.

- post
- verification
- day2ops
- cpms_test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cpms_test needed here? This is a Jenkins job definition — the Zuul integration job uses osp_verification.yaml. The e2e-periodic rolls all 3 masters (5h ginkgo timeout) and has been a systemic timeout in Jenkins since 4.18.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is a Jenkins job definition and the e2e-periodic 5h timeout would be problematic here. Removed cpms_test from stages, cpms_replace_attrs from day2ops_procedures, and the cpms_replacements vars. The CPMS tests will only run via the Zuul integration job using osp_verification.yaml.

- cpms_replacements.sg_name in item.security_groups | json_query('[*].name')
with_items: "{{ master_after }}"

rescue:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The day2ops wrapper run_procedure.yml already runs must-gather + records failure on rescue. This inner rescue duplicates that. See how moving-etcd-to-ephemeral.yml handles it — no inner rescue, relies on the wrapper. Consider removing this rescue block (keep the always — the restore logic is correct and needed).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. run_procedure.yml already handles must-gather and failure recording in its rescue block. Removed the inner rescue to follow the same pattern as moving-etcd-to-ephemeral.yml. The always block with the CPMS restore logic is kept since that's procedure-specific cleanup that the wrapper can't handle.

@tusharjadhav3302 tusharjadhav3302 force-pushed the support_cpms_test_role_tj branch from f88cbb5 to cce04e6 Compare May 28, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

2 participants