Skip to content

[shiftstack] Use GitHub for shiftstack-qa clone and fix false-positive test results#3922

Draft
imatza-rh wants to merge 1 commit into
openstack-k8s-operators:mainfrom
shiftstack:shiftstack-qa-github-clone
Draft

[shiftstack] Use GitHub for shiftstack-qa clone and fix false-positive test results#3922
imatza-rh wants to merge 1 commit into
openstack-k8s-operators:mainfrom
shiftstack:shiftstack-qa-github-clone

Conversation

@imatza-rh
Copy link
Copy Markdown
Contributor

@imatza-rh imatza-rh commented May 10, 2026

Summary

Two fixes for the shiftstack CI role:

1. Clone shiftstack-qa from GitHub instead of GerritHub

shiftstack-qa moved code review to GitHub PRs (shiftstack/shiftstack-qa#2). The role default still clones from GerritHub, which is no longer maintained. GerritHub main is 3+ commits behind GitHub main - missing the 4-stable OCP channel switch, lb_tests disable, and .gitreview removal.

  • Change cifmw_shiftstack_qa_repo default to https://github.com/shiftstack/shiftstack-qa
  • Rename cifmw_shiftstack_qa_gerrithub_change to cifmw_shiftstack_qa_change_ref - supports both GitHub PR refs and GerritHub change refs

2. Fix false-positive and connection-drop bugs in exec_command_in_pod.yml

oc exec/rsh connections to the shiftstackclient pod drop after ~38 minutes during long-running tests. The previous fix (marker file + polling) didn't solve it because the marker file write was chained to the same bash process that dies when the connection drops.

Root cause verified from tp!2297 build d349fb73 inner test log: the test ran for 1h 37m inside the pod (Manila StorageClass failure at 14:20 UTC), but oc exec returned after only 38m 50s. The marker file was never created - 720 retries over 2h all got No such file or directory.

Fix: writes the command to a script file, copies it into the pod, then runs it detached via nohup. The process survives oc exec connection drops. Also kills orphaned background processes from previous invocations to prevent marker file poisoning.

Verified on the live shiftstackclient pod (serval70) - 6-case Ansible test suite covering success, exit codes 1/42, dollar-sign commands, orphan kill, retry timeout, and nohup survival after oc exec termination.

Test plan

  • GitHub clone validated in tp!2297 build 70b1f455 (5h 8m) - OCP installed, cluster checks passed
  • 6-case Ansible test suite on live pod (serval70)
  • nohup survival test: 30s command, oc exec killed at 3s, marker appeared at t+30s
  • YAML syntax + ansible-lint clean
  • Testproject re-validation (tp!2297 recheck)

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 10, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@imatza-rh imatza-rh force-pushed the shiftstack-qa-github-clone branch 3 times, most recently from 22a3c0d to 4dd8af5 Compare May 11, 2026 13:00
@imatza-rh imatza-rh changed the title [shiftstack] Clone shiftstack-qa from GitHub instead of GerritHub [shiftstack] Use GitHub for shiftstack-qa clone and fix false-positive test results May 11, 2026
@imatza-rh imatza-rh changed the base branch from stable to main May 11, 2026 13:24
@imatza-rh imatza-rh force-pushed the shiftstack-qa-github-clone branch from 4dd8af5 to b2ab112 Compare May 11, 2026 13:27
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign imatza-rh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…e test results

shiftstack-qa moved code review to GitHub PRs (shiftstack/
shiftstack-qa#2). The role default still clones from
GerritHub, which is no longer maintained.

- Change cifmw_shiftstack_qa_repo default to GitHub
- Rename cifmw_shiftstack_qa_gerrithub_change to
  cifmw_shiftstack_qa_change_ref

Also fix a false-positive bug in exec_command_in_pod.yml
where oc rsh could return rc=0 before the inner command
finished, causing the test to report PASS when the playbook
actually failed. Observed in tp!2297 build d078ccbd - the
shiftstack test ran for only 145s (vs typical 1.5-2h), OCP
was never installed, yet the job reported SUCCESS.

The fix writes the inner command exit code to a marker file
on the pod, then verifies it in a separate oc exec call.
This catches failures even if the first oc exec connection
drops mid-execution. Also switches from oc rsh to oc exec
for non-interactive pod commands.

Related-Issue: #OSPRH-29506

Assisted-By: Claude Code
Signed-off-by: Itay Matza <imatza@redhat.com>
@imatza-rh imatza-rh force-pushed the shiftstack-qa-github-clone branch from b2ab112 to 487590c Compare May 12, 2026 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant