Skip to content

Monitoring API: remove maximum retention days prometheus#2851

Open
marioferh wants to merge 1 commit into
openshift:masterfrom
marioferh:fix_prometheus_retention_days
Open

Monitoring API: remove maximum retention days prometheus#2851
marioferh wants to merge 1 commit into
openshift:masterfrom
marioferh:fix_prometheus_retention_days

Conversation

@marioferh
Copy link
Copy Markdown
Contributor

@marioferh marioferh commented May 20, 2026

Remove Maximum validation and documentation for durationInDays (365)
and sizeInGiB (16384) on Prometheus Retention. We do not have enough
customer or operational data to pick defensible API-level ceilings;
arbitrary caps could block valid large-cluster configurations or
misalign with what CMO/Prometheus can actually support.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

Hello @marioferh! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR replaces Prometheus retention's days-based field with an hours-based field: DurationInDays → DurationInHours (int32, minimum=1, maximum=87600) and updates the PrometheusConfig.Retention comment to state the omitted default as 360 hours. The generated ClusterMonitoring CRD schema is updated to use spec.prometheusConfig.retention.durationInHours and the retention.sizeInGiB maximum is increased from 16384 to 2147483647.

Suggested reviewers

  • JoelSpeed
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title partially describes the changeset but is incomplete and somewhat misleading; it mentions only removing retention days limits but omits the significant change from durationInDays to durationInHours and sizeInGiB maximum expansion. Revise the title to reflect the primary changes: something like 'Monitoring API: replace durationInDays with durationInHours and expand retention limits' would better represent the scope of changes.
✅ Passed checks (14 passed)
Check name Status Explanation
Description check ✅ Passed The description accurately explains the rationale for removing API-level caps on Prometheus retention fields, but it does not mention the fundamental field change from durationInDays to durationInHours or the sizeInGiB limit expansion.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies only API definition files (types_cluster_monitoring.go, CRD YAML), not test code. No Ginkgo tests are added or modified, so the check for deterministic test names is not applicable.
Test Structure And Quality ✅ Passed This PR contains no Ginkgo test files (_test.go). It modifies type definitions, CRD YAML, and auto-generated files. The custom check for Ginkgo test quality does not apply.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Only type definitions and CRD YAML manifests for Prometheus retention configuration were modified.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. The changes are limited to API type definitions and CRD manifests for Prometheus retention configuration, not test code.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only API type definitions and CRD schemas, not deployment manifests, operator code, or controllers. No scheduling constraints are introduced; custom check does not apply.
Ote Binary Stdout Contract ✅ Passed PR modifies only type definitions and CRD manifests with no process-level executable code, init functions, or stdout writes that would violate the OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR modifies API type definitions and CRD manifests only; it does not add any Ginkgo e2e tests, so the IPv6 compatibility check does not apply.
No-Weak-Crypto ✅ Passed PR contains only Prometheus retention configuration/schema changes; no cryptographic code, weak algorithms, or secret comparisons present.
Container-Privileges ✅ Passed PR modifies only API type definitions and CRD schemas for Prometheus retention settings, not container/pod manifests. No container privilege configurations exist in these files.
No-Sensitive-Data-In-Logs ✅ Passed PR only modifies CRD schema and Go struct type definitions for Prometheus retention. No logging code or sensitive data exposure introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented
The command is terminated due to an error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 20, 2026
@openshift-ci openshift-ci Bot requested review from JoelSpeed and everettraven May 20, 2026 14:35
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marioferh marioferh force-pushed the fix_prometheus_retention_days branch 2 times, most recently from 80e0c7d to 03a1de9 Compare May 20, 2026 14:57
@marioferh
Copy link
Copy Markdown
Contributor Author

@danielmellado
Copy link
Copy Markdown
Contributor

danielmellado commented May 20, 2026

While this should be ok not having a tombstone as we're just loosening limits, the openapi.json diff includes unrelated changes (NetworkObservabilitySpec, registries descriptions, softirqs collector, KMS default) which are likely from other PRs that merged into master since the branch was created. Could you rebase on latest master so the openapi.json diff only shows the PR changes? This would avoid confusion during review and prevent accidentally shipping stale generated data.

@marioferh marioferh force-pushed the fix_prometheus_retention_days branch from 03a1de9 to 854e0ad Compare May 20, 2026 15:23
@danielmellado
Copy link
Copy Markdown
Contributor

danielmellado commented May 20, 2026

Hey @marioferh, as commented before we go with just removing the max constraints, I think we should also address the granularity issue that Ayoub and Simon raised.

Right now durationInDays: int32 can't express sub-day retention like 15h or 24h (which is the default for UWM), and sizeInGiB: int32 can't express sub-GiB sizes like 500MiB. The CMO ConfigMap and the Prometheus Operator CRD both accept freeform Prometheus duration/size strings, so customers migrating from retention: "15h" would have no equivalent in the new CRD.

I think we should push back on the integer-only convention for these two fields and switch to string fields with the same Prometheus pattern validation that the Prometheus Operator CRD uses:

duration: string with pattern ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$
size: string with pattern ^(0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$
This way CMO can pass the values straight through to the Prometheus CR without any lossy conversion, and all existing ConfigMap values remain valid. The patterns are well-defined and fully validatable at admission time, so we're not losing any safety vs. int32.

@everettraven I know we discussed avoiding duration strings back in #2653, but this comes from the Prometheus Operator CR itself, which validates retention with ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$ and size with (^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$ (https://github.com/openshift/cluster-monitoring-operator/blob/0ce8e4c0bce00381802c25d92ed238775139ee82/assets/monitoring-operator/00_0prometheus-custom-resource-definition.yaml). CMO passes these values straight through to the Prometheus CR, so using durationInDays: int32 and sizeInGiB: int32 introduces a lossy conversion layer that breaks existing configs like retention: "15h" or retentionSize: "500MiB" that customers can set today in the ConfigMap. Would you be open to using string fields with these same Prometheus patterns for this specific case? They're fully validatable at admission time via the regex, and since this is v1alpha1 behind a feature gate we can still change it.

Worst case, as a fallback, we may set hours, but wanted to get back your thoughts on this! Thanks!

Comment on lines 4143 to -4157
@@ -4152,9 +4150,7 @@ spec:
When the limit is reached, Prometheus will delete oldest data first.
When omitted, no size limit is enforced and Prometheus uses available PersistentVolume capacity.
Minimum value is 1 GiB.
Maximum value is 16384 GiB (16 TiB).
format: int32
maximum: 16384
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We strongly recommend having reasonable upper bounds here. If nothing else, at least a maximum of 2,147,483,647 (the int32 maximum).

It is also worth noting that once you set a maximum and it GAs, we won't let you decrease it without strong evidence that it does not break user workflows.

It is usually easier for us to increase maximum constraints as needed than to add one or decrease it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noting here that 2,147,483,647 GiB is ~ 2047 PiB. That seems pretty high...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Minimum value is 1 day.
Maximum value is 365 days (1 year).
format: int32
maximum: 365
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, a maximum of 2,147,483,647 here is equivalent to ~58835 centuries. Seems like an unreasonably high number that would never be set.

Without more context, I would think it to be reasonable to increase this to something like 3650 to account for ~10 years of retention. That seems pretty high to me for metrics, but 🤷

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@everettraven
Copy link
Copy Markdown
Contributor

Hey @marioferh, as commented before we go with just removing the max constraints, I think we should also address the granularity issue that Ayoub and Simon raised.

Right now durationInDays: int32 can't express sub-day retention like 15h or 24h (which is the default for UWM), and sizeInGiB: int32 can't express sub-GiB sizes like 500MiB. The CMO ConfigMap and the Prometheus Operator CRD both accept freeform Prometheus duration/size strings, so customers migrating from retention: "15h" would have no equivalent in the new CRD.

I think we should push back on the integer-only convention for these two fields and switch to string fields with the same Prometheus pattern validation that the Prometheus Operator CRD uses:

duration: string with pattern ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$ size: string with pattern ^(0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$ This way CMO can pass the values straight through to the Prometheus CR without any lossy conversion, and all existing ConfigMap values remain valid. The patterns are well-defined and fully validatable at admission time, so we're not losing any safety vs. int32.

@everettraven I know we discussed avoiding duration strings back in #2653, but this comes from the Prometheus Operator CR itself, which validates retention with ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$ and size with (^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$ (https://github.com/openshift/cluster-monitoring-operator/blob/0ce8e4c0bce00381802c25d92ed238775139ee82/assets/monitoring-operator/00_0prometheus-custom-resource-definition.yaml). CMO passes these values straight through to the Prometheus CR, so using durationInDays: int32 and sizeInGiB: int32 introduces a lossy conversion layer that breaks existing configs like retention: "15h" or retentionSize: "500MiB" that customers can set today in the ConfigMap. Would you be open to using string fields with these same Prometheus patterns for this specific case? They're fully validatable at admission time via the regex, and since this is v1alpha1 behind a feature gate we can still change it.

Worst case, as a fallback, we may set hours, but wanted to get back your thoughts on this! Thanks!

We strongly encourage not using durations in favor of a unit-based integer, but if you folks feel strongly that this is the best UX for your end-users I'm not going to prevent you from doing so. I'm not the expert on your user base here, you folks are :).

That being said, if the concern is solely that you cannot reasonably express smaller unit values that are likely to occur, you can change the unit you are using for the integer field to the smallest reasonable unit. For example, durationInSeconds and sizeInBytes.

Even if this is in v1alpha1 we still have to abide by tombstoning rules if we change the serialization of the fields unless we want to move this to a v1alpha2.

Replace durationInDays with durationInHours so sub-day retention (e.g. 15h,
24h) is expressible without Prometheus duration strings. Cap duration at
87600 hours (~3650 days) per API review; keep sizeInGiB bounded by int32
max instead of the previous 16384 GiB operational ceiling.

Co-authored-by: Cursor <cursoragent@cursor.com>
@marioferh marioferh force-pushed the fix_prometheus_retention_days branch from 854e0ad to 71e6477 Compare May 28, 2026 11:23
@openshift-ci openshift-ci Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 2238-2247: The public JSON name was changed and will break
existing manifests; restore wire compatibility by keeping the original JSON
field name for the struct field DurationInHours (e.g. change the tag to
`json:"durationInDays,omitempty"` so existing persisted objects continue to
round-trip) and add a deprecation/kubebuilder comment on the field explaining it
represents hours (or add a new explicitly named field for hours in a new API
version/conversion path if you intend to support both); update the struct field
DurationInHours and its kubebuilder tags accordingly and ensure any API
docs/comments reflect the restored wire name and units.
- Around line 2243-2255: Remove the hard upper-bound metadata and docs for
retention and storage fields: delete the `+kubebuilder:validation:Maximum=87600`
and the "Maximum value is 87600 hours..." doc line for the DurationInHours field
(symbol: DurationInHours) and delete the
`+kubebuilder:validation:Maximum=2147483647` and the "Maximum value is
2147483647 GiB..." doc line for the sizeInGiB field (symbol: sizeInGiB); keep
the Minimum validations and the rest of the comments intact so the API no longer
advertises or enforces an upper bound.

In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`:
- Around line 4136-4147: The CRD change renamed the serialized retention field
in v1alpha1 from spec.prometheusConfig.retention.durationInDays to
durationInHours which will break existing stored objects and clients; revert the
rename in the v1alpha1 CRD so the field remains durationInDays, restore its
original description/format/minimum/maximum, and instead introduce any new
representation by adding a new API version (e.g., v1beta1) with durationInHours
plus proper conversion webhooks or conversion funcs and tombstoning logic to
translate between durationInDays and durationInHours during reads/writes.
- Around line 4143-4145: The CRD still enforces a 10-year ceiling via the schema
entry "maximum: 87600"; remove that maximum constraint so the API no longer
rejects larger retention values at admission time. Locate the clustermonitorings
CRD schema where "format: int32" and "maximum: 87600" appear (the
retention/hours field) and delete the "maximum: 87600" line (and any equivalent
numeric maximums on related retention fields) so the property is unconstrained
by the CRD schema. Leave the type/format (e.g., "type: integer" / "format:
int32") intact if needed, and run schema validation tests to ensure no other
admission-side maximums remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 003efd47-3c8c-4d49-aac1-a3b47834cee3

📥 Commits

Reviewing files that changed from the base of the PR and between 03a1de9 and 71e6477.

⛔ Files ignored due to path filters (5)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • config/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/zz_generated*
  • openapi/generated_openapi/zz_generated.openapi.go is excluded by !openapi/**, !**/zz_generated*
  • openapi/openapi.json is excluded by !openapi/**
📒 Files selected for processing (2)
  • config/v1alpha1/types_cluster_monitoring.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml

Comment on lines +2238 to +2247
// durationInHours specifies how many hours Prometheus will retain metrics data.
// Prometheus automatically deletes data older than this duration.
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// The default value is 15.
// Minimum value is 1 day.
// Maximum value is 365 days (1 year).
// The default value is 360 (equivalent to 15 days).
// Minimum value is 1 hour.
// Maximum value is 87600 hours (equivalent to 3650 days, approximately 10 years).
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=365
// +kubebuilder:validation:Maximum=87600
// +optional
DurationInDays int32 `json:"durationInDays,omitempty"`
DurationInHours int32 `json:"durationInHours,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid a wire-format break in Retention.

Line 2247 swaps the public JSON field from durationInDays to durationInHours, so existing ClusterMonitoring manifests and persisted objects using the old field will stop round-tripping this setting after upgrade. If the intent is only to relax retention limits, keep the existing wire name or introduce the hours-based variant in a new API version/migration path and tombstone the old name.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/v1alpha1/types_cluster_monitoring.go` around lines 2238 - 2247, The
public JSON name was changed and will break existing manifests; restore wire
compatibility by keeping the original JSON field name for the struct field
DurationInHours (e.g. change the tag to `json:"durationInDays,omitempty"` so
existing persisted objects continue to round-trip) and add a
deprecation/kubebuilder comment on the field explaining it represents hours (or
add a new explicitly named field for hours in a new API version/conversion path
if you intend to support both); update the struct field DurationInHours and its
kubebuilder tags accordingly and ensure any API docs/comments reflect the
restored wire name and units.

Comment on lines +2243 to +2255
// Maximum value is 87600 hours (equivalent to 3650 days, approximately 10 years).
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=365
// +kubebuilder:validation:Maximum=87600
// +optional
DurationInDays int32 `json:"durationInDays,omitempty"`
DurationInHours int32 `json:"durationInHours,omitempty"`
// sizeInGiB specifies the maximum storage size in gibibytes (GiB) that Prometheus
// can use for data blocks and the write-ahead log (WAL).
// When the limit is reached, Prometheus will delete oldest data first.
// When omitted, no size limit is enforced and Prometheus uses available PersistentVolume capacity.
// Minimum value is 1 GiB.
// Maximum value is 16384 GiB (16 TiB).
// Maximum value is 2147483647 GiB (the maximum representable int32 value).
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=16384
// +kubebuilder:validation:Maximum=2147483647
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The schema still carries hard upper bounds.

Line 2245 still enforces a 87600-hour ceiling, and Line 2255 still publishes a sizeInGiB maximum instead of actually removing the upper-bound metadata. If this PR is meant to drop arbitrary retention caps from the API, these Maximum markers and the matching "Maximum value is ..." docs should be removed rather than replaced with larger numbers.

✂️ Proposed schema/doc cleanup
 	// The default value is 360 (equivalent to 15 days).
 	// Minimum value is 1 hour.
-	// Maximum value is 87600 hours (equivalent to 3650 days, approximately 10 years).
 	// +kubebuilder:validation:Minimum=1
-	// +kubebuilder:validation:Maximum=87600
 	// +optional
 	DurationInHours int32 `json:"durationInHours,omitempty"`
 	// sizeInGiB specifies the maximum storage size in gibibytes (GiB) that Prometheus
 	// can use for data blocks and the write-ahead log (WAL).
 	// When the limit is reached, Prometheus will delete oldest data first.
 	// When omitted, no size limit is enforced and Prometheus uses available PersistentVolume capacity.
 	// Minimum value is 1 GiB.
-	// Maximum value is 2147483647 GiB (the maximum representable int32 value).
 	// +kubebuilder:validation:Minimum=1
-	// +kubebuilder:validation:Maximum=2147483647
 	// +optional
 	SizeInGiB int32 `json:"sizeInGiB,omitempty"`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Maximum value is 87600 hours (equivalent to 3650 days, approximately 10 years).
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=365
// +kubebuilder:validation:Maximum=87600
// +optional
DurationInDays int32 `json:"durationInDays,omitempty"`
DurationInHours int32 `json:"durationInHours,omitempty"`
// sizeInGiB specifies the maximum storage size in gibibytes (GiB) that Prometheus
// can use for data blocks and the write-ahead log (WAL).
// When the limit is reached, Prometheus will delete oldest data first.
// When omitted, no size limit is enforced and Prometheus uses available PersistentVolume capacity.
// Minimum value is 1 GiB.
// Maximum value is 16384 GiB (16 TiB).
// Maximum value is 2147483647 GiB (the maximum representable int32 value).
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=16384
// +kubebuilder:validation:Maximum=2147483647
// +kubebuilder:validation:Minimum=1
// +optional
DurationInHours int32 `json:"durationInHours,omitempty"`
// sizeInGiB specifies the maximum storage size in gibibytes (GiB) that Prometheus
// can use for data blocks and the write-ahead log (WAL).
// When the limit is reached, Prometheus will delete oldest data first.
// When omitted, no size limit is enforced and Prometheus uses available PersistentVolume capacity.
// Minimum value is 1 GiB.
// +kubebuilder:validation:Minimum=1
// +optional
SizeInGiB int32 `json:"sizeInGiB,omitempty"`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/v1alpha1/types_cluster_monitoring.go` around lines 2243 - 2255, Remove
the hard upper-bound metadata and docs for retention and storage fields: delete
the `+kubebuilder:validation:Maximum=87600` and the "Maximum value is 87600
hours..." doc line for the DurationInHours field (symbol: DurationInHours) and
delete the `+kubebuilder:validation:Maximum=2147483647` and the "Maximum value
is 2147483647 GiB..." doc line for the sizeInGiB field (symbol: sizeInGiB); keep
the Minimum validations and the rest of the comments intact so the API no longer
advertises or enforces an upper bound.

Comment on lines +4136 to 4147
durationInHours:
description: |-
durationInDays specifies how many days Prometheus will retain metrics data.
durationInHours specifies how many hours Prometheus will retain metrics data.
Prometheus automatically deletes data older than this duration.
When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
The default value is 15.
Minimum value is 1 day.
Maximum value is 365 days (1 year).
The default value is 360 (equivalent to 15 days).
Minimum value is 1 hour.
Maximum value is 87600 hours (equivalent to 3650 days, approximately 10 years).
format: int32
maximum: 365
maximum: 87600
minimum: 1
type: integer
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid renaming the serialized retention field in v1alpha1.

Changing durationInDays to durationInHours changes the wire format of the only served/storage version in this CRD. Existing clients and stored objects using spec.prometheusConfig.retention.durationInDays will break once this lands. Keep the old field in v1alpha1, and introduce any new representation via a new version plus conversion/tombstoning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`
around lines 4136 - 4147, The CRD change renamed the serialized retention field
in v1alpha1 from spec.prometheusConfig.retention.durationInDays to
durationInHours which will break existing stored objects and clients; revert the
rename in the v1alpha1 CRD so the field remains durationInDays, restore its
original description/format/minimum/maximum, and instead introduce any new
representation by adding a new API version (e.g., v1beta1) with durationInHours
plus proper conversion webhooks or conversion funcs and tombstoning logic to
translate between durationInDays and durationInHours during reads/writes.

Comment on lines +4143 to +4145
Maximum value is 87600 hours (equivalent to 3650 days, approximately 10 years).
format: int32
maximum: 365
maximum: 87600
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This still leaves an API-side retention ceiling.

If the goal here is to remove arbitrary retention caps, maximum: 87600 just replaces the old cap with a 10-year one. That still rejects larger configurations at admission time, so the CRD has not actually removed the ceiling.

Suggested schema change
                       The default value is 360 (equivalent to 15 days).
                       Minimum value is 1 hour.
-                      Maximum value is 87600 hours (equivalent to 3650 days, approximately 10 years).
                     format: int32
-                    maximum: 87600
                     minimum: 1
                     type: integer
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`
around lines 4143 - 4145, The CRD still enforces a 10-year ceiling via the
schema entry "maximum: 87600"; remove that maximum constraint so the API no
longer rejects larger retention values at admission time. Locate the
clustermonitorings CRD schema where "format: int32" and "maximum: 87600" appear
(the retention/hours field) and delete the "maximum: 87600" line (and any
equivalent numeric maximums on related retention fields) so the property is
unconstrained by the CRD schema. Leave the type/format (e.g., "type: integer" /
"format: int32") intact if needed, and run schema validation tests to ensure no
other admission-side maximums remain.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@marioferh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-crdify 71e6477 link true /test verify-crdify
ci/prow/verify-crd-schema 71e6477 link true /test verify-crd-schema

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@danielmellado
Copy link
Copy Markdown
Contributor

@everettraven besides the override on the CI, would we need to tombstone this due to the filed removal? (change, technically, but anyway). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants