RFC: support clearing of metric objects for custom collectors by calestyo · Pull Request #1175 · prometheus/client_python

calestyo · 2026-05-13T21:08:07Z

Hey.

This is based on my second point in #1173 (comment).

Please see there and also the commit message which explains the rationale.

I think the biggest open point right now would be whether to use self.samples.clear() or rather self.samples = [], which would probably be way faster (see commit message for the reasoning).
If .samples was actually be considered non-public, it should however be renamed (and I think the same goes for e.g. add_sample, whose doc-string already says it would be internal only... so just renamed it to _add_sample, or maybe warp around it and add a deprecation warning to the public one.

Cheers,
Chris.

PS: @csmarchbanks and @k1chik might want to have a look. For the latter, if this PR would be considered worthy to add... it would probably make sense to present that way of use as an alternative to the current schema in the docs (which you've so greatly worked on recently)

When not doing “direct instrumentation” but rather collecting metrics and thus using the `*MetricFamily`-classes, the general approach seems to be: • Defining a custom collector (that is: a class which is derived from `prometheus_client.registry.Collector`) and at least a `collect`-method. • Create instances of `*MetricFamily` within `collect`. • Add metrics to these via their `add_metric`-method. • Either `return` or `yield` those, for use by the collector registry. The last two steps will be done for every invocation of `collect`. In many cases however, the `*MetricFamily`-objects themselves (that is: their name, documentation, labels, et cetera – basically all the values set via their respective `__init__`-method) except for any metrics added to them will always stay the same and be needlessly re-defined/created on every call to `collect`. This adds `clear`-method to their base class `prometheus_client.metrics_core.Metric`, which simply clears the instance’s `samples`-attribute, so that the `*MetricFamily`-objects that are static in the above sense, can be defined/created for example in `__init__`, and `collect` would merely need to call `clear` for every metric. That could be done either in the beginning of `clear` or perhaps right after `yield`ing the respective object(s) (which would allow Python to reclaim the memory faster, but might perhaps lead to the clearing not happening if an exception is raised. So perhaps the best practise would to do both, clear all in the beginning of `collect` and – for exporters who have so many metrics that this might be an issue – right after `yield`ing. Another idea was, that the clearing could actually be done by the registry after it has collected the metrics, but this seems to rather just limit the freedom and would possibly also not be backwards compatible. This change of course still allows for the “old” usage (that is: re-defining/ creating the `*MetricFamily`-objects in `collect` on every call. Last but not least, currently, `samples` is cleared via the `list.clear`-method rather than simply assigning a new empty list to it. The former is probably considerably slower (if there are many elements in the list), but chosen for now, as `samples` is (per naming conventions) “public” and thus references to it might be held by users. Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>

calestyo · 2026-05-13T21:12:59Z

btw: I think we could perhaps even tell people that if they would want to use the "new" way (i.e. define metrics in __init__ and merely clear() them in collect(), they could rather easily keep backwards compatibility, by simply doing the self.samples.clear() themselves, if self has no clear-method - or was anything around samples changed recently so that this wouldn't work?

And perhaps one more rationale point:
In my understanding, the memory issue is rather the many metrics (i.e. samples) and not having "many" the *MetricFamily-objects around, or is it?
Cause if correct, then "keeping" them, as proposed here, shouldn't really kill memory, if we still allow to .clear() as soon as the data has been yielded.

calestyo · 2026-05-13T21:14:33Z

Oh and one more point to decide upon:

How should the function be named?

I went for clear now, but we might also choose something like clear_metrics or clear_samples, because the function clears only those and not the whole object (like name, documentation, etc. all stay).

k1chik · 2026-05-14T01:56:03Z

Agree on the memory point, the cost is the samples (the label-value data), not the MetricFamily wrapper objects themselves, so keeping them in __init__ and clearing samples after each yield is sound. self.samples.clear() as a fallback for backwards compatability makes sense too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: support clearing of metric objects for custom collectors#1175

RFC: support clearing of metric objects for custom collectors#1175
calestyo wants to merge 1 commit into
prometheus:masterfrom
calestyo:clear-custom-collectors

calestyo commented May 13, 2026

Uh oh!

calestyo commented May 13, 2026

Uh oh!

calestyo commented May 13, 2026

Uh oh!

k1chik commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

calestyo commented May 13, 2026

Uh oh!

calestyo commented May 13, 2026

Uh oh!

calestyo commented May 13, 2026

Uh oh!

k1chik commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants