Sub-batches with equal flux in photon pooling by welucas2 · Pull Request #513 · LSSTDESC/imSim

welucas2 · 2026-03-18T12:28:47Z

Currently, the sub-batching in photon pooling is performed object-by-object. So, for example, if there are 100 objects in the batch, and 20 sub-batches, then the first five objects will go in the first sub-batch, the next five in the second, and so on. This means that if the objects vary a lot in brightness (and they do) then the flux per sub-batch will also vary a lot. In a typical run and setup provided by Jim, I found that most sub-batches contained a few thousand photons, and a handful contained several tens of millions. This leads to regular memory spikes when those sub-batches are processed in each new photon pool.

This PR smooths those memory spikes out by creating sub-batches with roughly equal fluxes. This is an implementation of the bin packing problem with fragmentation: our goal is to split the total flux in the pool across the sub-batches, and to achieve this we have to be able to fragment those extremely bright objects across however many sub-batches.

At the same time, each object fragmentation means another object lookup, and we had determined previously that with photon pooling these can mount up and become quite costly. For this reason, I'm allowing the flux in each sub-batch to vary slightly, letting objects to fill up over the limit to 105% of the expected per sub-batch flux if it prevents a fragmentation.

This is ready to go, but needs #511 to be merged in first and then this should be rebased onto it - so for now I'm leaving this as a draft.

…ooling.

… to allow loose packing to allow total fluxes that don't fit evenly into the requested number of sub-batches.

…mplex fragmentation.

…t still fails.

…lux method which is rename to make_photon_subbatches.

welucas2 · 2026-03-26T15:15:39Z

This is ready for review, though CI failed earlier on today during Conda setup: CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/linux-64/repodata.json>. Could someone with permission please re-run CI to see if this gets past it?

rmjarvis

I think I'm confused about some of this.

rmjarvis · 2026-04-08T21:09:53Z

+                # Calculate current sub-batch flux.
+                # It would be nicer to store rather then recalculate.
+                subbatch = subbatches[current_subbatch]
+                subbatch_flux = sum(sb_obj.phot_flux for sb_obj in subbatch)


As you say, it would be nicer to keep this as a running total, rather than O(N^2) recalculating it each time. Is that hard for some reason? I'm thinking the easiest way would be to keep an array with the flux in each batch so far.

subbatch_fluxes = [0. for _ in range(nsubbatch)]

Then update it each time you add something to subbatches.

There are also patterns that only keep the running sum for the current batch, but the advantage of this is there is a nice debug statement you could put at the end with the list of realized fluxes per batch, which would be handy to be able to easily see when running.

rmjarvis · 2026-04-08T21:12:55Z

+                if remaining_flux <= available_flux:
+                    # This sub-batch can contain the entirety of the remaining
+                    # object flux. Place entirety it all in here, the continue
+                    # without advancing in case more can go in.


I expected the first test to be if remaining_flux <= loose_photons_per_subbatch - subbatch_flux, in which case put it in entirely. And if remaining_flux >= photons_per_subbatch - subbatch_flux then bump to the next batch.

I think this cleanly gets what you now have both here and the start of the else branch.

rmjarvis · 2026-04-08T21:18:06Z

+    # In general there are multiple ways to split the batch. Assert that each
+    # object appears with its original flux across however many sub-batches it
+    # appears in, that the total flux across all sub-batches equals the total
+    # batch flux, and that the most full batch contains <= 1.1 * the flux in the


1.05?

Oh, actually, with the rhs using the minimum, it won't be this. And in fact, I don't think the code makes any guarantee about that ratio. There can be arbitrarily many batches up to 5% bigger than nominal, so the minimum can be arbitrarily small (even 0 I believe). So your use of 1.1 here is I think just something that happens to work for this particular case.

If we want to avoid that "failure mode" (of having a very small batch), then I think we could lessen the likelihood of it by recalculating the nominal and loose levels for the rest of the batches any time one batch goes over the nominal level. Not sure how important that is, but it's something we could implement if we see that there is often a batch at the end that is very low flux.

rmjarvis · 2026-04-08T21:28:18Z

+                            dataclasses.replace(obj, phot_flux=available_flux)
+                        )
+                        remaining_flux -= available_flux
+                        current_subbatch = (1 + current_subbatch) % nsubbatch


If I understand this algorithm correctly, I don't think the % nsubbatch is ever necessary. Is this a relic from a previous implementation? Or am I missing something here?

rmjarvis · 2026-04-08T21:29:04Z

+             ]
+    run_subbatch_test("fragmentation in first sub-batch", batch, 2)
+
+    # Make sure the sub-batcher can go backwards (i.e. assign to subbatches


I didn't think the algorithm ever did go backwards? Again, am I missing something here?

welucas2 mentioned this pull request Mar 19, 2026

test_image.py expected fluxes and assertion #511

Merged

welucas2 added 9 commits March 26, 2026 10:45

Initial tests and skeleton method for smarter subbatching in photon p…

0144656

…ooling.

More smart subbatching tests.

6fed78c

An initial bin packing with fragmentation implementation. Still needs…

18104ac

… to allow loose packing to allow total fluxes that don't fit evenly into the requested number of sub-batches.

Refactor assertions in smart subbatching tests.

f1210ec

More smart subbatching tests for unequal divisions leading to more co…

5b2ec35

…mplex fragmentation.

Allow loose fitting with fragmentation in smart sub-batching. One tes…

4443954

…t still fails.

Assert that all objects in sub-batches have > 0 photons.

bdc4eef

Remove simple sub-batching method and tests. Now only use the equal-f…

e37af20

…lux method which is rename to make_photon_subbatches.

Tidy up comments a little.

02b52ba

welucas2 force-pushed the u/welucas2/equal-flux-subbatches branch from 04a30a2 to 02b52ba Compare March 26, 2026 10:46

welucas2 marked this pull request as ready for review March 26, 2026 15:15

welucas2 mentioned this pull request Mar 27, 2026

Lower memory in photon to geometry calculation #515

Draft

rmjarvis requested changes Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-batches with equal flux in photon pooling#513

Sub-batches with equal flux in photon pooling#513
welucas2 wants to merge 9 commits intomainfrom
u/welucas2/equal-flux-subbatches

welucas2 commented Mar 18, 2026

Uh oh!

welucas2 commented Mar 26, 2026

Uh oh!

rmjarvis left a comment

Uh oh!

rmjarvis Apr 8, 2026

Uh oh!

rmjarvis Apr 8, 2026

Uh oh!

rmjarvis Apr 8, 2026

Uh oh!

rmjarvis Apr 8, 2026

Uh oh!

rmjarvis Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

welucas2 commented Mar 18, 2026

Uh oh!

welucas2 commented Mar 26, 2026

Uh oh!

rmjarvis left a comment

Choose a reason for hiding this comment

Uh oh!

rmjarvis Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

rmjarvis Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

rmjarvis Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

rmjarvis Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

rmjarvis Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants