fix: clamp content differences to prevent FMA-induced numerical instability by barendgehrels · Pull Request #1455 · boostorg/geometry

barendgehrels · 2026-03-30T15:18:28Z

Might fix #1452

@tinko92 could you verify if this helps the problem?

The fix is by Claude Code (reviewed/directed by me), it argues:

 Analysis

  The core invariant being violated

  When box_exp is expanded from box_orig to include an indexable, we have box_exp ⊇ box_orig, so mathematically content(box_exp) >=
  content(box_orig) and content_diff >= 0. With FMA fusion, this invariant breaks: the multiply inside content() fuses with the
  subsequent subtraction differently for the two calls, yielding negative or spuriously non-zero results.

  The dangerous patterns

  Every call site follows one of two shapes:

  Shape 1 — "enlargement cost" (5 sites): content(expanded) - content(original)
  - insert.hpp:92
  - rstar/choose_next_node.hpp:120, 249
  - quadratic/redistribute_elements.hpp:294-295
  - linear/redistribute_elements.hpp:435-436

  Shape 2 — "free content" (1 site): content(union) - content(a) - content(b)
  - quadratic/redistribute_elements.hpp:77-79

  In all cases the result is mathematically guaranteed >= 0, but FMA can make it negative or spuriously non-zero.

  Why negative matters more than "slightly different positive"

  When content_diff becomes negative for a box that didn't actually expand, that box appears to shrink when receiving the new element —
  making it the "best" choice by a wide margin. This cascades: wrong node selections lead to a suboptimal tree structure (6685 nodes
  instead of 6763 in the reported case), which means far less effective pruning during queries.

  The fix: clamp to the known invariant

  Since we know these differences are >= 0, clamping them is mathematically sound and eliminates the FMA issue without suppressing any  compiler optimizations globally:

  content_type content_diff = (std::max)(content_type(0),
      content - index::detail::content(ch_i.first));

  This fixes both failure modes:
  - Negative values become exactly 0 (correct: the box didn't enlarge)
  - "Should be zero but is epsilon" — while (std::max) doesn't fix this, it doesn't need to. Two boxes that both don't enlarge will both   clamp to exactly 0, so equality comparisons work. Two boxes that both enlarge by a tiny positive amount are just heuristic  tie-breakers where sub-ULP differences are irrelevant.

  Interesting existing precedent

  The code at rstar/choose_next_node.hpp:135 already does epsilon-aware comparison:
  if ( min_content_diff < -std::numeric_limits<content_type>::epsilon()
    || std::numeric_limits<content_type>::epsilon() < min_content_diff )
  This shows the codebase already anticipated near-zero issues, just not systematically.

  What about the == comparisons?

  Lines like content_diff == smallest_content_diff and content1 == content2 are used as tie-breakers. After clamping, the most important
   case — two equal boxes both showing zero enlargement — will compare as exactly 0 == 0. For two genuinely different enlargements,
  sub-ULP differences in tie-breaking are harmless for the heuristic.

  Concrete changes needed:

  │ insert.hpp                          │ 92      │ Clamp content_diff                 │
  │ rstar/choose_next_node.hpp          │ 120     │ Clamp content_diff                 │
  │ rstar/choose_next_node.hpp          │ 249     │ Clamp content_diff                 │
  │ quadratic/redistribute_elements.hpp │ 77-79   │ Clamp free_content                 │
  │ quadratic/redistribute_elements.hpp │ 294-295 │ Clamp both content_increase values │
  │ linear/redistribute_elements.hpp    │ 435-436 │ Clamp both content_increase values │

I think this makes sense, looks good - and locally it did not break any unit tests (the RST always failed for me - on the fly that is fixed now as well, see other PR)

barendgehrels · 2026-03-30T15:19:11Z

-                out_content_increase1 = content_incrase1;
-                out_content_increase2 = content_incrase2;
+                out_content_increase1 = content_increase1;
+                out_content_increase2 = content_increase2;


I also fixed a typo, that's why the diff is a bit larger here

tinko92 · 2026-03-31T15:06:45Z

I tried something similar to what the language model outputs here before I wrote #1453 , but instead of clamping, I tested the values of the content calls directly for equality. I went into a different direction because

it added an extra branch and prevented contraction to FMA (because the compiler needed to compute both intermediate results for comparing anyway), so no advantage over turning of contraction for content directly. This clamping is better though because max is branchless and it still allows applying FMA, so 1. is resolved.
It did not fully resolve the issue.

Consider the following test using the instrumented main.cpp and the data.csv from the issue:

#!/usr/bin/env bash

(cd ../../boost/libs/geometry/ && git reset --hard -q && git checkout $1 -q && sed -i '504i\cross_track_calls++;' include/boost/geometry/strategies/geographic/distance_cross_track.hpp && git branch --show-current)
g++ -g -I../../boost/ -DNDEBUG -O3 -march=$2 main.cpp -std=c++20 -o a.out && ./a.out | grep -E "^2:|query all took|cross_track calls:"

(Edit: Line 504 in distance_cross_track.hpp is inside the private apply method.)

Then we get:

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v2 # Baseline without issue (v2)
develop
2: 6763
query all took :   3460.036 ms
cross_track calls: 3303357

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v3 # Issue (v3)
develop
2: 6685
query all took :   7346.178 ms
cross_track calls: 6117432

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/index-content-fp-contract-off x86-64-v3 #1453 
fix/index-content-fp-contract-off
2: 6763 # Probably the same tree structure
query all took :   3496.460 ms
cross_track calls: 3306476 # Minor difference in distance calls, might be due to difference in distance itself.

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/issue-1452-index-content x86-64-v3 # This PR
fix/issue-1452-index-content
2: 6697 # Different tree structure from baseline.
query all took :   4120.981 ms
cross_track calls: 4200675 # Still a severe regression in the number of distance calls.

So there must be call sites of content that were not covered fully by the LLM-generated patch. I still like the change in itself, though, because it seems like a sensibly defensive thing to do that causes no branching and prevents potentially nonsensical values. It may be beneficial to merge both PRs (the original PR also e.g. prevents issues with contraction of content at potential future call sites or existing sites unrelated to this bug if there are any).

tinko92

I approve merging these changes, the one formatting comment is no blocker for me.

tinko92 · 2026-03-31T15:14:18Z

-                        content_type content_increase1 = enlarged_content1 - content1;
-                        content_type content_increase2 = enlarged_content2 - content2;
+                        content_type const content_increase1 = (std::max)(content_type(0), enlarged_content1 - content1);
+                        content_type const content_increase2 = (std::max)(content_type(0), enlarged_content2 - content2);


Minor: These violate our max line length of 100.

…l instability Might fix boostorg#1452

tinko92 · 2026-04-04T14:10:52Z

On further inspection, #1452 is also affected by yet another source of rounding error, relating to antimeridian-crossing boxes in coordinate normalisation for boxes (which also affects x86-64-v2, although in a more minor way). E.g., if we have a box with lower lon 170°, upper long e.g. 190.1° it will first subtract -360°, then add 360°, and change it to something like 190.09999...°, and the content change from expansion becomes negative. I will add a separate PR with a test case to solve that (this is needed in addition to the fixes around content_diff).

The content_diff-related issues can be fully resolved by making your content_diff robust with a filter, that not only clamps negative values but also rounds to 0 for very small positive values for which the actual content diff is 0. This approach makes both #1453 obsolete and is more general than the clamping proposed in this PR (#1455). I will try to write a new PR by end of next week, there are a few options, I want to compare first. Bascially we can have a static threshold which would be around 1.32e-10 for double (may be acceptable?), but for float it would be around 0.071 which may be too coarse. Or we can compute it per content diff call depending on the boxes which would be more accurate and preferable if the performance overhead turns out to be negligible in testing.

EDIT: After a lot of testing, the filtering looked good on ARM but failed to resolve the issue with GCC on x86 (similar results to this PR #1455, roughly ~4M distance calls, rather than ~2.6M). As of now, #1453 is the only solution that works for me across all platforms.

tinko92

Because this was questioned #1453 (comment) , I am still in favour of merging this PR. The clamping is still reasonable and the refactoring is useful for potential future formula changes specific to content_diff to make this maybe more robust in the future.

tinko92 · 2026-04-16T07:06:37Z

-                        content_type content_increase1 = enlarged_content1 - content1;
-                        content_type content_increase2 = enlarged_content2 - content2;
+                        content_type const content_increase1 = (std::max)(content_type(0), enlarged_content1 - content1);
+                        content_type const content_increase2 = (std::max)(content_type(0), enlarged_content2 - content2);


Let's call content_diff here for consistency and in particular to retain consistency if that function is changed for robustness in the future. Recomputing content1 and content2 is not a performance concern here, all major compilers will pull that out of the loop on O2 (tested for GCC, Clang, MSVC, ICX here).

tinko92 · 2026-04-16T07:07:57Z

-            content_type content_incrase1 = (enlarged_content1 - content1);
-            content_type content_incrase2 = (enlarged_content2 - content2);
+            content_type const content_increase1
+                = (std::max)(content_type(0), enlarged_content1 - content1);


I think L296 and L297 should call content_diff, as discussed in comment on redistribute_elements.hpp:437.

tinko92 · 2026-04-16T07:12:01Z

+template <typename Box>
+typename default_content_result<Box>::type content_diff(Box const& expanded, Box const& original)
+{
+    using content_type = typename default_content_result<Box>::type;


I propose an asssertion on original being within expanded here, because that is a precondition for that general clamping being valid and should prevent accidental misuse of this function for general content differences where arg1 is not an expansion of arg2.

barendgehrels requested review from awulkiew, tinko92 and vissarion March 30, 2026 15:18

barendgehrels self-assigned this Mar 30, 2026

barendgehrels commented Mar 30, 2026

View reviewed changes

barendgehrels mentioned this pull request Mar 30, 2026

Prevent FP contractions to FMA in index/.../content.hpp to avoid numerical instability in rtree #1453

Merged

tinko92 approved these changes Mar 31, 2026

View reviewed changes

fix(rtree): clamp content differences to prevent FMA-induced numerica…

2f70d78

…l instability Might fix boostorg#1452

barendgehrels force-pushed the fix/issue-1452-index-content branch from 4aa13e5 to 2f70d78 Compare April 1, 2026 11:32

tinko92 reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clamp content differences to prevent FMA-induced numerical instability#1455

fix: clamp content differences to prevent FMA-induced numerical instability#1455
barendgehrels wants to merge 1 commit intoboostorg:developfrom
barendgehrels:fix/issue-1452-index-content

barendgehrels commented Mar 30, 2026

Uh oh!

barendgehrels Mar 30, 2026

Uh oh!

tinko92 commented Mar 31, 2026 •

edited

Loading

Uh oh!

tinko92 left a comment •

edited

Loading

Uh oh!

tinko92 Mar 31, 2026

Uh oh!

barendgehrels Apr 1, 2026

Uh oh!

tinko92 commented Apr 4, 2026 •

edited

Loading

Uh oh!

tinko92 left a comment

Uh oh!

tinko92 Apr 16, 2026 •

edited

Loading

Uh oh!

tinko92 Apr 16, 2026

Uh oh!

tinko92 Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

barendgehrels commented Mar 30, 2026

Uh oh!

barendgehrels Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

tinko92 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tinko92 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tinko92 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

barendgehrels Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

tinko92 commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tinko92 left a comment

Choose a reason for hiding this comment

Uh oh!

tinko92 Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tinko92 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

tinko92 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tinko92 commented Mar 31, 2026 •

edited

Loading

tinko92 left a comment •

edited

Loading

tinko92 commented Apr 4, 2026 •

edited

Loading

tinko92 Apr 16, 2026 •

edited

Loading