Skip to content

fix: clamp content differences to prevent FMA-induced numerical instability#1455

Open
barendgehrels wants to merge 1 commit intoboostorg:developfrom
barendgehrels:fix/issue-1452-index-content
Open

fix: clamp content differences to prevent FMA-induced numerical instability#1455
barendgehrels wants to merge 1 commit intoboostorg:developfrom
barendgehrels:fix/issue-1452-index-content

Conversation

@barendgehrels
Copy link
Copy Markdown
Collaborator

Might fix #1452

@tinko92 could you verify if this helps the problem?

The fix is by Claude Code (reviewed/directed by me), it argues:

 Analysis

  The core invariant being violated

  When box_exp is expanded from box_orig to include an indexable, we have box_exp ⊇ box_orig, so mathematically content(box_exp) >=
  content(box_orig) and content_diff >= 0. With FMA fusion, this invariant breaks: the multiply inside content() fuses with the
  subsequent subtraction differently for the two calls, yielding negative or spuriously non-zero results.

  The dangerous patterns

  Every call site follows one of two shapes:

  Shape 1 — "enlargement cost" (5 sites): content(expanded) - content(original)
  - insert.hpp:92
  - rstar/choose_next_node.hpp:120, 249
  - quadratic/redistribute_elements.hpp:294-295
  - linear/redistribute_elements.hpp:435-436

  Shape 2 — "free content" (1 site): content(union) - content(a) - content(b)
  - quadratic/redistribute_elements.hpp:77-79

  In all cases the result is mathematically guaranteed >= 0, but FMA can make it negative or spuriously non-zero.

  Why negative matters more than "slightly different positive"

  When content_diff becomes negative for a box that didn't actually expand, that box appears to shrink when receiving the new element —
  making it the "best" choice by a wide margin. This cascades: wrong node selections lead to a suboptimal tree structure (6685 nodes
  instead of 6763 in the reported case), which means far less effective pruning during queries.

  The fix: clamp to the known invariant

  Since we know these differences are >= 0, clamping them is mathematically sound and eliminates the FMA issue without suppressing any  compiler optimizations globally:

  content_type content_diff = (std::max)(content_type(0),
      content - index::detail::content(ch_i.first));

  This fixes both failure modes:
  - Negative values become exactly 0 (correct: the box didn't enlarge)
  - "Should be zero but is epsilon" — while (std::max) doesn't fix this, it doesn't need to. Two boxes that both don't enlarge will both   clamp to exactly 0, so equality comparisons work. Two boxes that both enlarge by a tiny positive amount are just heuristic  tie-breakers where sub-ULP differences are irrelevant.

  Interesting existing precedent

  The code at rstar/choose_next_node.hpp:135 already does epsilon-aware comparison:
  if ( min_content_diff < -std::numeric_limits<content_type>::epsilon()
    || std::numeric_limits<content_type>::epsilon() < min_content_diff )
  This shows the codebase already anticipated near-zero issues, just not systematically.

  What about the == comparisons?

  Lines like content_diff == smallest_content_diff and content1 == content2 are used as tie-breakers. After clamping, the most important
   case — two equal boxes both showing zero enlargement — will compare as exactly 0 == 0. For two genuinely different enlargements,
  sub-ULP differences in tie-breaking are harmless for the heuristic.

  Concrete changes needed:

  │ insert.hpp                          │ 92      │ Clamp content_diff                 │
  │ rstar/choose_next_node.hpp          │ 120     │ Clamp content_diff                 │
  │ rstar/choose_next_node.hpp          │ 249     │ Clamp content_diff                 │
  │ quadratic/redistribute_elements.hpp │ 77-79   │ Clamp free_content                 │
  │ quadratic/redistribute_elements.hpp │ 294-295 │ Clamp both content_increase values │
  │ linear/redistribute_elements.hpp    │ 435-436 │ Clamp both content_increase values │

I think this makes sense, looks good - and locally it did not break any unit tests (the RST always failed for me - on the fly that is fixed now as well, see other PR)

out_content_increase1 = content_incrase1;
out_content_increase2 = content_incrase2;
out_content_increase1 = content_increase1;
out_content_increase2 = content_increase2;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also fixed a typo, that's why the diff is a bit larger here

@tinko92
Copy link
Copy Markdown
Collaborator

tinko92 commented Mar 31, 2026

I tried something similar to what the language model outputs here before I wrote #1453 , but instead of clamping, I tested the values of the content calls directly for equality. I went into a different direction because

  1. it added an extra branch and prevented contraction to FMA (because the compiler needed to compute both intermediate results for comparing anyway), so no advantage over turning of contraction for content directly. This clamping is better though because max is branchless and it still allows applying FMA, so 1. is resolved.

  2. It did not fully resolve the issue.

Consider the following test using the instrumented main.cpp and the data.csv from the issue:

#!/usr/bin/env bash

(cd ../../boost/libs/geometry/ && git reset --hard -q && git checkout $1 -q && sed -i '504i\cross_track_calls++;' include/boost/geometry/strategies/geographic/distance_cross_track.hpp && git branch --show-current)
g++ -g -I../../boost/ -DNDEBUG -O3 -march=$2 main.cpp -std=c++20 -o a.out && ./a.out | grep -E "^2:|query all took|cross_track calls:"

(Edit: Line 504 in distance_cross_track.hpp is inside the private apply method.)

Then we get:

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v2 # Baseline without issue (v2)
develop
2: 6763
query all took :   3460.036 ms
cross_track calls: 3303357

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v3 # Issue (v3)
develop
2: 6685
query all took :   7346.178 ms
cross_track calls: 6117432

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/index-content-fp-contract-off x86-64-v3 #1453 
fix/index-content-fp-contract-off
2: 6763 # Probably the same tree structure
query all took :   3496.460 ms
cross_track calls: 3306476 # Minor difference in distance calls, might be due to difference in distance itself.

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/issue-1452-index-content x86-64-v3 # This PR
fix/issue-1452-index-content
2: 6697 # Different tree structure from baseline.
query all took :   4120.981 ms
cross_track calls: 4200675 # Still a severe regression in the number of distance calls.

So there must be call sites of content that were not covered fully by the LLM-generated patch. I still like the change in itself, though, because it seems like a sensibly defensive thing to do that causes no branching and prevents potentially nonsensical values. It may be beneficial to merge both PRs (the original PR also e.g. prevents issues with contraction of content at potential future call sites or existing sites unrelated to this bug if there are any).

Copy link
Copy Markdown
Collaborator

@tinko92 tinko92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve merging these changes, the one formatting comment is no blocker for me.

content_type content_increase1 = enlarged_content1 - content1;
content_type content_increase2 = enlarged_content2 - content2;
content_type const content_increase1 = (std::max)(content_type(0), enlarged_content1 - content1);
content_type const content_increase2 = (std::max)(content_type(0), enlarged_content2 - content2);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: These violate our max line length of 100.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@barendgehrels barendgehrels force-pushed the fix/issue-1452-index-content branch from 4aa13e5 to 2f70d78 Compare April 1, 2026 11:32
@tinko92
Copy link
Copy Markdown
Collaborator

tinko92 commented Apr 4, 2026

On further inspection, #1452 is also affected by yet another source of rounding error, relating to antimeridian-crossing boxes in coordinate normalisation for boxes (which also affects x86-64-v2, although in a more minor way). E.g., if we have a box with lower lon 170°, upper long e.g. 190.1° it will first subtract -360°, then add 360°, and change it to something like 190.09999...°, and the content change from expansion becomes negative. I will add a separate PR with a test case to solve that (this is needed in addition to the fixes around content_diff).

The content_diff-related issues can be fully resolved by making your content_diff robust with a filter, that not only clamps negative values but also rounds to 0 for very small positive values for which the actual content diff is 0. This approach makes both #1453 obsolete and is more general than the clamping proposed in this PR (#1455). I will try to write a new PR by end of next week, there are a few options, I want to compare first. Bascially we can have a static threshold which would be around 1.32e-10 for double (may be acceptable?), but for float it would be around 0.071 which may be too coarse. Or we can compute it per content diff call depending on the boxes which would be more accurate and preferable if the performance overhead turns out to be negligible in testing.

EDIT: After a lot of testing, the filtering looked good on ARM but failed to resolve the issue with GCC on x86 (similar results to this PR #1455, roughly ~4M distance calls, rather than ~2.6M). As of now, #1453 is the only solution that works for me across all platforms.

Copy link
Copy Markdown
Collaborator

@tinko92 tinko92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this was questioned #1453 (comment) , I am still in favour of merging this PR. The clamping is still reasonable and the refactoring is useful for potential future formula changes specific to content_diff to make this maybe more robust in the future.

content_type content_increase1 = enlarged_content1 - content1;
content_type content_increase2 = enlarged_content2 - content2;
content_type const content_increase1 = (std::max)(content_type(0), enlarged_content1 - content1);
content_type const content_increase2 = (std::max)(content_type(0), enlarged_content2 - content2);
Copy link
Copy Markdown
Collaborator

@tinko92 tinko92 Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call content_diff here for consistency and in particular to retain consistency if that function is changed for robustness in the future. Recomputing content1 and content2 is not a performance concern here, all major compilers will pull that out of the loop on O2 (tested for GCC, Clang, MSVC, ICX here).

content_type content_incrase1 = (enlarged_content1 - content1);
content_type content_incrase2 = (enlarged_content2 - content2);
content_type const content_increase1
= (std::max)(content_type(0), enlarged_content1 - content1);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think L296 and L297 should call content_diff, as discussed in comment on redistribute_elements.hpp:437.

template <typename Box>
typename default_content_result<Box>::type content_diff(Box const& expanded, Box const& original)
{
using content_type = typename default_content_result<Box>::type;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose an asssertion on original being within expanded here, because that is a precondition for that general clamping being valid and should prevent accidental misuse of this function for general content differences where arg1 is not an expansion of arg2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Geometry Index GCC arch x86-64-v3

2 participants