fix: clamp content differences to prevent FMA-induced numerical instability#1455
fix: clamp content differences to prevent FMA-induced numerical instability#1455barendgehrels wants to merge 1 commit intoboostorg:developfrom
Conversation
| out_content_increase1 = content_incrase1; | ||
| out_content_increase2 = content_incrase2; | ||
| out_content_increase1 = content_increase1; | ||
| out_content_increase2 = content_increase2; |
There was a problem hiding this comment.
I also fixed a typo, that's why the diff is a bit larger here
|
I tried something similar to what the language model outputs here before I wrote #1453 , but instead of clamping, I tested the values of the content calls directly for equality. I went into a different direction because
Consider the following test using the instrumented main.cpp and the data.csv from the issue: #!/usr/bin/env bash
(cd ../../boost/libs/geometry/ && git reset --hard -q && git checkout $1 -q && sed -i '504i\cross_track_calls++;' include/boost/geometry/strategies/geographic/distance_cross_track.hpp && git branch --show-current)
g++ -g -I../../boost/ -DNDEBUG -O3 -march=$2 main.cpp -std=c++20 -o a.out && ./a.out | grep -E "^2:|query all took|cross_track calls:"(Edit: Line 504 in distance_cross_track.hpp is inside the private apply method.) Then we get: bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v2 # Baseline without issue (v2)
develop
2: 6763
query all took : 3460.036 ms
cross_track calls: 3303357
bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v3 # Issue (v3)
develop
2: 6685
query all took : 7346.178 ms
cross_track calls: 6117432
bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/index-content-fp-contract-off x86-64-v3 #1453
fix/index-content-fp-contract-off
2: 6763 # Probably the same tree structure
query all took : 3496.460 ms
cross_track calls: 3306476 # Minor difference in distance calls, might be due to difference in distance itself.
bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/issue-1452-index-content x86-64-v3 # This PR
fix/issue-1452-index-content
2: 6697 # Different tree structure from baseline.
query all took : 4120.981 ms
cross_track calls: 4200675 # Still a severe regression in the number of distance calls.So there must be call sites of content that were not covered fully by the LLM-generated patch. I still like the change in itself, though, because it seems like a sensibly defensive thing to do that causes no branching and prevents potentially nonsensical values. It may be beneficial to merge both PRs (the original PR also e.g. prevents issues with contraction of content at potential future call sites or existing sites unrelated to this bug if there are any). |
| content_type content_increase1 = enlarged_content1 - content1; | ||
| content_type content_increase2 = enlarged_content2 - content2; | ||
| content_type const content_increase1 = (std::max)(content_type(0), enlarged_content1 - content1); | ||
| content_type const content_increase2 = (std::max)(content_type(0), enlarged_content2 - content2); |
There was a problem hiding this comment.
Minor: These violate our max line length of 100.
…l instability Might fix boostorg#1452
4aa13e5 to
2f70d78
Compare
|
On further inspection, #1452 is also affected by yet another source of rounding error, relating to antimeridian-crossing boxes in coordinate normalisation for boxes (which also affects x86-64-v2, although in a more minor way). E.g., if we have a box with lower lon 170°, upper long e.g. 190.1° it will first subtract -360°, then add 360°, and change it to something like 190.09999...°, and the content change from expansion becomes negative. I will add a separate PR with a test case to solve that (this is needed in addition to the fixes around content_diff).
EDIT: After a lot of testing, the filtering looked good on ARM but failed to resolve the issue with GCC on x86 (similar results to this PR #1455, roughly ~4M distance calls, rather than ~2.6M). As of now, #1453 is the only solution that works for me across all platforms. |
tinko92
left a comment
There was a problem hiding this comment.
Because this was questioned #1453 (comment) , I am still in favour of merging this PR. The clamping is still reasonable and the refactoring is useful for potential future formula changes specific to content_diff to make this maybe more robust in the future.
| content_type content_increase1 = enlarged_content1 - content1; | ||
| content_type content_increase2 = enlarged_content2 - content2; | ||
| content_type const content_increase1 = (std::max)(content_type(0), enlarged_content1 - content1); | ||
| content_type const content_increase2 = (std::max)(content_type(0), enlarged_content2 - content2); |
There was a problem hiding this comment.
Let's call content_diff here for consistency and in particular to retain consistency if that function is changed for robustness in the future. Recomputing content1 and content2 is not a performance concern here, all major compilers will pull that out of the loop on O2 (tested for GCC, Clang, MSVC, ICX here).
| content_type content_incrase1 = (enlarged_content1 - content1); | ||
| content_type content_incrase2 = (enlarged_content2 - content2); | ||
| content_type const content_increase1 | ||
| = (std::max)(content_type(0), enlarged_content1 - content1); |
There was a problem hiding this comment.
I think L296 and L297 should call content_diff, as discussed in comment on redistribute_elements.hpp:437.
| template <typename Box> | ||
| typename default_content_result<Box>::type content_diff(Box const& expanded, Box const& original) | ||
| { | ||
| using content_type = typename default_content_result<Box>::type; |
There was a problem hiding this comment.
I propose an asssertion on original being within expanded here, because that is a precondition for that general clamping being valid and should prevent accidental misuse of this function for general content differences where arg1 is not an expansion of arg2.
Might fix #1452
@tinko92 could you verify if this helps the problem?
The fix is by Claude Code (reviewed/directed by me), it argues:
I think this makes sense, looks good - and locally it did not break any unit tests (the RST always failed for me - on the fly that is fixed now as well, see other PR)