converted fractional_part_rounding_thresholds to use an array of u32s#4719
converted fractional_part_rounding_thresholds to use an array of u32s#4719Cazadorro wants to merge 13 commits intofmtlib:mainfrom
Conversation
… u32s instead of a unicode literal character array for compatibliity with CUDA and non CUDA platforms
|
Note that this works with vcpkg as well (tested with overlay port), ran tests locally all appeared to pass (note however I'm on MSVC 2026). I did use the special CUDA compiler arguments for UTF8 in the separate project I used to test VCPKG compatibliity, dealing with that is a whole other can of worms I didn't want to include in this fix, I just wanted to make sure I could use the escape hatch with those compiler arguments as with out this change you can't use a fmt at all no matter what you do with vcpkg and cuda. below are the arguments I used in the separate project |
|
The PR appears clean to me - Please run |
…gnostic_fractional_part_rounding_thresholds
include/fmt/format.h
Outdated
| static constexpr uint32_t utf8_raw_rounding_thresholds[8] = { | ||
| 0x9999999au, 0x828f5c29u, 0x80418938u, 0x80068db9, | ||
| 0x8000a7c6u, 0x800010c7u, 0x800001aeu, 0x8000002bu}; |
There was a problem hiding this comment.
This will be duplicated in every TU which is undesirable. Another workaround is to use a UTF-16 literal and reconstruct a constant from two 16-bit halves.
Also utf8_raw_ prefix is not needed (and incorrect) since these constants have nothing to do with UTF-8 or Unicode in general. We only use Unicode literals to workaround language limitations here.
There was a problem hiding this comment.
Okay, I had to look up rules on C++ on how this works, I didn't realize static constexpr outside of a function basically has the exact opposite effect in terms of linkage that static constexpr inside a constexpr function has. However, I believe if I change this to an inline constexpr variable, it will fix this issue: https://stackoverflow.com/a/57407675 The address will be the same across translation units if I understand this correctly. Inline is implied with constexpr functions, but not with constexpr variables.
There was a problem hiding this comment.
inline won't work because it's C++17.
There was a problem hiding this comment.
For some reason I thought the library required at least c++17, not c++11 I'll see about trying utf16 but if there's any weird platform limitations with that too I won't have the bandwidth to track that down. Also won't there be endian-ness concerns with using utf16 litterals? And at that point why even bother with those if you could just use a char array? Though I guess the previous version must have also had endian-ness issues as well if I read the documentation correctly.
…eclaration of thresholds to inline constexpr, this should mean same address for every translation unit see: https://stackoverflow.com/a/57407675
… on the compiler cannonizing char litteral arrays in order to avoid duplicating the array definition per translation unit
…x variables in fractional_part_rounding_thresholds, and applied clang format to function
include/fmt/format.h
Outdated
|
|
||
| // recombine as uint32, this should eliminate endian issues, as now we are | ||
| // shifting bytes as uint32 which should match platform endian. | ||
| return byte_3 << 24u | byte_2 << 16u | byte_1 << 8u | byte_0; |
There was a problem hiding this comment.
I suggest using 16-bit (u) literals to simplify this.
There was a problem hiding this comment.
I'm really hesitant to do that, won't it actually make things more complicated? I'm still going to have to recombine byte-wise in the final step, and I'm also going to have to extract the bytes of the uint16s in order to do this. There's no way to guarantee that utf16 would match platform endian right? The standard does not appear guarantee utf16 big endian or little endian, nor does it appear to guarantee it matches platform endianness/ endianness of std::uint16_t.
It would look something like:
const uint32_t first_bytes =
static_cast<uint16_t>(u"..."
u"..."[index]);
const uint32_t second_bytes =
static_cast<uint16_t>(u"..."
u"..."[index]);
const uint32_t byte_3 = first_bytes >> 8;
const uint32_t byte_2 = first_bytes & 0xFF;
const uint32_t byte_1 = second_bytes >> 8;
const uint32_t byte_0 = second_bytes & 0xFF;
// recombine as uint32, this should eliminate endian issues, as now we are
// shifting bytes as uint32 which should match platform endian.
return byte_3 << 24u | byte_2 << 16u | byte_1 << 8u | byte_0;
There was a problem hiding this comment.
The endianness doesn't matter, you just need to split the constants into two parts instead of four and recombine as you do now but with fewer operations. You don't nee to extract bytes from the 16-bit constants for the same reason we don't do it now for 32-bit constants.
…ttps://github.com/Cazadorro/fmt into cuda_agnostic_fractional_part_rounding_thresholds
vitaut
left a comment
There was a problem hiding this comment.
Mostly looks good but the body of the function must be a single expression for C++11 compatibility.
…tement for c++11 compatibility
I'm confused, what isn't compatible with C++11 there? Or is it like an optimization that's not guaranteed to happen or something in c++11 unless you make it a single line statement? |
converted fractional_part_rounding_thresholds to use an array of u32s instead of a unicode literal character array for compatibility with CUDA and non CUDA platforms, based on discussions in #4167