Rounding floating point numbers and constexpr
source link: https://vorbrodt.blog/2021/04/04/rounding-floating-point-numbers-and-constexpr/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Rounding floating point numbers and constexpr
Good friend asked me a while ago how to round float or double to N digits of precision. Given 0.0123456789 and N = 7 how to get 0.0123457000 as the result. The only way to round numbers (inside a machine) I could think of was to multiply them by 10 ^ N (ten raised to the power of N), cast the result to some type of int effectively stripping the fractional part, followed by a cast back to a floating point representation and division by same 10 ^ N:
Or something to that effect. Translated to C++:
Then he added the dreaded do it fast,… and just as I was about to tell him to take a long walk off a short pier I remembered something about constexpr and doing arithmetic at compile time.
Looking at the implementation above I decided to start with computing the power of 10 at compile time. My first approach was a recursive template struct power_of with a partial specialization, the recursion stop, for the power of zero case (and a helper power_of_v variable declaration):
This allowed me to write
power_of_v<10, 3, float> to compute at compile time the 3rd power of
10 stored as type
float.
I also created a recursive
constexpr function since those too can be evaluated at compile time:
With it I could write
power_of_f<float>(10, 3) and it would be the equivalent of using the recursive template struct above.
I decided to think big and used
std::uint64_t as the base and
unsigned char as the exponent type of the computation. Hopefully overflow was not going to be an issue…
Next I moved onto rounding the floating point number to an integer type (after it has been multiplied by 10 ^ N) and quickly realized a simple type cast was not going to cut it. std::round does much more than that; it considers how close the number to be rounded is to the integer representation of it, ex.: given 0.49 is it closer to 0 or 1. It then rounds it to the closest whole number. Moreover, it considers the sign and rounds in different direction if the number is positive vs negative.
This meant I needed to determine (at compile time) whether the number is positive or negative:
Compute the absolute value of a given number:
And round the number based on the proximity to its integer representation while considering the sign:
Putting it all together and adding some sanity checks resulted in this floating point, compile time (as long as all parameters are constexpr), rounding function which produced the same exact results as its runtime equivalent:
The if statements are there to guard against number overflows. It is better to not round at all in such cases than to return garbage values, or at least this is my attitude regarding error handling in this case.
Here is a test program I created while testing my implementation; it allowed me to easily compare compile time results against the reference std::round based approach.
1: 0.0000000000
2: 0.0100000000
3: 0.0120000000
4: 0.0123000000
5: 0.0123500000
6: 0.0123460000
7: 0.0123457000
8: 0.0123456800
9: 0.0123456790
10: 0.0123456789Test program output.
The example program on GitHub: round.cpp. The floating point rounding implementation (sprinkled with comments and decorated with template concepts for good measures): round.hpp.
Given the way float and double are represented on the level of bits it is impossible to round them exactly. You can only do so much and come so close to being exact. Playing with the example program’s value and type of variable v as well as different parameters to setprecision stream modifier will illustrate what I mean…
Like this:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK