2

Improve Duration::try_from_secs_f32/64 accuracy by directly processing exponent...

 2 years ago
source link: https://github.com/rust-lang/rust/pull/90247
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Conversation

Copy link

Contributor

newpavlov commented on Oct 25, 2021

edited

Closes: #90225

The methods now implement direct processing of exponent and mantissa, which should result in the best possible conversion accuracy (modulo truncation, i.e. float value of 19.995 ns will be represented as 19 ns).

Copy link

Collaborator

rust-highfive commented on Oct 25, 2021

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

This comment has been hidden.

This comment has been hidden.

This comment has been hidden.

This comment has been hidden.

This comment has been hidden.

@@ -777,7 +799,7 @@ impl Duration {

/// use std::time::Duration;

///

/// let dur = Duration::from_secs_f32(2.7);

/// assert_eq!(dur, Duration::new(2, 700_000_000));

/// assert_eq!(dur, Duration::new(2, 700_000_047));

Out of curiosity: are the extra 47ns because 2.7 does not have an exact f32 representation? The example now looks a bit unfortunate.

Copy link

Contributor

Author

@newpavlov newpavlov on Oct 25, 2021

Yes, the closest f32 to 2.7 is 2.700000047683715... I think it's worth to have a clear demonstration of floats being "lossy", especially 32 bit ones.

pitdicker and colepoirier reacted with thumbs up emoji

Copy link

Contributor

Urgau commented on Oct 25, 2021

edited

Note for the author and reviewers that this proposed code, generate worse machine code; not by much but not negliable either.

EDIT: Although the code is theoretically slightly worse, in practice it's way better due to the removal of libm calls. See #90247 (comment) for benchmarks.

Copy link

Contributor

hkratz commented on Oct 25, 2021

edited

Note for the author and reviewers that this proposed code, generate worse machine code; not by much but not negliable either.

Too bad we can't use f32::trunc/f64::trunc: Godbolt. Huh, the total cycles are higher than the hand-rolled trunc?

Copy link

Contributor

Author

newpavlov commented on Oct 25, 2021

@Urgau
If I understand your link correctly, then the new version took less cycles to complete (1723 vs 3993). It's possible to improve performance by using u64 instead of u128 for the nanos part, but it degrades conversion accuracy a bit.

Copy link

Member

nagisa commented on Oct 25, 2021

Given that the original version involves an non-strength-reduced u128 division (cf. call to __udivti3), its going to be very hard to make the new implementation worse. The benchmarks should reflect a significant speed increase in practice (though you may want to carefully pick an input which triggers the worst cases for either implementation).

Copy link

Contributor

Urgau commented on Oct 25, 2021

@Urgau If I understand your link correctly, then the new version took less cycles to complete (1723 vs 3993). It's possible to improve performance by using u64 instead of u128 for the nanos part, but it degrades conversion accuracy a bit.

Yeah, you're right I've just see that the number of instructions was worse and because there was more branches I assume the code will be worse, but because the "Total Cycles" is way better I expect better time.

Given that the original version involves an non-strength-reduced u128 division (cf. call to __udivti3), its going to be very hard to make the new implementation worse. The benchmarks should reflect a significant speed increase in practice (though you may want to carefully pick an input which triggers the worst cases for either implementation).

True. I did see them but didn't realize.

Copy link

Contributor

Urgau commented on Oct 25, 2021

I've done some benchmark with the 3 implementations (before with libm, new and new with reordered match arms).

Here are the results:

Impl\Number -500 0.0 1.0 3.14e5 30.0 8000.0 85623.8745 99995412.2554122554

libm 26 154 231 293 272 254 361 452

frac 16 24 46 46 46 46 46 46

frac_reordered 17 30 46 46 46 46 46 46

Lower is better | Unit: ns/iter | Realized on an armv7 with hard-float

As @nagisa predicted, we can see a clear speedup with the new implementation. @newpavlov I would suggest you to revert your last commit ("reorder match arms") as it seems that it is slightly worse in two benchmark than the version without.

This comment has been hidden.

@@ -708,14 +708,28 @@ impl Duration {

/// as `f64`.

///

/// # Panics

/// This constructor will panic if `secs` is not finite, negative or overflows `Duration`.

/// This constructor will panic if `secs` is negative, overflows `Duration` or not finite.

Note that this change reflects the fact that negativity is now checked before anything else, meaning that -inf and -NaN are now a Negative error instead of a NonFinite error.

(the wording does imply that +inf should instead be an Overflow error instead of a NonFinite one, only leaving NaN to be a NonFinite error)

Copy link

Contributor

Author

@newpavlov newpavlov on Oct 27, 2021

edited

The error kinds are not exposed publicly and only used for generating error descriptions, so I think the order of checks is fairly unimportant. The main reason why I have changed the wording was to fix the ambiguity: the previous variant can be read as both "is not (finite, negative or overflows)" and "is (not finite), negative or overflows".

This comment has been hidden.

This comment has been hidden.

newpavlov

changed the title Process fractional part separately in Duration::try_from_secs_f32/64

Improve Duration::try_from_secs_f32/64 precision by directly processing exponent and mantissa

on Nov 20, 2021

Hi. Sorry for asking here directly but I am not quite sure about rust development workflow... is it possible to tell when this will be merged e.g. in nightly?

Copy link

Contributor

mati865 commented on Nov 22, 2021

Once the review is finished and it gets an approval.

newpavlov

changed the title Improve Duration::try_from_secs_f32/64 precision by directly processing exponent and mantissa

Improve Duration::try_from_secs_f32/64 accuracy by directly processing exponent and mantissa

on Dec 22, 2021

Copy link

Contributor

Author

newpavlov commented on Jan 22

r? @nagisa

This comment has been hidden.

Copy link

Contributor

Author

newpavlov commented 26 days ago

@nagisa
I have updated the code and now it uses the try_from_secs macro as you have proposed. Instead of match I've decided to use if else branches since it makes code more compact, removes duplication of range edges, and makes the exclusive_range_pattern unstable feature unnecessary.

Copy link

Member

nagisa commented 26 days ago

edited

Seems reasonable enough to me. Can you please squash away the fix commits and merges? I'll take another chance to read through the algorithm this evening and r+ then.

(0u64, 0u32)

} else if exp < 0 {

// the input is less than 1 second

let t = <$double_ty>::from(mant) << ($offset + exp);

On the choice of the offset here, would I be right in saying that the $offset could be as low as 30 (and shared between the implementations) without affecting the result of this computation in any way?

Just to clarify: I'm not suggesting to change anything here, and just trying to confirm my understanding. I think I understand how the current offset values have been arrived at:

  • for f32 – offset = $double_ty::BITS - $mant_bits;
  • for f64 – offset = $double_ty::BITS - type_of(NANOS_PER_SEC)::BITS - $mant_bits)

effectively making the current offset values largest possible such that no intermediate operations could overflow the types, correct?

Copy link

Contributor

Author

@newpavlov newpavlov 26 days ago

Offsets are specifically chosen so the final shifts will be equal to 64 and 96 bits respectively, allowing compiler to perform additional optimizations.

Copy link

Member

nagisa commented 26 days ago

@bors r+

Thanks for seeing this through @newpavlov!

Copy link

Contributor

bors commented 26 days ago

pushpin Commit e0bcf77 has been approved by nagisa

Copy link

Contributor

tmandry commented 20 days ago

@rustbot label: relnotes

See #93535

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK