0

Tell HN: The 10-bit timers are about to overflow on September 17th

 1 year ago
source link: https://news.ycombinator.com/item?id=32700184
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Tell HN: The 10-bit timers are about to overflow on September 17th

Tell HN: The 10-bit timers are about to overflow on September 17th
244 points by modinfo 6 hours ago | hide | past | favorite | 45 comments
Due to the overflow of the 10-bit counter, some devices will "go back in time" by 1024 weeks (almost 20 years). This will occur on the night of September 17-18. The problem will affect - and this is now a sure thing - Microsemi's (AKA Symmetricom) SyncServer (100, 200 and 300 series), TimeProvider 1000, parts of TimeProvider 5000 and Timesource TS3x00 (and a few others), which are popular in industrial networks.

Loss of historical data and event logs, logging and security problems, loss of process visualization - these are some of the surprises that can happen when, having received the wrong date, other devices also decide to "time travel."

What to do?

If your network is running one of the aforementioned devices, it's best to disconnect it in advance. Unfortunately, most of them are no longer supported by the manufacturer and no patches are expected. So it looks like they will become quite useless after September 17.

This leaves very little time, therefore, to replace them with new solutions. If you are not sure whether the problem also affects your device, you can unplug it on September seventeenth, and if it shows the correct date the next day, plug it in again. Of course, such a maneuver is possible only in those networks that can operate without time synchronization according to GPS for several/some hours.

This is not the GPS week counter rollover https://en.m.wikipedia.org/wiki/GPS_week_number_rollover

But instead it is a device-specific rollover. It’s relatively common for GPS receivers to handle the limited week counter by having a baked-in epoch that is more recent than the latest rollover, which gives them a nearly 20 year lifetime after that epoch instead of failing at the GPS rollover.

s.gif
If I'm following, the thought is: It's currently Sept 17th 2002, our counter will roll over in 2019 (17 years). But since we know 1999-2001 has already past, we can hard code logic to add 20 years to any counter that's reporting that range. That updated logic rolls over on Sept 17th 2022, which buys us three extra years. This works for devices that will EOL within 20 years.

I think devices that need a longer life should keep track of the last seen timestamp, then they can detect a rollover. The device would keep a local rollover counter and just add 20 years per count. This should stay correct until that counter rolls over, or if the device is offline for >20 years.

s.gif
> I think devices that need a longer life should keep track of the last seen timestamp, then they can detect a rollover. The device would keep a local rollover counter and just add 20 years per count. This should stay correct until that counter rolls over, or if the device is offline for >20 years.

Better in some ways, worse in others.

Can still result in unexpected behavior if configuration memory is lost.

And if it ever accepts an invalid future time for whatever reason, it gets "stuck" in that failure state.

s.gif
The epoch is the date & time that the minimum value of that field refers to. For Unix derived or inspired machines it’s typically the Unix epoch of 00:00:00 on January 1, 1970. This means that a timestamp of “0” refers to that date and time, and increases at whatever granularity a timestamp value maps to. If you bake the epoch into the device at the factory as being the date of manufacture and your timestamp value has 20 years worth of granularity then you get a 20 year life from manufacture time before the timestamp rolls over to 0 again.

If you’re going to use more space keep a rollover counter you may as well just add more bits to your timestamp field instead, since every bit doubles the amount of time that field can represent.

s.gif
> If you’re going to use more space keep a rollover counter you may as well just add more bits to your timestamp field instead, since every bit doubles the amount of time that field can represent.

The GPS week counter is part of the signal GPS satellites send and is not easily changed by the manufacturer of receivers.

Of course, devices that accept CNAV now have a few extra bits to work with.

Counters roll over, this is not new.

What matters is how rollover is dealt with.

For interest, we (mostly) all use GPS .. and the 'onboard' broadcast 10 bit GPS week counter has already rolled over twice:

* midnight (UTC) August 21 to 22, 1999

* midnight (UTC) April 6 to 7, 2019

The GPS "seconds since midnight last Sunday" timer resets to zero every week.

[1] https://en.wikipedia.org/wiki/GPS_week_number_rollover

[2] Satellite Geodesy Günter Seeber https://www.geokniga.org/bookfiles/geokniga-seeber-g-satelli...

s.gif
Note that GPS week number is now 13 bits long since the introduction of CNAV navigation message format; devices only accepting the original NAV format are now pretty rare.
s.gif
Sure, the seconds are still lapsed seconds since (weekly) epoch . . .

(Not to mention the post corrections broadcast upwards from the ground to adjust for relativistic time drag between Earth surface nominal 1G and sat orbit height gravity)

Point being, many timing applications use lapsed time since X counters and are part of systems designed to handle rollover.

The mindset of just adding more bits to a counter or recording more decimal points isn't always appropriate or the 'best' fix.

s.gif
> to adjust for relativistic time drag between Earth surface nominal 1G and sat orbit height gravity

That's just the General Relativity correction. There's also a Special Relativity correction to account for time dilation caused by the satellite's speed, which works in the opposite direction to the GR correction yet the two don't exactly cancel each other. Plus there are some others (Doppler shift, Sagnac effect, etc.)

https://www.aapt.org/doorway/TGRU/articles/Ashbyarticle.pdf

Im confused...were 10-bit timers being used in recent history? Also they're using weeks as their atomic unit?

This seems like something that never should have existed.

s.gif
The GPS system uses 10 bits for the week number part of the broadcast timestamp. This causes a rollover [1] every 19.6 years, and devices that aren't designed to anticipate it will report the current date/time as being two decades (or multiples of that) in the past.

I guess the ultimate cause is data framing in the transmission protocol. The timestamp contains the number of weeks since 00:00:00 1980-01-06, together with the number of seconds since the start of the current week. The number of seconds in a week won't fit into 16 bits, so my supposition is that the designers had to also use some of the bits that could otherwise have been used for a wider week counter.

I'd like to think that nowadays we'd use self-describing, upgradeable protocols. But GPS was designed in the 70s for the constrained technology of that era. And I'm pretty sure nobody anticipated how widely deployed it would be 45+ years later.

[1] https://en.m.wikipedia.org/wiki/GPS_week_number_rollover

s.gif
You've missed two key points:

* GPS depends upon acurate timing and positions.

* GPS broadcasts a data packet.

The length of the packet is limited by the broadcast frequency and the time to update .. so the packet was kept tight.

The satellites drift (orbital decay, magnetic torque, solar wind, random particle pressure) and live in a different gravity (aka relativistic time) .. their 'self' data (position | time) is regularly updated from ground stations and their internal epochs only need to operate for some low N number of expected update cycles .. if they lack an accurate sense of 'self' they lack function as waypoints.

Adding clock data widths that count nanoseconds for the lifetime of the sun would be pointless: it makes the data packet longer and it doesn't change the hard requirement for regular time|position updates from the ground.

They were designed with short epoch counters with an understanding of functional constraints.

s.gif
Still surprised, every time I see it mentioned, to have a cultural construct like a week in the data model. Likely a deliberate break from SI to avoid ambiguity between monotonously increasing counters and calendaric time, but surprising nonetheless.
s.gif
> I'd like to think that nowadays we'd use self-describing, upgradeable protocols.

That would open a whole new can of security worms though. Being able to modify a protocol in-band is something we're starting to move away from. Things are becoming more static as a precaution, like stored procedures on SQL so an attacker can't inject a change.

s.gif
> And I'm pretty sure nobody anticipated how widely deployed it would be 45+ years later.

Or maybe they thought by now we would have replaced it with something better. Spacefaring was still riding high in the '70s, the brakes were only applied after '89 (sadly one of the many indirect effects of glasnost, RIP Gorbachev).

s.gif
> This seems like something that never should have existed

The engineers who designed and implemented the GPS system many decades ago were pioneering a fundamental technology, and worked within the constraints at the time to balance efficiency, cost, and complexity deliberately and as best they could.

Now we get to drive around in our luxurious cars, decked out with supercomputers listening to satellite communications, charting our course for us, all so we don't have to read our own maps.

Are there any references for this? What is the "epoch" here?

1024 weeks before Sept 17 is February 1, 2003. What is significant about that date?

References welcome.

I'm flying on 17th September. Looking forward to it now.
s.gif
Same. I hope the Atlantic is warm this time of year.
People designed new devices with just a 10-bit time counter after Y2K?!
s.gif
It's something that's inherent in the GPS standard, which was developed considerably before Y2K, and it's something that's very very easy to correct for.

Are you surprised that people are still designing hardware where the time rolls over to 00:00 every 24 hours?

s.gif
The signal is 10-bit, but it seems silly to make the receiving device 10-bit and suffer this problem when correcting for it is easy. OP says it wasn't always corrected for, even by people with the lesson of Y2K behind them, which is what surprises me.
This is maybe a very dumb question but was it so expensive to add several bits (and extend its lifespan exponentially) to what is basically a counter?
s.gif
I remember how we once ran into trouble with a large timestamp counter in a FPGA implementation. (Was it just 64 bits, or 112 bits? Probably the full PTP timestamp, including fractional nanoseconds for precise drift corrections.)

The extra bits of storage are cheap. The problem is the addition circuitry. With a small counter, you can do addition in a single clock cycle, very easy to implement. With a large counter, addition of the highest bit has to wait for completion of the previous one, etc. so if this takes longer than one clock cycle you have to implement a different, pipelined addition algorithm. (Or run everything at lower clock frequency.)

s.gif
It can be quite surprising how what might seem like minor differences to the programmer can require major changes to the hardware.

I saw an MITx class on EdX called "Computation Structures" a few years ago and took it for fun. In the second part of that students design at the logic gate level a 32-bit RISC processor, except that we could assume black boxes for the register file and memory.

I considered trying to actually build mine using good old classic 74xx series logic. Mine would have needed 140 quad MUX2 chips, 75 quad AND2 chips, 81 dual MUX4 chips, and 55 quad XOR2 chips, 16 dual D flip flop chips, plus a handful of others.

It was around 370 chips in total.

My design included a shift unit that can do left or right shifts, arithmetic or logical, of 1 to 31 bits in 1 clock cycle.

If I replaced that with a shift unit that could only do 1 bit logical right shift, and changed the instruction set to have a new instruction for that, made the old shift instructions trap, and then emulated them in the trap handler, a whopping 88 of those 140 quad MUX2 chips would no longer be needed.

That would bring it down to around 280 chips. The fancy shifter was almost a quarter of the parts count!

s.gif
Naive question. Do processors ever have sub-elements that run at a higher clock. I can imagine trying to hack this sort of thing by putting some sort of subprocessor structure that does addition for a particular set of registers at twice normal speed (double length registers?? I'm clearly spitballing). I guess it can't because of memory bandwidth constraints?
s.gif
> Do processors ever have sub-elements that run at a higher clock

Yes, this is called a "clock domain"; there may be quite a lot of them, and they can often be powered off individually.

> I can imagine trying to hack this sort of thing by putting some sort of subprocessor structure that does addition for a particular set of registers at twice normal speed

It's the other way round: a particular arrangement of logic elements, at a particular size and on a particular wafer process, at a particular temperature, will have a worst-case timing. That timing determines how fast you can possibly clock that part of the circuit.

Adders are annoying because of carry: you can't determine the top bit until you've determined the effect of carry on all the other bits. So if it takes, say, 250ps to propagate through your 32-bit adder, you can clock that at 4GHz. If you widen it to 64 bits that takes 500ps, and now you can only clock that bit at 2GHz.

s.gif
You may know this, but the person you responded to almost certainly doesn't based on their question, so:

Carry look-ahead adders are a thing. The number of logic levels for computing the highest carry bit is logarithmic in the width of the numbers being added, not linear. Doubling the width of the numbers does not cut your clock rate in half, though you do have to pay for the faster cycle time in added area (more logic gates). There are all sorts of different trade-offs that are possible in the constant terms, but the standard adder designs have linear area in the number of bits, and logarithmic propagation time from inputs to outputs.

s.gif
Normally you'll do the opposite: you'll have sub-parts which run at a lower clock rate. The 'core clock' is generally the fastest clock in the system, at least outside of specific high-speed transceivers. The most common approach is to pipeline the operation, which increases latency of the operation but still gives you the same throughput so long as the output does not need to feed back into the input within the operation time.
s.gif
> With a large counter, addition of the highest bit has to wait for completion of the previous one, etc. so if this takes longer than one clock cycle you have to implement a different, pipelined addition algorithm. (Or run everything at lower clock frequency.)

Kogge Stone carry lookahead.

You can calculate the upper bits in parallel using Kogge-stone (which is the predecessor to what GPU programmers call "prefix-sum" parallelism)

s.gif
> was it so expensive to add several bits

If this is the GPS week number, there is very limited space for it in the GPS navigation message; every extra bit used for the counter means one less bit for everything else. From what I've read, the navigation message is transmitted at only 50 bits per second, and the time including the week number repeats every 6 seconds, so they had only 300 bits to play with. Given that a 10-bit week number already allows for nearly 20 years, a receiver only needs to know which decade it is today for that value not to be ambiguous, and that should be easy for a receiver with a few bits of local storage unless it's been powered off for more than a decade.

s.gif
It's probably less about adding a few bits to the counter, and more about replacing the existing counters that are already deployed everywhere in the world. Industrial applications usually go by "if it ain't broke, don't fix it" and stick with decades-old machines as long as they still get the job done.
s.gif
The question is if it would have been that expensive to make it more than 10 bits to begin with
s.gif
We as a human race have been terrible at anticipating how fast numbers grow. A similar story is that with databases, we used to think that 32bit primary keys were plenty big enough to store all the numbers we'll ever need. In all likelihood, people who manufactured these timers in the past thought that their equipment would never outlive the need for 10 bits.
s.gif
There's a difference between not anticipating use of products changing in scale to the point that database PKs might need to be longer (which I would assume, albeit as an uneducated guess, has seen closer to exponential than linear growth in terms of maximum length required for unique IDs in databases), vs. anticipating "this timer will definitely hit a limit in 2022, unless the physics of time as we know it somehow changes before then".

Sure it's still a relatively easy mistake to either wrongly assume that all of whatever you're creating will be in landfill by 2022 if you were making it long enough ago that all devices would be dead by now for other reasons (or to unethically do that intentionally, to cause customers to repurchase products this year - seems pretty unlikely but perhaps...), or to just forget to think about it, or think about it and assume it's fine for now and can be checked later before then forgetting.

But "We as a human race have been terrible at anticipating how fast numbers grow" - we're talking about a number that's growing exactly in sync with time, and with an entirely simple way of checking when the limit will be hit, so I think it's just sloppy/lazy or expecting less longevity of use for their product, more than a flaw of humanity.

s.gif
100% agreed. When it comes to any kind of incrementing counter, IMO the limit should always be multiple orders of magnitude greater than the expected lifespan of the system.

Taking OP at face value, these devices' counter has a zero point on February 1, 2003 so if we don't know what happens when they roll over they must have been designed after that point.

In the world of the 2000s even 16 bit processing is old school, so there's no good reason this couldn't have been a 16 bit counter instead. If that had been the case the rollover wouldn't be until the year 3259.

IMO this is the correct way to handle situations where you think you don't need any more, round up from your reasonable limit to the next major "bit border". If you have a 10 bit counter, round up to 16. If you need 16, then make it 24 or 32. The limit shouldn't just be "above where we expect it to reach in its lifespan" but "if you hit this the device belongs in a museum".

Designing a device with a 10 bit week counter in the 2000s is bad design, prioritizing either laziness or cost over quality.

s.gif
> We as a human race have been terrible at anticipating how fast numbers grow.

But in this case it was easy ... it's a timer!

s.gif
Numbers grow unpredictably—it’s true. People in the 1970’s didn’t anticipate that we’d get to the time Sep 2022. :) How time flies.
s.gif
> People in the 1970’s didn’t anticipate that we’d get to the time Sep 2022.

No, they just didn't anticipate that a GPS receiver would not know which decade it is (knowing the decade is enough to disambiguate a 10-bit week number) even though it knew the correct date a few moments ago. That is, they didn't anticipate counter rollover bugs caused by hard-coding of the starting point of the counter, instead of calculating it based on the last seen date.

s.gif
No. But which way did incentives align?
I'm wondering how comprehensive the research is that says it's just those 3 vendors and ~7 devices. Given that it's more than one vendor, it feels like a pattern that's a common mistake or design compromise. I wouldn't be surprised if the impact is broader than expected.
What kind of industrial applications? Are we talking about manufacturing or grid/infrastructure?
s.gif
Anything that uses and depends on time synchronization basically. Which is a huge list. from railways to game servers. But again, do they all use these devices or in particular 10-bit counters? I have no idea.
s.gif
Applications are open for YC Winter 2023
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK