2

Debian/Raspbian rngd with -S0 will bite you after a week

 2 years ago
source link: https://rachelbythebay.com/w/2022/04/20/rngd/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Debian/Raspbian rngd with -S0 will bite you after a week

I've been using my Raspberry Pis for some data collection purposes, and noticed some of them doing a few things which seemed suboptimal. There's this process called "rngd" which really likes to wake up every so often and announce stuff about what it's been up to. It's usually about every hour, and it tends to push a dozen lines to the syslog, none of which I care about.

So, on one of them, in the interest of not polluting the logs, and not burning the SD card with tons of writes that add zero value to my life, I decided to turn it off with "-S0". Per the manual:

Controls the interval in which statistics are output to syslog. Normally every hour, the maximum being one week (604800 seconds). Set to 0 to disable automatic statistics logging, e.g. to avoid hard disc spin-up in embedded systems (sending a SIGUSR1 signal will still cause statistics to be logged).

I did that back in February, and indeed, it stopped logging. I was pretty happy. I left the other ones alone for the time being.

Well, a few hours ago, I was poking around at these things, and noticed that rngd on that one was running about 15% CPU while it wasn't doing anything on the other ones. This kind of thing irked me, so I attached to it with strace, and this was my reward:

[pid 422] 23:43:16.512699 clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, 0x7ee28c68) = 0 <0.000103>
[pid 422] 23:43:16.512969 clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, 0x7ee28c68) = 0 <0.000101>
[ ... over and over and over like this ... ]

In short, WTF, over. Why are you spinning over and over and over calling sleep... with no delay? Check that out: tv_sec is the number of seconds to sleep. tv_nsec is the number of nanoseconds to sleep. Calling it with a struct where both are set to 0 will make it return right away, unsurprisingly.

Why would you do that? Then why would you keep doing it? And why would you keep doing it forever, such that you had 15% of the system's CPU?

I went to attach to it with gdb to find out exactly where this call to sleep was coming from, but there was a problem. It seems that nobody in Raspbian land cares about having debug symbols for their packages. Normally in the world of Debian, you can stick something extra in your apt sources file, do an update, and you'll have a whole new world of -dbgsym packages available to you. If those exist in Raspbian, I sure can't find them.

So I had no symbols beyond those for glibc itself. I dumped a core anyway, knowing that I'd probably wind up killing the process at some point and would want to come back to it later. I'm glad I did, because it ended up helping solve the mystery later.

The next hour or so was spent trying to reproduce this in a version I built from source. It wouldn't go into that loop. Instead, it would start up and go into a week-long sleep. The few spots where it called sleep in the source didn't seem to be causing this. I got really annoyed with this and was rather close to calling it a night, but then I decided to start over from first principles.

The original process was long dead so I could no longer grab it with strace or gdb, but honestly, neither of those would have anything new to tell me. I did still have the core file, so that much was still there, and I managed to load it up to see this:

[Current thread is 1 (Thread 0x76f49200 (LWP 422))]
(gdb) bt
#0 0x76e22f8c in __GI___clock_nanosleep_time64 (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ee28c58, req@entry=0x7ee28c50, rem=0x7ee28c68, rem@entry=0x7ee28c60) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:52
#1 0x76e23080 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ee28c9c, rem=rem@entry=0x7ee28c9c) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:92
#2 0x76e29830 in __GI___nanosleep (requested_time=requested_time@entry=0x7ee28c9c, remaining=remaining@entry=0x7ee28c9c) at nanosleep.c:27
#3 0x76e2971c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#4 0x00011470 in ?? ()

Notice that the thread in question is what gdb calls #1, which is LWP 422, aka PID 422. If you look at my strace from before, all of those calls to sleep are coming from pid 422. This told me that it wasn't one of the other threads causing the badness, but rather the main thread.

All of that noodling around with running it from source had taught me that the program's main thread just spins up some worker threads and then goes into this loop where it calls sleep and maybe does some stats stuff. It looks like this:

        sleepinterval = arguments->stats_interval ?
            arguments->stats_interval : STATS_INTERVAL_MAX;
        sleeptime = sleepinterval;
        while (!gotsigterm) {
                sleeptime = sleep(sleeptime);
                if ((arguments->stats_interval && sleeptime == 0) ||
                    gotsigusr1 || gotsigterm) {
                        dump_rng_stats();
                        sleeptime = sleepinterval;
                        gotsigusr1 = 0;
                }
        }

Looking at the code, sleepinterval gets set from a ternary expression that either takes the value of stats_interval if it's nonzero, or it sets it to STATS_INTERVAL_MAX which turns out to be 604800 seconds. That's a week worth of seconds, and that 604800 was showing up whenever I ran the one I had built from source. No surprise there.

Given that strace and gdb had proven that the looping code came from main, and this is where main ends up, how could this possibly sleep for 0 seconds? After all, it resets it inside that janky-looking "if" down there inside the loop, and it sets it to sleepinterval, and sleepinterval is 604800, so all good, right?

Well, no, actually.

It's around this time that I noticed that it's not a bare call to sleep. It's actually using the return value from sleep. I don't see this very much and missed it at first.

	sleeptime = sleep(sleeptime);

This whole thing. So... for us to ever sleep(0), then sleeptime would have to be 0. But we always reset it higher, don't we? No, we don't.

if ((arguments->stats_interval && sleeptime == 0) ||
	  gotsigusr1 || gotsigterm) {

Unrolling that branch goes like this.

Is arguments->stats_interval true? It's set to 0, so no, it isn't. We don't even look at the rest of that first one as a result. gotsigusr1 and gotsigterm are also not set since we didn't get a signal. None of those OR possibilities are true, so we DO NOT take the branch. We don't dump_rng_stats, and we don't reset sleeptime to sleepinterval.

That means sleeptime is going to stay set to whatever it was when it got set coming out of sleep. Could it be set to zero? Absolutely. The return value from sleep is "how long you had left to wait when it exited for whatever reason". This is because sleep can be interrupted, and this way you could restart it. Worse is better, PC losering, all of that stuff.

But what happens if you reach the time you asked it to wait? Well then, it returns zero.

So... all we have to do is get all the way through that sleep without getting one of those signals, and it'll return 0, and then never get reset, so it'll get called with 0 again, and again, and again, and again, ................. in a very tight loop, burning CPU the whole way.

The trick, and the reason why it's so damn hard to reproduce, is that you have to wait a full week for that first timeout to hit! Yep. Imagine that kind of patience.

I do not have that kind of patience, so to test my hypothesis, I lowered STATS_INTERVAL_MAX to 30 seconds and started up the new build in strace. It sat there for 30 seconds, and then promptly started chewing CPU in a very tight sleep loop.

So, yeah, if you run Debian's rngd with -S0, it'll be perfectly fine for a week or so, and then it'll go into a tight loop that'll tie up one of your CPUs from then until the process is stopped for some reason. Lovely.

If you've tried running it this way on your Raspberry Pi for similar card-preserving reasons, you might want to go look and see if you have a rngd that's eating a core. If your uptime is more than a week, I bet it is.

Sheesh.

Side note: this is the first thing I saw upon diving into the code. I was looking for places that might call sleep, and found a function called random_sleep(). Check this out.

         gettimeofday(&start, NULL);
         while (!gotsigterm && timeout_usec > 0 &&
                      poll(&pfd, 1, timeout_usec) < 0 &&
                      errno != EINTR) {
                gettimeofday(&now, NULL);
                timeout_usec -= elapsed_time(&start, &now);
                start = now;
        }

Right off the bat, I saw "gettimeofday" and thought "I picked the wrong week to stop something something". They're calculating elapsed time... with the WALL CLOCK. That's what gettimeofday() gives you. If you want a monotonic clock, you need to use clock_gettime() and specify the right one. This is just asking for trouble if the system clock gets changed while it's running.

Put it this way: it's a good thing they explicitly check for timeout_usec being a positive number, since if you pass poll a negative timeout, that means "infinity". Neat, right?

I think I'm going to stop there before I find something else.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK