34

No, the problem isn't “bad coders”

 5 years ago
source link: https://www.tuicool.com/articles/hit/rMZzmqN
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A recent blog article discussed the fact that 70% of all security bugs in Microsoft products are due to memory safety vulnerabilities. A lot of the comments I’ve seen on social media boil down to “The problem isn’t the use of a memory unsafe language, but that the programmers who wrote this code are bad.”

In this article, I’m going to look at a recent bug that was caught by the Rust compiler, which I think shows that not only is this assertion unreasonable but virtually impossible for reasons I haven’t seen discussed. While the example I’m going to give is about thread safety rather than memory safety, the arguments I’m going to present can be applied to both.

First, let’s talk briefly about the actual bug. The code that I was working on had both a thread pool and a database connection pool. In order to do its work, I would need exactly one thread and at least one database connection. Database connections are likely to be the more limited resource, and I wanted to avoid spawning a thread and immediately just having it block waiting for a database connection. So the code would grab a connection from the pool and then spawn off the new thread.

The problem is that the database connection would sometimes use a re-entrant mutex when it was acquired from the pool. A re-entrant mutex is a concurrency primitive that ensures you are only using some resource on a single thread. The re-entrant part means that you can ask for a lock multiple times as long as it’s on the same thread. With a normal mutex we would be fine, since you only one lock can exist and it doesn’t matter if we unlock it on a thread other than the one we locked it from. But since a re-entrant mutex remembers which thread it was locked from, we need to keep the resource on the same thread. Fundamentally, we just can’t have a re-entrant mutex be involved and also be able to pull the connection from the pool on a different thread than it is being used.

If the compiler hadn’t caught this for me, all scenarios would have been bad. The best case scenario is that it would have resulted in a test failure. Debugging “attempting to acquire a lock on this mutex hangs indefinitely” would have taken me several hours at least.

The worst case scenario here would have been that no tests failed. We had a case where we thought we had a resource which is not thread safe and could only be used on one thread, but there is actually another thread which could “acquire” a lock on it at any time. This is the sort of ticking time bomb that might not cause a bug at the time the code is written but leaves a massive hole for some other reasonable looking code to blow up in the future.

But luckily, that’s not what happened here. The compiler told me that the mutex guard didn’t implement Send , which is Rust’s way of saying “You can’t send this to another thread” at which point the problem became clear to me. At this point you might be thinking that this should have been obvious to me as soon as I started writing this code. I disagree with that assertion. But even if we assume that is true, I’ve left out one important detail which makes it baseless.

When this code was written, there were no re-entrant mutexes anywhere in the code base.

This wasn’t caught when I finished writing the code. It was caught weeks later, when rebasing against the other changes of the codebase. The invariants of the code I was working with had fundamentally changed out from underneath me between when the code was written and when I was planning to merge it.

Let me be clear, I disagree with the assertion that programmers can be expected to be perfect on its own. But the assertion that we just need better C programmers goes way farther than that. It’s not just a question of whether people can catch problems in code that they write. It’s also expecting people to be capable of re-contextualizing every invariant in any code they interact with (even indirectly). It sets the expectation that none of this changes between the time code is proposed and when it is merged.

These are not reasonable expectations of a human being. We need languages with guard rails to protect against these kinds of errors. Nobody is arguing that if we just had better drivers on the road we wouldn’t need seatbelts. We should not be making that argument about software developers and programming languages either.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK