3

I don't want to go to Chel-C

 1 year ago
source link: https://applied-langua.ge/posts/i-dont-want-to-go-to-chel-c.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Barbarism begins at $HOME

In a DevGAMM presentation, Jon Blow managed to convince himself and other game developers that high level languages are going to cause the "collapse of civilisation" due to a loss of some sort of "capability" to do low-level programming, and that abstraction will make people forget how to do things. We shall start off with a description of the alternative by Charles Antony Richard Hoare, and then work our way through some arguments we have heard against that line of thought:

We asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.1

We note with fear and horror that even in 2022, language designers and users have not learned this lesson.

While Jon argues that safety features are useless to "professional programmers"2 or "programmers that know what they're doing", and programmers are "afraid of pointers" or such, we derive our fear from how even the most talented computer scientists that we know make mistakes. Indeed the CWE team claims that "out-of-bounds writes" and "out-of-bounds reads" are the first and third most dangerous software weaknesses, respectively, and "use after free" is the seventh most dangerous.3 Jon must have found other company, who manage to never write buggy software, and do it without any techniques for systematically avoiding bugs. He mentions the design of highly reliable software systems, but merely writing perfect software is not enough, at the level of reliability he discusses.

Expanding redundancy

Good physical engineering crucially relies on redundancy, which is the property of multiple ways to ensure that something doesn't go wrong. Such redundancy protects against expected and unexpected forms of failure. Software engineering also relies on introducing redundancy,4 and redundancy may be achieved with static analyses such as static type checking and theorem proving, runtime checks such as array bounds5 and dynamic type checks, and system design with fallback and replicated subsystems.

Highly reliable software systems are possible, but they require those forms of redundancy that Jon argues are unnecessary for "real programmers". In particular, such systems require fault isolation, which can only be achieved by pervasive runtime checks, as not checking allows faults to propagate uncontrollably. While Jon insists that defects should never exist, Joe Armstrong insists in his infamous thesis, with some humility, that "from the outset, we know that faults will occur".6 Even if all programmers were perfect, the latter mindset also allows for minimising the effects of hardware faults; while Jon is aware that physics influences what sort of programs can be run fast, ignoring hardware faults is ignoring that physics dictates how a program run at all; programs are run on real electronic circuits, which can have faults, can lose connections and power, and so on. Even formal reasoning about programs cannot (easily) model random bit-flips in defective memory chips, but fault isolation and redundancy can attempt to work around it. These sorts of issues, while occuring with low probability, do appear in redundant systems which are around long enough to encounter many issues;7 thus high reliability systems do require more than simply not making mistakes while programming.

However, there appear to be reasons to avoid such forms of redundancy. We have heard that runtime checks are prohibitively slow, but modern processors with branch prediction can easily predict that errors will not occur. Furthermore, the cost of the system running into an unexpected state is drastically worse than the loss of efficiency in many situations (especially when reliability is such a great concern), and thus detecting an error is invaluable in comparison. We have similarly heard that safe languages impair writing some algorithms, but we never heard just which algorithms those were, and we never encountered any algorithms which require using unsafe code ourselves. Furthermore, any programs using such algorithms would rarely be valid programs written in those languages, either, as unsafe code typically invokes some sort of undefined behaviour in the language. Not invoking UB does not necessarily produce a correct or reliable program, but a program invoking UB is never correct, as it has no defined behaviour to consider "correct".

There is also the argument that these issues only occur with "complex" code, and "simple" code can be shown to not have out-of-bounds accesses. However, the reasoning applied, on average, is incredibly sloppy and should not be confused with any vigorous formal analysis of the code, which would be substantially more difficult. For example, the specification framework for the C language by Andronick et al8 does not support "references to local variables, goto statements, expressions with uncontrolled side-effects, switch statements using fall-through, unions, floating point arithmetic, or calls to function pointers". If someone does not want to use a language which provides for bounds checks, due to a lack of some sort of "capability", then we cannot imagine that they will want to use any subset of C that can be formally modelled!

A notion of code complexity is also influenced by the choice of language for that code; the C language does not provide for bounds or pointer checks, so introducing explicit checks will necessarily result in more code and more control flow paths, and code without these checks will necessarily appear to be simpler.9 Thus the idea of "simple" C code not requiring redundancy requires circular reasoning; the only simple code that can be written in C does not have any redundancy. Though, indeed, the idea was false to start with. The combination of how formal reasoning for even a subset of C is difficult, and the burden being put on the programmer to explicitly insert runtime checks, leads to "simple" C code ironically being very undesirable.

Abstract reasoning about abstractions

Using abstractions extensively actively encourages study into how to implement them efficiently, contrary to Blow's claim that their use will lead to few understanding their implementation. The implementation of higher level languages rather provides a context in which better study of how to optimise for a given computer is possible. A programmer could indeed break abstractions, and write low-level code themselves, to generate the best-performing code that they can think of. The programmer would then be effectively running the perfect optimising compiler in their head. But if a programmer is instead motivated to stick to these higher level constructs, they have to express their mental model precisely, in the form of code for the compiler to use. The latter requires deeper thought, as the programmer has to formalise and "explain" their knowledge, in a way similar to "learning by teaching". Hence abstractions encourage deeper thought into how to get consistent performance out of high-level code, rather than one-off low-level trickery.

One practical example of optimising such abstractions is the implementation of dynamic dispatch in object-oriented programming languages. The C++ programming language offers dynamic dispatch, but it is almost implemented with a "virtual table" lookup. If the lookup is too slow, a programmer may avoid dynamic dispatch and instead use regular functions to regain performance. However, this is not an option for implementations of the Smalltalk programming language, where every method call requires dynamic dispatch. Thus high-performance Smalltalk systems make use of inline caching.10 The Self programming language additionally uses method dispatch for "instance" or "local" variables, as they are only exposed via reader and writer methods, demanding the better technique of polymorphic inline caching.11 The general trend is that if an abstraction offers appealing principles, attempting to follow the principle consistently requires investigating how to make the abstraction convenient, as inconvenience will cause someone to abandon the abstraction; and the performance of an abstraction can be a crucial part of its convenience to a programmer.

Another property of abstractions is that improving the performance of abstractions improves performance for all users of the abstraction. Having such abstractions reduces the amount of code that needs to be written, producing simpler code, and allows for the implementor to fine-tune implementations for maximum performance on every machine they target, rather than every user doing it themselves for every program. As Robert Bernecky states on his time implementing APL systems:

In the late 1970’s, I was manager of the APL development department at I.P. Sharp Associates Limited. A number of users of our system were concerned about the performance of the ∨.∧ inner product [for transitive closures] on large Boolean arrays in graph computations. I realized that a permuted loop order would permit vectorization of the Boolean calculations, even on a non-vector machine. David Allen implemented the algorithm and obtained a thousand-fold speedup factor on the problem. This made all Boolean matrix products immediately practical in APL, and our user (and many others) went away very happy.

What made things even better was that the work had benefit for all inner products, not just the Boolean ones. The standard +.× [matrix multiplication] now ran 2.5—3 times faster than Fortran. […] So, rather than merely speeding up one library subroutine, we sped up a whole family of hundreds of such routines (even those that had never been used yet!), with no more effort than would have been required for one.12

Conclusion

You're quite right. C is memory unsafe, and the large quantity of C code in IoT devices is, as you rightly say, a disaster - only it has already happened; no waiting is involved.13

The supposed collapse of civilisation due to bad software is already here, and Mr. Blow doesn't want you to know why. Even great programmers still use redundancy measures to allow them to make mistakes without causing harm; we'd even say that the greatest programmers are so because of how they produce redundancy. The persistent use of redundancy and abstraction encourages research into how to make it efficient, which is quite the opposite of forgetting how any low-level trickery is performed.

As Henry Baker reminded us, there are many contexts "in which bad programming style can kill people".9 We can only hope that people figure out what to blame, before the idiotic view of producing high-quality software that Blow et al promote eventually does kill people.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK