3

Why do (or don't) languages forbid unreachable code?

 1 month ago
source link: https://langdev.stackexchange.com/questions/3650/why-do-or-dont-languages-forbid-unreachable-code
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

7 Answers

Great question. I can tell you a bit about C# and why in C#, unreachable code is a warning, not an error. I'll start by briefly describing how reachability detection works, then what it is used for, and then why it's a warning, not an error.

Unreachable code is detected by a few relatively simple rules. The simplest are ones like the case you describe; we can very easily partition statements into "blocks" where control enters at the top of the block and does not leave normally (that is, without an exception) until the end of the block. If no control flow enters the top of a basic block, the whole block is unreachable.

Those statements that always "go somewhere else" (goto, return, break, continue, throw) are said to have an "unreachable endpoint". As does any statement that is itself unreachable.

Next, C# does an analysis of compile-time constant expressions; these are expressions involving only literals or named constants. The else block of an if(0==0) condition is unreachable, but C# would consider the else block of if(x * 0 == 0) for integer variable x to be reachable, since the expression is not a constant expression. (Fun fact, C# 1 and 2 had a bug where it ran the arithmetic optimizer too early and treated some expressions of this form as constants; I fixed it for C# 3.) while(true) {} has an unreachable endpoint.

C# also analyzes the logical or/and expressions to determine reachability, which is where the rules get a little complicated; see the spec for details.

And let me finish this brief overview with the puzzle I always pose when discussing reachability. There is a C# program where there is a reachable goto, but its target label is unreachable. Can you give an example of such a program?


Why does C# care about reachability at all?

  • The end point of the last statement in a method with a non-void return type must be unreachable, otherwise we have a bug; the method doesn't necessarily return a value.

  • If the end point of the last statement in a method is reachable then every out parameter must be definitely assigned before that, otherwise we have a bug, we are possibly returning without assigning the out parameter.

  • In C and C++, the switch statement is bug prone. In a switch in C, each label starts a section, that section can be empty, and so in order to have two labels with the same consequence, control must fall through from the bottom of one section to the next. C# fixes this logical problem by instead saying that a section is never empty; it always has at least one statement, and that a section may have multiple labels. A section must have an unreachable endpoint.

  • C# disallows reading from a local variable before it is known to have been assigned, again because this is a common source of bugs. The rules that determine when a variable is known to be assigned and known to be read are based on reachability analysis. If there is a possibly-reachable read where the corresponding write is unreachable or possibly unreachable, that's an error. But a read of a possibly-unassigned local is not an error if the read itself is unreachable.

What I'm getting at here is: we care about reachability analysis because it enables automatically finding common bugs and preventing them at compile time. But why then is unreachable code not itself a bug that makes a compile-time error? And why is reading from an unassigned local legal in unreachable code?


To facilitate debugging! I'm sure you've encountered this situation of:

if(some complicated and hard to reproduce condition)
{
  there's a bug in this code
} else {
  some perfectly normal code
}

If the repro is a pain, the natural thing to do is to just write

if(true /* painful repro condition */)
{
  // put your breakpoint here
  there's a bug in this code
} else {
  some perfectly normal code
}

If that's a compile-time error because the else case is now unreachable, then you're slowing down the debugging process. If it is not a warning, then you're risking the developer accidentally checking in debug code. Therefore the sensible thing to do is to make it a warning.

Same with unassigned locals.

int x;
if (Validate(y, out x))
{
  do something with x
} else {
  uh oh the error case has a bug in it
  but it does not read x
}

We quickly debug it by

int x;
if (false /*Validate(y, out x)*/)
{
  do something with x
} else {
  // breakpoint here
  uh oh the error case has a bug in it
  but it does not read x   
}

In this debugging scenario "do something with x" reads x before it is assigned. Doesn't matter! Reading a variable before it is assigned in unreachable code is not an error. Rather, the unreachable code is a warning so that you don't check it in by mistake.

There are other scenarios as well; here's one I described in 2012.


The lesson for language designers here is to think about everything that a developer is going to do. Code is not just written and checked in; it's modified in temporary ways during debugging, often under time pressure, and anything we can do to make that process easier makes developers more productive.


If this subject interests you, I've written about it a fair amount. A good start is this now-deleted post on the Coverity blog:

https://web.archive.org/web/20140331160029/http://blog.coverity.com/2013/11/06/c-reachability#.UzmRJ4HP1qY


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK