9

C, what the fuck??

 4 years ago
source link: https://bowero.nl/blog/2019/12/15/c-what-the-fuck/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

What do you think that the value of a will be here?

int a = 0;
// What will be the value of a????/
a++;

You probably know that it won’t be 1 , but there is a big chance that you only know that because I asked this question this way.

a will actually not change, and that is because a++; will never be run. This is because of the comment above. There is something special about this line. Before we jump into that, let’s look at another example:

!didIMakeAMistake() ??!??! CIsWrongHere();

This actually compiles, which is already impressive on its own. The question is, however, what the fuck does this do?

To understand this, I have to admit one thing: I have to pass -trigraphs to a modern version of gcc before this actually works. Trigraphs are special combinations of characters that were invented because of a problem in C: it uses 9 characters that are not in the ISO/IEC 646 Invariant character set . That are these characters:

2ArI3ai.png!web Image: Wikipedia

Who uses C on a regular basis, should be able to figure out which 9 characters are missing from here. Those are the following:

# \ ^ [ ] | { } ~

The table in the image above is meant to be clarifying, but we discover that this table might be confusing. That is because the characters that are missing , can all be found in the table above. You should note however, that they are all grayed out. That is because they are national code points, and therefore not an international rule.

This could become very interesting. Let’s look at this simple line for example:

{ a[i] = '\n'; }

This would be written like this by a Swedish programmer:

ä aÄiÜ = 'Ön'; ü

This is because they would use different characters for the national code points, than an American programmer would use for example.

The ANSI C committee of course recognized this problem and therefore, they decided to introduce the trigraphs. That are nine combinations of characters that were meant to replace the non-standard characters.

ZnuMRbB.png!web Image: Wikipedia

Of course, this isn’t a beautiful solution, but it should do the trick.

Now, with this knowledge, let’s look at the lines that we started with. The latter is the easier one:

!didIMakeAMistake() ??!??! CIsWrongHere();

If we look at the table above, we see that ??! should be replaced with | . Therefore, this line actually says:

!didIMakeAMistake() || CIsWrongHere();

If you understand how short-circuit evaluation works, you can understand that this will result in the following:

if (!didIMakeAMistake()) 
  CIsWrongHere();

The other example is actually more interesting, and a good reason to be cautious with trigraphs:

int a = 0;
// What will be the value of a????/
a++;

Earlier, I already explained that a will be 0 , because a++; is never executed.

A trigraph is only a trigraph when the ?? s are followed by one of the nine string literals. So in this case, the C preprocessor will replace the code above with the following:

int a = 0;
// What will be the value of a??\
a++;

This \ actually escapes the newline, which eventually results in the following:

int a = 0;
// What will be the value of a??a++;

And this is why a++; was never executed.

I would like to end with a note from the committee itself:

The Committee makes no claims that a program written using trigraphs looks attractive. As a matter of style, it may be wise to surround trigraphs with white space, so that they stand out better in program text. Some users may wish to define preprocessing macros for some or all of the trigraph sequences.

Rationale for International Standard Programming Languages C (5.2.1.1)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK