1

C 語言裡面的 ??! 符號

 1 year ago
source link: https://blog.gslin.org/archives/2022/10/07/10908/c-%e8%aa%9e%e8%a8%80%e8%a3%a1%e9%9d%a2%e7%9a%84-%e7%ac%a6%e8%99%9f/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

C 語言裡面的 ??! 符號

Hacker News Daily 上看到這個奇怪的知識:「What does the ??!??! operator do in C? (stackoverflow.com)」,原文在 Stack Overflow 上:「What does the ??!??! operator do in C?」。

這是 trigraph,在 C89 就有了,從 Rationale for International Standard—Programming Languages—C 這邊的 5.2.1.1 可以看到 trigraph 的歷史原因:

Trigraph sequences were introduced in C89 as alternate spellings of some characters to allow the implementation of C in character sets which do not provide a sufficient number of non-alphabetic graphics

而且是強制要求實做:

Implementations are required to support these alternate spellings, even if the character set in use is ASCII, in order to allow transportation of code from systems which must use the trigraphs. AMD1 also added digraphs (see §6.4.6 and §MSE.4).

其中遇到的問題就是當年得決定 C 可以用的 charset,得考慮到很多不同機器 charset 相容性的問題:

The C89 Committee faced a serious problem in trying to define a character set for C. Not all of the character sets in general use have the right number of characters, nor do they support the graphical symbols that C users expect to see. For instance, many character sets for languages other than English resemble ASCII except that codes used for graphic characters in ASCII are instead used for alphabetic characters or diacritical marks. C relies upon a richer set of graphic characters than most other programming languages, so the representation of programs in character sets other than ASCII is a greater problem than for most other programming languages.

然後就使用了 ISO/IEC 646 這個標準 (要記得 Unicode 1.0.0 是 1991 年才出現):

The solution is an internationally agreed-upon repertoire in terms of which an international representation of C can be defined. ISO has defined such a standard, ISO/IEC 646, which describes an invariant subset of ASCII.

The characters in the ASCII repertoire used by C and absent from the ISO/IEC 646 invariant repertoire are:

[ ] { } \ | ~ ^

後面就是定義 ?? 當作 escape digraph。

算是一個歷史產物,現在不太需要用到了...

Related

非常經典的 UTF-8...

在 Hacker News 文摘上看到「UTF-8 – “The most elegant hack”」這篇。除了維基百科上的資料以外,Rob Pike 與其他人在 2003 年寫的 mail 也是相當重要的資料。 Ken Thompson 與 Rob Pike 兩位發展出來的 UTF-8 被譽為最優雅的 hack 真的一點都不為過。Unicode 1.0 在 1991 年 10 月公佈。之後就陸陸續續有表示的格式出來... 相容於 ASCII 0-127 的 UTF-1 在 1992 年被提出來,但 parsing performance 並不好。 1992 年 7 月,Dave Prosser 提出 FSS-UTF,很類似後來的 UTF-8…

October 1, 2013

In "Computer"

一個有趣的面試問題

在 Hacker News Daily 上看到「Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust」這個有趣的面試問題,在 Hacker News 上的討論也可以看看:「Performance comparison: counting words in Python, Go, C++, C, Awk, Forth, Rust (benhoyt.com)」。 問題是這樣: Write a program to count the frequencies of unique words from standard input, then print them out…

March 19, 2021

In "Computer"

C 語言的兩個笑話 (以及他的惡搞原理)

Twitter 上看到兩則 C 語言的笑話: a lot of people don't know this - C actually lets you do arithmetic with mixed types, much like JavaScript: pic.twitter.com/6XF4qm41xv— luna 💙 (@lunasorcery) March 18, 2022 This also works for exponents: pic.twitter.com/LYBf2qkseO— luna 💙 (@lunasorcery) March 18, 2022 第一個的 "-0.5" 是 char[],補了 + 1 會往後一格,所以會變成移到…

March 20, 2022

In "Computer"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on October 7, 2022October 7, 2022Categories Computer, Murmuring, ProgrammingTags 646, ascii, c, c89, charset, digraph, escape, iec, iso, language, programming, standard, trigraph

One thought on “C 語言裡面的 ??! 符號”

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK