0

提議 OpenJDK 的程式碼 UTF-8 化

 2 years ago
source link: https://blog.gslin.org/archives/2023/03/02/11083/%e6%8f%90%e8%ad%b0-openjdk-%e7%9a%84%e7%a8%8b%e5%bc%8f%e7%a2%bc-utf-8-%e5%8c%96/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

提議 OpenJDK 的程式碼 UTF-8 化

Hacker News 上看到提議把 OpenJDK 的程式碼 UTF-8 化:「Make JDK source code UTF-8 (openjdk.org)」,原始的 jira ticket 在「Make JDK source code UTF-8」這邊。

主要是現在的 source code 沒有統一標準:

Currently, the source code in the JDK is in an ill-defined encoding. There is no official declaration of the encoding used. It is "mostly ASCII", but the relatively few non-ASCII characters used are not well-defined. In many cases, it is latin-1, but I am pretty certain other encodings are used for e.g. Asian translations.

而 mailing list 上看起來還是有想要維持 ASCII 的討論:「Making the source code utf-8」,不過他提出來的理由我覺得不太行,在 console 跑 vi 或是 emacs 我覺得不太是個好理由...

Related

非常經典的 UTF-8...

在 Hacker News 文摘上看到「UTF-8 – “The most elegant hack”」這篇。除了維基百科上的資料以外,Rob Pike 與其他人在 2003 年寫的 mail 也是相當重要的資料。 Ken Thompson 與 Rob Pike 兩位發展出來的 UTF-8 被譽為最優雅的 hack 真的一點都不為過。Unicode 1.0 在 1991 年 10 月公佈。之後就陸陸續續有表示的格式出來... 相容於 ASCII 0-127 的 UTF-1 在 1992 年被提出來,但 parsing performance 並不好。 1992 年 7 月,Dave Prosser 提出 FSS-UTF,很類似後來的 UTF-8…

October 1, 2013

In "Computer"

Branchless UTF-8 解碼器

看到「A Branchless UTF-8 Decoder」這篇,先來回憶一下「非常經典的 UTF-8...」這篇,以及裡面提到的 encoding: 因為當初在設計 UTF-8 時就有考慮到,所以 decoding 很容易用 DFA 解決,也就是寫成一堆 if-then-else 的條件。但現代 CPU 因為 out-of-order execution 以及 pipeline 的設計,遇到 random branch 會有很高的效能損失,所以作者就想要試著寫看看 branchless 的版本。 成效其實還好,尤其是 Clang 上說不定在誤差內: With GCC 6.3.0 on an i7-6700, my decoder is about 20% faster than the DFA decoder in the benchmark. With…

October 9, 2017

In "Computer"

Elasticsearch 1.2.0

由於 Elasticsearch 的想法與實做比起 Solr 吸引人,可以看到愈來愈多團體換過去... 而前幾天 Elasticsearch 的官方放出 1.2.0 與 1.1.2 的消息:「elasticsearch 1.2.0 and 1.1.2 released」。 1.2.0 最大的改變是強制使用 Java 7 了,也就是不能在 Ubuntu 12.04 下安裝 default-jre 了,變成要裝 openjdk-7-jre。(要注意,官方建議的是 Oracle 官方的 JDK,而非 OpenJDK) 如果是 Ubuntu 14.04 就沒這個問題。(因為 default-jre 會裝 Java 7) 另外一個大改變是,之前產生安全問題的 dynamic scripting 預設關掉了,也就是 CVE-2014-3120。 目前我的進度只到看完 mapping,但還沒實際開始塞資料進去玩...

May 25, 2014

In "Computer"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on March 2, 2023Categories Computer, Murmuring, ProgrammingTags ascii, code, encoding, jdk, openjdk, source, unicode, utf-8, utf8

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


Recommend

  • 48
    • www.tuicool.com 6 years ago
    • Cache

    Working with UTF-8 in the Kernel

    Benefits for LWN subscribers The primary benefit fromsubscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content a...

  • 13
    • blog.codingnow.com 5 years ago
    • Cache

    Windows 下 UTF-16 的坑

    最近帮多个活跃的开源项目改了同一个 bug : 完善 Windows 下的 Unicode 支持。 问题源于 Windows 和其它平台不同,它有悠久的历史包袱。和现代大多数操作系统对 Unicode 支持的共识不同,它的 API 不是基于 UTF-8 而是基于...

  • 29
    • www.tuicool.com 5 years ago
    • Cache

    UTF-8 String Indexing Strategies

    When designing or, in some cases, implementing a programming language with built-in support for Unicode strings, an important decision must be made about how to represent or encode those strings in memory. Not all represe...

  • 55
    • www.tuicool.com 5 years ago
    • Cache

    Go 字符串编码,Unicode 和UTF-8

    1.字符串 字符串在Go语言中以原生数据类型出现,使用字符串就像使用其他原生数据类型(int、bool、 float32、foat64等)一样。 字符串的值为双引号中的内容,可以在Go语言的源码中直接添加非ASCⅡ码字符 Go...

  • 59
    • www.tuicool.com 5 years ago
    • Cache

    从摩斯密码到 UTF-8

    在电影《无间道》中,经常会出现摩斯密码的身影。摩斯密码本身的传奇性,为电影增色不少。其实摩斯密码一点也不复杂,反而很简单,透过摩斯密码,我们可以一窥计算机如何表示字符串的奥秘。

  • 56
    • studygolang.com 5 years ago
    • Cache

    中文转换成html中的utf-8

    在HTML中,中文的“好好学习”可以表示为“好好学习” 在项目中,需要对接短信告警,短信告警返回数据要求是utf8的 后来继续沟通,才发现要的是 html-utf8 的; 没有找到合适的golang工具包,涉及语言转码的...

  • 19
    • octobus.net 4 years ago
    • Cache

    Not everything is UTF-8

    Over the past few weeks I've helped a new developer get started with both Mercurial and Rust , exposing them to somewhat niche subject...

  • 26
    • wiki.tcl-lang.org 4 years ago
    • Cache

    UTF-8 bit by bit (2001)

    Richard Suchenwirth 2001-02-28 - From a delightful debugging chat at theTcl chatroom, I was brought to write down what I think on UTF-8 analysis (cf.Unicode and UTF-8, see also). I imagine a UTF-8 string as a ra...

  • 18
    • blog.xizhibei.me 4 years ago
    • Cache

    Golang 中的 Unicode 与 UTF-8

    大多数的我们,真正认识到有字符编码这回事,一般都是因为遇到了乱码,因为我国常用的编码是 GBK 以及 GB2312:用两个 Byte 来表示所有的汉字,这样,我们一共可以表示 2^16 = 65536 个字符,一旦我们的 GBK 以及 GB2312 编码遇到了其他编...

  • 13
    • blog.mathieu-leplatre.info 4 years ago
    • Cache

    Python UTF-8 print fails when redirecting stdout

    Python UTF-8 print fails when redirecting stdoutPython UTF-8 print fails when redirecting stdout Wed 26 January 2011Consider the following piece of code: # -*- coding: utf-8 -*- print u"Վարդանաշեն" Runni...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK