60

Domain hacks with unusual Unicode characters

 5 years ago
source link: https://www.tuicool.com/articles/hit/mqeIzyn
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Unicode contains a range of symbols which don't get much use. For example, there are separate symbols for TradeMark - ™, Service Mark - ℠, and Prescriptions - ℞.

Nestling among the "Letterlike Symbols" are two curious entries. Both of these are single characters:

What's interesting is both .tel and .no are Top-Level-Domains (TLD) on the Domain Name System (DNS).

So my contact site - https://edent.tel/ - can be written as - https://edent.℡/

And the Norwegian domain name registry NORID can be accessed at https://www.norid.№/

Copy and paste those links - they work in any browser!

Is this limited to TLDs?

No! This works ANYWHERE in a domain name. Copy and paste these examples:

  • Script https://ℰ ℳ ℒℰ. ℳ/
  • Math Bold https:// . /
  • Fraktur https:// . /
  • Math bold italic https:// . /
  • Math bold script https:// . /
  • Double struck https:// . /
  • Monospace https:// . /
  • Super script https://ᵉˣᵃᵐᵖˡᵉ.ᶜᵒᵐ/
  • Sub script https://ₑₓₐₘₚₗₑ.cₒₘ/ NB not all characters supported
  • Math sans bold https:// . /
  • Math sans bold italic https:// . /
  • Math sans italic https:// . /
  • Math Squared https:// . / NB the dot must not be squared
  • Circled https://ⓔⓧⓐⓜⓟⓛⓔ.ⓒⓞⓜ/ NB the dot must not be circled

There are a whole bunch more miscellaneous characters you can use:

Wait, so one can use any of

㍳ ㏃ ㏇(!) ㏈ ffffifflfifl ㎇㎓㎬㏉ ㏋㍱㎐ ㎄㎅㎑㏍㏎㎸㎾ ㎃㎆㎒㎫㎹㎷㎿㎽ ㎁㎋№㎵㎻ ㍵ ㎀㎩㎊㏗㏙㏚㎴㎺ ₨ ℠ßst㏜ ℡㎔™ ㏝

ÅℬℂℭℰℱℐℑKℒℳℕℙℚℛℜℝℤℨ and more to leet-code URLs? @urlstandard

— Christoph Päper (@Cr1ss0v) October 8, 2018

How does this work?

Magic! Which is to say, I think it is the browser doing the conversion. DNS Servers don't successfully reply to queries about .℡ domains.

The browser sees the .℡ and then follows the IDNA2008 process listed in RFC5895 to normalise it:

map characters to the "Simple_Lowercase_Mapping" property (the fourteenth column) in < http://www.unicode.org/Public/UNIDATA/UnicodeData.txt >, if any.

The ℡ entry is:

2121;TELEPHONE SIGN;So;0;ON;<compat> 0054 0045 004C;;;;N;T E L SYMBOL;;;;

U+0054 is T, U+0045 is E, U+004C is L.

You can test this in Python using:

python -c 'import sys;print sys.argv[1].decode("utf-8").encode("idna")' "℡"

Does this work?

Yes! I asked people on Twitter whether they could access my website using a .℡ - and it appeared to work on every modern browser and operating system.

Hey gang! I have a little experiment for you

Does this URL resolve in your browser?

https://edent.℡/

(That's https:// edent. ℡ /)

If it does or doesn't, could you let me know which browser and operating system?

THANKS!

— Terence Eden (@edent) October 8, 2018

It even works on command line tools like wget and curl .

Things used to retrieve web pages rather than web browsers

curl 7.59, Linux - Yes

wget 1.19, Linux - Yes

— Mike (@6byNine) October 8, 2018

It does fail in some circumstances:

Yes, Chrome/Safari/Firefox running on Mac. The TEL however changed from superscript to normal text. If I copied/pasted into Word and then into the browser, the superscript is preserved and it no longer resolves (takes you to the google page with this page being the first hit)

— Ricardo Sueiras (@094459) October 8, 2018

What are the limitations?

Two main ones:

  • Sites like Twitter and Facebook don't recognise it as a valid URl and refuse to auto link it.
  • Some command line tools like dig and host don't understand it
dig edent.℡

; <<>> DiG 9.10.6 <<>> edent.℡
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 55282

Is this useful?

Obviously yes. This may be the most important discovery of the decade. You get cool looking URls and get to save a couple of characters on specific domains, at the minor expense of working inconsistently.

It could also be used for evading URl filters.

Every modern browser supports these "fancy" domain names - but most websites won't automatically link to them. So sharing on Facebook doesn't work.

Where can it be used?

Here are the single characters which can be normalised down to a valid TLD. They're mostly country codes, but there are a few interesting exceptions:

  • - US Military
  • - .tel registry
  • - Norway
  • - Australia
  • - Dominica
  • - Panama
  • - Namibia
  • - Morocco
  • - French Polynesia
  • - Norfolk Island
  • - Kyrgyzstan
  • - Mali
  • - Federated States of Micronesia
  • - Finland
  • - Myanmar
  • - Cameroon
  • & - Comoros
  • - Palestine
  • - Montserrat
  • & - Republic of Maldives.
  • - Palau
  • & - Malawi
  • - Cocos (Keeling) Islands
  • - Democratic Republic of Congo
  • - Guyana
  • - Philippines
  • - Saint Pierre and Miquelon
  • - Puerto Rico
  • - Suriname
  • - El Salvador
  • - San Marino
  • - Turkmenistan
  • & - São Tomé and Príncipe
  • - Great Britain ( Obsolete )
  • ß - South Sudan ( Not available )
  • - India and Indiana (subdomain of .us)
  • & - Virgin Islands and Virginia (subdomain of .us)
  • - Florida (subdomain of .us)
  • - New Mexico (subdomain of .us)
  • - Nevada (subdomain of .us)
  • - As part of .ovh

If you can find any more, please stick a comment in the box below.

You can always reach this blog post at:

https:// ₛᵖ .ⓜ / /

Support this blog

Enjoyed this blog post? You can say thanks to the author in the following ways:

Donate to charity

Charity-fs8.png

Buy me a birthday present

Amazon-Wishlist-356.png

Get me a coffee

ko-fi-donate.png

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK