Domain hacks with unusual Unicode characters
source link: https://www.tuicool.com/articles/hit/mqeIzyn
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Unicode contains a range of symbols which don't get much use. For example, there are separate symbols for TradeMark - ™, Service Mark - ℠, and Prescriptions - ℞.
Nestling among the "Letterlike Symbols" are two curious entries. Both of these are single characters:
- Telephone symbol - ℡
- Numero Sign - №
What's interesting is both .tel and .no are Top-Level-Domains (TLD) on the Domain Name System (DNS).
So my contact site - https://edent.tel/ - can be written as - https://edent.℡/
And the Norwegian domain name registry NORID can be accessed at https://www.norid.№/
Copy and paste those links - they work in any browser!
Is this limited to TLDs?
No! This works ANYWHERE in a domain name. Copy and paste these examples:
- Script https://ℰ ℳ ℒℰ. ℳ/
- Math Bold https:// . /
- Fraktur https:// . /
- Math bold italic https:// . /
- Math bold script https:// . /
- Double struck https:// . /
- Monospace https:// . /
- Super script https://ᵉˣᵃᵐᵖˡᵉ.ᶜᵒᵐ/
- Sub script https://ₑₓₐₘₚₗₑ.cₒₘ/ NB not all characters supported
- Math sans bold https:// . /
- Math sans bold italic https:// . /
- Math sans italic https:// . /
- Math Squared https:// . / NB the dot must not be squared
- Circled https://ⓔⓧⓐⓜⓟⓛⓔ.ⓒⓞⓜ/ NB the dot must not be circled
There are a whole bunch more miscellaneous characters you can use:
Wait, so one can use any of
㍳ ㏃ ㏇(!) ㏈ ffffifflfifl ㎇㎓㎬㏉ ㏋㍱㎐ ㎄㎅㎑㏍㏎㎸㎾ ㎃㎆㎒㎫㎹㎷㎿㎽ ㎁㎋№㎵㎻ ㍵ ㎀㎩㎊㏗㏙㏚㎴㎺ ₨ ℠ßst㏜ ℡㎔™ ㏝
ÅℬℂℭℰℱℐℑKℒℳℕℙℚℛℜℝℤℨ and more to leet-code URLs? @urlstandard
— Christoph Päper (@Cr1ss0v) October 8, 2018How does this work?
Magic! Which is to say, I think it is the browser doing the conversion. DNS Servers don't successfully reply to queries about .℡ domains.
The browser sees the .℡ and then follows the IDNA2008 process listed in RFC5895 to normalise it:
map characters to the "Simple_Lowercase_Mapping" property (the fourteenth column) in < http://www.unicode.org/Public/UNIDATA/UnicodeData.txt >, if any.
The ℡ entry is:
2121;TELEPHONE SIGN;So;0;ON;<compat> 0054 0045 004C;;;;N;T E L SYMBOL;;;;
U+0054 is T, U+0045 is E, U+004C is L.
You can test this in Python using:
python -c 'import sys;print sys.argv[1].decode("utf-8").encode("idna")' "℡"
Does this work?
Yes! I asked people on Twitter whether they could access my website using a .℡ - and it appeared to work on every modern browser and operating system.
Hey gang! I have a little experiment for you
Does this URL resolve in your browser?
https://edent.℡/
(That's https:// edent. ℡ /)
If it does or doesn't, could you let me know which browser and operating system?
THANKS!
— Terence Eden (@edent) October 8, 2018
It even works on command line tools like wget
and curl
.
Things used to retrieve web pages rather than web browsers
curl 7.59, Linux - Yes
wget 1.19, Linux - Yes
— Mike (@6byNine) October 8, 2018It does fail in some circumstances:
Yes, Chrome/Safari/Firefox running on Mac. The TEL however changed from superscript to normal text. If I copied/pasted into Word and then into the browser, the superscript is preserved and it no longer resolves (takes you to the google page with this page being the first hit)
— Ricardo Sueiras (@094459) October 8, 2018What are the limitations?
Two main ones:
- Sites like Twitter and Facebook don't recognise it as a valid URl and refuse to auto link it.
-
Some command line tools like
dig
andhost
don't understand it
dig edent.℡ ; <<>> DiG 9.10.6 <<>> edent.℡ ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 55282
Is this useful?
Obviously yes. This may be the most important discovery of the decade. You get cool looking URls and get to save a couple of characters on specific domains, at the minor expense of working inconsistently.
It could also be used for evading URl filters.
Every modern browser supports these "fancy" domain names - but most websites won't automatically link to them. So sharing on Facebook doesn't work.
Where can it be used?
Here are the single characters which can be normalised down to a valid TLD. They're mostly country codes, but there are a few interesting exceptions:
-
㏕
- US Military -
℡
- .tel registry -
№
- Norway -
㍳
- Australia -
㍷
- Dominica -
㎀
- Panama -
㎁
- Namibia -
㎃
- Morocco -
㎊
- French Polynesia -
㎋
- Norfolk Island -
㎏
- Kyrgyzstan -
㎖
- Mali -
㎙
- Federated States of Micronesia -
fi
- Finland -
㎜
- Myanmar -
㎝
- Cameroon -
㎞
&㏎
- Comoros -
㎰
- Palestine -
㎳
- Montserrat -
㎷
&㎹
- Republic of Maldives. -
㎺
- Palau -
㎽
&㎿
- Malawi -
㏄
- Cocos (Keeling) Islands -
㏅
- Democratic Republic of Congo -
㏉
- Guyana -
㏗
- Philippines -
㏘
- Saint Pierre and Miquelon -
㏚
- Puerto Rico -
㏛
- Suriname -
㏜
- El Salvador -
℠
- San Marino -
™
- Turkmenistan -
st
&ſt
- São Tomé and Príncipe -
㎇
- Great Britain ( Obsolete ) -
ß
- South Sudan ( Not available ) -
㏌
- India and Indiana (subdomain of .us) -
Ⅵ
&ⅵ
- Virgin Islands and Virginia (subdomain of .us) -
fl
- Florida (subdomain of .us) -
㎚
- New Mexico (subdomain of .us) -
㎵
- Nevada (subdomain of .us) -
㍵
- As part of .ovh
If you can find any more, please stick a comment in the box below.
You can always reach this blog post at:
https:// ₛᵖ .ⓜ / /
Support this blog
Enjoyed this blog post? You can say thanks to the author in the following ways:
Donate to charity
Buy me a birthday present
Get me a coffee
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK