

TLDs -- Putting the 'Fun' in the top of the DNS
source link: https://www.netmeister.org/blog/tlds.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

TLDs -- Putting the 'Fun' in the top of the DNS
August 12th, 2021
The Domain Name System or DNS is a never-ending source of amusement and amazement. If you have been dealing with just about anything related to operations on the internet, you know that it's always the DNS in the end, what with its almost 100 different resource records and, uhm, shall we say, "interesting" security threat model.
But today, let's talk about Top-Level
Domains, or TLDs. You know,
.com
, .org
, .net
,
.gov
, .vermögensberatung
and
.香港
-
those guys. As you know, the entire domain name space
consists of a tree of domain names; the
(common) root of the DNS tree is .
(dot), and
the tree sub-divides into zones consisting of
domains and sub-domains:

Ok, so far, so good. With RFC920, we got the initial set of top level domains:
.gov
Government, any government related domains meeting the second level requirements.
.edu
Education, any education related domains meeting the second level requirements.
.com
Commercial, any commercial related domains meeting the second level requirements.
.mil
Military, any military related domains meeting the second level requirements.
.org
Organization, any other domains meeting the second level requirements.
.net
Initially intended for network organizations;
not mentioned in RFC920, but created in 1985
Oh, and:
.arpa
.arpa
Temporary; The current ARPA-Internet hosts.
That's right: .arpa
was supposed
to be temporary:
"After a short period of initial experimentation, all current ARPA-Internet hosts will select some domain other than ARPA for their future use. The use of ARPA as a top level domain will eventually cease." -- RFC920
Yeah, well, we all know how temporary
temporary solutions are. And so today, we continue
to use .arpa
for e.g., reverse mapping of IP
addresses to names via the .in-addr.arpa
and
.ip6.arpa
second-level domains. But
.arpa
is used for a lot more:
as112.arpa
(RFC7535,
effectively RFC1918
reverse resolution; see also https://www.as112.net/),
e164.arpa
(RFC6116
/ NAPTR records),
home.arpa
(RFC8375,
non-unique use in residential home network),
in-addr-servers.arpa
and
ip6-servers.arpa
(RFC5855,
name servers for the in-addr.arpa
and
ip6.arpa
domains), ipv4only.arpa
(RFC7050,
detecting DNS64 and IPv6 Prefixes),
iris.arpa
(RFC4698,
for locating Internet Registry Information Services),
as well as uri.arpa
and urn.arpa
(RFC3405
for resolving Uniform Resource Identifiers / NAPTR).
ccTLDs
In addition to these original TLDs, we also got the country code top-level domains, or ccTLDs:
The English two letter code (alpha-2) identifying a country according the the ISO Standard for "Codes for the Representation of Names of Countries". -- RFC920
And this is where the fun begins, because of course you are always operating on Layer 9, and this list is necessarily somewhat fluid, as countries change, are born, divided, or cease to exist:
.ss
, the ccTLD for South Sudan was allocated in August 2011, but not added to the root zone until February 2019, with general availability of names in that domain only starting in September 2020..ge
, the ccTLD for Georgia (the country, not the US state) uses the ISO-3166-1 Alpha 2 country code that previously used to represent the Gilbert and Ellice Islands; the Ellice Islands became Tuvalu, which, per country code designation, got the rather valuable.tv
ccTLD.
This TLD became so valuable that at some point 10% of the country's revenue came from royalties of.tv
domains; Tuvalu used the money for the marketing rights to allow them to pay the membership dues for United Nations when they joined the UN in 2000!.eh
has been reserved (but not been assigned yet) as the ccTLD for the disputed "Western Sahara" territory; in 2013, on April 1st, CIRA, the Canadian Internet Registration Authority (responsible for.ca
), announced that it would offer.eh
names, because, you know, Canadians would like that, eh?- Some ccTLDs represent countries that other
countries don't acknowledge as existing.
.ps
is the ccTLD for Palestine (not a sponsored TLD for the (Turing-complete!) PostScript programming language), recognized by 138 of the 193 UN members;.tw
is assigned for the Republic of China, aka Taiwan, which a mere 14 countries recognize. - Not all ccTLDs represent actual
countries: Hong Kong has a ccTLD,
.hk
as a special administrative region of China (much like.mo
for Macao);.uk
represents the entire United Kingdom, while the e.g., scarcely used.gb
is assigned to Great Britain; England doesn't even get a ccTLD, and neither does Northern Ireland!
Similarly,.eu
counts as a ccTLD, representing, obviously, not a single country. But due to Brexit, British citizens who had registered.eu
domains had their domains suspended on January 1st, 2021, requiring proof of European Economic Area (EEA) citizenship to avoid them being deleted in March 2021. - When a country ceases to exist, its ccTLD is
normally retired:
.cs
(Czechoslovakia) became.cz
(Czech Republic) and.sk
(Slovakia);.dd
(East Germany) disappeared after the reunification of Germany;.yu
(Yugoslavia) became.si
(Slovenia),.hr
(Croatia), and Serbia and Montenegro, which had had.cs
assigned (but never used that, instead continuing to use.yu
) before they split into.rs
(Serbia) and.me
(Montenegro);.zr
(Zaire) became.cd
(Democratic Republic of the Congo).
However,.su
, the ccTLD for the former Soviet Union, assigned a mere 15 months before that Union was dissolved back in 1990, still remains in active use.
ccTLD Domain Hacks and Governance
With ccTLDs having appealing two-letter names (gTLDs are a minimum of three characters), they lend themselves to so-called "domain hacks" to create words, to shorten URLs, or as a convenient way to jump on a popular trend, and many people began registering names in other countries' ccTLDs:
.ag
, the ccTLD for Antigua and Barbuda is often used in German speaking countries, where "AG" is an abbrevication of "Arbeitsgemeinschaft", a generic term for coordinated collaboration..ai
, the ccTLD for Anguilla, is used for extra leet effect in artificial intelligence marketing..ai
also is notable in that as a TLD it nevertheless has both anA
andMX
record, meaning you can have a functional email address likehal@ai
. (Email addresses are difficult to validate, it turns out.).am
(Armenia) is used by e.g., instagr.am.at
(Austria) is used for things like e.g., donteat.at.be
is used by e.g., Google to shorten youtu.be links..by
(Belarus) is frequently used for sites relating to the German state of Bavaria (Bayern).cm
(Cameroon) and.co
(Colombia) are frequently used by typo-squatters to catch traffic from people fat-fingering ".com"..cx
was assigned to the Christmas Island, and appears currently to be defunct, but it did have the significant glory of once having been the home ofgoatse.cx
(Wikipedia)..im
(Isle of Man) is used for various instant messaging domain hacks..io
, assigned to the British Indian Ocean Territory is almost exclusively used by annoying startups for content completely unrelated to the islands..la
(Laos) is commonly used for Louisianna or Los Angeles related domains as well as random domain hacks, like e.g., Mozilla's link shortener mzl.la or Tesla's ts.la.me
(Montenegro, which up until 2007 had been usingcg.yu
) became one of the most popular TLDs and is used for link shorteners like Facebook's fb.me, Google's g.me or GoDaddy's go.me.
Yahoo used to useme.me
for its "Yahoo! Meme microblogging site"; after it shut that service, it returned the domain to the registry, and it's now, what else, a Meme search engine..ms
(Montserrat) is, of course, used by Microsoft, sites in the US state of Mississippi, and by e.g., the New York Times for its nyti.ms link shortener.- Python nerds on the internet register names in
Paraguy's ccTLD (
.py
), Rust nerds in Serbia's.rs
. - The editor wars have been decided at the TLD
level:
.vi
exists (U.S. Virgin Islands), but.emacs
does not (emacs.vi
, however, does).
Now one noteworthy aspect here is that since the
ccTLDs are administered by the given country, they may
be subject to (and enforce) different requirements.
Some domains can only be registered by entities
residing within the given country, others, like the
.cat
domain
sponsored by the dotCAT
foundation to promote the Catalan language, may
stipulate the language or content of the domains.
Lybia, with the ever so popular .ly
ccTLD did in
2010 shut down Violet
Blue's vb.ly
domain, objecting
to the content. In a similar manner, Colombia
could choose to break just about all of Twitter (which
uses the t.co
domain name to wrap every
single link on its platform); Greenland could shut down
Google's goo.gl
links.
Generic TLDs (gTLDs)
In addition to the original TLDs and the
ccTLDs, in the late 1980s InterNIC added
.nato
, but that was later replaced by
.nato.int
, with the new .int
TLD
being added in 1988 for intergovernmental
organizations.
In 2000, ICANN, who had by
then taken over the administration of domain names,
added seven more TLDs: .aero
,
.biz
,
.coop
,
.info
,
.museum
,
.name
,
and .pro
.
It then began soliciting proposals for "sponsored
top-level domains" (sTLDs), but only received a
handful of proposals, ultimately adding .asia
, .cat
, .jobs
, .mobi
, .post
,
.tel
,
.travel
,
and .xxx
.
Sponsored TLDs being somewhat restricted in scope
and use, ICANN then went for another round of
accepting proposals for new, generic TLDs
(gTLDs), this time with a price tag of $185,000 per
TLD. In 2012, it processed 1,930 applications: 101
from Google (under the name Charleston Road
Registry Inc. (see
also), including .lol
,
.google
, .dog
, and .foo
(of those, .lol
is the only one not
assigned), 76 from Amazon, 11 from Microsoft and 307
from the "Donuts" domain name
registry.
The list of ultimately approved domains included a
number of geographic TLDs (geoTLDs), adding
domains for certain cities (e.g., .berlin
, .london
,
.nyc
,
.paris
,
or .tokyo
),
countries that previously did not have a ccTLD (e.g.,
.cymru
,
.scot
, and
.wales
,
although England still doesn't get its own
TLD, while e.g., New Zealand (.nz
) now got a
second: .kiwi
), and
broader geographic regions (e.g., .africa
or
.lat
).
But of course people went a bit nuts, too: many
brands applied for .<brand>
and got
into various arguments over who should own the given
TLD. For example, Amazon applied for
(and was given) .amazon
over the objection
of several nations of, well, the Amazon; and
multiple applications for entirely generic terms had
to be sorted
out.
One of those was the .secure
domain,
which had been proposed by one Alex
Stamos of (then) Artemis Internet as a TLD that
would enforce certain
minimum security requirements; ultimately,
.secure
was assigned to Amazon.
Eventually, ICANN
added 1239 new TLDs to the DNS, bestowing upon us
such important TLDs as e.g., .beer
,
.cloud
, .dot
, .duck
,
.foo
, .google
, .rocks
and
.sucks
, .travelersinsurance
, and
.yahoo
.
But of course some TLDs then go under again: .wed
, for
example, was delegated, but the company that had
applied for this name apparently didn't pay up, and
ICANN terminated the registry agreement. However, the
TLD remains in the root; it appears to now be operated
by ICANN
EBERO and some names remain in use (e.g., get.wed, albeit
with an invalid certificate).
Internationalized TLDs
Even before the landrush for the new gTLDs, ICANN approved the introduction of internationalized domain name (IDN) TLDs, and many ccTLDs added TLDs using their respective languages and alphabets (including right-to-left!), represented within the DNS using Punycode.
DNS name
IDN ccTLD
Country/Region
Language
Other ccTLD
xn--lgbbat1ad8j
.الجزائر
Algeria
Arabic
.dz
xn--fiqs8s
.中国
China
Chinese (Simplified)
.cn
xn--qxa6a
.ευ
European Union
Greek
.eu
xn--4dbrk0ce
.ישראל
Israel
Hebrew
.il
xn--o3cw4h
.ไทย
Thailand
Thai
.th
(See Wikipedia's full table for all IDN ccTLDs.)
But IDNs are not only for ccTLDs: many of the new
gTLDs also include various Unicode characters, such as
e.g., .сайт
("website"),
.大众汽车
("volkswagen"),
.ファッション
("fashion"),
ابوظبي.
("Abu Dhabi"), and, of course,
.vermögensberatung
("wealth management /
advice").
Note that with IDNs, you can mix an IDN second-level with a non-IDN top-level or vice versa. Due to the resulting IDN Homograph Attack vector, browsers stopped rendering the IDNs and now always display them as Punycode.
Special Use Domains
In addition to all that, there is also a small
number of so-called "special use domains", of which
.arpa
(already discussed
above) is just one. These are:
.example
-- intended for use in documentation, tutorials, and testing; defined, together withexample.com
,example.net
, andexample.org
in RFC6761..invalid
and.test
-- for testing and documentation, originally defined in RFC2606..local
-- usually used for zero-configuration networking (RFC6762)..localhost
-- reserved since traditionally.localhost
existed in e.g.,/etc/hosts
for the loopback address (RFC2606). Note:.localdomain
is not reserved, and use oflocalhost.localdomain
can lead to unexpected results if your stub resolver expands this..onion
-- used by Tor (.onion service address) and defined in RFC7686. Note: this "TLD" is not entered into the DNS, but following CA/B Forum Ballot 144, you can get a valid x509 certificate from public CAs. (For a while, Tor also used to use the.exit
pseudo TLD; this is no longer supported.)- Not all networks may use the standard DNS root
(ICANN
is not pleased),
and so of course all bets are off if you are on a
network using an alternative
DNS root. Some TLDs in such networks include(d)
.bitnet
,.csnet
,.oz
(from ACSnet, now moved into.oz.au
),.uucp
(if you remember that), and.i2p
(the aptly named "Invisible Internet Project"). - Some TLDs are effectively split-horizon, only
exposing some parts to the public internet.
.kp
, the ccTLD assigned for North Korea serves the North Korea internal-only Kwangmyong network. - China uses the
.chn
domain internally for its Internet of Things. This domain relies on the use of an alternate DNS root as well, and is not found in the common root.
TLD Zone files
The DNS is an inherently public system (modulo alternate root shenenigans or split-horizon games). The root zone itself continues to be available for download via FTP or HTTPS and so we can easily extract the full count of all TLDs:
$ curl https://www.internic.net/domain/root.zone | awk '{if ($4 == "NS") { print $1;}}' | sort -u | wc -l 1499
Processing the simple zone file, we find that most
TLDs are two- (248) or three- (222) letter TLDs;
that there are 154 IDN TLDs; that there are TLDs
starting with every letter of the alphabet ('s'
being the most popular one); that the longest TLD is
vermögensberatung
(24
characters in punycode:
xn--vermgensberatung-pwb
).
But what about all the individual TLD zone files? Since that data is also public in nature, we should be able to get and process it as well. And for the ICANN assigned new gTLDs, this is indeed the case: ICANN offers the Centralized Zone Data Service, where you can apply to gain access to all gTLD zone files. For some domains the access is granted almost instantly, for others it takes a few days.
Now for the ccTLDs, however, there unfortunately is
no equivalent service, although there's a
(rather short) list of ccTLD zone sources here;
some registries let you AXFR
the domain
(e.g., .ee
,
.ch
and .li
,
.se
and .nu
), some
provide a list of names (e.g., sk
or .gov
),
but otherwise it's up to you to contact the registry
in question and plead your case. Yes, for each of the
over 300 domains -- good luck!
Given how difficult it is to get to all the public data, it's then no surprise that several businesses are making good money by selling you that access or by providing TLD reports.
Some stats
After having requested access to all gTLD zone
files and having received most of them (several are
still pending), I looked around a bit, seeking
entertaining stats. One thing to note is that a
large number of zones (230) do not have any names
defined (other than, say, a NIC NS
record) --
TLDs registered purely as a brand or placeholder, I
suspect. Over 360 zones have fewer than 10 records,
over 470 fewer than 100.
Zones that are actually used include the expected variety of silly names, including very long domain names:
accountantaccountantaccountantaccountantaccountantaccountant.accountant artartartartartartartartartartartartartartartartartartartartart.art yoyoyodogillbestraightwithyouicanttellifthatsatattoooranartisti.art barbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbar.bar clickclickclickclickclickclickclickclickclickclickclickclick.click ahndung-von-verkehrsordnungswidrigkeiten-mit-unfallfolge.cologne. 0-------------------------------------------------------------0.com. thelongestdomainnameintheworldliterallynobodycangetalongeronexd.community you-know-you-are-pretty-gosh-darned-cute-do-you-wanna-go-on-a.date. lololololololololololololololololololololololololololololololol.fun. gayfriendlyconvenientaffordabletrendyhairsalonsindowntowntoront.mobi wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww.org partypartypartypartypartypartypartypartypartypartypartyparty.party runrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrun.run thehighestthemostvaluableandthemostexpensivedomainnameofalltime.top this-crazy-url-is-definitely-one-of-the-longest-adresses-in-the.world. rindfleischetikettierungsuberwachungsaufgabenubertragungsgesetz.xyz xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xyz. zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.zone
...and so on and so on. Per RFC1035, the maximum size of a DNS label is 63 octets (note: octets, not characters, which is why the maximmum length of a domain name is 253 characters), which explains why there are no longer second-level domains, although it doesn't explain why people insist on registering over 1700 such names.
Of the 987 zones I looked at, the top ten zones based on number of domains were:
Rank
TLD
# of domains
1
.com
155,883,253
2
.net
13,291,304
3
.org
10,424,321
4
.info
3,859,083
5
.xyz
3,128,897
6
.online
1,811,807
7
.top
1,200,953
8
.site
1,067,408
9
.shop
907,239
10
.app
722,140
For my own entertainment, I wrote a shabby little perl script to run over a zone file and produce some additional numbers:
$ gzcat net.txt.gz | perl -T zonestats.pl Total number of records: 34658946 Total number of names: 13291304 Total number of different record types: 7 ns: 32819414 rrsig: 759035 ds: 414744 nsec3: 379518 a: 270671 aaaa: 15543 soa: 1 Top ten name lengths: 9: 2839977 10: 2836099 8: 2783467 11: 2648213 7: 2496883 12: 2404541 6: 2205730 13: 2159827 14: 1886160 15: 1585087 Longest name: 000000000000000000000000000000000000000000000000000000000000001.net. (63) There are 134 names with 63 chars in this domain. Total number of unique name servers: 689703 The three most popular name servers found in this zone are: dns1.registrar-servers.com.: 298617 dns2.registrar-servers.com.: 298352 jm2.dns.com.: 239693 The most popular domains in which the nameservers are: domaincontrol.com: 6200836 googledomains.com: 1485364 dns.com: 908420 This domain contains names including the following dirty words: shit: 8732 fuck: 8057 tits: 2351 piss: 844 cunt: 575 motherfucker: 86 cocksucker: 16 $
The "seven
dirty words" domains are of course full of
mismatches, but it looks like most zones contain
more or less the same percentage of dirty domain
names: somewhere between 0.006% and 0.008% of the
total; .xxx
predictably ranks a bit higher
here, but not all that much at only 0.1% of all
names.
Public Suffix List
Now all of the above is good fun, but why would you want to know whether a given string is a TLD? Wouldn't it be trivially the right-most label of the fully-qualified domain name (FQDN)?
Strictly speaking: yes. However, consider that many TLDs are not generic in nature, meaning people cannot simply register any name under the given TLD. ccTLDs, being managed by individual registries, each may have unique requirements and regulations, and it is a common practice for these registries to enforce a second-level domain hierarchy, replicating or mirroring to some degree the top-level hierarchy.
For example, and perhaps most widely known, the
.uk
TLD uses .ac.uk
(for academic
institutions), .co.uk
(for commercial
entities), .gov.uk
, .net.uk
,
.org.uk
, and so on. How many such
second-level domains are reserved depends on each TLD;
Brazil (.br
), for example, has over
100.
Now within the context of, for example, HTTP
cookies or x509 TLS certificates, it's rather
important that an entity cannot use a wildcard to
match an entire TLD, but how does a browser know
whether foo.example
is a reserved
second-level domain, or simply a normal domain
registered by some entity? Should a website be able to
set a cookie for foo.example
? Should it be
able to get a certificate for
*.foo.example
? There is no programmatic way
to determine this.
To solve this problem, the good folks over at Mozilla started putting together a list of these TLDs and "effective TLDs", known as the Public Suffix List. That's right, it's another one of those manually compiled and maintained text files we like to build the internet infrastructure on!
This lists consists of over 9,000 prefixes, and is used by all of the popular browsers to restrict cookie scope as well as for various UI features.
Google uses similar
heuristics based on a domain name's TLD to
determine whether to offer users different language
versions of their content and other geo-targeting.
Within that context, Google treats some ccTLDs (such
as e.g., .io
, .me
, .tv
etc.) as if they were gTLDs rather than as indicators
of geographic location.
Finally, the HSTS Preload list baked into browsers like Chrome and Firefox to enforce HTTP Strict Transport Security includes a number of TLDs and public prefixes:
$ curl -O https://publicsuffix.org/list/public_suffix_list.dat $ curl -O https://hg.mozilla.org/mozilla-central/raw-file/tip/security/manager/ssl/nsSTSPreloadList.inc $ grep -v '^/' public_suffix_list.dat | grep . | sed -e 's/$/\./' | sort > psl $ sed -n -e 's/^\([^, ]*\), .*/\1\./p' nsSTSPreloadList.inc > hsts $ comm -1 -2 hsts psl | wc -l 73 $
That is, websites registered under any of these 73
prefixes, such as e.g.,
.app
or .dev
, will always use HTTPS
when using the common, popular browsers that consume
this list.
Summary
Well, there you go. Top-level domains are, it turns out, a lot more complicated than what we commonly think of. The internet being a truly global network of networks with varied jurisdictions being in control of parts of the whole continues to provide for curious challenges and -- as anybody working in tech knows -- you regularly run into weird scenarios that trace back to the DNS.
Sometimes all the way to thetoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptop.top
.
August 12th, 2021
See also:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK