30

A unique spell on the tube

 5 years ago
source link: https://www.tuicool.com/articles/hit/jyqauiF
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

As I was watching an episode of Only Connect the other day, I learned an (almost) interesting fact about London tube stations:

St. John’s Woodis the only tube station not to share any letters with the word mackerel

Someone, for some reason, some time.

This was such a delightfully pointless fact, that my curiosity was instantly piqued. Why this station? That word? Is it unique, or unusual for this to happen? Let’s, dear reader, find out for ourselves.

To get to the bottom of this, I first found a few useful sources of data:

The next step was to go through all of these words, and find what letters they have in common with the tube station names. There are, after removing duplicates, approximately 122 million combinations to consider.

I was a bit worried that doing this naively in python would be too slow, and that writing in a compiled language on my Windows machine would be too painful, so I just did it naively in node, which ended up running in about a second – quite pleasantly surprising.

One of the small optimisations which probably helped was pre-computing the occurrences of letters in tube station names, i.e. creating the matrix eUz6z2I.png!web which is true/false if letter is in/not in tube station , e.g.

const letters = new Array(tubesUnique.length).fill(0).map(l => new Array(26).fill(false));

tubesUnique.forEach((t, i) => {
  for (let tind = 0; tind < t.length; ++tind) {
    const ccode = t.charCodeAt(tind) - 97; // So lowercase 'a' -> 0 etc.
    letters[i][ccode] = true;
  }
});

The full script is on GitHub for those interested.

Results

So – is the combination of mackerel and St. John’s Wood special and unique?

Not at all.

There are 57,614 combinations of words and tube stations which fulfil the above condition for the large word list, and 931 for the smaller.

The number of ‘hits’ per station – i.e. the number of different words for which that station is the only one not to share any letters – is plotted below (click to enlarge).

q6BNfmF.png!web

It is clear that St. John’s Wood is not special at all, perhaps in the top quartile. The most fertile name amongst tube stations is the unassuming Woodford .

Words where Woodford is the only station with no letters in common

  • Most common word – language
  • Least common word – cleanup
  • Shortest word – alenu/ulnae (5), or more common, bleach (6)
  • Longest word – intellectualistically (21)

At the other end of the scale, there are a few stations where only one word in the English language fits:

  • Whitechapel – borborygmus
  • Old Street – pacifying
  • Turnham Green – slipbody
  • St. James’s Park – hobblingly
  • Boston Manor – childlike
  • Moorgate – whilikins
  • South Acton – bewilderedly

These words aren’t exactly all in common use, so looking at the smaller subset of words gives a different set of stations:

  • Maida Vale – thrown
  • Chesham – burlington
  • Surrey Quays – clothing
  • Goodge Street – painful
  • Oval – Edinburgh
  • Highgate – compounds
  • Baker Street – immunology
  • Brixton – valued
  • Cockfosters – individually
  • Kenton – hydraulic

Then, for fun, let’s look at a few other words we can use with St. John’s Wood (there are 1,006 choices):

  • Most common word – player
  • Least common word – fireplace
  • Shortest word – clear (5)
  • Longest word – pluricarpellary (15)

Finally, let’s look at some aggregate statistics. Here’s how word length affects the number of tube stations with letters in common – we are interested in the ‘orange’ words:

ea6fMry.png!web

The longest word for which this works is philosophophicopsychological  for the station Debden . Slightly shorter pairs include microspectrophotometric for Bank , and counterproductiveness for Balham .

The distribution of word lengths producing a single ‘hit’ is

eiYnaaM.png!web

and normalising for the distribution of word lengths in English,

Yvae6fF.png!web

So this trick works best for longish words around 10 letters long, where it becomes quite probable – over a fifth of words of that length work.

If someone ever trots out this fact, or perhaps asks it at a pub quiz, you can rest safe in the knowledge that it is not special in any way, and neither is any of this, or anything else we do to while away our Sunday afternoons.

Happy spelling!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK