36

Hacker News Front Page Trends

 4 years ago
source link: https://www.tuicool.com/articles/JnQ3YvR
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Search what’s been popular on the HN front page since 2007. Supports words, phrases, domains, and usernames. Seefor more info.

About

Searches are performed against a database of Hacker News items dating back to October 2006. The dataset updates nightly with the latest front page items.

Code is available here on GitHub .

The intent is to search only items that appeared on the front page of HN, with the important caveat that HN only provides the exact list of front page items for dates since November 11, 2014, so anything before then is an estimate. For earlier dates, I used a heuristic of sorting by score and taking the top 115 items on weekdays, 80 on weekends, subject to a minimum of 3 points. This definitely isn’t perfect, for example:

  • it excludes job posts before 11/11/14 since they always have 1 point
  • items with high scores don’t always get to the front page
  • it’s possible that HN has changed its algorithm over time to promote faster or slower front page turnover

But it should be a decent approximation, and the code could be modified to use other heuristics. It would probably be an improvement to fetch and include all job posts from pre 11/11/14 via the HN API .

The app allows searching by title, domain (with or without subdomain), and username. For a given search, the y-axis can display the percentage or number of all front page items that match the search term, the cumulative score of matching front page items, or the percentage of total front page score that the matching items represent.

Title search styles

When searching by title, there are 3 search styles available:

  1. Web search uses PostgreSQL full text search, specifically the websearch_to_tsquery() function. It supports a few operators: "quoted phrases" , OR , and - . The easiest way to explain them is with examples:

    machine learning
    "machine learning"
    machine -learning
    "machine learning" or ML
    

    Titles are converted to tsvector using PostgreSQL’s built-in simple text search configuration. I experimented with the english text search configuration, but found that the stemming and stopwords sometimes interfered with proper nouns that appear in HN titles. The simple configuration does something closer to an exact text match, so remember to use the OR operator to search the singular and plural of a word, e.g. neural network or neural networks .

    Web search is always case insensitive.

  2. Exact match, case insensitive uses a PostgreSQL regular expression to match the contents of each search term within word boundaries:

    title ~* ('\y' || search_term || '\y')

    Note that this makes use of a trigram index , as opposed to full text search, which uses a full text GIN index .

  3. Exact match, case sensitive is the same as #2 above, but uses the ~ operator instead of the ~* operator.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK