Hacker News Front Page Trends
source link: https://www.tuicool.com/articles/JnQ3YvR
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Search what’s been popular on the HN front page since 2007. Supports words, phrases, domains, and usernames. Seefor more info.
About
Searches are performed against a database of Hacker News items dating back to October 2006. The dataset updates nightly with the latest front page items.
Code is available here on GitHub .
The intent is to search only items that appeared on the front page of HN, with the important caveat that HN only provides the exact list of front page items for dates since November 11, 2014, so anything before then is an estimate. For earlier dates, I used a heuristic of sorting by score and taking the top 115 items on weekdays, 80 on weekends, subject to a minimum of 3 points. This definitely isn’t perfect, for example:
- it excludes job posts before 11/11/14 since they always have 1 point
- items with high scores don’t always get to the front page
- it’s possible that HN has changed its algorithm over time to promote faster or slower front page turnover
But it should be a decent approximation, and the code could be modified to use other heuristics. It would probably be an improvement to fetch and include all job posts from pre 11/11/14 via the HN API .
The app allows searching by title, domain (with or without subdomain), and username. For a given search, the y-axis can display the percentage or number of all front page items that match the search term, the cumulative score of matching front page items, or the percentage of total front page score that the matching items represent.
Title search styles
When searching by title, there are 3 search styles available:
-
Web search uses PostgreSQL full text search, specifically the
websearch_to_tsquery()
function. It supports a few operators:"quoted phrases"
,OR
, and-
. The easiest way to explain them is with examples:machine learning "machine learning" machine -learning "machine learning" or ML
Titles are converted to
tsvector
using PostgreSQL’s built-insimple
text search configuration. I experimented with theenglish
text search configuration, but found that the stemming and stopwords sometimes interfered with proper nouns that appear in HN titles. Thesimple
configuration does something closer to an exact text match, so remember to use theOR
operator to search the singular and plural of a word, e.g.neural network or neural networks
.Web search is always case insensitive.
-
Exact match, case insensitive uses a PostgreSQL regular expression to match the contents of each search term within word boundaries:
title ~* ('\y' || search_term || '\y')
Note that this makes use of a trigram index , as opposed to full text search, which uses a full text GIN index .
-
Exact match, case sensitive is the same as #2 above, but uses the
~
operator instead of the~*
operator.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK