44

NYT Tracks Reddit Conversations Around nytimes.com Content

 4 years ago
source link: https://www.tuicool.com/articles/jqQvyir
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Last fall, my colleague James Robinson (no relation) sent me a direct message on Slack. “Hey old friend. Are you free for a quick question?” he wrote, “I’m looking to rebuild an old project and would love your perspective.” In a previous newsroom analytics role at The New York Times, James had built a suite of incredibly interesting tools that happened to run on a finely-tuned Windows PC under his desk in the newsroom. He wanted to talk about rewriting one of those tools and putting it on a server.

The project was a tool that checked Reddit for conversations around Times content and alerted the newsroom via a Slack channel. Once our writers and editors knew that a story was being discussed, they could get real-time feedback from the Reddit community and even join the conversation . It is a useful tool and James, now the director of international analytics on the newly-created Audience team at The Times, wanted it rebuilt.

Since the process was running locally, it stopped reporting in late 2015 when James changed teams. When I looked at the code, I realized I was not going to be able to reuse anything because it was written in Perl. My plan was to rely on many of Google’s managed services, which don’t support Perl.

zIJziqF.png!web7VNBbq6.png!web
James had a nice suite of analytics libraries, they just happen to be written in Perl.

I decided to rebuild this system during The Times’s quarterlyMaker Day using all of the new tooling we have available to us at The Times. To prepare, I started to formulate how everything would fit together. For infrastructure that required minimum maintenance, I decided to use the Google App Engine standard environment and a basic scheduled cron to kick off a process every couple of minutes. From there, the rest of the system would look something like this:

QfiqIvE.png!webI36FZrv.png!web
Architecture of the Reddit Slack service.
  1. App Engine cron job will hit our service to kick off the process.
  2. The service will grab the top 300 nytimes.com links shared to reddit via theirJSON endpoint.
  3. The service will use Google Cloud Datastore to store comment counts to determine what to alert and what has already been alerted.
  4. The service will use NYT’s internalGraphQL Sangria server to fetch additional metadata about the nytimes.com article.
  5. The service will post an update to a Slack channel with the real headline, the title of the Reddit post, the subreddit the post occurred in and how many comments currently exist.

I wanted to spend most of Maker Day hacking, so I made sure I had everything I needed ahead of time. This included access to James’ Google Cloud project in order to deploy the service but also access credentials for our GraphQL service. We use HashiCorp Vault to share secrets, like API credentials and database passwords, so once my request was made, someone from our GraphQL team just needed to hand me a vault unwrap command . With that command, I was able to securely get the secrets and put them into my own project’s Vault store. At start-up, the application would fetch the credentials viagcp-vault.

Once Maker Day arrived, I was ready to hit the ground running. I quickly created a basic endpoint that could scan Reddit for the most commented Times articles. From there, I saved basic information to Cloud Datastore with minimal effort as no schema or set up was required. To fetch article metadata, I needed to build a GraphQL query. With the help of our GraphQL team and their handy interface for exploring our schema, I was able to build a query to fetch only the information needed to post to Slack. Finally, to make sure the Slack messages were easy to read and aesthetically pleasing, I reused code from an old Times project of mine,Newshound.

u6bEniF.png!webne6JBza.png!web
Example posts from the new Reddit Slack service

By the end of the day, I was able to hook the service into Drone withour drone-gae plugin and the entire system was ready to automatically deploy on any commit to the master branch. We have plans to add more, but the system is a better state than its prior days on that Windows PC under James’ desk.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK