22

Interpreting my 7-Eleven visits with hierarchical clustering, anomaly detection,...

 4 years ago
source link: https://towardsdatascience.com/interpreting-my-7-eleven-visits-with-hierarchical-clustering-anomaly-detection-and-time-series-f178d80c2bfa?gi=225f8fb46ed6
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Name a better pair than 7-Eleven and Asia. I’ll wait…

Ok, fair enough. Food, culture, and surreal cities are also valid choices. But my point still stands. 7-Eleven is, in my opinion, a staple of the lifestyle of certain Asian countries. There, you can find (almost) anything you need. Did you just land and need a SIM card? Go to 7–11. Do you need water because security made you empty your bottle? Go to 7–11. Hungry and want a cheap (and tasty) breakfast? You already know.

During my traveling adventures in this beautiful continent, I’ve quite often found solace (and snacks) in this convenience store. In fact, I’ve been so many times there that I have enough data to study, and since I don’t like wasting data, well, I analyzed it.

In this article, I’ll show what I discovered after investigating my 7-Eleven check-in data collected during my time in Asia. My analysis consists of two sections: which , when . In which , the goal is to discover which 7-Eleven’s I visited and how many times. In this part, I’ll summarize the dataset and explore it with a series of visualizations, while trying to answer the question, “which 7–11 have I visited?”

Then comes the when part. In this segment, the objective is finding the trend or pattern of my visits (if there’s any!). To find this out, I’ll use hierarchical clustering , anomaly detection , and time series .

The data

This project’s dataset consists of 99 check-ins I logged using the Foursquare’s Swarm app during the period of July 7, 2019, to December 15, 2019. I collected the data in the following countries: Singapore, Malaysia, Thailand, Hong Kong, and Japan.

The tools

The experiment uses R and Python code. The primary analysis — visualizations, clustering, and data exploration — is done in R. With Python, I used the library foursquare , Prophet to perform the time series analysis, and scikit-learn to do the anomaly detection.

Let’s begin!

Getting the data

Every data project starts with data — the new electricity (catchy phrase, I know). So, my first step was collecting it. To do this, I wrote a small Python script that retrieves my Swarm check-in data and stores it in a JSON file. The following code shows how:

Here, I’m getting all my check-ins created between two dates. Since each API call retrieves a max of 250 check-ins, in each iteration, I had to increase the offset variable by 250 to get the subsequent 250.

Which 7-Eleven?

From July 7 to December 15, 2019 (162 days), I had the pleasure of visiting the store 99 times on 78 different days. That’s 48% of the days — almost one visit every two days! However, and here comes the first pitfall of the analysis, I have to keep in mind that not every country I was at had 7-Elevens. For example, Cambodia has none. As a result, I need to remove those cases from my count. With that done, the total amount of days spent in countries with 7-Elevens reduces to 127, increasing the percentage to 61% — a bit more than once every two days. Figure 1 below presents the calendar. The days coded in red are those where I didn’t visit the shop, while those in blue are the days where I did visit.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK