47

My First Data Science Project — Family-Friendly Neighborhoods in London

 4 years ago
source link: https://www.tuicool.com/articles/6feEfyQ
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

My First Data Science Project — Family-Friendly Neighborhoods in London

qMjy6f3.jpg!web

There has been a lot of changes in my life in the past two years. I’ve decided to transition into data science from a neuroscience background and I moved from the USA to London, England. My move to London was the inspiration for this project as I know very little about which neighborhoods are considered good and which considered bad. So when my husband and I were trying to find a place to live with our two young boys we were pretty clueless. I tried to Google good neighborhoods in London and there were quite a few sites that listed their opinion on the best neighborhoods. But the problem with all these lists were they just seemed like opinions and not based on actual data. Coming from a science background, I was left wanting data to support the lists. We ended up making our best guess and we got pretty lucky to end up in a good area.

In the future, I want to avoid this unknown and get a better idea of which London Neighborhoods are family-friendly. Ok, I’m done babbling about my inspiration for the project lets just get right to it.

* Note this post will not be code-heavy if you would like more details on the code used you can view the notebook on Github here . (Folium maps won’t be visible but you can paste the Github link on nbviewer if you want to see the maps)

Data

For this project, we sourced London Crime data from the Metropolitan Police Crime Dashboard and sourced the venue data from Foursquare API using the Latitude & Longitude from the Crime dataset .

Crime Data

6jUZVzb.jpg!web
Original Crime Dataset

We first need to clean up the crime data by dropping unnecessary rows and renaming columns.

mmEzuuM.jpg!web
Clean Crime Dataset

Now that we have a clean dataset we can run the Latitude and Longitude through Geopy’s reverse geocoding and find the corresponding Postal code which we will append to a new postal_code column.

MVbqAv6.jpg!web
Example of the dataset with Postal Codes

Unfortunately, the reverse geocoding did not come out as clean as we would have liked. There were missing data points and Postal Codes that had more than the Prefix (Example: Index 5 Postal_Code reads “SW1 H ”) We will manually clean up each one of the postal codes and enter in the missing postal codes using the following code:

Snippet of Code to edit Postal_Code Column
36NrEje.jpg!web
Subset of Dataframe after fixing Postal_Code column

We then groupby Postal code adding the Crime_Incid together. This will be our final crime dataset which will have a total of 212 rows and 4 Columns.

36baEv3.jpg!web
Final Crime Dataset

Visualization of Crime Data

To determine the distribution of crime in London we will plot the data in a box plot.

viuqUvF.jpg!web

Box plot of Crime in London

We will define “normal” amount of crime to be all the values that fall within the box and low amount of crime to be points below the box and high crime to be the points above the box.

We now use folium to plot the points on a map of London. We color code the points as follows:

  • High Crime = “red”
  • Normal Crime = “yellow”
  • Low Crime = “green”

Vj2emqE.jpg!web

Crime Map of London

Great, we now have a good idea of crime in different postal codes (neighborhoods). But crime isn’t the only factor to take into consideration when looking for family-friendly neighborhoods. We also need to consider access to kid-friendly activities. For this, we use the Foursquare API.

Venue Data

For this project, we decided to focus on the following categories:

  • Parks ( Foursquare Category Id: ‘4bf58dd8d48988d163941735’)
  • Libraries (Foursquare Category Id: ‘4bf58dd8d48988d12f941735’)
  • Lidos (Pools) (Foursquare Category Id: ‘4bf58dd8d48988d15e941735’)
  • Playgrounds (Foursquare Category Id: ‘4bf58dd8d48988d1e7941735’)
  • Cinemas (Foursquare Category Id: ‘4bf58dd8d48988d180941735’)

We create a function for identifying the venues and appending the total number of venues for that category to new columns in our current dataframe. Here is the function:

Function for identifying the total number of venues in each category

And here is the final data set:

3yUR3yI.jpg!web
Final Dataset

K-Means Clustering

Finally, we use the machine learning technique K-Means. This technique was chosen for two main reasons:

  1. We do not have labeled data
  2. We know how many clusters we would like (3: unfamily-friendly, family- friend and cautious)

For K-Means we select the features we want the algorithm to use, which means selecting “Crime_Incid”, “Park”, “Library”, “Pool”, “Playground”, and “Cinemas” columns:

FNf6Rnj.jpg!web

After running K-Means we insert the cluster labels into the original dataset and then plot the data using Folium to visualize the different clusters.

qMjy6f3.jpg!web

Final Map of Family-Friendly Neighborhoods

Immediately, we can see a lot of the red has changed to yellow and there are now more green points on the map. This is great news because unlike what most people think there are such thing family-friendly neighborhoods in big cities.

Just to make sure there are no hidden clusters we run the elbow method to ensure we have the optimal number of clusters.

nu67rqM.jpg!web

Optimal k = 3

We can see that 3 is the optimal number of clusters. Yeah!

Results and Discussion

It’s great news to see that there are more family-friendly neighborhoods in London than there are neighborhoods to avoid. In fact, there are 136 neighborhoods to choose from. Here is a simple breakdown:

EfYBJzB.jpg!web

So for any families like my own who are looking for the best family-friendly neighborhoods in London, England. I suggest you start off with these 10:

R3IbIvQ.jpg!web

Top 10 family-Friendly neighborhoods

They don’t have the lowest crime but they still have very low crime & great access to kid-friendly activities.

But if your looking for the safest neighborhoods then start off with these:

zYnMJ3y.jpg!web

Neighborhoods with the lowest crime Incidents

But I give you fair warning that some of the postal codes don’t have great access to kid-friendly activities. But this would be of interest to businesses, city planners, and developers who may want to build these kid-friendly venues.

Now for our Police Chiefs, Commissioners, and other safety personnel it would benefit you to know what neighborhoods could use more resources to lower the crime in these areas. And for families, these are areas you should avoid living.

QRvIvam.jpg!web

Further Analysis & Improvement Ideas

No data science project is ever truly complete. There is always room to optimize so for that reason I listed some ideas for improving the results

1. Add more kid-friendly venues for analysis.

  • Parks, Playgrounds, Libraries, Pools, and Cinemas are not the only kid-friendly activities that could be available.

2. Add Boundaries for each Postal Code

  • Currently only using a 1-mile radius to define the boundaries of the Postal Codes assigned latitudes and longitudes. This could cause some unwanted overlapping of postal code zones while searching for venues. This would also allow for choropleth maps to be made.

3. Standardize Crime Incidents based on population within the Postal Code.

  • It is noted that many of the postal codes that are to be avoided are centrally located where the population density is most likely larger than the postal codes further from the central. This could cause the crime rate to look superficially high.

4. Filter the results further by budget

  • This will help better cater to families of all budgets. London is an expensive city to live in, especially when it comes to housing. The average cost for renting 2 bedroom apartment in London is $2338 (£1875) making it the 6th most expensive rental market in the world according to Business Insider. London ranks 8th most expensive place to buy in the world based on average property price.

Conclusion

It’s good to see that a big city can be kid-friendly. It was also good to see that the neighborhood we ended up choosing was found to be kid-friendly!

Children are our futureand big cities are where opportunities can be found. Therefore, it's important to make big cities more kid-friendly. Families shouldn’t feel like they have to flee big cities in order to raise their children.

I hope you enjoyed reading about this project as much as I enjoyed working on it. As it is my first project I would love your feedback so please feel free to get in contact with me! The full notebook can be found on Github here and the report can be found here .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK