Chicago Crime Mapping: Magic of Data Science and Python

“When a man is denied the right to live the life he believes in, he has no choice but to become an outlaw.”

― Nelson Mandela

Predictions, Forecasts and Loss scores. Sound too mainstream, don’t they?

In the era of increasing interest towards Machine Learning and its algorithms, we are hugely ignoring important duties of being a data scientist, and one of those is Data Exploration.

We, the modern data scientists are so naive that we forget the beauty of Visualizations and the quality it stands for. Today, allow me to present you an Exploratory Data Analysis of the Kaggle Dataset : Crime in Chicago .

The Crimes in Chicago Dataset

I will be using the codes and visualizations from my Kernel which you can find here : Chicago Crime Mapping

Chicago Crime Mapping — At the time of editing

So, before starting off with the analysis, Let me brief you about the dataset, According to the briefings, it says:

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or [email protected].

Essentially, this dataset contains the type of Crime, Location, Sub Category of the Crime, Type of Vicinity and Whether the arrest was possible or not .

Checking if data contains Null Values or not

The very first step maybe to check if dataset contains any null values or not, and I used a heatmap to determine the same.

Viridis Heatmap

Looking at our heat map, we can safely conclude that there are not many values left out, so just go ahead and drop it.

I was curious to find out what maybe the number of crimes reported in these 5 years, and what I could see was :

STREET                            325084
RESIDENCE                         223854
APARTMENT                         179444
SIDEWALK                          158478
OTHER                              53474
PARKING LOT/GARAGE(NON.RESID.)     40907
ALLEY                              31239
RESIDENTIAL YARD (FRONT/BACK)      30209
SMALL RETAIL STORE                 28209
SCHOOL, PUBLIC, BUILDING           25474
Name: Location Description, dtype: int64

Pretty high, for a span of 5 years.

Location Description and its semantics

One maybe thinking about where the crimes happened most. Is it the dirty streets, notorious residents or unguarded parking lots? We can check it for ourselves using this snippet:

plt.figure(figsize = (15, 10))

sns.countplot(y= 'Location Description', data = df, order = df['Location Description'].value_counts().iloc[:10].index)

Location Semantics

Apparently the Streets are the unsafest of all, while residence and apartments following close suite.

Mapping the amount of Crimes

Let’s have a closer look at the unique locations where the crimes have taken place and use Folium to map them. You can use this snippet to recreate my map.

chicago_map_crime = folium.Map(location=[41.895140898, -87.624255632],
                        zoom_start=13,
                        tiles="CartoDB dark_matter")

for i in range(500):
    lat = CR_index['LocationCoord'].iloc[i][0]
    long = CR_index['LocationCoord'].iloc[i][1]
    radius = CR_index['ValueCount'].iloc[i] / 45
    
    if CR_index['ValueCount'].iloc[i] > 1000:
        color = "#FF4500"
    else:
        color = "#008080"
    
    popup_text = """Latitude : {}<br>
                Longitude : {}<br>
                Criminal Incidents : {}<br>"""
    popup_text = popup_text.format(lat,
                               long,
                               CR_index['ValueCount'].iloc[i]
                               )
    folium.CircleMarker(location = [lat, long], popup= popup_text,radius = radius, color = color, fill = True).add_to(chicago_map_crime)

Map of crimes

Here, the Orange Circles means that crimes taken place at that particular location are above 1000, while others are self explanatory. Clicking on those maps would show the Coordinates and the number of crimes taken place at that particular (Latitude, Longitude)

An example of details

A closer look at the thefts

I have a special interest in thefts and public peace disruptions, but let’s have a look at the latter one later. For now, let’s focus on the types of thefts taken place around Chicago in these 5 years.

Type of thefts in Chicago from 2012- 2017

Well, $500 thefts are pretty dominating for now. No?

If that’s not enough, let’s have a look at the way these thefts are split around the month. Have a look at this graph and allow me to explain the sudden plunge in crime scene statistics.

Thefts Per Month

Well, in August, no Superman or Batman arrived in the city to protect justice. It was just a algorithmic loss which resulted in a NAN value at August which I had to replace by 0 (Because I am lazy)

Here is the code, if you don’t believe me:

theft_in_months = pd.DataFrame({"thefts" : df_theft['Month'].value_counts(), "month" : df_theft["Month"].value_counts().index}, index = range(12))

theft_in_months.fillna(0, inplace=True)
theft_in_months = theft_in_months.sort_values(['month'], ascending=[1])

theft_in_months.head()

Annual Crime Statistics : Using Literally all the data at once.

You can try re-sampling the dataset with respect to date and you will realize that this data set, indeed contains 1854 days to be precise. Want to know the mapping of how many crimes were committed in a single day? Have a look at this graph then.

Thefts on a daily basis

As you may have noticed, the yearly crime statistics follow a general trend.

Here, the noticable trend is a rise in curve at the start of the year and achieveing the peak at the mid point. somehwhere at June — July . After that it has an equally sharp drop to the initial number of crimes as the year started!

Public Peace Violations

I promise that this is the last area of research in this article.

Anyways, if one may focus on the types of Public Peace Violations and their numerical distribution, one can easily point out that Reckless Conduct is the leader in this area and (thankfully) not Bomb and Arson threats.

Types of Public Peace Violations

While we are at it, let’s have a look at the Peace Disruption incidents around Chicago. In this map, the Orange Circles represent that Peace Disruptions at that location have exceeded the count of 30 in these 5 years and hence are a little sensitive spots to tread on.

Peace Disruption Locations

Conclusion

As you may have already judged, this is not a coding tutorial but a potential project starter. You can use this EDA in your notebooks keeping in mind the Apache 2.0 License and make your prediction models out of these ideas.

Few Ideas from my side :

A Season Based Predictive Model which predicts how many crimes are going to happen on that particular day.
A Prediction Model which judges the sensitivity of an area or vicinity (like Lincolnwood ) and predict when will the next crime take place.

Or any other idea which may strike your mind.

Until next time, peace out.

Uddeshya Singh

Chicago Crime Mapping: Magic of Data Science and Python