69

GitHub - jamesqo/gun-violence-data: A comprehensive, accessible database that co...

 6 years ago
source link: https://github.com/jamesqo/gun-violence-data
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

Gun Violence Data

What is this repository?

This repository contains data for all recorded gun violence incidents in the US between January 2013 and March 2018, inclusive.

Why was it created?

There's currently a lack of large and easily-accessible amounts of detailed data on gun violence. This project aims to change that; we make a record of more than 260k gun violence incidents, with detailed information about each incident, available in CSV format. We hope that this will make it easier for data scientists and statisticians to study gun violence and predict future trends.

Where did you get the data?

The data was downloaded from gunviolencearchive.org. From the organization's description:

Gun Violence Archive (GVA) is a not for profit corporation formed in 2013 to provide free online public access to accurate information about gun-related violence in the United States. GVA will collect and check for accuracy, comprehensive information about gun-related violence in the U.S. and then post and disseminate it online.

How did you get the data?

Because GVA limits the number of incidents that are returned from a single query, and because the website's "Export to CSV" functionality was missing crucial fields, it was necessary to obtain this dataset using web scraping techniques.

Stage 1: For each date between 1/1/2013 and 3/31/2018, a Python script queried all incidents that happened at that particular date, then scraped the data and wrote it to a CSV file. Each month got its own CSV file, with the exception of 2013, since not many incidents were recorded from then.

Stage 2: Each entry was augmented with additional data not directly viewable from the query results page, such as participant information, geolocation data, etc.

Stage 3: The entries were sorted in order of increasing date, then merged into a single CSV file.

Click here to download the tarball the data is stored in. You can decompress the tarball using the 7-Zip utility on Windows, or via the tar executable on macOS/Linux.

Data format

The data is stored in a single CSV file sorted by increasing date. It has the following fields:

field type description required? incident_id int gunviolencearchive.org ID for incident yes date str date of occurrence yes state str

yes city_or_county str

yes address str address where incident took place yes n_killed int number of people killed yes n_injured int number of people injured yes incident_url str link to gunviolencearchive.org webpage containing details of incident yes source_url str link to online news story concerning incident no incident_url_fields_missing bool ignore, always False yes congressional_district int

no gun_stolen dict[int, str] key: gun ID, value: 'Unknown' or 'Stolen' no gun_type dict[int, str] key: gun ID, value: description of gun type no incident_characteristics list[str] list of incident characteristics no latitude float

no location_description str description of location where incident took place no longitude float

no n_guns_involved int number of guns involved no notes str additional notes about the incident no participant_age dict[int, int] key: participant ID no participant_age_group dict[int, str] key: participant ID, value: description of age group, e.g. 'Adult 18+' no participant_gender dict[int, str] key: participant ID, value: 'Male' or 'Female' no participant_name dict[int, str] key: participant ID no participant_relationship dict[int, str] key: participant ID, value: relationship of participant to other participants no participant_status dict[int, str] key: participant ID, value: 'Arrested', 'Killed', 'Injured', or 'Unharmed' no participant_type dict[int, str] key: participant ID, value: 'Victim' or 'Subject-Suspect' no sources list[str] links to online news stories concerning incident no state_house_district int

no state_senate_district int

no

Important notes:

  • Each list is encoded as a string with separator ||. For example, "a||b" represents ['a', 'b'].

  • Each dict is encoded as a string with outer separator || and inner separator ::. For example, 0::a, 1::b represents {0: 'a', 1: 'b'}.

  • The "gun ID" and "participant ID" are numbers specific to a given incident that refer to a particular gun/person involved in that incident. For example, this:

    participant_age_group = 0::Teen 12-17||1::Adult 18+
    participant_status = 0::Killed||1::Injured
    participant_type = 0::Victim||1::Victim
    

    corresponds to this:

    Age Group Status Type Participant #0 Teen 12-17 Killed Victim Participant #1 Adult 18+ Injured Victim

Example

The incident described here resulted in the following fields:

incident_id date state city_or_county address n_killed n_injured incident_url source_url incident_url_fields_missing congressional_district gun_stolen gun_type incident_characteristics latitude location_description longitude n_guns_involved notes participant_age participant_age_group participant_gender participant_name participant_relationship participant_status participant_type sources state_house_district state_senate_district 1081561 3/29/2018 Colorado Pueblo 617 W Northern Ave 0 0 http://www.gunviolencearchive.org/incident/1081561 https://www.chieftain.com/news/crime/pueblo-sheriff-seizes-illegal-guns-drugs-cash-in-bessemer-building/article_436d713a-4be6-565f-a919-747ab83e66df.html False 3 0::Stolen||1::Unknown||2::Unknown||3::Unknown||4::Unknown||5::Unknown||6::Unknown||7::Unknown||8::Unknown||9::Unknown||10::Unknown||11::Unknown||12::Unknown||13::Unknown||14::Unknown||15::Unknown||16::Unknown||17::Unknown||18::Unknown||19::Unknown||20::Unknown||21::Unknown||22::Unknown||23::Unknown||24::Unknown 0::Handgun||1::Handgun||2::Unknown||3::Unknown||4::Unknown||5::Unknown||6::Unknown||7::Unknown||8::Unknown||9::Unknown||10::Unknown||11::Unknown||12::Unknown||13::Unknown||14::Unknown||15::Unknown||16::Unknown||17::Unknown||18::Unknown||19::Unknown||20::Unknown||21::Unknown||22::Unknown||23::Unknown||24::Unknown Non-Shooting Incident||Drug involvement||ATF/LE Confiscation/Raid/Arrest||Possession (gun(s) found during commission of other crimes)||Possession of gun by felon or prohibited person||Stolen/Illegally owned gun{s} recovered during arrest/warrant 38.2442 Bessemer -104.618 25 Guns and drugs recovered from residence. 0::43 0::Adult 18+ 0::Male 0::Phillip W. Key

0::Unharmed, Arrested 0::Subject-Suspect https://www.chieftain.com/news/crime/pueblo-sheriff-seizes-illegal-guns-drugs-cash-in-bessemer-building/article_436d713a-4be6-565f-a919-747ab83e66df.html 46 3

Additional notes

  • The list of incidents from 2013 is not exhaustive; only 279 incidents from that year were catalogued.

Contact us

If you're interested in using this dataset for your research, feel free to contact us and ask questions / let us know about your work at [email protected]. Please note that we are not affiliated with gunviolencearchive.org in any way.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK