0% found this document useful (0 votes)
33 views50 pages

Assignment 3.HTML

Uploaded by

gcparchcert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views50 pages

Assignment 3.HTML

Uploaded by

gcparchcert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

# Assignment 3: Chicago Crimes # MSDS420

**Author:** Atef Bader, PhD **Last Edit:** 11/20/2020

Deliverables:¶
• Submit two files that are labeled: YourLastName_Assignment_3 that have
the following formats:
1. Your HTML that has your Source code and output
2. Your ipynb script that has your Source code and output

Objectives:¶
• Use SQL to execute different queries to retrieve data from Chicago
Crime dataset and Police statins dataset
• Use Geospatial queries to locate police stations and gun related crimes
(with arrest or no arrest) in every district on Choropleth map
• Use Geospatial queries to provide descriptive stat for every district on
Choropleth map
• Use Geospatial queries to locate the Block that is the furthest
(Maximum Distance) from the police station that has gun related crime
resulted in arrest Object 7
1
2
3
4
5
6
Submission Formats :¶
• Complete IPYNB script that has the source code in Python used to
access and analyze the data. The code should be submitted as an IPYNB
script that can be be loaded and run in Jupyter Notebook for Python.
• Make sure to include your name as part of the IPYNB file name.
• From the File menu select Download As -> HTML (.html) to create an
HTML of the IPYNB file.

Formatting Python Code:¶


When programming in Python, refer to Kenneth Reitz’ PEP 8: The Style Guide
for Python Code: http://pep8.org/ (Links to an external site.)Links to an
external site. There is the Google style guide for Python at
https://google.github.io/styleguide/pyguide.html (Links to an external
site.)Links to an external site. Comment often and in detail.

Descriptions and Requirement Specifications¶


Chicago Crimes¶
In his first state of the uniion address , president Trump mentioned Chicago
violance 10 times Trump's State of the Union Address
Chicago has more homicides than New York and Los Angeles
combined¶
Columnist Clarence Page wrote an article , published by the Chicago Tribune
stated that the city of Chicago had more homicides in the past two years
than New York and Los Angeles combined

Chicago Police Department¶


Chicago police department CPD issues and publishes on daily basis on its
website crime alerts, and press releases for the different districts .
The CPD categoizes the crimes into 8 categories as
follows:¶

Chicago Crimes Dataset¶


The CSV file for crimes dataset for the city of Chicago is obtained from the
data portal for the city of Chicago. Here is the link for the city of Chicago
data portal City of Chicago Data Portal
Loading the Dataset CSV file¶
Three set of data are need for this assignment:
1. The Chicago police stations in every district (hosted on NW server)
2. The Crimes dataset (hosted on NW server)
3. The Boundaries.geojson data for district boundries (downloaded in Zip
file)
Complete description of the dataset can be found on Chicago city data
portal.
Based on Trumps State of the Uniion Address and the article written by
columnist Clarence Page and published by the Chicago Tribune, we are
interested to retrieve the data for the past two years and perform different
types of spatial queries.
There are few of these queries that we are interested in to help CPD and city
of Chicago to plot on a Choroplteh map those districts that have highest gun
crimes.
Here are examples of those types of queries:
1. Plot on Choropleth map the districts and their Violent Crimes
2. Plot on Choropleth map the districts and their Gun related crimes
3. Which district is the crime capital of Chicago districts?
4. What the crime density per district?
5. Plot on Choropleth map those gun related crimes that resulted in
arrests
6. Plot on Choropleth map the gun related crime that is in the farthest
Block from the policy stattion for every district
Packages you need to Connect PostgreSQL server to load and retrieve
Crhicago Crime dataset from the database:
1. psycopg2: for PostgreSQL driver
2. area: to calculate the area inside of any GeoJSON geometry
3. Folium: for Choropleth maps </font>
Since we are using PostGIS in our work, please read and bookmark Chapter
4. Using PostGIS: Data Management and Queries
In [1]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install psycopg2
Requirement already satisfied: psycopg2 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (2.8.6)
In [2]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install area
Requirement already satisfied: area in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (1.1.1)
In [3]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install folium
Requirement already satisfied: folium in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (0.12.1)
Requirement already satisfied: requests in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from folium) (2.24.0)
Requirement already satisfied: numpy in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from folium) (1.19.2)
Requirement already satisfied: jinja2>=2.9 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from folium) (2.11.2)
Requirement already satisfied: branca>=0.3.0 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from folium) (0.4.2)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from requests->folium)
(1.25.11)
Requirement already satisfied: idna<3,>=2.5 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from requests->folium)
(2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from requests->folium)
(3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from requests->folium)
(2020.6.20)
Requirement already satisfied: MarkupSafe>=0.23 in
/Users/oscaramezcua/opt/anaconda3/lib/python3.8/site-packages (from jinja2>=2.9-
>folium) (1.1.1)
In [4]:
import folium
from folium import plugins
from folium.plugins import MarkerCluster
import psycopg2
import csv
import pandas as pd
import json
from area import area

from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


from IPython.display import display
In [5]:
folium.__version__
Out[5]:
'0.12.1'
In [6]:
psycopg2.__version__
Out[6]:
'2.8.6 (dt dec pq3 ext lo64)'

IMPORTANT NOTES:¶
• In this assignment you will be connecting to DSCC PostgreSQL server
that has the database and tables already created on the server.
• You need to use your NetID and password for login and passward to
connect PostgreSQL server hosted on DSCC
• For the psycopg2.connect statements listed below, you must provide
your NetID and password in order to connect to PostgreSQL server
hosted on DSCC
In [7]:
# All data
db_connection = psycopg2.connect(host='129.105.248.26',dbname="chicago_crimes_ut",
user="oaf9342" , password="Endeavor13@")

# ----------------------------------------------------------------
# full data
#db_connection = psycopg2.connect(host='129.105.248.26',dbname="chicago_crimes",
user="YourNetID" , password="YourPassword")
# -----------------------------------------------------------------
cursor = db_connection.cursor()
In [8]:
# Uncomment and run this code when your transactions get "stuck"
#db_connection.rollback()
Chicago Crimes Dataset¶
The Crimes_2001_to_present.csv is downloaded from Chicago data portal and
it has roughly 6.5 million records.
While working on this dataset, It is prudent to make a note of the following:
1. Geospatial queries are very demanding for system resouces like CPU,
Memory, and DISK
2. We are interested in the data set of the past 2 years, and when you
execute Geospatial type queries, please be advised that these queries
slow down your machine.
3. There are two dataset that you will use in this assignment
• chicago_crimes_ut : Use this mini dataset that has data for rougly
a month for coding/debugging and unit testing
• chicago_crimes : Use this dataset that has data for 2 years for
your final submission
Algorithm Performance¶
• Sort algorithms used by the database engines vary in performance
between O($N log N$) and O($ N^{2} $) where $N$ is the size of the
number
• Search algorithms used by the database engines vary in performance
between O($log N$) and O($ N $) where $N$ is the size of the number
What are the column names in the 2 tables that we will use from chicago_crimes
database?¶
In [9]:
# Get the column names for table crimes

cursor.execute("SELECT column_name \
FROM INFORMATION_SCHEMA.COLUMNS \
WHERE table_name = 'crimes';")

rows=cursor.fetchall()

rows
Out[9]:
[('id',),
('caseno',),
('date_of_occurrence',),
('block',),
('iucr',),
('primary_type',),
('description',),
('location_description',),
('arrest',),
('domestic',),
('beat',),
('district',),
('ward',),
('community_area',),
('fbi_cd',),
('xcoordinate',),
('ycoordinate',),
('year',),
('updated_on',),
('latitude',),
('longitude',),
('location',),
('where_is',)]
In [10]:
# Get the column names for table police_stations

cursor.execute("SELECT column_name \
FROM INFORMATION_SCHEMA.COLUMNS \
WHERE table_name = 'police_stations';")

rows=cursor.fetchall()

rows
Out[10]:
[('district',),
('district_name',),
('address',),
('city',),
('state',),
('zip',),
('website',),
('phone',),
('fax',),
('tty',),
('x_coordinate',),
('y_coordinate',),
('latitude',),
('longitude',),
('location',),
('where_is',)]
In [11]:
query = "SELECT * \
FROM police_stations"
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
# create a data frame with column headers
stations = pd.DataFrame(data,columns = colnames).drop_duplicates()
stations
Out[11]:
district_nam
district address city state zip website
e

1718 S http://home.chicagopolic
0 1 Central Chicago IL 60616
State St .org/community/distri...

5101 S
http://home.chicagopolic
1 2 Wentworth Wentworth Chicago IL 60609
.org/community/distri...
Ave

7040 S
Grand http://home.chicagopolic
2 3 Cottage Chicago IL 60637
Crossing .org/community/distri...
Grove Ave

South 2255 E http://home.chicagopolic


3 4 Chicago IL 60617
Chicago 103rd St .org/community/distri...

727 E http://home.chicagopolic
4 5 Calumet Chicago IL 60628
111th St .org/community/distri...

7808 S http://home.chicagopolic
5 6 Gresham Chicago IL 60620
Halsted St .org/community/distri...

1438 W http://home.chicagopolic
6 7 Englewood Chicago IL 60636
63rd St .org/community/distri...

3420 W http://home.chicagopolic
7 8 Chicago Lawn Chicago IL 60629
63rd St .org/community/distri...

3120 S http://home.chicagopolic
8 9 Deering Chicago IL 60608
Halsted St .org/community/distri...

3315 W http://home.chicagopolic
9 10 Ogden Chicago IL 60623
Ogden Ave .org/community/distri...

3151 W http://home.chicagopolic
10 11 Harrison Chicago IL 60612
Harrison St .org/community/distri...
district_nam
district address city state zip website
e

1412 S
http://home.chicagopolic
11 12 Near West Blue Island Chicago IL 60608
.org/community/distri...
Ave\n

2150 N
http://home.chicagopolic
12 14 Shakespeare California Chicago IL 60647
.org/community/distri...
Ave

5701 W http://home.chicagopolic
13 15 Austin Chicago IL 60644
Madison St .org/community/distri...

5151 N
http://home.chicagopolic
14 16 Jefferson Park Milwaukee Chicago IL 60630
.org/community/distri...
Ave

4650 N http://home.chicagopolic
15 17 Albany Park Chicago IL 60630
Pulaski Rd .org/community/distri...

1160 N
http://home.chicagopolic
16 18 Near North Larrabee Chicago IL 60610
.org/community/distri...
St

850 W http://home.chicagopolic
17 19 Town Hall Chicago IL 60613
Addison St .org/community/distri...

5400 N
http://home.chicagopolic
18 20 Lincoln Lincoln Chicago IL 60625
.org/community/distri...
Ave

1900 W
http://home.chicagopolic
19 22 Morgan Park Monterey Chicago IL 60643
.org/community/distri...
Ave

6464 N http://home.chicagopolic
20 24 Rogers Park Chicago IL 60626
Clark St .org/community/distri...

5555 W http://home.chicagopolic
21 25 Grand Central Chicago IL 60639
Grand Ave .org/community/distri...

22 Headquarters Headquarters 3510 S Chicago IL 60653 http://home.chicagopolic


Michigan
district_nam
district address city state zip website
e

Ave .org

In [12]:
# store districts (excluding Headquarters) for later use
districts = stations.query("district != 'Headquarters'").district
districts
Out[12]:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 14
13 15
14 16
15 17
16 18
17 19
18 20
19 22
20 24
21 25
Name: district, dtype: object
In [13]:
# Get the crimes table
query = "SELECT * \
FROM crimes"
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
# create a data frame with column headers
chicago = pd.DataFrame(data,columns = colnames)
chicago.columns
Out[13]:
Index(['id', 'caseno', 'date_of_occurrence', 'block', 'iucr', 'primary_type',
'description', 'location_description', 'arrest', 'domestic', 'beat',
'district', 'ward', 'community_area', 'fbi_cd', 'xcoordinate',
'ycoordinate', 'year', 'updated_on', 'latitude', 'longitude',
'location', 'where_is'],
dtype='object')
In [14]:
for i in cursor.description:
print(i[0])
id
caseno
date_of_occurrence
block
iucr
primary_type
description
location_description
arrest
domestic
beat
district
ward
community_area
fbi_cd
xcoordinate
ycoordinate
year
updated_on
latitude
longitude
location
where_is
DATA DICTIONARY (CRIMES) (See https://www.kaggle.com/currie32/crimes-in-
chicago)
id - Unique identifier for the record.
caseno - The Chicago Police Department RD Number (Records Division
Number), which is unique to the incident.
date_of_occurence - Date when the incident occurred. this is sometimes a
best estimate.
**block - The partially redacted address where the incident occurred, placing
it on the same block as the actual address.
iucr - The Illinois Unifrom Crime Reporting code. This is directly linked to the
Primary Type and Description. See the list of IUCR codes at
https://data.cityofchicago.org/d/c7ck-438e.
primary_type - The primary description of the IUCR code.
description - The secondary description of the IUCR code, a subcategory of
the primary description.
location-description - Description of the location where the incident
occurred.
arrest - Indicates whether an arrest was made.
domestic - Indicates whether the incident was domestic-related as defined
by the Illinois Domestic Violence Act.
beat - Indicates the beat where the incident occurred. A beat is the smallest
police geographic area – each beat has a dedicated police beat car. Three to
five beats make up a police sector, and three sectors make up a police
district. The Chicago Police Department has 22 police districts. See the beats
at https://data.cityofchicago.org/d/aerh-rz74.
district - Indicates the police district where the incident occurred. See the
districts at https://data.cityofchicago.org/d/fthy-xz3r.
ward - The ward (City Council district) where the incident occurred. See the
wards at https://data.cityofchicago.org/d/sp34-6z76.
community_area - Indicates the community area where the incident
occurred. Chicago has 77 community areas. See the community areas at
https://data.cityofchicago.org/d/cauq-8yn6.
fbi_cd - Indicates the crime classification as outlined in the FBI's National
Incident-Based Reporting System (NIBRS). See the Chicago Police
Department listing of these classifications at
http://gis.chicagopolice.org/clearmap_crime_sums/crime_types.html.
xcoordinate - The x coordinate of the location where the incident occurred in
State Plane Illinois East NAD 1983 projection. This location is shifted from
the actual location for partial redaction but falls on the same block.
ycoordinate - The y coordinate of the location where the incident occurred in
State Plane Illinois East NAD 1983 projection. This location is shifted from
the actual location for partial redaction but falls on the same block.
year - Year the incident occurred.
updated_on - Date and time the record was last updated.
latitude - The latitude of the location where the incident occurred. This
location is shifted from the actual location for partial redaction but falls on
the same block.
longitude - The longitude of the location where the incident occurred. This
location is shifted from the actual location for partial redaction but falls on
the same block.
location - The location where the incident occurred in a format that allows
for creation of maps and other geographic operations on this data portal.
This location is shifted from the actual location for partial redaction but falls
on the same block.
In [15]:
chicago.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14436 entries, 0 to 14435
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 14436 non-null int64
1 caseno 14436 non-null object
2 date_of_occurrence 14436 non-null datetime64[ns]
3 block 14436 non-null object
4 iucr 14436 non-null object
5 primary_type 14436 non-null object
6 description 14436 non-null object
7 location_description 14436 non-null object
8 arrest 14436 non-null bool
9 domestic 14436 non-null bool
10 beat 14436 non-null int64
11 district 14436 non-null int64
12 ward 14436 non-null float64
13 community_area 14436 non-null object
14 fbi_cd 14436 non-null object
15 xcoordinate 14436 non-null float64
16 ycoordinate 14436 non-null float64
17 year 14436 non-null int64
18 updated_on 14436 non-null datetime64[ns]
19 latitude 14436 non-null float64
20 longitude 14436 non-null float64
21 location 14436 non-null object
22 where_is 14436 non-null object
dtypes: bool(2), datetime64[ns](2), float64(5), int64(4), object(10)
memory usage: 2.3+ MB
In [16]:
chicago.head()
Out[16]:
date_of_oc primary_ location_de
id caseno block iucr description arrest
currence type scription

050XX S FIRST
2019-01-03
0 24368 JC103046 INDIANA 110 HOMICIDE DEGREE AUTO False
14:37:00
AVE MURDER

050XX S FIRST
2019-01-03
1 24369 JC103046 INDIANA 110 HOMICIDE DEGREE AUTO False
14:37:00
AVE MURDER

FIRST
2019-01-05 012XX N
2 24370 JC105864 110 HOMICIDE DEGREE STREET False
16:23:00 ADA ST
MURDER

060XX FIRST
2019-01-06
3 24372 JC107385 W 63RD 110 HOMICIDE DEGREE STREET False
22:23:00
ST MURDER

002XX FIRST
2019-01-07 PARKING
4 24373 JC108358 W 87TH 110 HOMICIDE DEGREE True
15:45:00 LOT
ST MURDER

5 rows × 23 columns
In [17]:
# use value_counts to see the values of the field
chicago['year'].value_counts()
Out[17]:
2019 14436
Name: year, dtype: int64
In [18]:
# what is in the field domestic
chicago['domestic'].value_counts()
Out[18]:
False 11974
True 2462
Name: domestic, dtype: int64
In [19]:
# let's look at how many crimes are domestic by primary type
pd.crosstab(chicago['primary_type'], chicago['domestic'], margins=True)
Out[19]:
domestic False True All

primary_type

ARSON 18 1 19

ASSAULT 774 317 1091

BATTERY 1328 1398 2726

BURGLARY 630 7 637

CONCEALED CARRY LICENSE VIOLATION 7 0 7

CRIM SEXUAL ASSAULT 79 15 94

CRIMINAL DAMAGE 1289 168 1457

CRIMINAL TRESPASS 436 25 461

DECEPTIVE PRACTICE 785 5 790

GAMBLING 3 0 3

HOMICIDE 15 0 15

HUMAN TRAFFICKING 3 0 3

INTERFERENCE WITH PUBLIC OFFICER 81 0 81

INTIMIDATION 12 4 16

KIDNAPPING 4 0 4

LIQUOR LAW VIOLATION 11 0 11

MOTOR VEHICLE THEFT 483 5 488

NARCOTICS 933 0 933

NON-CRIMINAL 1 0 1

OBSCENITY 0 1 1

OFFENSE INVOLVING CHILDREN 50 49 99

OTHER OFFENSE 678 326 1004


domestic False True All

primary_type

PROSTITUTION 58 0 58

PUBLIC PEACE VIOLATION 63 1 64

ROBBERY 513 11 524

SEX OFFENSE 41 3 44

STALKING 7 2 9

THEFT 3300 124 3424

WEAPONS VIOLATION 372 0 372

All 11974 2462 14436

In [20]:
# percentages would be more informative
# ------ lambda expressions section 4.7.6 ('syntactic sugar for a normal function
definition')
# https://docs.python.org/3/tutorial/controlflow.html
pd.crosstab(chicago['primary_type'], chicago['domestic']).apply(lambda x: x/x.sum(),
axis = 1)
Out[20]:
domestic False True

primary_type

ARSON 0.947368 0.052632

ASSAULT 0.709441 0.290559

BATTERY 0.487161 0.512839

BURGLARY 0.989011 0.010989

CONCEALED CARRY LICENSE VIOLATION 1.000000 0.000000

CRIM SEXUAL ASSAULT 0.840426 0.159574

CRIMINAL DAMAGE 0.884695 0.115305


domestic False True

primary_type

CRIMINAL TRESPASS 0.945770 0.054230

DECEPTIVE PRACTICE 0.993671 0.006329

GAMBLING 1.000000 0.000000

HOMICIDE 1.000000 0.000000

HUMAN TRAFFICKING 1.000000 0.000000

INTERFERENCE WITH PUBLIC OFFICER 1.000000 0.000000

INTIMIDATION 0.750000 0.250000

KIDNAPPING 1.000000 0.000000

LIQUOR LAW VIOLATION 1.000000 0.000000

MOTOR VEHICLE THEFT 0.989754 0.010246

NARCOTICS 1.000000 0.000000

NON-CRIMINAL 1.000000 0.000000

OBSCENITY 0.000000 1.000000

OFFENSE INVOLVING CHILDREN 0.505051 0.494949

OTHER OFFENSE 0.675299 0.324701

PROSTITUTION 1.000000 0.000000

PUBLIC PEACE VIOLATION 0.984375 0.015625

ROBBERY 0.979008 0.020992

SEX OFFENSE 0.931818 0.068182

STALKING 0.777778 0.222222

THEFT 0.963785 0.036215

WEAPONS VIOLATION 1.000000 0.000000


domestic False True

primary_type

Lets start executing different Queries¶


Query #1:¶
• Calculate the total number of crimes in every district and plot that on
Choropleth map. In other words, create a Choropleth map where the
districts are shaded in proportion to the number of crimes in that
district.
In [21]:
# Get the total number of crimes per district.
# Since each row corresponds to a crime, we just need to count the number of rows per
district.
cursor.execute("SELECT district, count(district) \
FROM crimes \
GROUP BY district")
rows = cursor.fetchall()
In [22]:
# Create a dataframe from the results of the query.
crimes_per_district = pd.DataFrame(rows, columns=['dist_num','number_of_crimes'])
crimes_per_district['dist_num'] = crimes_per_district['dist_num'].astype(str)

crimes_per_district
Out[22]:
dist_num number_of_crimes

0 24 412

1 11 1113

2 8 886

3 19 608

4 25 726

5 4 790

6 14 508

7 3 695

8 17 386
dist_num number_of_crimes

9 20 254

10 22 442

11 9 585

12 10 779

13 7 745

14 1 854

15 5 635

16 18 869

17 2 602

18 16 473

19 15 530

20 6 844

21 12 700

In [23]:
# For each district, excluding HQ, we get the latitude/longitude of the police station
at the district.
# With a few exceptions, will be inserting markers on the map at these police station
locations
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district
\
FROM police_stations \
WHERE district!='Headquarters'""")
police_stations = cursor.fetchall()

# Create a dataframe with this information.


station_locations =
pd.DataFrame(police_stations,columns=['st_latitude','st_longitude','dist_num']).drop_du
plicates()

station_locations
Out[23]:
st_latitude st_longitude dist_num

0 41.858373 -87.627356 1

1 41.801811 -87.630560 2

2 41.766431 -87.605748 3

3 41.707933 -87.568349 4

4 41.692723 -87.604506 5

5 41.752137 -87.644229 6

6 41.779632 -87.660887 7

7 41.778987 -87.708864 8

8 41.837394 -87.646408 9

9 41.856685 -87.708382 10

10 41.873582 -87.705488 11

11 41.862977 -87.656973 12

12 41.921103 -87.697452 14

13 41.880083 -87.768200 15

14 41.974094 -87.766149 16

15 41.966053 -87.728115 17

16 41.903242 -87.643352 18

17 41.947400 -87.651512 19

18 41.979550 -87.692845 20

19 41.691435 -87.668520 22

20 41.999763 -87.671324 24

21 41.918609 -87.765574 25

In [24]:
# Add columns for the coordinates of each distract. Again, this is going to be the
locations of our markers.
query1_df = pd.merge(crimes_per_district, station_locations,on='dist_num')
query1_df
Out[24]:
dist_num number_of_crimes st_latitude st_longitude

0 24 412 41.999763 -87.671324

1 11 1113 41.873582 -87.705488

2 8 886 41.778987 -87.708864

3 19 608 41.947400 -87.651512

4 25 726 41.918609 -87.765574

5 4 790 41.707933 -87.568349

6 14 508 41.921103 -87.697452

7 3 695 41.766431 -87.605748

8 17 386 41.966053 -87.728115

9 20 254 41.979550 -87.692845

10 22 442 41.691435 -87.668520

11 9 585 41.837394 -87.646408

12 10 779 41.856685 -87.708382

13 7 745 41.779632 -87.660887

14 1 854 41.858373 -87.627356

15 5 635 41.692723 -87.604506

16 18 869 41.903242 -87.643352

17 2 602 41.801811 -87.630560

18 16 473 41.974094 -87.766149

19 15 530 41.880083 -87.768200


dist_num number_of_crimes st_latitude st_longitude

20 6 844 41.752137 -87.644229

21 12 700 41.862977 -87.656973

Creating the folium maps¶


In [25]:
# Create a folium map centered on "downtown Chicago"
total_number_of_crimes_per_district_map = folium.Map(location = (41.8781,
-87.6298),zoom_start = 11)
In [26]:
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct
are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to
the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column
of crimes_per_district.
# We add this "choropleth layer" to our folium map.
folium.Choropleth(geo_data = "Boundaries.geojson",
fill_color = 'OrRd',
fill_opacity =0.5,
line_opacity= 1,
data = crimes_per_district,
key_on = 'feature.properties.dist_num',
columns = ['dist_num', 'number_of_crimes'],
legend_name = "CRIMES PER DISTRICT"
).add_to(total_number_of_crimes_per_district_map)
Out[26]:
<folium.features.Choropleth at 0x7fadeaa7dc70>
In [27]:
# We iterate over the rows in the dataframe creating one marker per row.
# The ("popup") label for each district marker displays the district number and number
of crimes.
# Each distric marker is added to the map at the district station's location.
for index, row in query1_df.iterrows():
# location of police station
police_station_location = (row['st_latitude'],row['st_longitude'])
district = row['dist_num']
num_crimes = row['number_of_crimes']

# Create the marker and add it to the map.


folium.Marker(location = police_station_location,
popup =\
folium.Popup(html=f"District No : {district} has Total Number of Crimes:
{num_crimes}",max_width=450)).\
add_to(total_number_of_crimes_per_district_map)
• Lets plot the Choropleth map and notice the intensity of color on the
different districts
• The Blue POPUP represents the location of police station in the
different districts in the map
In [28]:
# Display the finished map.
total_number_of_crimes_per_district_map
Out[28]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #2:¶
• Calculate the total number of violent crimes in every district and plot
that in a table on Choropleth map. In other words, create a Choropleth
map where the districts are shaded in proportion to the number of
violent crimes in that district.
Note: A crime is considered a violent crime if the PRIMARY_TYPE of the
crimes is THEFT ,ASSAULT,ROBBER,KIDNAPPING,CRIM SEXUAL ASSAULT,BATTERY, or
MURDER.
• Then find the total number of crimes in the district for each of these
primary types of violent crime and add a popup marker (located at that
district's police headquarter) that displays a DataFrame containing this
data.

In [29]:
violent_crime_categories = 'THEFT','ASSAULT','ROBBERY','KIDNAPPING','CRIM SEXUAL
ASSAULT','BATTERY','MURDER'
In [30]:
# Get the total number of violent crimes per district.
cursor.execute("SELECT district, count(district)\
FROM crimes \
WHERE PRIMARY_TYPE in %s \
GROUP BY district",[violent_crime_categories])
rows=cursor.fetchall()
violent_crime_data=pd.DataFrame(rows, columns=['dist_num','number_of_violent_crimes'])
violent_crime_data['dist_num'] = violent_crime_data['dist_num'].astype(str)
violent_crime_data
Out[30]:
dist_num number_of_violent_crimes

0 8 488
dist_num number_of_violent_crimes

1 11 407

2 24 216

3 19 368

4 25 362

5 4 400

6 14 309

7 3 387

8 17 220

9 20 147

10 22 217

11 9 312

12 7 367

13 10 362

14 1 624

15 5 293

16 18 605

17 2 349

18 16 252

19 15 291

20 6 457

21 12 430

Now, lets create a dataframe of the different types of violent crimes for
every district
In [31]:
cursor.execute(f"SELECT district, PRIMARY_TYPE, count(PRIMARY_TYPE) \
FROM crimes \
WHERE PRIMARY_TYPE in {violent_crime_categories} \
GROUP BY distrIct,PRIMARY_TYPE")
rows=cursor.fetchall()
violent_crime_type = pd.DataFrame(rows,
columns=['dist_num','Description','number_of_violent_crimes'])
violent_crime_type['dist_num'] = violent_crime_type['dist_num'].astype(str)
violent_crime_type
Out[31]:
dist_num Description number_of_violent_crimes

0 24 ROBBERY 15

1 3 CRIM SEXUAL ASSAULT 9

2 8 BATTERY 174

3 12 ASSAULT 38

4 18 THEFT 417

... ... ... ...

108 5 BATTERY 123

109 10 ASSAULT 63

110 18 ROBBERY 33

111 5 ASSAULT 62

112 22 CRIM SEXUAL ASSAULT 2

113 rows × 3 columns


In [32]:
# This time our dataframe will contain one row for each district and each type of
violent crime.
# For each distrct number and type of violent crime our dataframe will have the total
of violent crimes of that type.
# We once again include the stations location since that is where the markers will
appear on the map.
query2_df = pd.merge(violent_crime_type, station_locations,on='dist_num')
query2_df
Out[32]:
dist_nu number_of_violent_cr
Description st_latitude st_longitude
m imes

0 24 ROBBERY 15 41.999763 -87.671324

1 24 THEFT 96 41.999763 -87.671324

2 24 BATTERY 79 41.999763 -87.671324

3 24 ASSAULT 23 41.999763 -87.671324

CRIM SEXUAL
4 24 3 41.999763 -87.671324
ASSAULT

... ... ... ... ... ...

CRIM SEXUAL
108 14 3 41.921103 -87.697452
ASSAULT

109 14 ASSAULT 31 41.921103 -87.697452

110 14 ROBBERY 16 41.921103 -87.697452

111 14 BATTERY 48 41.921103 -87.697452

112 14 THEFT 211 41.921103 -87.697452

113 rows × 5 columns


In [33]:
districts
Out[33]:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 14
13 15
14 16
15 17
16 18
17 19
18 20
19 22
20 24
21 25
Name: district, dtype: object
Create a folium map centered on "downtown Chicago". The boundaries of
each distrct are stored in "Boundaries.geojson". Use this data to draw the
district regions on the folium map. Then shade each region "orange red" with
the intensity of the color corresponding to the number of crimes. Remember
the number of crimes per district is stored in the number_of_crimes column
of crimes_per_district. We add this "chropleth layer" to our folium map.

Creating the folium maps¶


In [34]:
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct
are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to
the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column
of crimes_per_district.
# We add this "choropleth layer" to our folium map
violent_crimes_per_district_map= folium.Map(location =(41.8781,
-87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = violent_crime_data, #data source changed from first example
key_on='feature.properties.dist_num',
columns = ['dist_num', 'number_of_violent_crimes'],
legend_name = "VIOLENT CRIMES PER DISTRICT"
).add_to(violent_crimes_per_district_map)
Out[34]:
<folium.features.Choropleth at 0x7fadeb771250>
In addition, for each district find the block(s) that has the highest number of
gun crimes in that district. Note that there might be a tie for the highest
number of gun crimes. You need to find all such blocks. Add a popup marker
(located at that district's police headquarter) that displays a DataFrame
containing all such block along with the number of gun crimes for that block
(i.e. the highest number of crimes for a district).
In [35]:
# We iterate over district numbers and get all the rows in query2_df with that district
number.
# This new dataframe contains the number of crimes for each of violent crime types in a
particular district.
# We convert this dataframe to "html" which will be displayed in marker label (along
with the district number)
# Each distric marker is added to the map at the district station's location like
before.

#districts = query2_df.dist_num.unique()
for district in districts:
df2d = query2_df[query2_df.dist_num == district] # new dataframe to add to marker.

police_station_location = tuple(df2d[['st_latitude','st_longitude']].values[0])

violent_crimes_per_district_df = df2d[['Description','number_of_violent_crimes']]
# Convert out new data frame to html before inserting it into the label.
header = violent_crimes_per_district_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
# Create the folium marker and add it to the map.
folium.Marker(location=police_station_location, popup=folium.\
Popup(html=f"District Number {district} - Violent Crimes
{header}")).\
add_to(violent_crimes_per_district_map)

# Create the marker and add it to the map.


folium.Marker(location = police_station_location,
popup =\
folium.Popup(html=f"District No : {district} - Violent Crimes
{header}",max_width=450)).\
add_to(violent_crimes_per_district_map)
In [36]:
# Display the finished map.
violent_crimes_per_district_map
Out[36]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #3:¶
• Calculate the total number of gun related violent crimes in every
district and plot that in a table on Choropleth map.
Note: A crime is considered a gun related violent crime if the word "gun" is
contained in the DESCRIPTION and the PRIMARY_TYPE of the crimes is
THEFT ,ASSAULT,ROBBER,KIDNAPPING,CRIM SEXUAL ASSAULT,BATTERY, or MURDER.
• Then find the total number of crimes in the district for the different
DESCRIPTIONs containing the word "gun" and add a popup marker
(located at that district's police headquarter) that displays a DataFrame
containing this data.
Lets first create a dataframe of gun crimes per district first to get an idea
about the number of gun crimes per district
In [37]:
# isolate violent gun crime
gun='%GUN%'
cursor.execute(f"SELECT district, count(district)\
FROM crimes\
WHERE PRIMARY_TYPE in {violent_crime_categories}\
AND DESCRIPTION::text LIKE '{gun}' GROUP BY district")
districts_violent_gun_crimes = cursor.fetchall()
districts_violent_gun_crimes_df = pd.DataFrame(districts_violent_gun_crimes,
columns=['dist_num','violent_gun_crimes'])
districts_violent_gun_crimes_df['dist_num'] =
districts_violent_gun_crimes_df['dist_num'].astype(str)
districts_violent_gun_crimes_df.head()
Out[37]:
dist_num violent_gun_crimes

0 24 9

18 38

2 11 36

3 19 9

4 25 25

Now, lets create a dataframe of the different types of violent gun crimes for
every district
In [38]:
cursor.execute(f"SELECT district, DESCRIPTION, count(DESCRIPTION) \
FROM crimes \
WHERE PRIMARY_TYPE in {violent_crime_categories} AND DESCRIPTION::text
LIKE '{gun}' \
GROUP BY district, DESCRIPTION")
rows=cursor.fetchall()
violent_gun_crime_type = pd.DataFrame(rows,
columns=['dist_num','Description','number_of_violent_gun_crimes'])
violent_gun_crime_type['dist_num'] = violent_gun_crime_type['dist_num'].astype(str)
violent_gun_crime_type.head()
Out[38]:
dist_num Description number_of_violent_gun_crimes

08 ARMED: HANDGUN 23

1 11 AGGRAVATED: HANDGUN 15

26 AGGRAVATED: HANDGUN 15

3 20 ATTEMPT: ARMED-HANDGUN 1

4 19 AGGRAVATED: HANDGUN 5

In [39]:
query3_df = pd.merge(violent_gun_crime_type, station_locations, on = 'dist_num')
query3_df.head()
Out[39]:
dist_nu number_of_violent_gu st_latitud st_longitud
Description
m n_crimes e e

08 ARMED: HANDGUN 23 41.778987 -87.708864


dist_nu number_of_violent_gu st_latitud st_longitud
Description
m n_crimes e e

ATTEMPT: ARMED-
18 2 41.778987 -87.708864
HANDGUN

AGGRAVATED:
28 13 41.778987 -87.708864
HANDGUN

AGGRAVATED:
3 11 15 41.873582 -87.705488
HANDGUN

4 11 ARMED: HANDGUN 19 41.873582 -87.705488

Creating the folium maps¶


In [40]:
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct
are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to
the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column
of crimes_per_district.
# We add this "choropleth layer" to our folium map.# Create a folium map centered on
"downtown Chicago"
districts_violent_gun_crimes_map = folium.Map(location =(41.8781,
-87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_violent_gun_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'violent_gun_crimes'],
legend_name="VIOLENT GUN CRIME"
).add_to(districts_violent_gun_crimes_map)
Out[40]:
<folium.features.Choropleth at 0x7fadeca1b070>
In [41]:
# districts = query3_df.dist_num.unique()
for district in districts:
df3d = query3_df[query3_df.dist_num == district]

police_station_location = tuple(df3d[['st_latitude','st_longitude']].values[0])

violent_gun_crimes_per_district_df =
df3d[['Description','number_of_violent_gun_crimes']]

header = violent_gun_crimes_per_district_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
folium.Marker(location=police_station_location, popup=folium.\
Popup(html=f"District Number {district} - Violent GUN Crimes
{header}")).\
add_to(districts_violent_gun_crimes_map)

folium.Marker(location = police_station_location,
popup =\
folium.Popup(html=f"District No : {district} - Violent Gun Crimes
{header}",max_width=450)).\
add_to(districts_violent_gun_crimes_map)
In [42]:
districts_violent_gun_crimes_map
Out[42]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #4:¶
• Calculate the crime density per district
Boundaries.geojson: { "type": "FeatureCollection", "features":
[ {"type":"Feature","properties":{"dist_num":"17","dist_label":"17TH"},
"geometry": {"type":"MultiPolygon","coordinates": [[[[-
87.71067089391354,41.997365655369435], [-
87.71066884721016,41.99729359357709], [-
87.71066053080999,41.997225765680135], ....
In [43]:
district = []
tarea=[]

with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features'] # a is a list of district data (dictionaries)
for i in range(len(a)):
obj = a[i]['geometry'] # list of coordinates defining the ith district boundary
n = a[i]['properties'] # dictionary with district number and district label
keyes for ith district
district.append(n['dist_num']) # add district number to the district list
tarea.append(area(obj)/10000) # add the the area (in hectares) to area list

af = pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
final_data= pd.merge(af, crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_crimes']/
(final_data['district_area_inHectares']/100))
final_data
Out[43]:
dist_num district_area_inHectares number_of_crimes crime_density

0 17 2492.727155 386 15.0

1 20 1132.170216 254 22.0


dist_num district_area_inHectares number_of_crimes crime_density

2 19 2225.035732 608 27.0

3 25 2827.989237 726 26.0

4 14 1555.869965 508 33.0

5 7 1688.670732 745 44.0

6 3 1576.063931 695 44.0

7 4 7068.152865 790 11.0

8 6 2099.682124 844 40.0

9 22 3490.416073 442 13.0

10 5 3318.613379 635 19.0

11 24 1406.081387 412 29.0

12 16 8171.776367 473 6.0

13 8 5992.169760 886 15.0

14 18 1215.520046 869 71.0

15 12 2509.453028 700 28.0

16 11 1582.727274 1113 70.0

17 15 989.631393 530 54.0

18 10 2038.988883 779 38.0

19 1 1214.818895 854 70.0

20 9 3505.216898 585 17.0

21 2 1949.690970 602 31.0

Query #5:¶
• Create Marker Clusters on Choropleth map for those gun related crimes
that resulted in arrest (green icon) and those that didn't (red icon)
In [44]:
# isolate gun crime
gun='%GUN%'
cursor.execute(f"SELECT district, count(district)\
FROM crimes\
WHERE DESCRIPTION::text LIKE '{gun}' GROUP BY district")
districts_gun_crimes = cursor.fetchall()
districts_gun_crimes_df = pd.DataFrame(districts_gun_crimes,
columns=['dist_num','gun_crimes'])
districts_gun_crimes_df['dist_num'] = districts_gun_crimes_df['dist_num'].astype(str)
districts_gun_crimes_df.head()
Out[44]:
dist_num gun_crimes

0 24 9

18 49

2 11 69

3 19 11

4 25 43

In [45]:
# This time our dataframe will contain one row for each district and each block in the
district.
# Each row contains information about a particular gun crime including the block in
which the crime occured.
# We also save the location of the crime incidendent and whether there was an arrest.
# We will use this information when creating the markers.
# In particular, markers will be positioned at the crime location instead of police
station location.
data= []
for district in districts:
cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block, DESCRIPTION,
count(arrest), arrest,latitude,longitude\
FROM crimes\
WHERE district = %s and DESCRIPTION::text LIKE %s \
GROUP BY caseno, block, DESCRIPTION,arrest, latitude, longitude""",
[district,gun])
results = cursor.fetchall()

# insert the district number as the first element of each result list
gun_crimes_per_district = [[district]+list(result) for result in results]
data += (gun_crimes_per_district)

gun_crimes_per_district_df = pd.DataFrame(data, columns=['dist_num','caseno','block',\


'Description','arrest_count', 'arrest', 'latitude', 'longitude'])
gun_crimes_per_district_df['dist_num'] =
gun_crimes_per_district_df['dist_num'].astype(str)
gun_crimes_per_district_df.head()
Out[45]:
dist_ arrest_
caseno block Description arrest latitude longitude
num count

004XX W
AGGRAVATED:
01 JC105034 VAN BUREN 1 False 41.876794 -87.637817
HANDGUN
ST

ATTEMPT:
008XX S
11 JC111601 ARMED- 1 False 41.871096 -87.629570
PARK TER
HANDGUN

ATTEMPT:
0000X W
21 JC111881 ARMED- 1 False 41.878802 -87.627917
QUINCY ST
HANDGUN

022XX S
AGGRAVATED:
31 JC118978 MICHIGAN 1 True 41.852284 -87.623790
HANDGUN
AVE

029XX S UNLAWFUL
41 JC119606 DEARBORN POSS OF 1 True 41.841856 -87.628742
ST HANDGUN

Creating the folium maps¶


In [46]:
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct
are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to
the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column
of crimes_per_district.
# We add this "choropleth layer" to our folium map.
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_crimes_df, #using data from Query 3
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
).add_to(gun_crime_arrests_map)
marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)
In [47]:
# We create a marker for each gun crime--"green" if there was an arrest and "red"
otherwise.

gun='%GUN%'
for district in districts:
gun_crimes_per_district_df2 =
gun_crimes_per_district_df[gun_crimes_per_district_df.dist_num==district]

for index, row in gun_crimes_per_district_df2.iterrows():


description = row['Description']
block = row['block']
loc = (row['latitude'],row['longitude'])
# print("District No: %s <br> Description: %s <br> Block: %s" %
(district,description,block))
if row['arrest']==True:
folium.Marker(location=loc,popup = folium.Popup(html=\
f"District No: {district} <br> Description: {description} <br> Block:
{block}"),\
icon=folium.Icon(color='green', icon='ok-
sign'),).add_to(marker_cluster)
# print(f"District {district} green marker at {loc}")
else:
folium.Marker(location=loc, popup = folium.Popup(html=
f"District No: {district} <br> Description: {description} <br> Block:
{block}"),\
icon=folium.Icon(color='red',icon='remove-
sign'),).add_to(marker_cluster)

In [48]:
gun_crime_arrests_map
Out[48]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #6:¶
• Plot on Choropleth map the farthest Block that has a gun crime from
every police station in every district
Locate the farthest gun crime from the police station in every district. Create
a Choropleth map where the districts are shaded in proportion to the number
of gun crimes in that district. For each district, find the gun crime that was
farthest from police station. Add a pop-up on the Choropleth map to display
the district number and the Block where the farthest gun crime occurred.
Also add circle marker (of radius 5) at the location of the farthest gun crime.
In [49]:
# This complex query returns for each block in each district the gun crime which
occured furthest
# from the police station. We also store this distance together with the police station
location.
# Once again we will be positioning the markers at the police station locations.
results= []
for district in districts:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block,
A.where_is,ST_Distance(A.where_is,B.where_is)\
FROM crimes as A, police_stations as B
WHERE A.district=%s and DESCRIPTION::text LIKE %s and B.district= %s
and ST_Distance(A.where_is,B.where_is)
IN
( SELECT max(dist)
FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist
FROM crimes as A, police_stations as B
WHERE A.district=%s and DESCRIPTION::text LIKE %s and B.district= %s ) as
f)""",
[district,gun,district,district,gun, district])

farthest_block_gun_crime = list(cursor.fetchall()[0])
cursor.execute(f"SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))"
,(farthest_block_gun_crime[2],farthest_block_gun_crime[2]))
farthest_block_gun_crime[2:3] = cursor.fetchall()[0]

results.append(farthest_block_gun_crime)

farthest_gun_crime_df =
pd.DataFrame(results,columns=['dist_num','block','crime_lat','crime_long','dist'])
farthest_gun_crime_df['dist_num']= farthest_gun_crime_df['dist_num'].astype(str)
query6_df = pd.merge(farthest_gun_crime_df, station_locations,on='dist_num')
query6_df.head()

Out[49]:
dist_n crime_lon st_latitud st_longitud
block crime_lat dist
um g e e

004XX W VAN
01 41.876794 -87.637817 1171.461889 41.858373 -87.627356
BUREN ST

055XX S
12 41.794425 -87.582519 5365.880774 41.801811 -87.630560
EVERETT AVE

026XX E
23 41.759100 -87.560145 5093.598187 41.766431 -87.605748
75TH ST

101XX S
34 41.711374 -87.535281 3693.438062 41.707933 -87.568349
EWING AVE

118XX S
45 41.678829 -87.658995 6086.359525 41.692723 -87.604506
LAFLIN ST

Creating the folium maps¶


In [50]:
# Create a folium map centered on "downtown Chicago". The boundaries of each distrct
are stored in "Boundaries.geojson".
# Use this data to draw the district regions on the folium map.
# Then shade each region "orange red" with the intensity of the color corresponding to
the number of crimes.
# Remember the number of crimes per district is stored in the number_of_crimes column
of crimes_per_district.
# We add this "choropleth layer" to our folium map.
farthest_block_gun_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
).add_to(farthest_block_gun_crime_map)

Out[50]:
<folium.features.Choropleth at 0x7faded14a730>
In [51]:
# This time we place a circle markers at the crime locations in addition to the
"regular" markers at the
# police station locations.

for district in districts:

data = query6_df.loc[query6_df.dist_num==district]
# print(data)
# print(data['block'].values)

block, crime_lat, crime_long, dist, police_lat, police_long =\


data['block'].values[0], data['crime_lat'].values[0],
data['crime_long'].values[0],\
data['dist'].values[0], data['st_latitude'].values[0],
data['st_longitude'].values[0]

folium.CircleMarker((crime_lat,crime_long),radius=5,color='#ff3187',
popup=folium.Popup(html=f"District No.: {district} <br> Block:
{block}")).\
add_to(farthest_block_gun_crime_map)

folium.Marker(location=(police_lat,police_long),
popup=folium.Popup(\
html=f"Police Station <br> District No.: {district} <br> Farthest
Gun_Crime Block:{block}")).\
add_to(farthest_block_gun_crime_map)

farthest_block_gun_crime_map
Out[51]:
Make this Notebook Trusted to load map: File -> Trust Notebook
# Requirements ** The HTML document your are submitting must have the
source code and the output for the following requirements **
Requirement #1: (10 points)¶
• Use SQL to select the following fields from crimes - district, ward,
arrest, primary_type and location_description. Select the records for
those crimes that occurred in a RESIDENCE.
• Iterate over the results to display the primary_type of each crime within
District 1.
• Your output should look like the sample output shown below.
Sample Output: Primary Type: BURGLARY Primary Type: THEFT Primary Type:
BATTERY Primary Type: THEFT Primary Type: BATTERY Primary Type:
DECEPTIVE PRACTICE Primary Type: DECEPTIVE PRACTICE Primary Type:
OTHER OFFENSE Primary Type: BATTERY Primary Type: DECEPTIVE PRACTICE
Primary Type: THEFT Primary Type: DECEPTIVE PRACTICE Primary Type:
DECEPTIVE PRACTICE Primary Type: OTHER OFFENSE Primary Type: THEFT
Primary Type: DECEPTIVE PRACTICE Primary Type: CRIMINAL D
In [52]:
residence='%RESIDENCE%'
cursor.execute(f"SELECT district, ward, arrest, primary_type, location_description \
FROM crimes\
WHERE location_description LIKE '{residence}'")
residence_crimes = cursor.fetchall()
residence_crimes_df = pd.DataFrame(residence_crimes,
columns=['district','ward','arrest','primary_type','location_description'])
residence_crimes_df['district'] = residence_crimes_df['district'].astype(str)
residence_crimes_df.head()

for idx, row in residence_crimes_df.iterrows():


if row['district'] == "1":
print(f"Primary Type: {row['primary_type']}")
Primary Type: BURGLARY
Primary Type: THEFT
Primary Type: BATTERY
Primary Type: THEFT
Primary Type: BATTERY
Primary Type: DECEPTIVE PRACTICE
Primary Type: DECEPTIVE PRACTICE
Primary Type: THEFT
Primary Type: OTHER OFFENSE
Primary Type: BATTERY
Primary Type: DECEPTIVE PRACTICE
Primary Type: THEFT
Primary Type: THEFT
Primary Type: THEFT
Primary Type: DECEPTIVE PRACTICE
Primary Type: DECEPTIVE PRACTICE
Primary Type: NARCOTICS
Primary Type: OTHER OFFENSE
Primary Type: THEFT
Primary Type: CRIMINAL DAMAGE
Primary Type: THEFT
Primary Type: DECEPTIVE PRACTICE
Primary Type: CRIMINAL DAMAGE
Primary Type: THEFT
Primary Type: THEFT
Requirement #2: (10 points)¶
• Use SQL to select all rows of data with the following fields from crimes -
district, ward, arrest, primary_type and location_description.
• Create a dataframe called new_df with the selected fields.
• Make sure you name the columns in the dataframe.
• Show your new_df using info().
In [53]:
cursor.execute(f"SELECT district, ward, arrest, primary_type, location_description \
FROM crimes")
crimes_query = cursor.fetchall()
new_df = pd.DataFrame(crimes_query,
columns=['district','ward','arrest','primary_type','location_description'])
new_df['district'] = residence_crimes_df['district'].astype(str)
new_df.head()
Out[53]:
district ward arrest primary_type location_description

06 3.0 False HOMICIDE AUTO

1 10 3.0 False HOMICIDE AUTO

28 2.0 False HOMICIDE STREET

3 17 13.0 False HOMICIDE STREET

48 21.0 True HOMICIDE PARKING LOT

Requirement #3: (10 points)¶


• Using new_df, show a crosstab of primary_type with the variable called
arrest; Show the results as percentages.
• Comment on what the results for the crime type of Assault tell us
In [54]:
pd.crosstab(new_df['primary_type'], new_df['arrest']).apply(lambda x: x/x.sum(), axis =
1)
Out[54]:
arrest False True

primary_type

ARSON 1.000000 0.000000

ASSAULT 0.806599 0.193401

BATTERY 0.798239 0.201761

BURGLARY 0.990581 0.009419

CONCEALED CARRY LICENSE VIOLATION 0.000000 1.000000


arrest False True

primary_type

CRIM SEXUAL ASSAULT 0.957447 0.042553

CRIMINAL DAMAGE 0.952642 0.047358

CRIMINAL TRESPASS 0.407809 0.592191

DECEPTIVE PRACTICE 0.959494 0.040506

GAMBLING 0.000000 1.000000

HOMICIDE 0.866667 0.133333

HUMAN TRAFFICKING 1.000000 0.000000

INTERFERENCE WITH PUBLIC OFFICER 0.049383 0.950617

INTIMIDATION 1.000000 0.000000

KIDNAPPING 1.000000 0.000000

LIQUOR LAW VIOLATION 0.000000 1.000000

MOTOR VEHICLE THEFT 0.959016 0.040984

NARCOTICS 0.000000 1.000000

NON-CRIMINAL 1.000000 0.000000

OBSCENITY 0.000000 1.000000

OFFENSE INVOLVING CHILDREN 0.979798 0.020202

OTHER OFFENSE 0.767928 0.232072

PROSTITUTION 0.000000 1.000000

PUBLIC PEACE VIOLATION 0.265625 0.734375

ROBBERY 0.954198 0.045802

SEX OFFENSE 0.931818 0.068182


arrest False True

primary_type

STALKING 0.777778 0.222222

THEFT 0.898364 0.101636

WEAPONS VIOLATION 0.322581 0.677419

In [55]:
'''Based on the table data above, we can observe that roughly 80% of crimes involving
assault do not result in
the arrest of the perpetrator. This is a casted observation across all districts in the
City of Chicago and is
not specific to geographic location within the city.'''
Out[55]:
'Based on the table data above, we can observe that roughly 80% of crimes involving
assault do not result in\nthe arrest of the perpetrator. This is a casted observation
across all districts in the City of Chicago and is\nnot specific to geographic location
within the city.'
Requirement #4: (10 points)¶
• Calculate the gun crimes density in every district. (See Query 4.)
In [56]:
cursor.execute("SELECT district, count(district) \
FROM crimes \
WHERE DESCRIPTION::text LIKE '%GUN%' \
GROUP BY district")
rows = cursor.fetchall()
gun_crimes_per_district = pd.DataFrame(rows,
columns=['dist_num','number_of_gun_crimes'])
gun_crimes_per_district['dist_num'] = gun_crimes_per_district['dist_num'].astype(str)

gun_crimes_per_district
Out[56]:
dist_num number_of_gun_crimes

0 24 9

1 8 49

2 11 69

3 19 11

4 25 43
dist_num number_of_gun_crimes

5 4 50

6 14 12

7 3 57

8 17 7

9 20 5

10 22 24

11 10 58

12 9 39

13 7 67

14 1 6

15 5 44

16 18 14

17 2 34

18 16 12

19 15 34

20 6 63

21 12 38

In [57]:
district = []
tarea=[]

with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features']
for i in range(len(a)):
obj = a[i]['geometry']
n = a[i]['properties']
district.append(n['dist_num'])
tarea.append(area(obj)/10000)
area_frame = pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
area_frame['dist_num'] = area_frame['dist_num'].astype(str)
final_data= pd.merge(area_frame, gun_crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_gun_crimes']/
(final_data['district_area_inHectares']/100),5)
final_data
Out[57]:
dist_num district_area_inHectares number_of_gun_crimes crime_density

0 17 2492.727155 7 0.28082

1 20 1132.170216 5 0.44163

2 19 2225.035732 11 0.49437

3 25 2827.989237 43 1.52051

4 14 1555.869965 12 0.77127

5 7 1688.670732 67 3.96762

6 3 1576.063931 57 3.61660

7 4 7068.152865 50 0.70740

8 6 2099.682124 63 3.00045

9 22 3490.416073 24 0.68760

10 5 3318.613379 44 1.32585

11 24 1406.081387 9 0.64008

12 16 8171.776367 12 0.14685

13 8 5992.169760 49 0.81773

14 18 1215.520046 14 1.15177

15 12 2509.453028 38 1.51427

16 11 1582.727274 69 4.35956

17 15 989.631393 34 3.43562

18 10 2038.988883 58 2.84455
dist_num district_area_inHectares number_of_gun_crimes crime_density

19 1 1214.818895 6 0.49390

20 9 3505.216898 39 1.11263

21 2 1949.690970 34 1.74387

For Requirements 5 and 6, follow the steps illustrated in the Queries:


• Create the data frame with the appropriate marker data (10 pts).
• Create the folium base map (3 pts).
• Add the Choropleth layer (10 pts).
• Iterate over the data frame you created to place the markers on the
folium map (10 pts).
• Show the final map (2 pts).
Requirement #5: (35 points)¶
• Locate the Block that has the highest number of gun crimes. The popup
on Choropleth map shall display the Block in every district along with
the total number of gun crimes for that block
In [58]:
# by district per the announcement
cursor.execute("SELECT district, count(district) \
FROM crimes \
WHERE DESCRIPTION::text LIKE '%GUN%' \
GROUP BY district")
rows = cursor.fetchall()

crimes_per_district = pd.DataFrame(rows, columns=['dist_num','number_of_gun_crimes'])


crimes_per_district['dist_num'] = crimes_per_district['dist_num'].astype(str)
crimes_per_district
Out[58]:
dist_num number_of_gun_crimes

0 24 9

1 8 49

2 11 69

3 19 11

4 25 43

5 4 50

6 14 12

7 3 57
dist_num number_of_gun_crimes

8 17 7

9 20 5

10 22 24

11 10 58

12 9 39

13 7 67

14 1 6

15 5 44

16 18 14

17 2 34

18 16 12

19 15 34

20 6 63

21 12 38

In [59]:
# using police stations now since we are no longer using blocks for geo location
plotting
query5_df = pd.merge(crimes_per_district, station_locations,on='dist_num')
query5_df
Out[59]:
dist_num number_of_gun_crimes st_latitude st_longitude

0 24 9 41.999763 -87.671324

1 8 49 41.778987 -87.708864

2 11 69 41.873582 -87.705488

3 19 11 41.947400 -87.651512

4 25 43 41.918609 -87.765574
dist_num number_of_gun_crimes st_latitude st_longitude

5 4 50 41.707933 -87.568349

6 14 12 41.921103 -87.697452

7 3 57 41.766431 -87.605748

8 17 7 41.966053 -87.728115

9 20 5 41.979550 -87.692845

10 22 24 41.691435 -87.668520

11 10 58 41.856685 -87.708382

12 9 39 41.837394 -87.646408

13 7 67 41.779632 -87.660887

14 1 6 41.858373 -87.627356

15 5 44 41.692723 -87.604506

16 18 14 41.903242 -87.643352

17 2 34 41.801811 -87.630560

18 16 12 41.974094 -87.766149

19 15 34 41.880083 -87.768200

20 6 63 41.752137 -87.644229

21 12 38 41.862977 -87.656973

In [60]:
total_number_of_gun_crimes_per_district_map = folium.Map(location = (41.8781,
-87.6298),zoom_start = 11)
In [61]:
folium.Choropleth(geo_data = "Boundaries.geojson",
fill_color = 'OrRd',
fill_opacity =0.5,
line_opacity= 1,
data = crimes_per_district,
key_on = 'feature.properties.dist_num',
columns = ['dist_num', 'number_of_gun_crimes'],
legend_name = "CRIMES PER DISTRICT"
).add_to(total_number_of_gun_crimes_per_district_map)
Out[61]:
<folium.features.Choropleth at 0x7fadedcd3cd0>
In [62]:
for index, row in query5_df.iterrows():

pin_location = (row['st_latitude'],row['st_longitude'])
district = row['dist_num']
num_crimes = row['number_of_gun_crimes']

folium.Marker(location = pin_location,
popup =\
folium.Popup(html=f"District No : {district} has Total Number of Gun Crimes:
{num_crimes}",max_width=450)).\
add_to(total_number_of_gun_crimes_per_district_map)
In [63]:
total_number_of_gun_crimes_per_district_map
Out[63]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Requirement #6: (35 points)¶


• Create Marker Clusters on Choropleth map for those gun related crimes
that have Location Description as RESIDENCE in (green icon) and those
that have Location Description as STREET in (red icon) (See Query 5.)
In [69]:
cursor.execute("SELECT DISTINCT ON(caseno) caseno, location_description \
FROM crimes \
WHERE DESCRIPTION::text LIKE '%GUN%'")
rows = cursor.fetchall()

location_crimes = pd.DataFrame(rows, columns=['caseno','description'])


#location_crimes['dist_num'] = location_crimes['dist_num'].astype(str)
location_crimes
Out[69]:
caseno description

0 JC100002 STREET

1 JC100027 RESIDENTIAL YARD (FRONT/BACK)

2 JC100030 ALLEY

3 JC100038 STREET

4 JC100048 ALLEY

... ... ...


caseno description

740 JC128708 STREET

741 JC128736 STREET

742 JC128740 STREET

743 JC130193 STREET

744 JC131249 RESIDENCE

745 rows × 2 columns


In [71]:
# I'm going to use the object from the example queries as I am having a lot
# of connection issues. Constantly getting errors, keep restarting kernal, works,
# then stops again. File is very hard to work with on a test and incremental basis.

df_query6 = pd.merge(gun_crimes_per_district_df, location_crimes, on='caseno' )


In [72]:
gun_crime_cluster_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
folium.Choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
).add_to(gun_crime_arrests_map)
marker_cluster = MarkerCluster().add_to(gun_crime_cluster_map)
In [74]:
gun='%GUN%'
for district in districts:
gun_crimes_per_district_df2 = df_query6[df_query6.dist_num==district]

for index, row in gun_crimes_per_district_df2.iterrows():


description = row['Description']
location_description = row['description']
block = row['block']
loc = (row['latitude'],row['longitude'])
# print("District No: %s <br> Description: %s <br> Block: %s" %
(district,description,block))
if "RESIDENCE" in location_description:
folium.Marker(location=loc,popup = folium.Popup(html=\
f"District No: {district} <br> Description: {description} <br> Block:
{block}"),\
icon=folium.Icon(color='green', icon='ok-
sign'),).add_to(marker_cluster)
# print(f"District {district} green marker at {loc}")
elif "STREET" in location_description:
folium.Marker(location=loc, popup = folium.Popup(html=
f"District No: {district} <br> Description: {description} <br> Block:
{block}"),\
icon=folium.Icon(color='red',icon='remove-
sign'),).add_to(marker_cluster)
else:
continue
In [75]:
gun_crime_cluster_map
Out[75]:
Make this Notebook Trusted to load map: File -> Trust Notebook

You might also like