Assignment 3.HTML
Assignment 3.HTML
Deliverables:¶
• Submit two files that are labeled: YourLastName_Assignment_3 that have
the following formats:
1. Your HTML that has your Source code and output
2. Your ipynb script that has your Source code and output
Objectives:¶
• Use SQL to execute different queries to retrieve data from Chicago
Crime dataset and Police statins dataset
• Use Geospatial queries to locate police stations and gun related crimes
(with arrest or no arrest) in every district on Choropleth map
• Use Geospatial queries to provide descriptive stat for every district on
Choropleth map
• Use Geospatial queries to locate the Block that is the furthest
(Maximum Distance) from the police station that has gun related crime
resulted in arrest Object 7
1
2
3
4
5
6
Submission Formats :¶
• Complete IPYNB script that has the source code in Python used to
access and analyze the data. The code should be submitted as an IPYNB
script that can be be loaded and run in Jupyter Notebook for Python.
• Make sure to include your name as part of the IPYNB file name.
• From the File menu select Download As -> HTML (.html) to create an
HTML of the IPYNB file.
IMPORTANT NOTES:¶
• In this assignment you will be connecting to DSCC PostgreSQL server
that has the database and tables already created on the server.
• You need to use your NetID and password for login and passward to
connect PostgreSQL server hosted on DSCC
• For the psycopg2.connect statements listed below, you must provide
your NetID and password in order to connect to PostgreSQL server
hosted on DSCC
In [7]:
# All data
db_connection = psycopg2.connect(host='129.105.248.26',dbname="chicago_crimes_ut",
user="oaf9342" , password="Endeavor13@")
# ----------------------------------------------------------------
# full data
#db_connection = psycopg2.connect(host='129.105.248.26',dbname="chicago_crimes",
user="YourNetID" , password="YourPassword")
# -----------------------------------------------------------------
cursor = db_connection.cursor()
In [8]:
# Uncomment and run this code when your transactions get "stuck"
#db_connection.rollback()
Chicago Crimes Dataset¶
The Crimes_2001_to_present.csv is downloaded from Chicago data portal and
it has roughly 6.5 million records.
While working on this dataset, It is prudent to make a note of the following:
1. Geospatial queries are very demanding for system resouces like CPU,
Memory, and DISK
2. We are interested in the data set of the past 2 years, and when you
execute Geospatial type queries, please be advised that these queries
slow down your machine.
3. There are two dataset that you will use in this assignment
• chicago_crimes_ut : Use this mini dataset that has data for rougly
a month for coding/debugging and unit testing
• chicago_crimes : Use this dataset that has data for 2 years for
your final submission
Algorithm Performance¶
• Sort algorithms used by the database engines vary in performance
between O($N log N$) and O($ N^{2} $) where $N$ is the size of the
number
• Search algorithms used by the database engines vary in performance
between O($log N$) and O($ N $) where $N$ is the size of the number
What are the column names in the 2 tables that we will use from chicago_crimes
database?¶
In [9]:
# Get the column names for table crimes
cursor.execute("SELECT column_name \
FROM INFORMATION_SCHEMA.COLUMNS \
WHERE table_name = 'crimes';")
rows=cursor.fetchall()
rows
Out[9]:
[('id',),
('caseno',),
('date_of_occurrence',),
('block',),
('iucr',),
('primary_type',),
('description',),
('location_description',),
('arrest',),
('domestic',),
('beat',),
('district',),
('ward',),
('community_area',),
('fbi_cd',),
('xcoordinate',),
('ycoordinate',),
('year',),
('updated_on',),
('latitude',),
('longitude',),
('location',),
('where_is',)]
In [10]:
# Get the column names for table police_stations
cursor.execute("SELECT column_name \
FROM INFORMATION_SCHEMA.COLUMNS \
WHERE table_name = 'police_stations';")
rows=cursor.fetchall()
rows
Out[10]:
[('district',),
('district_name',),
('address',),
('city',),
('state',),
('zip',),
('website',),
('phone',),
('fax',),
('tty',),
('x_coordinate',),
('y_coordinate',),
('latitude',),
('longitude',),
('location',),
('where_is',)]
In [11]:
query = "SELECT * \
FROM police_stations"
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
# create a data frame with column headers
stations = pd.DataFrame(data,columns = colnames).drop_duplicates()
stations
Out[11]:
district_nam
district address city state zip website
e
1718 S http://home.chicagopolic
0 1 Central Chicago IL 60616
State St .org/community/distri...
5101 S
http://home.chicagopolic
1 2 Wentworth Wentworth Chicago IL 60609
.org/community/distri...
Ave
7040 S
Grand http://home.chicagopolic
2 3 Cottage Chicago IL 60637
Crossing .org/community/distri...
Grove Ave
727 E http://home.chicagopolic
4 5 Calumet Chicago IL 60628
111th St .org/community/distri...
7808 S http://home.chicagopolic
5 6 Gresham Chicago IL 60620
Halsted St .org/community/distri...
1438 W http://home.chicagopolic
6 7 Englewood Chicago IL 60636
63rd St .org/community/distri...
3420 W http://home.chicagopolic
7 8 Chicago Lawn Chicago IL 60629
63rd St .org/community/distri...
3120 S http://home.chicagopolic
8 9 Deering Chicago IL 60608
Halsted St .org/community/distri...
3315 W http://home.chicagopolic
9 10 Ogden Chicago IL 60623
Ogden Ave .org/community/distri...
3151 W http://home.chicagopolic
10 11 Harrison Chicago IL 60612
Harrison St .org/community/distri...
district_nam
district address city state zip website
e
1412 S
http://home.chicagopolic
11 12 Near West Blue Island Chicago IL 60608
.org/community/distri...
Ave\n
2150 N
http://home.chicagopolic
12 14 Shakespeare California Chicago IL 60647
.org/community/distri...
Ave
5701 W http://home.chicagopolic
13 15 Austin Chicago IL 60644
Madison St .org/community/distri...
5151 N
http://home.chicagopolic
14 16 Jefferson Park Milwaukee Chicago IL 60630
.org/community/distri...
Ave
4650 N http://home.chicagopolic
15 17 Albany Park Chicago IL 60630
Pulaski Rd .org/community/distri...
1160 N
http://home.chicagopolic
16 18 Near North Larrabee Chicago IL 60610
.org/community/distri...
St
850 W http://home.chicagopolic
17 19 Town Hall Chicago IL 60613
Addison St .org/community/distri...
5400 N
http://home.chicagopolic
18 20 Lincoln Lincoln Chicago IL 60625
.org/community/distri...
Ave
1900 W
http://home.chicagopolic
19 22 Morgan Park Monterey Chicago IL 60643
.org/community/distri...
Ave
6464 N http://home.chicagopolic
20 24 Rogers Park Chicago IL 60626
Clark St .org/community/distri...
5555 W http://home.chicagopolic
21 25 Grand Central Chicago IL 60639
Grand Ave .org/community/distri...
Ave .org
In [12]:
# store districts (excluding Headquarters) for later use
districts = stations.query("district != 'Headquarters'").district
districts
Out[12]:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 14
13 15
14 16
15 17
16 18
17 19
18 20
19 22
20 24
21 25
Name: district, dtype: object
In [13]:
# Get the crimes table
query = "SELECT * \
FROM crimes"
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
# create a data frame with column headers
chicago = pd.DataFrame(data,columns = colnames)
chicago.columns
Out[13]:
Index(['id', 'caseno', 'date_of_occurrence', 'block', 'iucr', 'primary_type',
'description', 'location_description', 'arrest', 'domestic', 'beat',
'district', 'ward', 'community_area', 'fbi_cd', 'xcoordinate',
'ycoordinate', 'year', 'updated_on', 'latitude', 'longitude',
'location', 'where_is'],
dtype='object')
In [14]:
for i in cursor.description:
print(i[0])
id
caseno
date_of_occurrence
block
iucr
primary_type
description
location_description
arrest
domestic
beat
district
ward
community_area
fbi_cd
xcoordinate
ycoordinate
year
updated_on
latitude
longitude
location
where_is
DATA DICTIONARY (CRIMES) (See https://www.kaggle.com/currie32/crimes-in-
chicago)
id - Unique identifier for the record.
caseno - The Chicago Police Department RD Number (Records Division
Number), which is unique to the incident.
date_of_occurence - Date when the incident occurred. this is sometimes a
best estimate.
**block - The partially redacted address where the incident occurred, placing
it on the same block as the actual address.
iucr - The Illinois Unifrom Crime Reporting code. This is directly linked to the
Primary Type and Description. See the list of IUCR codes at
https://data.cityofchicago.org/d/c7ck-438e.
primary_type - The primary description of the IUCR code.
description - The secondary description of the IUCR code, a subcategory of
the primary description.
location-description - Description of the location where the incident
occurred.
arrest - Indicates whether an arrest was made.
domestic - Indicates whether the incident was domestic-related as defined
by the Illinois Domestic Violence Act.
beat - Indicates the beat where the incident occurred. A beat is the smallest
police geographic area – each beat has a dedicated police beat car. Three to
five beats make up a police sector, and three sectors make up a police
district. The Chicago Police Department has 22 police districts. See the beats
at https://data.cityofchicago.org/d/aerh-rz74.
district - Indicates the police district where the incident occurred. See the
districts at https://data.cityofchicago.org/d/fthy-xz3r.
ward - The ward (City Council district) where the incident occurred. See the
wards at https://data.cityofchicago.org/d/sp34-6z76.
community_area - Indicates the community area where the incident
occurred. Chicago has 77 community areas. See the community areas at
https://data.cityofchicago.org/d/cauq-8yn6.
fbi_cd - Indicates the crime classification as outlined in the FBI's National
Incident-Based Reporting System (NIBRS). See the Chicago Police
Department listing of these classifications at
http://gis.chicagopolice.org/clearmap_crime_sums/crime_types.html.
xcoordinate - The x coordinate of the location where the incident occurred in
State Plane Illinois East NAD 1983 projection. This location is shifted from
the actual location for partial redaction but falls on the same block.
ycoordinate - The y coordinate of the location where the incident occurred in
State Plane Illinois East NAD 1983 projection. This location is shifted from
the actual location for partial redaction but falls on the same block.
year - Year the incident occurred.
updated_on - Date and time the record was last updated.
latitude - The latitude of the location where the incident occurred. This
location is shifted from the actual location for partial redaction but falls on
the same block.
longitude - The longitude of the location where the incident occurred. This
location is shifted from the actual location for partial redaction but falls on
the same block.
location - The location where the incident occurred in a format that allows
for creation of maps and other geographic operations on this data portal.
This location is shifted from the actual location for partial redaction but falls
on the same block.
In [15]:
chicago.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14436 entries, 0 to 14435
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 14436 non-null int64
1 caseno 14436 non-null object
2 date_of_occurrence 14436 non-null datetime64[ns]
3 block 14436 non-null object
4 iucr 14436 non-null object
5 primary_type 14436 non-null object
6 description 14436 non-null object
7 location_description 14436 non-null object
8 arrest 14436 non-null bool
9 domestic 14436 non-null bool
10 beat 14436 non-null int64
11 district 14436 non-null int64
12 ward 14436 non-null float64
13 community_area 14436 non-null object
14 fbi_cd 14436 non-null object
15 xcoordinate 14436 non-null float64
16 ycoordinate 14436 non-null float64
17 year 14436 non-null int64
18 updated_on 14436 non-null datetime64[ns]
19 latitude 14436 non-null float64
20 longitude 14436 non-null float64
21 location 14436 non-null object
22 where_is 14436 non-null object
dtypes: bool(2), datetime64[ns](2), float64(5), int64(4), object(10)
memory usage: 2.3+ MB
In [16]:
chicago.head()
Out[16]:
date_of_oc primary_ location_de
id caseno block iucr description arrest
currence type scription
050XX S FIRST
2019-01-03
0 24368 JC103046 INDIANA 110 HOMICIDE DEGREE AUTO False
14:37:00
AVE MURDER
050XX S FIRST
2019-01-03
1 24369 JC103046 INDIANA 110 HOMICIDE DEGREE AUTO False
14:37:00
AVE MURDER
FIRST
2019-01-05 012XX N
2 24370 JC105864 110 HOMICIDE DEGREE STREET False
16:23:00 ADA ST
MURDER
060XX FIRST
2019-01-06
3 24372 JC107385 W 63RD 110 HOMICIDE DEGREE STREET False
22:23:00
ST MURDER
002XX FIRST
2019-01-07 PARKING
4 24373 JC108358 W 87TH 110 HOMICIDE DEGREE True
15:45:00 LOT
ST MURDER
5 rows × 23 columns
In [17]:
# use value_counts to see the values of the field
chicago['year'].value_counts()
Out[17]:
2019 14436
Name: year, dtype: int64
In [18]:
# what is in the field domestic
chicago['domestic'].value_counts()
Out[18]:
False 11974
True 2462
Name: domestic, dtype: int64
In [19]:
# let's look at how many crimes are domestic by primary type
pd.crosstab(chicago['primary_type'], chicago['domestic'], margins=True)
Out[19]:
domestic False True All
primary_type
ARSON 18 1 19
GAMBLING 3 0 3
HOMICIDE 15 0 15
HUMAN TRAFFICKING 3 0 3
INTIMIDATION 12 4 16
KIDNAPPING 4 0 4
NON-CRIMINAL 1 0 1
OBSCENITY 0 1 1
primary_type
PROSTITUTION 58 0 58
SEX OFFENSE 41 3 44
STALKING 7 2 9
In [20]:
# percentages would be more informative
# ------ lambda expressions section 4.7.6 ('syntactic sugar for a normal function
definition')
# https://docs.python.org/3/tutorial/controlflow.html
pd.crosstab(chicago['primary_type'], chicago['domestic']).apply(lambda x: x/x.sum(),
axis = 1)
Out[20]:
domestic False True
primary_type
primary_type
primary_type
crimes_per_district
Out[22]:
dist_num number_of_crimes
0 24 412
1 11 1113
2 8 886
3 19 608
4 25 726
5 4 790
6 14 508
7 3 695
8 17 386
dist_num number_of_crimes
9 20 254
10 22 442
11 9 585
12 10 779
13 7 745
14 1 854
15 5 635
16 18 869
17 2 602
18 16 473
19 15 530
20 6 844
21 12 700
In [23]:
# For each district, excluding HQ, we get the latitude/longitude of the police station
at the district.
# With a few exceptions, will be inserting markers on the map at these police station
locations
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district
\
FROM police_stations \
WHERE district!='Headquarters'""")
police_stations = cursor.fetchall()
station_locations
Out[23]:
st_latitude st_longitude dist_num
0 41.858373 -87.627356 1
1 41.801811 -87.630560 2
2 41.766431 -87.605748 3
3 41.707933 -87.568349 4
4 41.692723 -87.604506 5
5 41.752137 -87.644229 6
6 41.779632 -87.660887 7
7 41.778987 -87.708864 8
8 41.837394 -87.646408 9
9 41.856685 -87.708382 10
10 41.873582 -87.705488 11
11 41.862977 -87.656973 12
12 41.921103 -87.697452 14
13 41.880083 -87.768200 15
14 41.974094 -87.766149 16
15 41.966053 -87.728115 17
16 41.903242 -87.643352 18
17 41.947400 -87.651512 19
18 41.979550 -87.692845 20
19 41.691435 -87.668520 22
20 41.999763 -87.671324 24
21 41.918609 -87.765574 25
In [24]:
# Add columns for the coordinates of each distract. Again, this is going to be the
locations of our markers.
query1_df = pd.merge(crimes_per_district, station_locations,on='dist_num')
query1_df
Out[24]:
dist_num number_of_crimes st_latitude st_longitude
Query #2:¶
• Calculate the total number of violent crimes in every district and plot
that in a table on Choropleth map. In other words, create a Choropleth
map where the districts are shaded in proportion to the number of
violent crimes in that district.
Note: A crime is considered a violent crime if the PRIMARY_TYPE of the
crimes is THEFT ,ASSAULT,ROBBER,KIDNAPPING,CRIM SEXUAL ASSAULT,BATTERY, or
MURDER.
• Then find the total number of crimes in the district for each of these
primary types of violent crime and add a popup marker (located at that
district's police headquarter) that displays a DataFrame containing this
data.
In [29]:
violent_crime_categories = 'THEFT','ASSAULT','ROBBERY','KIDNAPPING','CRIM SEXUAL
ASSAULT','BATTERY','MURDER'
In [30]:
# Get the total number of violent crimes per district.
cursor.execute("SELECT district, count(district)\
FROM crimes \
WHERE PRIMARY_TYPE in %s \
GROUP BY district",[violent_crime_categories])
rows=cursor.fetchall()
violent_crime_data=pd.DataFrame(rows, columns=['dist_num','number_of_violent_crimes'])
violent_crime_data['dist_num'] = violent_crime_data['dist_num'].astype(str)
violent_crime_data
Out[30]:
dist_num number_of_violent_crimes
0 8 488
dist_num number_of_violent_crimes
1 11 407
2 24 216
3 19 368
4 25 362
5 4 400
6 14 309
7 3 387
8 17 220
9 20 147
10 22 217
11 9 312
12 7 367
13 10 362
14 1 624
15 5 293
16 18 605
17 2 349
18 16 252
19 15 291
20 6 457
21 12 430
Now, lets create a dataframe of the different types of violent crimes for
every district
In [31]:
cursor.execute(f"SELECT district, PRIMARY_TYPE, count(PRIMARY_TYPE) \
FROM crimes \
WHERE PRIMARY_TYPE in {violent_crime_categories} \
GROUP BY distrIct,PRIMARY_TYPE")
rows=cursor.fetchall()
violent_crime_type = pd.DataFrame(rows,
columns=['dist_num','Description','number_of_violent_crimes'])
violent_crime_type['dist_num'] = violent_crime_type['dist_num'].astype(str)
violent_crime_type
Out[31]:
dist_num Description number_of_violent_crimes
0 24 ROBBERY 15
2 8 BATTERY 174
3 12 ASSAULT 38
4 18 THEFT 417
109 10 ASSAULT 63
110 18 ROBBERY 33
111 5 ASSAULT 62
CRIM SEXUAL
4 24 3 41.999763 -87.671324
ASSAULT
CRIM SEXUAL
108 14 3 41.921103 -87.697452
ASSAULT
#districts = query2_df.dist_num.unique()
for district in districts:
df2d = query2_df[query2_df.dist_num == district] # new dataframe to add to marker.
police_station_location = tuple(df2d[['st_latitude','st_longitude']].values[0])
violent_crimes_per_district_df = df2d[['Description','number_of_violent_crimes']]
# Convert out new data frame to html before inserting it into the label.
header = violent_crimes_per_district_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
# Create the folium marker and add it to the map.
folium.Marker(location=police_station_location, popup=folium.\
Popup(html=f"District Number {district} - Violent Crimes
{header}")).\
add_to(violent_crimes_per_district_map)
Query #3:¶
• Calculate the total number of gun related violent crimes in every
district and plot that in a table on Choropleth map.
Note: A crime is considered a gun related violent crime if the word "gun" is
contained in the DESCRIPTION and the PRIMARY_TYPE of the crimes is
THEFT ,ASSAULT,ROBBER,KIDNAPPING,CRIM SEXUAL ASSAULT,BATTERY, or MURDER.
• Then find the total number of crimes in the district for the different
DESCRIPTIONs containing the word "gun" and add a popup marker
(located at that district's police headquarter) that displays a DataFrame
containing this data.
Lets first create a dataframe of gun crimes per district first to get an idea
about the number of gun crimes per district
In [37]:
# isolate violent gun crime
gun='%GUN%'
cursor.execute(f"SELECT district, count(district)\
FROM crimes\
WHERE PRIMARY_TYPE in {violent_crime_categories}\
AND DESCRIPTION::text LIKE '{gun}' GROUP BY district")
districts_violent_gun_crimes = cursor.fetchall()
districts_violent_gun_crimes_df = pd.DataFrame(districts_violent_gun_crimes,
columns=['dist_num','violent_gun_crimes'])
districts_violent_gun_crimes_df['dist_num'] =
districts_violent_gun_crimes_df['dist_num'].astype(str)
districts_violent_gun_crimes_df.head()
Out[37]:
dist_num violent_gun_crimes
0 24 9
18 38
2 11 36
3 19 9
4 25 25
Now, lets create a dataframe of the different types of violent gun crimes for
every district
In [38]:
cursor.execute(f"SELECT district, DESCRIPTION, count(DESCRIPTION) \
FROM crimes \
WHERE PRIMARY_TYPE in {violent_crime_categories} AND DESCRIPTION::text
LIKE '{gun}' \
GROUP BY district, DESCRIPTION")
rows=cursor.fetchall()
violent_gun_crime_type = pd.DataFrame(rows,
columns=['dist_num','Description','number_of_violent_gun_crimes'])
violent_gun_crime_type['dist_num'] = violent_gun_crime_type['dist_num'].astype(str)
violent_gun_crime_type.head()
Out[38]:
dist_num Description number_of_violent_gun_crimes
08 ARMED: HANDGUN 23
1 11 AGGRAVATED: HANDGUN 15
26 AGGRAVATED: HANDGUN 15
3 20 ATTEMPT: ARMED-HANDGUN 1
4 19 AGGRAVATED: HANDGUN 5
In [39]:
query3_df = pd.merge(violent_gun_crime_type, station_locations, on = 'dist_num')
query3_df.head()
Out[39]:
dist_nu number_of_violent_gu st_latitud st_longitud
Description
m n_crimes e e
ATTEMPT: ARMED-
18 2 41.778987 -87.708864
HANDGUN
AGGRAVATED:
28 13 41.778987 -87.708864
HANDGUN
AGGRAVATED:
3 11 15 41.873582 -87.705488
HANDGUN
police_station_location = tuple(df3d[['st_latitude','st_longitude']].values[0])
violent_gun_crimes_per_district_df =
df3d[['Description','number_of_violent_gun_crimes']]
header = violent_gun_crimes_per_district_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
folium.Marker(location=police_station_location, popup=folium.\
Popup(html=f"District Number {district} - Violent GUN Crimes
{header}")).\
add_to(districts_violent_gun_crimes_map)
folium.Marker(location = police_station_location,
popup =\
folium.Popup(html=f"District No : {district} - Violent Gun Crimes
{header}",max_width=450)).\
add_to(districts_violent_gun_crimes_map)
In [42]:
districts_violent_gun_crimes_map
Out[42]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Query #4:¶
• Calculate the crime density per district
Boundaries.geojson: { "type": "FeatureCollection", "features":
[ {"type":"Feature","properties":{"dist_num":"17","dist_label":"17TH"},
"geometry": {"type":"MultiPolygon","coordinates": [[[[-
87.71067089391354,41.997365655369435], [-
87.71066884721016,41.99729359357709], [-
87.71066053080999,41.997225765680135], ....
In [43]:
district = []
tarea=[]
with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features'] # a is a list of district data (dictionaries)
for i in range(len(a)):
obj = a[i]['geometry'] # list of coordinates defining the ith district boundary
n = a[i]['properties'] # dictionary with district number and district label
keyes for ith district
district.append(n['dist_num']) # add district number to the district list
tarea.append(area(obj)/10000) # add the the area (in hectares) to area list
af = pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
final_data= pd.merge(af, crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_crimes']/
(final_data['district_area_inHectares']/100))
final_data
Out[43]:
dist_num district_area_inHectares number_of_crimes crime_density
Query #5:¶
• Create Marker Clusters on Choropleth map for those gun related crimes
that resulted in arrest (green icon) and those that didn't (red icon)
In [44]:
# isolate gun crime
gun='%GUN%'
cursor.execute(f"SELECT district, count(district)\
FROM crimes\
WHERE DESCRIPTION::text LIKE '{gun}' GROUP BY district")
districts_gun_crimes = cursor.fetchall()
districts_gun_crimes_df = pd.DataFrame(districts_gun_crimes,
columns=['dist_num','gun_crimes'])
districts_gun_crimes_df['dist_num'] = districts_gun_crimes_df['dist_num'].astype(str)
districts_gun_crimes_df.head()
Out[44]:
dist_num gun_crimes
0 24 9
18 49
2 11 69
3 19 11
4 25 43
In [45]:
# This time our dataframe will contain one row for each district and each block in the
district.
# Each row contains information about a particular gun crime including the block in
which the crime occured.
# We also save the location of the crime incidendent and whether there was an arrest.
# We will use this information when creating the markers.
# In particular, markers will be positioned at the crime location instead of police
station location.
data= []
for district in districts:
cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block, DESCRIPTION,
count(arrest), arrest,latitude,longitude\
FROM crimes\
WHERE district = %s and DESCRIPTION::text LIKE %s \
GROUP BY caseno, block, DESCRIPTION,arrest, latitude, longitude""",
[district,gun])
results = cursor.fetchall()
# insert the district number as the first element of each result list
gun_crimes_per_district = [[district]+list(result) for result in results]
data += (gun_crimes_per_district)
004XX W
AGGRAVATED:
01 JC105034 VAN BUREN 1 False 41.876794 -87.637817
HANDGUN
ST
ATTEMPT:
008XX S
11 JC111601 ARMED- 1 False 41.871096 -87.629570
PARK TER
HANDGUN
ATTEMPT:
0000X W
21 JC111881 ARMED- 1 False 41.878802 -87.627917
QUINCY ST
HANDGUN
022XX S
AGGRAVATED:
31 JC118978 MICHIGAN 1 True 41.852284 -87.623790
HANDGUN
AVE
029XX S UNLAWFUL
41 JC119606 DEARBORN POSS OF 1 True 41.841856 -87.628742
ST HANDGUN
gun='%GUN%'
for district in districts:
gun_crimes_per_district_df2 =
gun_crimes_per_district_df[gun_crimes_per_district_df.dist_num==district]
In [48]:
gun_crime_arrests_map
Out[48]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Query #6:¶
• Plot on Choropleth map the farthest Block that has a gun crime from
every police station in every district
Locate the farthest gun crime from the police station in every district. Create
a Choropleth map where the districts are shaded in proportion to the number
of gun crimes in that district. For each district, find the gun crime that was
farthest from police station. Add a pop-up on the Choropleth map to display
the district number and the Block where the farthest gun crime occurred.
Also add circle marker (of radius 5) at the location of the farthest gun crime.
In [49]:
# This complex query returns for each block in each district the gun crime which
occured furthest
# from the police station. We also store this distance together with the police station
location.
# Once again we will be positioning the markers at the police station locations.
results= []
for district in districts:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block,
A.where_is,ST_Distance(A.where_is,B.where_is)\
FROM crimes as A, police_stations as B
WHERE A.district=%s and DESCRIPTION::text LIKE %s and B.district= %s
and ST_Distance(A.where_is,B.where_is)
IN
( SELECT max(dist)
FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist
FROM crimes as A, police_stations as B
WHERE A.district=%s and DESCRIPTION::text LIKE %s and B.district= %s ) as
f)""",
[district,gun,district,district,gun, district])
farthest_block_gun_crime = list(cursor.fetchall()[0])
cursor.execute(f"SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))"
,(farthest_block_gun_crime[2],farthest_block_gun_crime[2]))
farthest_block_gun_crime[2:3] = cursor.fetchall()[0]
results.append(farthest_block_gun_crime)
farthest_gun_crime_df =
pd.DataFrame(results,columns=['dist_num','block','crime_lat','crime_long','dist'])
farthest_gun_crime_df['dist_num']= farthest_gun_crime_df['dist_num'].astype(str)
query6_df = pd.merge(farthest_gun_crime_df, station_locations,on='dist_num')
query6_df.head()
Out[49]:
dist_n crime_lon st_latitud st_longitud
block crime_lat dist
um g e e
004XX W VAN
01 41.876794 -87.637817 1171.461889 41.858373 -87.627356
BUREN ST
055XX S
12 41.794425 -87.582519 5365.880774 41.801811 -87.630560
EVERETT AVE
026XX E
23 41.759100 -87.560145 5093.598187 41.766431 -87.605748
75TH ST
101XX S
34 41.711374 -87.535281 3693.438062 41.707933 -87.568349
EWING AVE
118XX S
45 41.678829 -87.658995 6086.359525 41.692723 -87.604506
LAFLIN ST
Out[50]:
<folium.features.Choropleth at 0x7faded14a730>
In [51]:
# This time we place a circle markers at the crime locations in addition to the
"regular" markers at the
# police station locations.
data = query6_df.loc[query6_df.dist_num==district]
# print(data)
# print(data['block'].values)
folium.CircleMarker((crime_lat,crime_long),radius=5,color='#ff3187',
popup=folium.Popup(html=f"District No.: {district} <br> Block:
{block}")).\
add_to(farthest_block_gun_crime_map)
folium.Marker(location=(police_lat,police_long),
popup=folium.Popup(\
html=f"Police Station <br> District No.: {district} <br> Farthest
Gun_Crime Block:{block}")).\
add_to(farthest_block_gun_crime_map)
farthest_block_gun_crime_map
Out[51]:
Make this Notebook Trusted to load map: File -> Trust Notebook
# Requirements ** The HTML document your are submitting must have the
source code and the output for the following requirements **
Requirement #1: (10 points)¶
• Use SQL to select the following fields from crimes - district, ward,
arrest, primary_type and location_description. Select the records for
those crimes that occurred in a RESIDENCE.
• Iterate over the results to display the primary_type of each crime within
District 1.
• Your output should look like the sample output shown below.
Sample Output: Primary Type: BURGLARY Primary Type: THEFT Primary Type:
BATTERY Primary Type: THEFT Primary Type: BATTERY Primary Type:
DECEPTIVE PRACTICE Primary Type: DECEPTIVE PRACTICE Primary Type:
OTHER OFFENSE Primary Type: BATTERY Primary Type: DECEPTIVE PRACTICE
Primary Type: THEFT Primary Type: DECEPTIVE PRACTICE Primary Type:
DECEPTIVE PRACTICE Primary Type: OTHER OFFENSE Primary Type: THEFT
Primary Type: DECEPTIVE PRACTICE Primary Type: CRIMINAL D
In [52]:
residence='%RESIDENCE%'
cursor.execute(f"SELECT district, ward, arrest, primary_type, location_description \
FROM crimes\
WHERE location_description LIKE '{residence}'")
residence_crimes = cursor.fetchall()
residence_crimes_df = pd.DataFrame(residence_crimes,
columns=['district','ward','arrest','primary_type','location_description'])
residence_crimes_df['district'] = residence_crimes_df['district'].astype(str)
residence_crimes_df.head()
primary_type
primary_type
primary_type
In [55]:
'''Based on the table data above, we can observe that roughly 80% of crimes involving
assault do not result in
the arrest of the perpetrator. This is a casted observation across all districts in the
City of Chicago and is
not specific to geographic location within the city.'''
Out[55]:
'Based on the table data above, we can observe that roughly 80% of crimes involving
assault do not result in\nthe arrest of the perpetrator. This is a casted observation
across all districts in the City of Chicago and is\nnot specific to geographic location
within the city.'
Requirement #4: (10 points)¶
• Calculate the gun crimes density in every district. (See Query 4.)
In [56]:
cursor.execute("SELECT district, count(district) \
FROM crimes \
WHERE DESCRIPTION::text LIKE '%GUN%' \
GROUP BY district")
rows = cursor.fetchall()
gun_crimes_per_district = pd.DataFrame(rows,
columns=['dist_num','number_of_gun_crimes'])
gun_crimes_per_district['dist_num'] = gun_crimes_per_district['dist_num'].astype(str)
gun_crimes_per_district
Out[56]:
dist_num number_of_gun_crimes
0 24 9
1 8 49
2 11 69
3 19 11
4 25 43
dist_num number_of_gun_crimes
5 4 50
6 14 12
7 3 57
8 17 7
9 20 5
10 22 24
11 10 58
12 9 39
13 7 67
14 1 6
15 5 44
16 18 14
17 2 34
18 16 12
19 15 34
20 6 63
21 12 38
In [57]:
district = []
tarea=[]
with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features']
for i in range(len(a)):
obj = a[i]['geometry']
n = a[i]['properties']
district.append(n['dist_num'])
tarea.append(area(obj)/10000)
area_frame = pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
area_frame['dist_num'] = area_frame['dist_num'].astype(str)
final_data= pd.merge(area_frame, gun_crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_gun_crimes']/
(final_data['district_area_inHectares']/100),5)
final_data
Out[57]:
dist_num district_area_inHectares number_of_gun_crimes crime_density
0 17 2492.727155 7 0.28082
1 20 1132.170216 5 0.44163
2 19 2225.035732 11 0.49437
3 25 2827.989237 43 1.52051
4 14 1555.869965 12 0.77127
5 7 1688.670732 67 3.96762
6 3 1576.063931 57 3.61660
7 4 7068.152865 50 0.70740
8 6 2099.682124 63 3.00045
9 22 3490.416073 24 0.68760
10 5 3318.613379 44 1.32585
11 24 1406.081387 9 0.64008
12 16 8171.776367 12 0.14685
13 8 5992.169760 49 0.81773
14 18 1215.520046 14 1.15177
15 12 2509.453028 38 1.51427
16 11 1582.727274 69 4.35956
17 15 989.631393 34 3.43562
18 10 2038.988883 58 2.84455
dist_num district_area_inHectares number_of_gun_crimes crime_density
19 1 1214.818895 6 0.49390
20 9 3505.216898 39 1.11263
21 2 1949.690970 34 1.74387
0 24 9
1 8 49
2 11 69
3 19 11
4 25 43
5 4 50
6 14 12
7 3 57
dist_num number_of_gun_crimes
8 17 7
9 20 5
10 22 24
11 10 58
12 9 39
13 7 67
14 1 6
15 5 44
16 18 14
17 2 34
18 16 12
19 15 34
20 6 63
21 12 38
In [59]:
# using police stations now since we are no longer using blocks for geo location
plotting
query5_df = pd.merge(crimes_per_district, station_locations,on='dist_num')
query5_df
Out[59]:
dist_num number_of_gun_crimes st_latitude st_longitude
0 24 9 41.999763 -87.671324
1 8 49 41.778987 -87.708864
2 11 69 41.873582 -87.705488
3 19 11 41.947400 -87.651512
4 25 43 41.918609 -87.765574
dist_num number_of_gun_crimes st_latitude st_longitude
5 4 50 41.707933 -87.568349
6 14 12 41.921103 -87.697452
7 3 57 41.766431 -87.605748
8 17 7 41.966053 -87.728115
9 20 5 41.979550 -87.692845
10 22 24 41.691435 -87.668520
11 10 58 41.856685 -87.708382
12 9 39 41.837394 -87.646408
13 7 67 41.779632 -87.660887
14 1 6 41.858373 -87.627356
15 5 44 41.692723 -87.604506
16 18 14 41.903242 -87.643352
17 2 34 41.801811 -87.630560
18 16 12 41.974094 -87.766149
19 15 34 41.880083 -87.768200
20 6 63 41.752137 -87.644229
21 12 38 41.862977 -87.656973
In [60]:
total_number_of_gun_crimes_per_district_map = folium.Map(location = (41.8781,
-87.6298),zoom_start = 11)
In [61]:
folium.Choropleth(geo_data = "Boundaries.geojson",
fill_color = 'OrRd',
fill_opacity =0.5,
line_opacity= 1,
data = crimes_per_district,
key_on = 'feature.properties.dist_num',
columns = ['dist_num', 'number_of_gun_crimes'],
legend_name = "CRIMES PER DISTRICT"
).add_to(total_number_of_gun_crimes_per_district_map)
Out[61]:
<folium.features.Choropleth at 0x7fadedcd3cd0>
In [62]:
for index, row in query5_df.iterrows():
pin_location = (row['st_latitude'],row['st_longitude'])
district = row['dist_num']
num_crimes = row['number_of_gun_crimes']
folium.Marker(location = pin_location,
popup =\
folium.Popup(html=f"District No : {district} has Total Number of Gun Crimes:
{num_crimes}",max_width=450)).\
add_to(total_number_of_gun_crimes_per_district_map)
In [63]:
total_number_of_gun_crimes_per_district_map
Out[63]:
Make this Notebook Trusted to load map: File -> Trust Notebook
0 JC100002 STREET
2 JC100030 ALLEY
3 JC100038 STREET
4 JC100048 ALLEY