week_four - 911 Calls Capstone Project.

This is an analysis of some 911 call data from Kaggle that I took as a progress milestone to cover for the first batch of learnings.

Data and Setup

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Getting started - Data Access

df = pd.read_csv('911.csv')

df.head(3)

	lat	lng	desc	zip	title	timeStamp	twp	addr	e
0	40.297876	-75.581294	REINDEER CT & DEAD END; NEW HANOVER; Station ...	19525.0	EMS: BACK PAINS/INJURY	2015-12-10 17:40:00	NEW HANOVER	REINDEER CT & DEAD END	1
1	40.258061	-75.264680	BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...	19446.0	EMS: DIABETIC EMERGENCY	2015-12-10 17:40:00	HATFIELD TOWNSHIP	BRIAR PATH & WHITEMARSH LN	1
2	40.121182	-75.351975	HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...	19401.0	Fire: GAS-ODOR/LEAK	2015-12-10 17:40:00	NORRISTOWN	HAWS AVE	1

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
lat          99492 non-null float64
lng          99492 non-null float64
desc         99492 non-null object
zip          86637 non-null float64
title        99492 non-null object
timeStamp    99492 non-null object
twp          99449 non-null object
addr         98973 non-null object
e            99492 non-null int64
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB

Top 5 zipcodes for 911 calls

df['zip'].value_counts().head(5)

0    6979
0    6643
0    4854
0    4748
0    3174
Name: zip, dtype: int64

Top 5 townships (twp) for 911 calls

df['twp'].value_counts().head(5)

LOWER MERION    8443
ABINGTON        5977
NORRISTOWN      5890
UPPER MERION    5227
CHELTENHAM      4575
Name: twp, dtype: int64

How many unique ‘title’ codes are there?

len(df['title'].value_counts())

Creating new features

In the titles column there are “Reasons/Departments” specified before the title code.

These are EMS, Fire, and Traffic.

Use .apply() with a custom lambda expression to create a new column called “Reason” that contains this string value.

For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.

df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

df['Reason'].head()

   EMS
   EMS
  Fire
   EMS
   EMS
Name: Reason, dtype: object

The most common Reason for a 911 call based off of this new column

df['Reason'].value_counts()

EMS        48877
Traffic    35695
Fire       14920
Name: Reason, dtype: int64

Use Seaborn to Create a countplot of 911 calls by Reason.

sns.countplot(x='Reason', data= df, palette='coolwarm')

png

Focus on time information.

Data type of the objects in the timeStamp column

type(df['timeStamp'][0])

pandas._libs.tslib.Timestamp

convert the ‘timeStamp’ column from strings to DateTime objects

df['timeStamp'] = pd.to_datetime(df['timeStamp'])

type(df['timeStamp'][0])

pandas._libs.tslib.Timestamp

df['timeStamp'][5]

Timestamp('2015-12-10 17:40:01')

Extract time components eg. Month, hour

print(df['timeStamp'][5].month)
print(df['timeStamp'][5].hour)

12
17

Create 3 new columns called Hour, Month, and Day_of_Week

df['Month'] = df['timeStamp'].apply(lambda timestamp: timestamp.month)

df['Hour'] = df['timeStamp'].apply(lambda timestamp: timestamp.hour)

df['Day_of_week'] = df['timeStamp'].apply(lambda timestamp: timestamp.weekday())

Use the .map() with this dictionary to map the actual string names to the day of the week:

dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

df['Day_of_week'] = df['timeStamp'].apply(lambda timestamp: timestamp.weekday()).map(dmap)

With Seaborn create:

Countplot of the Day of Week with the hue based off of the Reason

sns.countplot(data=df, x='Day_of_week',hue='Reason')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

png

Now Countplot for Month:

sns.countplot(data=df, x='Month',hue='Reason')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

png

The dataset is missing some Months,

fill in this information by possibly a simple line plot that fills in the missing months.

Now create a gropuby object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation.

byMonth = df.groupby('Month').count()

A simple plot of the DataFrame indicating the count of calls per month.

byMonth['twp'].plot()

png

Create a linear fit on the number of calls per month.

Keep in mind you may need to reset the index to a column.

sns.lmplot(data=byMonth.reset_index(), x='Month',y='twp')

png

Create a new column called ‘Date’ that contains the date from the timeStamp column.

You’ll need to use apply along with the .date() method.

df['Date'] = df['timeStamp'].apply(lambda timestamp: timestamp.date())

Group by this Date column with the count() aggregate

and create a plot of counts of 911 calls.

df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

png

Create plots representing a Reason for the 911 call

Plot representing EMS calls

df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title('EMS')

png

Plot representing Fire calls

df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title('Fire')

png

Plot representing Traffic calls

df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title('Traffic')

png

Creating Heatmaps with Seaborn

Restructure the dataframe so that the columns become the Hours, Index becomes the Day of the Week.

There are lots of ways to do this, try to combine groupby with an unstack method.

dayHour = df.groupby(by=['Day_of_week','Hour']).count()['Reason'].unstack()
dayHour.head()

Hour	0	1	2	3	4	5	6	7	8	9	...	14	15	16	17	18	19	20	21	22	23
Day_of_week
Fri	275	235	191	175	201	194	372	598	742	752	...	932	980	1039	980	820	696	667	559	514	474
Mon	282	221	201	194	204	267	397	653	819	786	...	869	913	989	997	885	746	613	497	472	325
Sat	375	301	263	260	224	231	257	391	459	640	...	789	796	848	757	778	696	628	572	506	467
Sun	383	306	286	268	242	240	300	402	483	620	...	684	691	663	714	670	655	537	461	415	330
Thu	278	202	233	159	182	203	362	570	777	828	...	876	969	935	1013	810	698	617	553	424	354

5 rows × 24 columns

Create a HeatMap using this new DataFrame

plt.figure(figsize=(12,6))
sns.heatmap(dayHour, cmap='coolwarm')

png

Create a Clustermap using this DataFrame.

sns.clustermap(dayHour,cmap='viridis')

png

Repeat these same plots and operations,

for a DataFrame that shows the Month as the column.

dayMonth = df.groupby(by=['Day_of_week','Month']).count()['Reason'].unstack()

Create a HeatMap by Month column

plt.figure(figsize=(12,6))
sns.heatmap(dayMonth, cmap='coolwarm')

png

Create a Clustermap by Month column

sns.clustermap(dayMonth,cmap='viridis')

png

So that was the fourth week.. 🔏

https://raw.githubusercontent.com/4bic/4bic.github.io/master/notebooks/911_Project/output_15_1.png