This is an analysis of some 911 call data from Kaggle that I took as a progress milestone to cover for the first batch of learnings.

Data and Setup

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Getting started - Data Access

df = pd.read_csv('911.csv')

df.head(3)
lat lng desc zip title timeStamp twp addr e
0 40.297876 -75.581294 REINDEER CT & DEAD END; NEW HANOVER; Station ... 19525.0 EMS: BACK PAINS/INJURY 2015-12-10 17:40:00 NEW HANOVER REINDEER CT & DEAD END 1
1 40.258061 -75.264680 BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP... 19446.0 EMS: DIABETIC EMERGENCY 2015-12-10 17:40:00 HATFIELD TOWNSHIP BRIAR PATH & WHITEMARSH LN 1
2 40.121182 -75.351975 HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St... 19401.0 Fire: GAS-ODOR/LEAK 2015-12-10 17:40:00 NORRISTOWN HAWS AVE 1
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
lat          99492 non-null float64
lng          99492 non-null float64
desc         99492 non-null object
zip          86637 non-null float64
title        99492 non-null object
timeStamp    99492 non-null object
twp          99449 non-null object
addr         98973 non-null object
e            99492 non-null int64
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB

Top 5 zipcodes for 911 calls

df['zip'].value_counts().head(5)
19401.0    6979
19464.0    6643
19403.0    4854
19446.0    4748
19406.0    3174
Name: zip, dtype: int64

Top 5 townships (twp) for 911 calls

df['twp'].value_counts().head(5)
LOWER MERION    8443
ABINGTON        5977
NORRISTOWN      5890
UPPER MERION    5227
CHELTENHAM      4575
Name: twp, dtype: int64

How many unique ‘title’ codes are there?

len(df['title'].value_counts())
110

Creating new features

In the titles column there are “Reasons/Departments” specified before the title code.

These are EMS, Fire, and Traffic.

Use .apply() with a custom lambda expression to create a new column called “Reason” that contains this string value.

For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.

df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

df['Reason'].head()
0     EMS
1     EMS
2    Fire
3     EMS
4     EMS
Name: Reason, dtype: object

The most common Reason for a 911 call based off of this new column

df['Reason'].value_counts()
EMS        48877
Traffic    35695
Fire       14920
Name: Reason, dtype: int64

Use Seaborn to Create a countplot of 911 calls by Reason.

sns.countplot(x='Reason', data= df, palette='coolwarm')

png

Focus on time information.

Data type of the objects in the timeStamp column

type(df['timeStamp'][0])
pandas._libs.tslib.Timestamp

convert the ‘timeStamp’ column from strings to DateTime objects

df['timeStamp'] = pd.to_datetime(df['timeStamp'])

type(df['timeStamp'][0])
pandas._libs.tslib.Timestamp
df['timeStamp'][5]
Timestamp('2015-12-10 17:40:01')

Extract time components eg. Month, hour

print(df['timeStamp'][5].month)
print(df['timeStamp'][5].hour)
12
17

Create 3 new columns called Hour, Month, and Day_of_Week

df['Month'] = df['timeStamp'].apply(lambda timestamp: timestamp.month)

df['Hour'] = df['timeStamp'].apply(lambda timestamp: timestamp.hour)

df['Day_of_week'] = df['timeStamp'].apply(lambda timestamp: timestamp.weekday())

Use the .map() with this dictionary to map the actual string names to the day of the week:

dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

df['Day_of_week'] = df['timeStamp'].apply(lambda timestamp: timestamp.weekday()).map(dmap)

With Seaborn create:

Countplot of the Day of Week with the hue based off of the Reason

sns.countplot(data=df, x='Day_of_week',hue='Reason')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

png

Now Countplot for Month:

sns.countplot(data=df, x='Month',hue='Reason')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

png

The dataset is missing some Months,

fill in this information by possibly a simple line plot that fills in the missing months.

Now create a gropuby object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation.

byMonth = df.groupby('Month').count()

A simple plot of the DataFrame indicating the count of calls per month.

byMonth['twp'].plot()

png

Create a linear fit on the number of calls per month.

Keep in mind you may need to reset the index to a column.

sns.lmplot(data=byMonth.reset_index(), x='Month',y='twp')

png

Create a new column called ‘Date’ that contains the date from the timeStamp column.

You’ll need to use apply along with the .date() method.

df['Date'] = df['timeStamp'].apply(lambda timestamp: timestamp.date())

Group by this Date column with the count() aggregate

and create a plot of counts of 911 calls.

df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

png

Create plots representing a Reason for the 911 call

Plot representing EMS calls

df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title('EMS')

png

Plot representing Fire calls

df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title('Fire')

png

Plot representing Traffic calls

df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title('Traffic')

png

Creating Heatmaps with Seaborn

Restructure the dataframe so that the columns become the Hours, Index becomes the Day of the Week.

There are lots of ways to do this, try to combine groupby with an unstack method.

dayHour = df.groupby(by=['Day_of_week','Hour']).count()['Reason'].unstack()
dayHour.head()
Hour 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
Day_of_week
Fri 275 235 191 175 201 194 372 598 742 752 ... 932 980 1039 980 820 696 667 559 514 474
Mon 282 221 201 194 204 267 397 653 819 786 ... 869 913 989 997 885 746 613 497 472 325
Sat 375 301 263 260 224 231 257 391 459 640 ... 789 796 848 757 778 696 628 572 506 467
Sun 383 306 286 268 242 240 300 402 483 620 ... 684 691 663 714 670 655 537 461 415 330
Thu 278 202 233 159 182 203 362 570 777 828 ... 876 969 935 1013 810 698 617 553 424 354

5 rows × 24 columns

Create a HeatMap using this new DataFrame

plt.figure(figsize=(12,6))
sns.heatmap(dayHour, cmap='coolwarm')

png

Create a Clustermap using this DataFrame.

sns.clustermap(dayHour,cmap='viridis')

png

Repeat these same plots and operations,

for a DataFrame that shows the Month as the column.

dayMonth = df.groupby(by=['Day_of_week','Month']).count()['Reason'].unstack()

Create a HeatMap by Month column

plt.figure(figsize=(12,6))
sns.heatmap(dayMonth, cmap='coolwarm')

png

Create a Clustermap by Month column

sns.clustermap(dayMonth,cmap='viridis')

png

So that was the fourth week.. 🔏

https://raw.githubusercontent.com/4bic/4bic.github.io/master/notebooks/911_Project/output_15_1.png