Simple and practical applications of plotting and data representation in Python.
I use Jupiter notebook alot for the convenience and interactivity it allows.
To get us started, load up necessary python statistical library tools.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#enable jupyter to display matplotlib graphs
%matplotlib notebook
#import a standard matplotlib color cycle
plt.style.use('seaborn-colorblind')
Generate plot data
As part of the preparations, using numpy, generate sample time-series data for the plots.
np.random.seed(123) #makes the random numbers
Then create a dataframe using pandas with three columns(A, B & C).
df = pd.DataFrame({'A': np.random.randn(365).cumsum(0),
'B': np.random.randn(365).cumsum(0)+20,
'C': np.random.randn(365).cumsum(0)-20},
index= pd.date_range('1/1/2017', periods=365))
len(df) #365
df.head() #check top columns and rows of dataframe
and here we go . . .
Simple line plot
df.plot(); #use semi-colon to supress unwanted details on the plot
Scatter plot
df.plot('A', 'B', kind = 'scatter');
Scatter color map
With Values of B
as the color range
df.plot.scatter('A', 'C', c='B',
s = df['B'], colormap='viridis');
Scatter color map with Adjusted Aspect Ratio
ax = df.plot.scatter('A', 'C', c='B',
s = df['B'], colormap='viridis');
ax.set_aspect('equal')
Box Plots
df.plot.box();
Histogram
df.plot.hist(alpha=0.7);
KDE CHART
df.plot.kde();
'''kernel density estimation (KDE) is a
way to estimate probability of random variables'''
Pandas plotting tools
For the demonstrations, I’ll use iris.csv, a dataset with details on three flowers species,
These details include;
- SepalLength
- SepalWidth
- PetalLength
- PetalWidth
Using pandas, we can read the file contents and assign into a dataframe
iris = pd.read_csv('iris.csv')
iris.head() #check top columns and rows of dataframe
1. Scatter Matrix
pd.tools.plotting.scatter_matrix(iris);
2. Parallel Coordinates
plt.figure()
pd.tools.plotting.parallel_coordinates(iris, 'Name');