Simple and practical applications of plotting and data representation in Python.

I use Jupiter notebook alot for the convenience and interactivity it allows.

To get us started, load up necessary python statistical library tools.

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt

    #enable jupyter to display matplotlib graphs
    %matplotlib notebook

    #import a standard matplotlib color cycle
    plt.style.use('seaborn-colorblind')

Generate plot data

As part of the preparations, using numpy, generate sample time-series data for the plots.


    np.random.seed(123) #makes the random numbers

Then create a dataframe using pandas with three columns(A, B & C).

    df = pd.DataFrame({'A': np.random.randn(365).cumsum(0),
                       'B': np.random.randn(365).cumsum(0)+20,
                       'C': np.random.randn(365).cumsum(0)-20},
                     index= pd.date_range('1/1/2017', periods=365))
    len(df) #365
    df.head() #check top columns and rows of dataframe

and here we go . . .

Simple line plot


df.plot(); #use semi-colon to supress unwanted details on the plot

png

Scatter plot

    df.plot('A', 'B', kind = 'scatter');

png

Scatter color map

With Values of B as the color range

    df.plot.scatter('A', 'C', c='B',
                    s = df['B'], colormap='viridis');

png

Scatter color map with Adjusted Aspect Ratio

    ax = df.plot.scatter('A', 'C', c='B',
                        s = df['B'], colormap='viridis');

    ax.set_aspect('equal')

png

Box Plots

    df.plot.box();

png

Histogram

df.plot.hist(alpha=0.7);

png

KDE CHART

df.plot.kde();
'''kernel density estimation (KDE) is a
way to estimate probability of random variables'''

png

Pandas plotting tools

For the demonstrations, I’ll use iris.csv, a dataset with details on three flowers species,

These details include;

    - SepalLength
    - SepalWidth
    - PetalLength
    - PetalWidth

Using pandas, we can read the file contents and assign into a dataframe

    iris = pd.read_csv('iris.csv')

    iris.head() #check top columns and rows of dataframe

1. Scatter Matrix

    pd.tools.plotting.scatter_matrix(iris);

png

2. Parallel Coordinates

plt.figure()
pd.tools.plotting.parallel_coordinates(iris, 'Name');

png