In this notebook Fake Data about some purchases done through Amazon! is used.
Please excuse anything that doesn't make "Real-World" sense in the dataframe, all the data is fake and made-up.
Data is imported from an Ecommerce Purchases csv file and set it to a DataFrame called ecom. **
import pandas as pd
ecom = pd.read_csv('Ecommerce Purchases')
Check the head of the DataFrame.
How many rows and columns are there?
What is the average Purchase Price?
ecom['Purchase Price'].mean()
What were the highest and lowest purchase prices?
ecom['Purchase Price'].max()
ecom['Purchase Price'].min()
How many people have English 'en' as their Language of choice on the website?
ecom[ecom['Language']== 'en'].count()
How many people have the job title of "Lawyer" ?
sum(ecom['Job'] == 'Lawyer')
How many people made the purchase during the AM and how many people made the purchase during PM ?
(Hint: Check out value_counts() )
ecom['AM or PM'].value_counts()
What are the 5 most common Job Titles?
Someone made a purchase that came from Lot: "90 WT" , what was the Purchase Price for this transaction?
ecom[ecom['Lot']== "90 WT"]['Purchase Price']
What is the email of the person with the following Credit Card Number: 4926535242672853
ecom[ecom['Credit Card']==4926535242672853]['Email']
How many people have American Express as their Credit Card Provider and made a purchase above $95 ?
ecom[(ecom['CC Provider']=="American Express") & (ecom['Purchase Price'] > 95)].count()
Hard: How many people have a credit card that expires in 2025?
ecom[ecom['CC Exp Date'].apply(lambda exp: exp[3:] =='25')].count()
Hard: What are the top 5 most popular email providers/hosts (e.g.,, etc...)
# ecom['Email']
ecom['Email'].apply(lambda email: email.split('@')[1]).value_counts().head(5)