2/24/2020 4.
3_Merge_append - Jupyter Notebook
Merge DataFrames
“Merging” two datasets is the process of bringing two datasets together into one, and aligning the rows from
each based on common attributes or columns.
In [1]:
import numpy as np
import pandas as pd
In [2]:
user_usage = pd.read_csv('data/user_usage.csv')
user_usage.head(3)
Out[2]:
outgoing_mins_per_month outgoing_sms_per_month monthly_mb use_id
0 21.97 4.82 1557.33 22787
1 1710.08 136.88 7267.55 22788
2 1710.08 136.88 7267.55 22789
In [6]:
user_usage.shape
Out[6]:
(240, 4)
In [3]:
user_device = pd.read_csv('data/user_device.csv')
user_device.head(3)
Out[3]:
use_id user_id platform platform_version device use_type_id
0 22782 26980 ios 10.2 iPhone7,2 2
1 22783 29628 android 6.0 Nexus 5 3
2 22784 28473 android 5.1 SM-G903F 1
In [7]:
user_device.shape
Out[7]:
(272, 6)
localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 1/5
2/24/2020 4.3_Merge_append - Jupyter Notebook
In [4]:
android_devices = pd.read_csv('data/android_devices.csv')
android_devices.head(3)
Out[4]:
Unnamed: Unnamed: U
use_id user_id platform platform_version device use_type_id
6 7
0 22782 26980 ios 10.2 iPhone7,2 2 NaN NaN
1 22783 29628 android 6.0 Nexus 5 3 NaN NaN
SM-
2 22784 28473 android 5.1 1 NaN NaN
G903F
Sample problem
We would like to determine if the usage patterns for users differ between different devices. For example, do
users using Samsung devices use more call minutes than those using LG devices?
We want to form a single dataframe with columns for user usage figures (calls per month, sms per month etc)
and also columns with device information (model, manufacturer, etc). We will need to “merge” (or “join”) our
sample datasets together into one single dataset for analysis.
Merging user_usage with user_devices
Lets see how we can correctly add the “device” and “platform” columns to the user_usage dataframe using the
Pandas Merge command.
In [5]:
result = pd.merge(user_usage,
user_device[['use_id', 'platform', 'device']],
on='use_id')
result.head()
Out[5]:
outgoing_mins_per_month outgoing_sms_per_month monthly_mb use_id platform device
GT-
0 21.97 4.82 1557.33 22787 android
I9505
SM-
1 1710.08 136.88 7267.55 22788 android
G930F
SM-
2 1710.08 136.88 7267.55 22789 android
G930F
3 94.46 35.17 519.12 22790 android D2303
SM-
4 71.59 79.26 1557.33 22792 android
G361F
localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 2/5
2/24/2020 4.3_Merge_append - Jupyter Notebook
In [8]:
result.shape
Out[8]:
(159, 6)
localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 3/5
2/24/2020 4.3_Merge_append - Jupyter Notebook
In [21]:
agg_device = result.groupby("device").count()['platform']
agg_device
Out[21]:
device
A0001 2
C6603 1
D2303 2
D5503 2
D5803 1
D6603 2
E6653 1
EVA-L09 1
F3111 4
GT-I8190N 1
GT-I9195 3
GT-I9300 2
GT-I9505 11
GT-I9506 1
GT-I9515 3
GT-N7100 2
HTC Desire 510 5
HTC Desire 530 1
HTC Desire 620 1
HTC Desire 626 2
HTC Desire 825 3
HTC One M9 1
HTC One S 2
HTC One mini 2 3
HTC One_M8 1
HUAWEI CUN-L01 1
HUAWEI VNS-L31 1
LG-H815 1
Lenovo K51c78 1
Moto G (4) 4
MotoE2(4G-LTE) 1
Nexus 5X 1
ONE A2003 2
ONEPLUS A3003 7
SM-A300FU 4
SM-A310F 2
SM-A500FU 1
SM-G360F 2
SM-G361F 5
SM-G531F 1
SM-G800F 1
SM-G900F 30
SM-G903F 2
SM-G920F 5
SM-G925F 4
SM-G930F 3
SM-G935F 5
SM-J320FN 5
SM-N9005 1
SM-N910F 5
VF-795 1
Vodafone Smart ultra 6 1
X11 2
localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 4/5
2/24/2020 4.3_Merge_append - Jupyter Notebook
iPhone6,2 1
iPhone7,2 1
Name: platform, dtype: int64
Append Dataframes
Join df1 and df2 into a single DataFrame called df
In [23]:
#df = df1.append(df2)
In [ ]:
localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 5/5