0% found this document useful (0 votes)

23 views11 pages

Group - by Python Code

The document discusses the 'group by' operation in data analysis using pandas, highlighting its importance for conditional aggregation of datasets. It provides examples of grouping data by specific keys, performing aggregations, and analyzing datasets such as NYC flight data and phone call records. The document also includes code snippets demonstrating various group by operations and their results.

Uploaded by

paulrajarshi7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views11 pages

Group - by Python Code

Uploaded by

paulrajarshi7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2/24/2020 3.

3_group_by - Jupyter Notebook

Group By: Split, Apply, Combine

Simple aggregations can give you a flavor of your dataset, but often we would prefer to aggregate conditionally
on some label or index: this is implemented in the so-called groupby operation. The name "group by" comes
from a command in the SQL database language, but it is perhaps more illuminative to think of it in the terms
first coined by Hadley Wickham of Rstats fame: split, apply, combine.

In [1]:

import numpy as np
import pandas as pd

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 1/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [8]:

df = [Link]({'key': ['A', 'B', 'C', 'A', 'B', 'C','B'],

'data': range(7)}, columns=['key', 'data'])
df

Out[8]:

key data

0 A 0

1 B 1

2 C 2

3 A 3

4 B 4

5 C 5

6 B 6

In [4]:

[Link]('key')

Out[4]:

<[Link] object at 0x0000010CC3FF41C8>

Notice that what is returned is not a set of DataFrames, but a DataFrameGroupBy object. This object is where
the magic is: you can think of it as a special view of the DataFrame, which is poised to dig into the groups but
does no actual computation until the aggregation is applied.

In [9]:

[Link]('key').sum()

Out[9]:

data

key

A 3

B 11

C 7

In [13]:

for (key, group) in [Link]('key'):

print("{0:20s} shape={1}".format(key, [Link]))

A shape=(2, 2)
B shape=(3, 2)
C shape=(2, 2)

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 2/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [15]:

nyc_flight = pd.read_csv('data/nyc_flights_2013.csv')

In [27]:

nyc_flight.head(3)

Out[27]:

year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin

0 2013 1 1 517.0 2.0 830.0 11.0 UA N14228 1545 EWR

1 2013 1 1 533.0 4.0 850.0 20.0 UA N24211 1714 LGA

2 2013 1 1 542.0 2.0 923.0 33.0 AA N619AA 1141 JFK

SELECT carrier, count(*)

FROM df
GROUP BY carrier

In [26]:

n_by_flight = nyc_flight.groupby("carrier")["carrier"].count()
n_by_flight

Out[26]:

carrier
9E 2982
AA 5094
AS 115
B6 8144
DL 7160
EV 8298
F9 107
FL 527
HA 50
MQ 4137
OO 1
UA 8917
US 3152
VX 716
WN 1915
YV 103
Name: carrier, dtype: int64

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 3/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [3]:

data = pd.read_csv('data/phone_data.csv')
[Link](3)

Out[3]:

index date duration item month network network_type

0 0 15/10/14 06:58 34.429 data 2014-11 data data

1 1 15/10/14 06:58 13.000 call 2014-11 Vodafone mobile

2 2 15/10/14 14:46 23.000 call 2014-11 Meteor mobile

In [30]:

[Link]()

<class '[Link]'>
RangeIndex: 830 entries, 0 to 829
Data columns (total 7 columns):
index 830 non-null int64
date 830 non-null object
duration 830 non-null float64
item 830 non-null object
month 830 non-null object
network 830 non-null object
network_type 830 non-null object
dtypes: float64(1), int64(1), object(5)
memory usage: 45.5+ KB

In [4]:

import dateutil

# Convert date from string to date times

#data['date'] = data['date'].apply([Link], dayfirst=True)
data['date'] = pd.to_datetime(data['date'])

In [33]:

[Link]()

<class '[Link]'>
RangeIndex: 830 entries, 0 to 829
Data columns (total 7 columns):
index 830 non-null int64
date 830 non-null datetime64[ns]
duration 830 non-null float64
item 830 non-null object
month 830 non-null object
network 830 non-null object
network_type 830 non-null object
dtypes: datetime64[ns](1), float64(1), int64(1), object(4)
memory usage: 45.5+ KB

1. How many rows the dataset has?

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 4/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [35]:

data['item'].count()

Out[35]:

830

2. What was the longest phone call / data entry?

In [36]:

data['duration'].max()

Out[36]:

10528.0

3. How many seconds of phone calls are recorded in total?

In [7]:

data[data['item']=='call']['duration'].sum()

Out[7]:

92321.0

4. How many entries are there for each month?

In [38]:

data['month'].value_counts()

Out[38]:

2014-11 230
2015-01 205
2014-12 157
2015-02 137
2015-03 101
Name: month, dtype: int64

5. Number of non-null unique network entries

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 5/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [39]:

data['network'].nunique()

Out[39]:

6. Get the first entry for each month

In [40]:

[Link]('month').first()

Out[40]:

index date duration item network network_type

month

2014-11 0 2014-10-15 [Link] 34.429 data data data

2014-12 228 2014-11-13 [Link] 34.429 data data data

2015-01 381 2014-12-13 [Link] 34.429 data data data

2015-02 577 2015-01-13 [Link] 34.429 data data data

2015-03 729 2015-02-12 [Link] 69.000 call landline landline

7. Get the sum of the durations per month

In [41]:

[Link]('month')['duration'].sum()

Out[41]:

month
2014-11 26639.441
2014-12 14641.870
2015-01 18223.299
2015-02 15522.299
2015-03 22750.441
Name: duration, dtype: float64

8. Get the number of dates / entries in each month

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 6/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [42]:

[Link]('month')['date'].count()

Out[42]:

month
2014-11 230
2014-12 157
2015-01 205
2015-02 137
2015-03 101
Name: date, dtype: int64

9. What is the sum of durations, for calls only, to each network

In [43]:

data[data['item'] == 'call'].groupby('network')['duration'].sum()

Out[43]:

network
Meteor 7200.0
Tesco 13828.0
Three 36464.0
Vodafone 14621.0
landline 18433.0
voicemail 1775.0
Name: duration, dtype: float64

10. How many calls, sms, and data entries are in each month?

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 7/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [44]:

[Link](['month', 'item'])['date'].count()

Out[44]:

month item
2014-11 call 107
data 29
sms 94
2014-12 call 79
data 30
sms 48
2015-01 call 88
data 31
sms 86
2015-02 call 67
data 31
sms 39
2015-03 call 47
data 29
sms 25
Name: date, dtype: int64

11. How many calls, texts, and data are sent per month, split by network_type?

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 8/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [45]:

[Link](['month', 'network_type'])['date'].count()

Out[45]:

month network_type
2014-11 data 29
landline 5
mobile 189
special 1
voicemail 6
2014-12 data 30
landline 7
mobile 108
voicemail 8
world 4
2015-01 data 31
landline 11
mobile 160
voicemail 3
2015-02 data 31
landline 8
mobile 90
special 2
voicemail 6
2015-03 data 29
landline 11
mobile 54
voicemail 4
world 3
Name: date, dtype: int64

In [48]:

# produces Pandas Series

#[Link]('month')['duration'].sum()
# Produces Pandas DataFrame
[Link]('month')[['duration']].sum()

Out[48]:

duration

month

2014-11 26639.441

2014-12 14641.870

2015-01 18223.299

2015-02 15522.299

2015-03 22750.441

In [1]:

import pandas as pd

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 9/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [6]:

name = [Link](['ram','shyam','kiran','rishi'])

In [7]:

name

Out[7]:

0 ram
1 shyam
2 kiran
3 rishi
dtype: object

In [9]:

mark1 = [Link]([45, 67, 32, 65])

In [10]:

mark2 = [Link]([77, 34, 72, 55])

In [13]:

df=[Link]({name, mark1, mark2})

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-f00c768bf164> in <module>
----> 1 df=[Link]({name, mark1, mark2})

~\Anaconda3\lib\site-packages\pandas\core\[Link] in __hash__(self)
1884 raise TypeError(
1885 "{0!r} objects are mutable, thus they cannot be"
-> 1886 " hashed".format(self.__class__.__name__)
1887 )
1888

TypeError: 'Series' objects are mutable, thus they cannot be hashed

In [12]:

Out[12]:

0 1 2 3

0 ram shyam kiran rishi

1 45 67 32 65

2 77 34 72 55

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 10/11

2/24/2020 3.3_group_by - Jupyter Notebook

In [ ]:

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 11/11

Unit Iv
No ratings yet
Unit Iv
63 pages
Python For Data Science - Unit 6 - Week 3
No ratings yet
Python For Data Science - Unit 6 - Week 3
5 pages
Mastering Pandas for Data Science Interviews
No ratings yet
Mastering Pandas for Data Science Interviews
11 pages
L80a AokUS tNHUp7zlShDF4aovfO1
No ratings yet
L80a AokUS tNHUp7zlShDF4aovfO1
15 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Python For Data Science - Unit 6 - Week 3
No ratings yet
Python For Data Science - Unit 6 - Week 3
5 pages
Data Analysis and Visualization Exam Guide
No ratings yet
Data Analysis and Visualization Exam Guide
12 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
Understanding GroupBy in Pandas
No ratings yet
Understanding GroupBy in Pandas
32 pages
Python Data Structures and Libraries Guide
No ratings yet
Python Data Structures and Libraries Guide
7 pages
Pandas GroupBy for Data Aggregation
No ratings yet
Pandas GroupBy for Data Aggregation
49 pages
1
No ratings yet
1
83 pages
Informatic Practices HHW
No ratings yet
Informatic Practices HHW
21 pages
10 Minutes To Pandas - Pandas 2.1.1 Documentation
No ratings yet
10 Minutes To Pandas - Pandas 2.1.1 Documentation
24 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Lec 05-DSFa23
No ratings yet
Lec 05-DSFa23
65 pages
NumPy and Pandas: Essential Python Libraries
No ratings yet
NumPy and Pandas: Essential Python Libraries
72 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
70 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Pandas Guide for Beginners
No ratings yet
Pandas Guide for Beginners
18 pages
Numpy Data Analysis and Visualisation With Python
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Chai Time Data Science Lab 7 Guide
No ratings yet
Chai Time Data Science Lab 7 Guide
5 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
10 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Python Data Libraries Guide
No ratings yet
Python Data Libraries Guide
53 pages
100 Pandas Puzzles
No ratings yet
100 Pandas Puzzles
20 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Python Basics for Calculations
No ratings yet
Python Basics for Calculations
80 pages
FDS Lab Manual-1
No ratings yet
FDS Lab Manual-1
51 pages
Informatic Practices HHW
No ratings yet
Informatic Practices HHW
59 pages
Data Structures For Statistical Computing in Python
No ratings yet
Data Structures For Statistical Computing in Python
6 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Python Data Analysis Tutorial
No ratings yet
Python Data Analysis Tutorial
47 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Ip Practice Test (14in)
No ratings yet
Ip Practice Test (14in)
9 pages
Lab Session 07: Perform Following Operations Using Pandas
No ratings yet
Lab Session 07: Perform Following Operations Using Pandas
4 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Even Students
No ratings yet
Even Students
36 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Python Cheat Sheet 2.0
100% (2)
Python Cheat Sheet 2.0
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Unit 1 Python Programming-Ii
No ratings yet
Unit 1 Python Programming-Ii
15 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
4th ETE
No ratings yet
4th ETE
4 pages
4th ME
No ratings yet
4th ME
4 pages
Mobile Communication Systems: Part II-Part II
No ratings yet
Mobile Communication Systems: Part II-Part II
79 pages
Chi-Square Test
No ratings yet
Chi-Square Test
6 pages
Large Language Models For Propaganda Detection: University of Zurich University of Zurich University of Zurich
No ratings yet
Large Language Models For Propaganda Detection: University of Zurich University of Zurich University of Zurich
7 pages
Pollution and Congestion in Urban Areas
No ratings yet
Pollution and Congestion in Urban Areas
19 pages
Python Filtering
No ratings yet
Python Filtering
7 pages
Merge Append Python Code
No ratings yet
Merge Append Python Code
5 pages
Hands On Practical Examples On Sequential Feature Selection in Python
No ratings yet
Hands On Practical Examples On Sequential Feature Selection in Python
26 pages
Covariance Matrix
No ratings yet
Covariance Matrix
6 pages
Introduction to Patanjali's Yoga Sutra
No ratings yet
Introduction to Patanjali's Yoga Sutra
4 pages
Educ 535 Unit Plan Condensed
No ratings yet
Educ 535 Unit Plan Condensed
27 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
"Vailati Masterclass: 3-Step Dental Technique"
No ratings yet
"Vailati Masterclass: 3-Step Dental Technique"
13 pages
PSY 154 History of Experimental Psychology
No ratings yet
PSY 154 History of Experimental Psychology
45 pages
Origen Crouzel Henri
100% (2)
Origen Crouzel Henri
304 pages
Qumica General Orgnica y Biolgica Full Download
No ratings yet
Qumica General Orgnica y Biolgica Full Download
407 pages
Revised Resume
No ratings yet
Revised Resume
1 page
Percubaan Sem 2 2024 Terengganu
No ratings yet
Percubaan Sem 2 2024 Terengganu
27 pages
Assignment-1 Reading Skills
No ratings yet
Assignment-1 Reading Skills
2 pages
Haircut and Dresscode Policy
No ratings yet
Haircut and Dresscode Policy
1 page
Edci 311 Notes Recent (1) - 1
0% (1)
Edci 311 Notes Recent (1) - 1
31 pages
(2025-26) Syllabus For Term - 1 (Grade 5)
No ratings yet
(2025-26) Syllabus For Term - 1 (Grade 5)
2 pages
Sem 2 Applied Nutrition and Dietetics
No ratings yet
Sem 2 Applied Nutrition and Dietetics
5 pages
Civil Aviation School and College, Tejgaon: Sl. Student Name User Id & Birth Reg. No. Father & Mother Mobile Quotarank
No ratings yet
Civil Aviation School and College, Tejgaon: Sl. Student Name User Id & Birth Reg. No. Father & Mother Mobile Quotarank
59 pages
Case Study On Aussie Pooch Mobile (APM)
No ratings yet
Case Study On Aussie Pooch Mobile (APM)
13 pages
BÀI TẬP VIẾT LẠI CÂU Because -Although
No ratings yet
BÀI TẬP VIẾT LẠI CÂU Because -Although
3 pages
Early Childhood Educators' Well-Being, Work Environments and Quality': Possibilities For Changing Policy and Practice
No ratings yet
Early Childhood Educators' Well-Being, Work Environments and Quality': Possibilities For Changing Policy and Practice
16 pages
Bai Tap Chuyen Sau Anh 8 Global Unit 3 TEENAGERS
No ratings yet
Bai Tap Chuyen Sau Anh 8 Global Unit 3 TEENAGERS
14 pages
Culture and Psychology 7th Edition PDF
0% (1)
Culture and Psychology 7th Edition PDF
25 pages
NSTP CWTS - Lesson 1
0% (1)
NSTP CWTS - Lesson 1
4 pages
4.engg Graphics I
No ratings yet
4.engg Graphics I
13 pages
A Framing Analysis and Model of Barack Obama in Political Cartoons (Dissertation)
No ratings yet
A Framing Analysis and Model of Barack Obama in Political Cartoons (Dissertation)
213 pages
Teachers' Questioning Style and Learning Behavior of Grade 12 Senior High School Students During Online Class in STI College Tagum
No ratings yet
Teachers' Questioning Style and Learning Behavior of Grade 12 Senior High School Students During Online Class in STI College Tagum
27 pages
Understanding the Scientific Method
No ratings yet
Understanding the Scientific Method
26 pages
ML Unit 2
No ratings yet
ML Unit 2
16 pages
2026 Junior Babel Academy Brochure - Compressed
No ratings yet
2026 Junior Babel Academy Brochure - Compressed
16 pages
Defining Ethics: Expected Outputs
No ratings yet
Defining Ethics: Expected Outputs
16 pages
Sources of Ethical Norms
90% (10)
Sources of Ethical Norms
2 pages
SEAM 2 (Revised)
67% (3)
SEAM 2 (Revised)
11 pages

Group - by Python Code

Uploaded by

Group - by Python Code

Uploaded by

2/24/2020 3.

3_group_by - Jupyter Notebook

Group By: Split, Apply, Combine

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 1/11

df = [Link]({'key': ['A', 'B', 'C', 'A', 'B', 'C','B'],

<[Link] object at 0x0000010CC3FF41C8>

for (key, group) in [Link]('key'):

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 2/11

0 2013 1 1 517.0 2.0 830.0 11.0 UA N14228 1545 EWR

1 2013 1 1 533.0 4.0 850.0 20.0 UA N24211 1714 LGA

2 2013 1 1 542.0 2.0 923.0 33.0 AA N619AA 1141 JFK

SELECT carrier, count(*)

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 3/11

index date duration item month network network_type

0 0 15/10/14 06:58 34.429 data 2014-11 data data

1 1 15/10/14 06:58 13.000 call 2014-11 Vodafone mobile

2 2 15/10/14 14:46 23.000 call 2014-11 Meteor mobile

# Convert date from string to date times

1. How many rows the dataset has?

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 4/11

2. What was the longest phone call / data entry?

3. How many seconds of phone calls are recorded in total?

4. How many entries are there for each month?

5. Number of non-null unique network entries

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 5/11

6. Get the first entry for each month

index date duration item network network_type

2014-11 0 2014-10-15 [Link] 34.429 data data data

2014-12 228 2014-11-13 [Link] 34.429 data data data

2015-01 381 2014-12-13 [Link] 34.429 data data data

2015-02 577 2015-01-13 [Link] 34.429 data data data

2015-03 729 2015-02-12 [Link] 69.000 call landline landline

7. Get the sum of the durations per month

8. Get the number of dates / entries in each month

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 6/11

9. What is the sum of durations, for calls only, to each network

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 7/11

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 8/11

# produces Pandas Series

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 9/11

mark1 = [Link]([45, 67, 32, 65])

mark2 = [Link]([77, 34, 72, 55])

df=[Link]({name, mark1, mark2})

TypeError: 'Series' objects are mutable, thus they cannot be hashed

0 ram shyam kiran rishi

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 10/11

localhost:8888/notebooks/Machine Learning/Python/3.3_group_by.ipynb 11/11

You might also like