0% found this document useful (0 votes)

11 views18 pages

Data Engineering: Statistical Analysis

This document discusses techniques for exploring and analyzing grouped data, including performing statistical analysis on groups, iterating through groups, and applying aggregation, transformation, and filtration methods to extract useful insights from grouped data in Python. Key topics covered include common statistical analysis methods in Pandas like describe(), mean(), corr(), count(), and how to group data, iterate through groups, and apply aggregations, transformations, and filters to grouped data.

Uploaded by

Sabrina Sibarani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views18 pages

Data Engineering: Statistical Analysis

Uploaded by

Sabrina Sibarani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

IF2106 – Data Engineering

Data Exploring and Analysis (2)

• Statistical Analysis
• Data Grouping

Undergraduate

Computer Science
Overview

• Learn how to statistically analyze grouped data, iterate through

groups, and apply aggregations, transformations, and filtration
techniques
Objectives

Upon completion of this Unit, you are expected to be able to:

• Properly perform and practice data exploration and analysis
techniques
Contents

a. Statistical Analysis

b. Data Grouping
Statistical Analysis
Data Analysis

• Pandas provides
numerous methods for
data analysis

• Also, you can define

your own methods for
specific statistical
analysis
• [Link](): Summary statistics for numerical columns

• [Link](): Returns the mean of all columns

Statistical • [Link](): Returns the correlation between columns in a data
Analysis frame

• [Link](): Returns the number of non-null values in each

data frame column
• The correlation coefficient is a measure that
determines the degree to which two
variables’ movements are associated

Statistical • The most common correlation coefficient,

Analysis generated by the Pearson correlation, may

be used to measure the linear relationship
(Cont.) between two variables
• However, in a nonlinear relationship, this
correlation coefficient may not always
be a suitable measure of dependence
• The range of values for the correlation coefficient
is -1.0 to 1.0
• In other words, the values cannot exceed 1.0
or be less than -1.0, whereby a correlation of
-1.0 indicates a perfect negative correlation,
Statistical and a correlation of 1.0 indicates a perfect

Analysis positive correlation

(Cont.) • The correlation coefficient is denoted as r

• If its value greater than zero, it’s a positive
relationship; while if the value is less than
zero, it’s a negative relationship
• A value of zero indicates that there is no
relationship between the two variables
• [Link](): Returns the highest value in
each column

• [Link](): Returns the lowest value in

Statistical each column
Analysis
• [Link](): Returns the median of each
(Cont.)
column

• [Link](): Returns the standard deviation

of each column
Data Grouping
• You can split data into groups to
perform more specific analysis
over the data set

• Once you perform data grouping,

Data Grouping you can compute summary

statistics (aggregation), perform
specific group operations
(transformation), and discard
data with some conditions
(filtration)
Iterating Through
Groups
• You can iterate through a specific
group

• You can also select a specific group

using the get_group() method
Aggregations • Aggregation functions return a
single aggregated value for each
group

• Once the groupby object is

created, you can implement
various functions on the grouped
data
Transformations

• Transformation on a group or a column returns an

object that is indexed the same size as the one being
grouped

• Thus, the transform should return a result that is the

same size as that of a group chunk
Filtration

• Python provides direct filtering for data

Summary

This Unit covered how to explore and analyze data in different collection
structures. Here’s a recap of what was covered in this Unit:

• How to apply statistical analysis on the derived data from implementing

Python data grouping, iterating through groups, aggregations,
transformations, and filtration techniques
Discussion

Presentation - University
No ratings yet
Presentation - University
52 pages
Pandas Pandas
No ratings yet
Pandas Pandas
16 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Mastering Pandas: DataFrame Operations
100% (2)
Mastering Pandas: DataFrame Operations
33 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Experiment No 5
No ratings yet
Experiment No 5
6 pages
Da Pra Week-8 (Karthik S) - 074713
No ratings yet
Da Pra Week-8 (Karthik S) - 074713
9 pages
Python Libraries for Statistical Analysis
No ratings yet
Python Libraries for Statistical Analysis
40 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
Python & MySQL For Data Analysis
No ratings yet
Python & MySQL For Data Analysis
45 pages
Lec 05-DSFa23
No ratings yet
Lec 05-DSFa23
65 pages
Data Science
No ratings yet
Data Science
6 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
Lec 04 - DS100 Fa24 - Pandas III
No ratings yet
Lec 04 - DS100 Fa24 - Pandas III
59 pages
Pandas Data Handling & Visualization Guide
100% (1)
Pandas Data Handling & Visualization Guide
37 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Data Analysis With Python
100% (1)
Data Analysis With Python
29 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
EDA Module 3-1
No ratings yet
EDA Module 3-1
40 pages
Data Aggregation Using Python
No ratings yet
Data Aggregation Using Python
33 pages
BasicAnalysis Using PYTHON
No ratings yet
BasicAnalysis Using PYTHON
6 pages
IP CH 1 12th
No ratings yet
IP CH 1 12th
3 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Dilip PP
No ratings yet
Dilip PP
9 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
18 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Pandas GroupBy for Data Aggregation
No ratings yet
Pandas GroupBy for Data Aggregation
49 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
Information Practices
No ratings yet
Information Practices
141 pages
Python Data Exploration Guide
100% (1)
Python Data Exploration Guide
12 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
96 pages
Data Analysis with Pandas Guide
No ratings yet
Data Analysis with Pandas Guide
49 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
10 pages
Data Analysis - 5th Unit
No ratings yet
Data Analysis - 5th Unit
14 pages
CS352 - Lab Syllabus
No ratings yet
CS352 - Lab Syllabus
2 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Python Data Analysis Tutorial
No ratings yet
Python Data Analysis Tutorial
47 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
47 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Pandas Data Analysis and Wrangling Guide
No ratings yet
Pandas Data Analysis and Wrangling Guide
12 pages
Data Analytics Curriculum
No ratings yet
Data Analytics Curriculum
8 pages
KKT Kuhn Tucker Presentation
No ratings yet
KKT Kuhn Tucker Presentation
13 pages
文獻回顧在研究計劃中的重要性
100% (1)
文獻回顧在研究計劃中的重要性
12 pages
Mahatma Gandhi University: Literature and Communication Studies Programmes)
No ratings yet
Mahatma Gandhi University: Literature and Communication Studies Programmes)
20 pages
Differentiation Techniques Explained
No ratings yet
Differentiation Techniques Explained
29 pages
Stat 110 Strategic Practice 6, Fall 2011
No ratings yet
Stat 110 Strategic Practice 6, Fall 2011
16 pages
Laplace Transforms Table & Notes
No ratings yet
Laplace Transforms Table & Notes
2 pages
B.Sc. Statistics STHT-304 Survey Sampling Exam
No ratings yet
B.Sc. Statistics STHT-304 Survey Sampling Exam
4 pages
CMC356 Eh297j PDF
No ratings yet
CMC356 Eh297j PDF
9 pages
Engineering Mathematics 3 (Beamer)
No ratings yet
Engineering Mathematics 3 (Beamer)
38 pages
Integration
No ratings yet
Integration
6 pages
ENG316 Research Methods
No ratings yet
ENG316 Research Methods
17 pages
Understanding Indeterminate Forms in Limits
No ratings yet
Understanding Indeterminate Forms in Limits
8 pages
The Impacts of Symmetry in Architecture and Urbanism Towards A New 7trend
No ratings yet
The Impacts of Symmetry in Architecture and Urbanism Towards A New 7trend
12 pages
SALA-Advanced Econometrics II 10 11-Syllabus
No ratings yet
SALA-Advanced Econometrics II 10 11-Syllabus
3 pages
Calculus Curve Sketching Guide
No ratings yet
Calculus Curve Sketching Guide
20 pages
Process Control Lab Report
100% (1)
Process Control Lab Report
23 pages
TC - CH 2.7 Implicit Diff Extra Practice
No ratings yet
TC - CH 2.7 Implicit Diff Extra Practice
2 pages
Single Sample Statistical Analysis
No ratings yet
Single Sample Statistical Analysis
71 pages
Statistical Analysis for Researchers
No ratings yet
Statistical Analysis for Researchers
11 pages
Larson ELA 8e 09 02 Final
No ratings yet
Larson ELA 8e 09 02 Final
19 pages
Project Scheduling with CPM and PERT
No ratings yet
Project Scheduling with CPM and PERT
11 pages
AP Chemistry Equilibrium Practice Questions
No ratings yet
AP Chemistry Equilibrium Practice Questions
4 pages
IADC/SPE 39321 A Robust Torque and Drag Analysis Approach For Well Planning and Drillstring Design
No ratings yet
IADC/SPE 39321 A Robust Torque and Drag Analysis Approach For Well Planning and Drillstring Design
16 pages
Adoc - Pub - Validasi Metode Analisis Dalam Penetapan Kadar Ben
No ratings yet
Adoc - Pub - Validasi Metode Analisis Dalam Penetapan Kadar Ben
9 pages
Types of Statistical Tests
No ratings yet
Types of Statistical Tests
4 pages
Introduction To The Finite Element Method in Electromagnetics
100% (1)
Introduction To The Finite Element Method in Electromagnetics
126 pages
Theory Questions
No ratings yet
Theory Questions
4 pages
Solutions CN2116 HW7
No ratings yet
Solutions CN2116 HW7
3 pages
Partial Differential Equations Course Guide
No ratings yet
Partial Differential Equations Course Guide
3 pages
Finite Element Method Basics
100% (1)
Finite Element Method Basics
71 pages