0% found this document useful (0 votes)

27 views20 pages

@PowerBI - Ir - SQL Vs Python Data Analysis

Uploaded by

mhpmbok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views20 pages

@PowerBI - Ir - SQL Vs Python Data Analysis

Uploaded by

mhpmbok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

SQL DATA ANALYSIS

VS
PYTHON DATA ANALYSIS

What is SQL?
SQL (Structured Query Language) is a standardized language used to
store, retrieve, and manipulate data in relational databases like
MySQL, PostgreSQL, SQL Server, and SQLite.
Key Uses in Data Analysis:
• Extracting specific data using SELECT, WHERE, and JOIN
• Summarizing data with GROUP BY, COUNT, AVG, etc.
• Filtering and sorting records from large datasets.

What is Python?
Python is a general-purpose, high-level programming language
known for its simplicity and powerful libraries used in data science,
machine learning, and automation.
Key Uses in Data Analysis:
• Reading and cleaning datasets (using Pandas)
• Performing statistical calculations (using NumPy)
• Creating charts and graphs (using Matplotlib, Seaborn)
• Building machine learning models (using Scikit-learn,
TensorFlow)
SQL vs Python for Data Analysis :

SQL (Structured Python (Programming

Feature/Aspect Query Language) Language)

1. Primary Use Querying and Data manipulation,

managing data in analysis, visualization,
relational databases automation
2. Data Storage Works directly with Imports data from files or
databases (MySQL, databases for in-memory
PostgreSQL, analysis
SQLite)
3. Type of Declarative Procedural/Imperative
Language (describe what you (describe how to do it)
want)
4. Output Type Tabular data (rows Tables, charts, statistical
and columns) outputs, models
5. Key Libraries Native SQL Pandas, NumPy,
language Matplotlib, Seaborn,
Scikit-learn
6. Best At Data extraction, Data cleaning,
filtering, joining, transformation, advanced
summarization analysis, ML
7. Loops & Logic Limited support Full support for conditions,
loops, and functions
8. Visualization Not supported Strong support (Matplotlib,
Seaborn, Plotly, etc.)
9. Machine Not possible Fully supported via
Learning directly in SQL libraries (e.g., Scikit-learn,
TensorFlow)
10. Learning Easier for beginners Steeper learning curve but
Curve (simple queries) more flexible
SQL Concepts Used :
• SELECT (choose columns)
• FROM (choose table)
• WHERE (filter rows)
• LIMIT (restrict number of rows)
• DISTINCT (unique values)
• ORDER BY (sorting)
• AS (aliasing columns or tables)
• GROUP BY (grouping rows)
• Aggregate functions: AVG(), COUNT(), SUM(), MIN(), MAX()
• HAVING (filter groups after aggregation)
• LIKE (pattern matching)
• BETWEEN (range filtering)
• IN (multiple value filtering)

Python Concepts Used :

• DataFrame size and shape: len(), .shape
• Column selection: df[['col1', 'col2']], .unique()
• Row selection / filtering: Boolean indexing with .str.startswith(),
&, .isin()
• Value assignment: df['col'] = value
• Sorting: .sort_values()
• Renaming: .rename()
• Grouping and aggregation: .groupby(), .mean(), .sum(), .min(),
.max(), .size(), .nunique()
• Filtering grouped data: .filter()
• DataFrame/Series transformation: .to_frame(), .reset_index()

COUPON RECOMMENDATION ANALYSIS

-:SQL:-
➢ “Select all columns and all rows from the table named
dataset_1.”
select * from dataset_1;

-: PYTHON :-
➢ Importing a dataset from a CSV file into Python (as a
DataFrame) so you can perform data analysis using pandas,
similar to how you would in SQL.
import pandas as pd
sql=pd.read_csv(r"C:\ …….\data.csv”)
sql

-:SQL:-
➢ "Fetch the columns weather and temperature from all the rows in the
table named dataset_1."
select weather, temperature from dataset_1;

-: PYTHON :-
➢ Select the columns weather and temperature from the
DataFrame df.
df=sql
df[['weather','temperature']]

-:SQL:-
➢ "Select all columns from the first 10 rows of the table dataset_1."
select * from dataset_1 limit 10;

-: PYTHON :-
➢ "Display the first 10 rows of the DataFrame df."
df.head(10)
-:SQL:-
➢ "Return all unique (non-repeating) values from the passanger column
in the dataset_1 table."
select distinct passanger from dataset_1;

-: PYTHON :-
➢ "Return an array of all unique (non-repeating) values in the
passanger column of the DataFrame df."
df.passanger.unique()

-:SQL:-
➢ "Retrieve all rows from the table dataset_1 where the value in
the destination column is exactly 'Home'."
select * from dataset_1 where destination='Home';

-: PYTHON :-
➢ This shows only the rows where the destination is already
'Home', without changing the data.
df.destination="Home"
df
-:SQL:-
➢ "Select all rows and columns from dataset_1 and sort the result
by the coupon column in ascending order."
select * from dataset_1 order by coupon;

-: PYTHON :-
➢ "Sort the entire DataFrame df by the values in the coupon
column in ascending order."
df.sort_values("coupon")
-:SQL:-
➢ "Select the destination column from the table dataset_1, and
rename (alias) it as Destination."
select destination as Destination from dataset_1 d ;

-: PYTHON :-
➢ Renames the column destination → Destination (with a capital
D).
➢ The change is permanent in the DataFrame df because you used
inplace=True.
df.rename(columns={'destination':'Destination'},inplace=True)
-:SQL:-
➢ "Return the unique values of the occupation column from the
dataset_1 table."
select occupation from dataset_1 group by occupation;
-: PYTHON :-
➢ "Group the DataFrame by the occupation column, count the number
of rows in each group, and return a DataFrame with occupation and
the corresponding counts in a column named Count."
df.groupby('occupation').size().to_frame('Count').reset_index()

-:SQL:-
➢ "For each unique weather type in the table dataset_1, calculate the
average (AVG) temperature, and return the weather type along with its
average temperature, aliased as avg_temp."

select weather ,AVG(temperature) as avg_temp from dataset_1

group by weather;
-: PYTHON :-
➢ "Group the DataFrame df by the weather column, calculate the
average (mean) temperature for each weather group, then convert the
result to a DataFrame with a column named avg_temp, and finally
reset the index so that weather becomes a regular column."
df.groupby('weather')['temperature'].mean().to_frame('avg_temp').reset_i
ndex()

-:SQL:-
➢ "For each unique weather condition in dataset_1, count the number of
temperature entries and label that count as count_temp."
select weather, COUNT(temperature) as count_temp from dataset_1
group by weather;

-: PYTHON :-
➢ "Group the DataFrame df by the weather column, count the number of
rows in each group (including those with NaN values), convert the
result to a DataFrame with the column name Count_temp, and reset
the index so weather becomes a column."
df.groupby('weather')['temperature'].size().to_frame('Count_temp').reset_
index()
-:SQL:-
➢ "For each unique weather condition in dataset_1, count the number of
distinct (unique) temperature values and name that count as
count_distinct_temp."
select weather ,COUNT(DISTINCT temperature) as count_distinct_temp
from dataset_1 group by weather ;

-: PYTHON :-
➢ "Group the DataFrame by weather, count the number of unique values
in the temperature column for each weather group, convert the result
to a DataFrame with the column name count_distinct_temp, and reset
the index to make weather a column."
df.groupby('weather')['temperature'].nunique().to_frame('count_distinct_t
emp').reset_index()

-:SQL:-
➢ "For each unique weather condition in dataset_1, calculate the total
(sum) of the temperature values, and label this total as sum_temp."
select weather ,SUM(temperature) as sum_temp from dataset_1 group by
weather;
-: PYTHON :-
➢ "Group the DataFrame df by the weather column, calculate the sum of
temperature values for each weather group, convert the result into a
DataFrame with the column name sum_temp, and reset the index so
weather becomes a column."
df.groupby('weather')['temperature'].sum().to_frame('sum_temp').reset_i
ndex()

-:SQL:-
➢ "For each unique weather condition in dataset_1, find the minimum
temperature and label it as min_temp."
select weather ,MIN(temperature) as min_temp from dataset_1 group by
weather;

-: PYTHON :-
➢ "Group the DataFrame df by the weather column, find the minimum
temperature for each weather group, convert the result to a DataFrame
with column name min_temp, and reset the index to make weather a
regular column."
df.groupby('weather')['temperature'].min().to_frame('min_temp').reset_in
dex()

-:SQL:-
➢ "For each unique weather condition in dataset_1, find the maximum
temperature and label it as max_temp."
select weather ,MAX(temperature) as max_temp from dataset_1 group
by weather;

-: PYTHON :-
➢ "Group the DataFrame df by the weather column, find the maximum
temperature for each weather group, convert the result into a
DataFrame with the column name max_temp, and reset the index so
weather becomes a column."
df.groupby('weather')['temperature'].max().to_frame('max_temp').reset_i
ndex()

-:SQL:-
➢ "Group the data by occupation and return only the group where
the occupation is 'Student'."
select occupation from dataset_1 group by occupation having
occupation='Student';

-: PYTHON :-
➢ "Group the DataFrame df by occupation, then filter to keep only
the group where the occupation is 'Student'. Finally, count the
number of rows in that group."
df.groupby('occupation').filter(lambda x: x['occupation'].iloc[0] ==
'Student').groupby('occupation').size()

-:SQL:-
➢ "Select all rows from dataset_1 where the weather column starts
with 'Sun'."
select * from dataset_1 where weather like 'Sun%';
-: PYTHON :-
➢ "Filter the DataFrame df to return only the rows where the
weather column starts with 'Sun'."
df[df['weather'].str.startswith('Sun')]

-:SQL:-
➢ "Select all unique (DISTINCT) temperature values from
dataset_1 where the temperature is between 29 and 75
(inclusive)."
select distinct temperature from dataset_1 where temperature
between 29 and 75;

-: PYTHON :-
➢ "From the DataFrame df, filter rows where temperature is
between 29 and 75 (inclusive), then return only the unique
temperature values."
df[(df['temperature'] >= 29) & (df['temperature']
<=75)]['temperature'].unique()

-:SQL:-
➢ "Select the occupation column from dataset_1 where the
occupation is either 'Sales & Related' or 'Management'. "
select occupation from dataset_1 where occupation in('Sales &
Related','Management');

-: PYTHON :-
➢ "Filter the DataFrame df to return only the rows where the
occupation is either 'Sales & Related' or 'Management', and
show only the occupation column."
df[df['occupation'].isin(['Sales &
Related','Management'])][['occupation']]

Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Practical
No ratings yet
Practical
12 pages
Ip MS
No ratings yet
Ip MS
6 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Python & SQL Exam Paper
No ratings yet
Python & SQL Exam Paper
9 pages
Pandas Plots
No ratings yet
Pandas Plots
14 pages
Xii Ip
No ratings yet
Xii Ip
5 pages
Holiday Worksheet
No ratings yet
Holiday Worksheet
9 pages
Pandas Complete Cheatsheet
No ratings yet
Pandas Complete Cheatsheet
3 pages
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
No ratings yet
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
5 pages
Python Data Structures and Libraries Guide
No ratings yet
Python Data Structures and Libraries Guide
7 pages
Battle of The Data Tools - Pandas Vs SQL
No ratings yet
Battle of The Data Tools - Pandas Vs SQL
12 pages
Python & MySQL For Data Analysis
No ratings yet
Python & MySQL For Data Analysis
45 pages
PB 1 IP Answer Key 2024
No ratings yet
PB 1 IP Answer Key 2024
6 pages
Paper 2.
No ratings yet
Paper 2.
5 pages
Class 05-Case Study
No ratings yet
Class 05-Case Study
6 pages
DS Practical
No ratings yet
DS Practical
30 pages
Copyy
No ratings yet
Copyy
4 pages
Pandas Cheat Sheet for Data Manipulation
No ratings yet
Pandas Cheat Sheet for Data Manipulation
1 page
Behenchod
No ratings yet
Behenchod
10 pages
Pandas Library
No ratings yet
Pandas Library
6 pages
Pandas
No ratings yet
Pandas
13 pages
B SC Programme / B SC Mathematical Science: Instructions For Candidates
No ratings yet
B SC Programme / B SC Mathematical Science: Instructions For Candidates
2 pages
Pandas and SQL Basics for Data Analysis
No ratings yet
Pandas and SQL Basics for Data Analysis
5 pages
Informatics Practices
No ratings yet
Informatics Practices
9 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Python and SQL Data Analysis Guide
No ratings yet
Python and SQL Data Analysis Guide
8 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
HCLTech
No ratings yet
HCLTech
5 pages
File Ip
No ratings yet
File Ip
22 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
DATAFRAME
No ratings yet
DATAFRAME
11 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Data Ingestion and Reshaping Guide
100% (1)
Data Ingestion and Reshaping Guide
2 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Python Project File
No ratings yet
Python Project File
31 pages
SQL To Pandas - Group Aggregations
No ratings yet
SQL To Pandas - Group Aggregations
6 pages
Unit 2 PART B-F
No ratings yet
Unit 2 PART B-F
2 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pragya File
No ratings yet
Pragya File
31 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Python - DataScience Question - Paper
No ratings yet
Python - DataScience Question - Paper
5 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
@PowerBI - Ir - 20 Secrets To SQL Query Optimization
No ratings yet
@PowerBI - Ir - 20 Secrets To SQL Query Optimization
21 pages
@PowerBI - Ir - Data Viz Cheat Sheet
No ratings yet
@PowerBI - Ir - Data Viz Cheat Sheet
17 pages
Agile System in Health Care Literature Review
No ratings yet
Agile System in Health Care Literature Review
11 pages
PPM 11 - PPMO Set Up Guide - v4.0 - 20160329
No ratings yet
PPM 11 - PPMO Set Up Guide - v4.0 - 20160329
46 pages
CH 1
No ratings yet
CH 1
9 pages
Preview-9781444138351 A37692676
No ratings yet
Preview-9781444138351 A37692676
32 pages
An Agile Modeling Framework For Population Dynamics
No ratings yet
An Agile Modeling Framework For Population Dynamics
22 pages
The Impactof Large Language Modelson Agile Devel
No ratings yet
The Impactof Large Language Modelson Agile Devel
7 pages
Database Management Systems Exam
No ratings yet
Database Management Systems Exam
2 pages
API Introduction
No ratings yet
API Introduction
3 pages
Data Mining Tools Notes Btech
No ratings yet
Data Mining Tools Notes Btech
6 pages
AEM CRXDE Folder Structure
No ratings yet
AEM CRXDE Folder Structure
5 pages
Kyocera Net Manager - Introductory Profile
No ratings yet
Kyocera Net Manager - Introductory Profile
54 pages
Divya MULA SFDC - Resume
No ratings yet
Divya MULA SFDC - Resume
3 pages
Cyber Attacks in India Real Life Case Studies
No ratings yet
Cyber Attacks in India Real Life Case Studies
8 pages
Smart Door Lock Business Model Canvas
No ratings yet
Smart Door Lock Business Model Canvas
1 page
SAP Intelligent RPA Quick Start Guide
No ratings yet
SAP Intelligent RPA Quick Start Guide
31 pages
Data Subject Access Request Template
No ratings yet
Data Subject Access Request Template
5 pages
Novum Testamentum Graece Et Latine-1-Brandscheid-1901 PDF
No ratings yet
Novum Testamentum Graece Et Latine-1-Brandscheid-1901 PDF
687 pages
Grc330 en Col17 Ilt FV Co A4
100% (1)
Grc330 en Col17 Ilt FV Co A4
33 pages
Education UI/UX Internship Report
No ratings yet
Education UI/UX Internship Report
17 pages
CF Lecture 07-Memory Forensics
100% (2)
CF Lecture 07-Memory Forensics
54 pages
CCNA RSE Chp3
No ratings yet
CCNA RSE Chp3
36 pages
GE ELECT 1 - Data and Databases
No ratings yet
GE ELECT 1 - Data and Databases
5 pages
Y10 05 P30 Assessment v2
No ratings yet
Y10 05 P30 Assessment v2
7 pages
ServiceNow Best Practices
No ratings yet
ServiceNow Best Practices
26 pages
Uttam Thesis Presentation Aktu
No ratings yet
Uttam Thesis Presentation Aktu
29 pages
Chimdesa Gedefa Assignment #2 Causal and Entry Consistency
No ratings yet
Chimdesa Gedefa Assignment #2 Causal and Entry Consistency
15 pages
TCS Feedback
No ratings yet
TCS Feedback
6 pages
Resume Javascript
No ratings yet
Resume Javascript
1 page
Saes Z 010
No ratings yet
Saes Z 010
16 pages
PEGA Interview Questions and Answers
No ratings yet
PEGA Interview Questions and Answers
59 pages
SAP Functional Specification Overview
No ratings yet
SAP Functional Specification Overview
4 pages
Cloud Computing UNIT-I PPT - PPSX
No ratings yet
Cloud Computing UNIT-I PPT - PPSX
61 pages
Himanshu's Resume
No ratings yet
Himanshu's Resume
2 pages
Sahil Wagh: Frontend Developer Resume
No ratings yet
Sahil Wagh: Frontend Developer Resume
1 page
Top 30+ Best Oracle Apex Interview Questions and Answers in 2022
No ratings yet
Top 30+ Best Oracle Apex Interview Questions and Answers in 2022
15 pages
NAR-PD-0139-103-en V200 N4 AMCS BaseSoftwarePackage Instructions
No ratings yet
NAR-PD-0139-103-en V200 N4 AMCS BaseSoftwarePackage Instructions
16 pages

@PowerBI - Ir - SQL Vs Python Data Analysis

Uploaded by

@PowerBI - Ir - SQL Vs Python Data Analysis

Uploaded by

SQL DATA ANALYSIS

SQL (Structured Python (Programming

1. Primary Use Querying and Data manipulation,

Python Concepts Used :

COUPON RECOMMENDATION ANALYSIS

select weather ,AVG(temperature) as avg_temp from dataset_1

You might also like