0% found this document useful (0 votes)
27 views20 pages

@PowerBI - Ir - SQL Vs Python Data Analysis

Uploaded by

mhpmbok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views20 pages

@PowerBI - Ir - SQL Vs Python Data Analysis

Uploaded by

mhpmbok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

SQL DATA ANALYSIS

VS
PYTHON DATA ANALYSIS

What is SQL?
SQL (Structured Query Language) is a standardized language used to
store, retrieve, and manipulate data in relational databases like
MySQL, PostgreSQL, SQL Server, and SQLite.
Key Uses in Data Analysis:
• Extracting specific data using SELECT, WHERE, and JOIN
• Summarizing data with GROUP BY, COUNT, AVG, etc.
• Filtering and sorting records from large datasets.

What is Python?
Python is a general-purpose, high-level programming language
known for its simplicity and powerful libraries used in data science,
machine learning, and automation.
Key Uses in Data Analysis:
• Reading and cleaning datasets (using Pandas)
• Performing statistical calculations (using NumPy)
• Creating charts and graphs (using Matplotlib, Seaborn)
• Building machine learning models (using Scikit-learn,
TensorFlow)
SQL vs Python for Data Analysis :

SQL (Structured Python (Programming


Feature/Aspect Query Language) Language)

1. Primary Use Querying and Data manipulation,


managing data in analysis, visualization,
relational databases automation
2. Data Storage Works directly with Imports data from files or
databases (MySQL, databases for in-memory
PostgreSQL, analysis
SQLite)
3. Type of Declarative Procedural/Imperative
Language (describe what you (describe how to do it)
want)
4. Output Type Tabular data (rows Tables, charts, statistical
and columns) outputs, models
5. Key Libraries Native SQL Pandas, NumPy,
language Matplotlib, Seaborn,
Scikit-learn
6. Best At Data extraction, Data cleaning,
filtering, joining, transformation, advanced
summarization analysis, ML
7. Loops & Logic Limited support Full support for conditions,
loops, and functions
8. Visualization Not supported Strong support (Matplotlib,
Seaborn, Plotly, etc.)
9. Machine Not possible Fully supported via
Learning directly in SQL libraries (e.g., Scikit-learn,
TensorFlow)
10. Learning Easier for beginners Steeper learning curve but
Curve (simple queries) more flexible
SQL Concepts Used :
• SELECT (choose columns)
• FROM (choose table)
• WHERE (filter rows)
• LIMIT (restrict number of rows)
• DISTINCT (unique values)
• ORDER BY (sorting)
• AS (aliasing columns or tables)
• GROUP BY (grouping rows)
• Aggregate functions: AVG(), COUNT(), SUM(), MIN(), MAX()
• HAVING (filter groups after aggregation)
• LIKE (pattern matching)
• BETWEEN (range filtering)
• IN (multiple value filtering)

Python Concepts Used :


• DataFrame size and shape: len(), .shape
• Column selection: df[['col1', 'col2']], .unique()
• Row selection / filtering: Boolean indexing with .str.startswith(),
&, .isin()
• Value assignment: df['col'] = value
• Sorting: .sort_values()
• Renaming: .rename()
• Grouping and aggregation: .groupby(), .mean(), .sum(), .min(),
.max(), .size(), .nunique()
• Filtering grouped data: .filter()
• DataFrame/Series transformation: .to_frame(), .reset_index()

COUPON RECOMMENDATION ANALYSIS


-:SQL:-
➢ “Select all columns and all rows from the table named
dataset_1.”
select * from dataset_1;

-: PYTHON :-
➢ Importing a dataset from a CSV file into Python (as a
DataFrame) so you can perform data analysis using pandas,
similar to how you would in SQL.
import pandas as pd
sql=pd.read_csv(r"C:\ …….\data.csv”)
sql

-:SQL:-
➢ "Fetch the columns weather and temperature from all the rows in the
table named dataset_1."
select weather, temperature from dataset_1;

-: PYTHON :-
➢ Select the columns weather and temperature from the
DataFrame df.
df=sql
df[['weather','temperature']]

-:SQL:-
➢ "Select all columns from the first 10 rows of the table dataset_1."
select * from dataset_1 limit 10;

-: PYTHON :-
➢ "Display the first 10 rows of the DataFrame df."
df.head(10)
-:SQL:-
➢ "Return all unique (non-repeating) values from the passanger column
in the dataset_1 table."
select distinct passanger from dataset_1;

-: PYTHON :-
➢ "Return an array of all unique (non-repeating) values in the
passanger column of the DataFrame df."
df.passanger.unique()

-:SQL:-
➢ "Retrieve all rows from the table dataset_1 where the value in
the destination column is exactly 'Home'."
select * from dataset_1 where destination='Home';

-: PYTHON :-
➢ This shows only the rows where the destination is already
'Home', without changing the data.
df.destination="Home"
df
-:SQL:-
➢ "Select all rows and columns from dataset_1 and sort the result
by the coupon column in ascending order."
select * from dataset_1 order by coupon;

-: PYTHON :-
➢ "Sort the entire DataFrame df by the values in the coupon
column in ascending order."
df.sort_values("coupon")
-:SQL:-
➢ "Select the destination column from the table dataset_1, and
rename (alias) it as Destination."
select destination as Destination from dataset_1 d ;

-: PYTHON :-
➢ Renames the column destination → Destination (with a capital
D).
➢ The change is permanent in the DataFrame df because you used
inplace=True.
df.rename(columns={'destination':'Destination'},inplace=True)
-:SQL:-
➢ "Return the unique values of the occupation column from the
dataset_1 table."
select occupation from dataset_1 group by occupation;
-: PYTHON :-
➢ "Group the DataFrame by the occupation column, count the number
of rows in each group, and return a DataFrame with occupation and
the corresponding counts in a column named Count."
df.groupby('occupation').size().to_frame('Count').reset_index()

-:SQL:-
➢ "For each unique weather type in the table dataset_1, calculate the
average (AVG) temperature, and return the weather type along with its
average temperature, aliased as avg_temp."

select weather ,AVG(temperature) as avg_temp from dataset_1


group by weather;
-: PYTHON :-
➢ "Group the DataFrame df by the weather column, calculate the
average (mean) temperature for each weather group, then convert the
result to a DataFrame with a column named avg_temp, and finally
reset the index so that weather becomes a regular column."
df.groupby('weather')['temperature'].mean().to_frame('avg_temp').reset_i
ndex()

-:SQL:-
➢ "For each unique weather condition in dataset_1, count the number of
temperature entries and label that count as count_temp."
select weather, COUNT(temperature) as count_temp from dataset_1
group by weather;

-: PYTHON :-
➢ "Group the DataFrame df by the weather column, count the number of
rows in each group (including those with NaN values), convert the
result to a DataFrame with the column name Count_temp, and reset
the index so weather becomes a column."
df.groupby('weather')['temperature'].size().to_frame('Count_temp').reset_
index()
-:SQL:-
➢ "For each unique weather condition in dataset_1, count the number of
distinct (unique) temperature values and name that count as
count_distinct_temp."
select weather ,COUNT(DISTINCT temperature) as count_distinct_temp
from dataset_1 group by weather ;

-: PYTHON :-
➢ "Group the DataFrame by weather, count the number of unique values
in the temperature column for each weather group, convert the result
to a DataFrame with the column name count_distinct_temp, and reset
the index to make weather a column."
df.groupby('weather')['temperature'].nunique().to_frame('count_distinct_t
emp').reset_index()

-:SQL:-
➢ "For each unique weather condition in dataset_1, calculate the total
(sum) of the temperature values, and label this total as sum_temp."
select weather ,SUM(temperature) as sum_temp from dataset_1 group by
weather;
-: PYTHON :-
➢ "Group the DataFrame df by the weather column, calculate the sum of
temperature values for each weather group, convert the result into a
DataFrame with the column name sum_temp, and reset the index so
weather becomes a column."
df.groupby('weather')['temperature'].sum().to_frame('sum_temp').reset_i
ndex()

-:SQL:-
➢ "For each unique weather condition in dataset_1, find the minimum
temperature and label it as min_temp."
select weather ,MIN(temperature) as min_temp from dataset_1 group by
weather;

-: PYTHON :-
➢ "Group the DataFrame df by the weather column, find the minimum
temperature for each weather group, convert the result to a DataFrame
with column name min_temp, and reset the index to make weather a
regular column."
df.groupby('weather')['temperature'].min().to_frame('min_temp').reset_in
dex()

-:SQL:-
➢ "For each unique weather condition in dataset_1, find the maximum
temperature and label it as max_temp."
select weather ,MAX(temperature) as max_temp from dataset_1 group
by weather;

-: PYTHON :-
➢ "Group the DataFrame df by the weather column, find the maximum
temperature for each weather group, convert the result into a
DataFrame with the column name max_temp, and reset the index so
weather becomes a column."
df.groupby('weather')['temperature'].max().to_frame('max_temp').reset_i
ndex()

-:SQL:-
➢ "Group the data by occupation and return only the group where
the occupation is 'Student'."
select occupation from dataset_1 group by occupation having
occupation='Student';

-: PYTHON :-
➢ "Group the DataFrame df by occupation, then filter to keep only
the group where the occupation is 'Student'. Finally, count the
number of rows in that group."
df.groupby('occupation').filter(lambda x: x['occupation'].iloc[0] ==
'Student').groupby('occupation').size()

-:SQL:-
➢ "Select all rows from dataset_1 where the weather column starts
with 'Sun'."
select * from dataset_1 where weather like 'Sun%';
-: PYTHON :-
➢ "Filter the DataFrame df to return only the rows where the
weather column starts with 'Sun'."
df[df['weather'].str.startswith('Sun')]

-:SQL:-
➢ "Select all unique (DISTINCT) temperature values from
dataset_1 where the temperature is between 29 and 75
(inclusive)."
select distinct temperature from dataset_1 where temperature
between 29 and 75;

-: PYTHON :-
➢ "From the DataFrame df, filter rows where temperature is
between 29 and 75 (inclusive), then return only the unique
temperature values."
df[(df['temperature'] >= 29) & (df['temperature']
<=75)]['temperature'].unique()

-:SQL:-
➢ "Select the occupation column from dataset_1 where the
occupation is either 'Sales & Related' or 'Management'. "
select occupation from dataset_1 where occupation in('Sales &
Related','Management');

-: PYTHON :-
➢ "Filter the DataFrame df to return only the rows where the
occupation is either 'Sales & Related' or 'Management', and
show only the occupation column."
df[df['occupation'].isin(['Sales &
Related','Management'])][['occupation']]

You might also like