SQL FOR DATA ANALYST
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 1
DATABASE
A database is an organized collection of data.
For Example:
A university database organizes
the data about students, faculty, admin staff, etc. DATA
DATABASE
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DBMS(Data Base Management System)
The software which is used to manage databases
is called Database Management System. DBMS
allows users to create ,read , update and delete
data in database.
For example:
MySQL, Oracle, Microsoft SQL server, PostgreSQL
are the popular commercial DBMS used in
different applications.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
WHY DBMS?
• Data Storage
• Data abstraction
• Controls data redundancy
• Multi - user access and views
• Security
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DATA MODELS
• Data models define how the logical structure of Hierarchical Model
a database is modelled.
• Data models define how data is connected to
Network Model
each other ,how they are processed and stored Entity Relationship Model
inside the system.
Relational Model
No SQL
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 5
RELATIONAL MODEL
In this model , the data is organized in the form of two
dimensional table. That means data is stored in the
form of rows and columns.
Stu_Id Name Branch
201 John CSE
For example:
The table here shows “STUDENT DETAILS” with 202 Mary ECE
attributes such as Stu_Id, Name and Branch which has 5 203 Britto IS
records.
204 Harry IT
205 Steve ME
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
RDBMS
RDBMS is a program which is used to
manage Relational databases in which
data is stored in the form of rows and
columns which can be easily retrieved
,managed and updated.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
MySQL
• MySQL is an open source relational
database that uses structured query
language to interact with databases.
• It stores data in the form of table and can
be modified using SQL.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
WHY MySQL ?
• Easy to use
• Cost effective
• Secured
• Platform-Friendly
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SQL
• SQL stands for Structured Query Language.
• SQL is a database programming language designed for the retrieval and management of
data in a relational database.
• All relational database management systems like MySQL ,Oracle , MS Access and SQL
server uses SQL as their standard database language.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
WHAT SQL DOES?
• SQL is used to create new databases and tables.
• SQL execute queries against a database.
• SQL is used to retrieve, update and insert records into a database.
• SQL used to create stored procedures and views in a database.
• SQL can also set permissions on tables, procedures and views.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SQL COMMANDS
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
MySQL WORKBENCH
• MySQL workbench is a Graphical tool developed by Oracle ,
which is used to work with MySQL server and databases.
• MySQL provides data modelling , SQL development and
various administration tools for configuration.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
CREATE DATABASE
CREATE DATABASE <database name>:
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DROP DATABASE
DROP DATABASE <database name>:
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SHOW DATABASE
SHOW DATABASES:
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
TABLE
SQL table is a collection of data which is
organized in rows and columns. Each row
represents unique record and each column
represents field in the record.
For example:
Here is a table which contains employee data in
which row represents each employee and
column represents employee information such
as employee number, name, designation and
age.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DATA TYPES
The data type of a column defines what sort of data that an object can store such as integer,
character, money, date and time, binary, and so on.
⮚ NUMERIC
⮚ CHARACTER
⮚ DATE AND TIME
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
NUMERIC
Data Type Range
bigint -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
int/integer -2,147,483,647 to 2,147,483,647
small int -32,768 to 32767
tinyint 0 - 255
decimal(p , s) /numeric(p,s) -10 ^38+1 to 10^38 -1
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
Integer-small int,bigint,tinyint :
• It used to store whole numbers.
• It rounds of the decimal number
• Eg: (123.5 .124)
Float(m,d):
• It rounds value to the given value.
• M is significant value and d is exponent value.
• It is used to store approximate numeric values
• Eg:float(7,4)878.00009878.0001.
Numeric(p,s):
• numeric(3,2)123123.00
• p is precision that represents total numbers of digits allowed in column.(1 to 38 but default 18)
• S is scale represents the number of decimal places.
• Precision is fixed.
Decimal(p,s):
• decimal(3,4)123.34123.3400
• Precision is not fixed
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DATE AND TIME
Data Type Format
Date YYYY-MM-DD
Time HH:MM:SS
Year YYYY
YYYY-MM-DD
Timestamp HH:MM:SS
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
CHARACTER
Data Type Range
Char 0 - 255 characters
Varchar(s) 0 – 65535 characters
Text 65535 characters
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
Char(size)
• A fixed length string contains letters , numbers and special characters.
• Size determine the length of characters.
• The default size value is 1 and range is 0-255.
Varchar (size)
• A variable length string containing letters, numbers and special characters.
• The size option sets maximum character length for a column.
• Range is 0-6553 bytes
Binary(size)
• To store binary values.
• The default size is 1.
• Range is 0-255 bytes.
Text(size)
• It stores a string.
• Range is 65,535 characters.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SQL CONSTRAINTS
A DBMS keys are an attributes which help you
uniquely identify a record or row of data in a
NOT
relation. NULL
default unique
keys
Primary
check
key
Foreign
key
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SQL CONSTRAINTS
Not null It ensures that column cannot have null values.
Unique It ensures that all values in a column are unique.
Primary key It’s a combination of not null and unique keys.
Foreign key It prevents actions that would destroy link between tables.
Default Sets a default value for a column if value is specified.
Check It ensures that the values in a column satisfies the
specified condition.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
CREATE TABLE
CREATE TABLE < table name > (
< column1> <data type> <constraint> ,
< column1> <data type> <constraint> ,
< column1> <data type> ,<constraint> ,
…
);
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
INSERT INTO
Insert into is used to insert new records in the table.
INSERT INTO < table name > (column1 ,column2,column3…)
VALUES (<val1>, <val2> ,<val3>…);
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DROP TABLE
DROP TABLE < table name > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
UPDATE
Used to modify the existing records in a table using where condition.
UPDATE < table name >
SET <column1>=<value1> ,<column2>=<value1> …
WHERE <condition> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DELETE
Used to delete the existing records in a table using where condition.
DELETE FROM < table name >
WHERE <condition> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
TRUNCATE
Truncate used to clear the records inside table.
TRUNCATE TABLE< table name > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
ALTER
ADD
It is used to add new column to the table.
ALTER TABLE < table name > ADD < column name > < data type > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
DROP
Used to drop existing column from the table.
ALTER TABLE < table name > DROP COLUMN < column name > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
MODIFY
Used to modify data type of column.
ALTER TABLE < table name > MODIFY < column name > < data type > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
RENAME
Used to rename the existing column name.
ALTER TABLE < table name > RENAME COLUMN < old_column_name >
to <new_column_name> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
RENAME
Used to rename table.
ALTER TABLE < table name > RENAME TO <new_table_name> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
IMPORT TABLE
� Select database
� Right click
� Select Table data import wizard
� Browse to select the file
� Check on create new table
� Click next
� Import
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
IMPORT DATABASE
� Click on Server
� Select data import
� Self contained file
� Open Import progress
� Start import
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
EXPORT TABLE
� Select Table
� Right click
� Select Table data export wizard
� Browse to select destination
� Export
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
IMPORT DATABASE
� Click on Server
� Select data export
� Select the database
� Open export progress
� Start export
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SQL COMMANDS
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SELECT
Select statement is used to fetch all the records from the table or to fetch subset of columns.
SELECT <column1>, < column2 > FROM < table name > ;
SELECT < * > FROM < table name > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SELECT DISTINCT
Used to fetch unique records from the table.
SELECT <distinct(column1)> FROM < table name > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
ALIASES
SQL Aliases are used to give temporary name for column and table .
SELECT <column_name> AS <alias_name> FROM < table name > ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
WHERE CLAUSE
Used to filter records based on condition.
Conditions are of two types:
Comparison : = , > ,< >= ,<= ,!=
Logical : and , or ,not
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
WHERE CLAUSE
Used to filter records based on condition.
SELECT * FROM < table name > where <condition> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
AND
Return records which satisfies both the conditions.
SELECT * FROM < table name > where <condition 1> AND <condition 2> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
OR
Return records which satisfies either of the conditions.
SELECT * FROM < table name > where <condition 1> OR <condition 2> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
NOT
Return records if condition is true.
SELECT * FROM < table name > where not <condition > :
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
ORDER BY
Order by keyword is used to sort the records in Ascending or descending order.
SELECT * FROM < table name > where <condition > order by <column> asc/desc :
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
IS NULL/IS NOT NULL
Is null / Is not null are used to check whether null values exists in the table or not.
SELECT * FROM < table name > where <column1 > is null / not null :
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
LIMIT
It is used to return the specified number of records.
SELECT * FROM < table name > WHERE <condition> LIMIT <value> :
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
IN
It is used to return records by specifying multiple conditions.
SELECT * FROM < table name > WHERE <column1> IN (val1 ,val2,val3) :
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
BETWEEN
It is used to return records within a certain range. It can be values ,dates or text.
SELECT * FROM < table name > WHERE <column1> BETWEEN <val1> AND <val2A> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
LIKE
It is used to return records based on the pattern.
SELECT * FROM < table name > WHERE <column1> LIKE <PATTERN> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
LIKE
% 🡪 It represents zero ,one or multiple characters.
_ 🡪 It is used represent one or single character.
'a%’ 🡪 It returns values that start with a .
'%a’ 🡪 It returns values that end with a .
% or % 🡪 It returns values that have or in any position .
a %b 🡪 It returns values that start with a and end with b .
'a_’ 🡪 It returns values that start with a and are at least 2 characters in length.
‘_s’ 🡪 It returns values that has s in the second position.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
AGGREGATE FUNCTIONS
SQL aggregate functions are used to perform the calculations on multiple rows of a single column
and returns a single value.
TYPES OF AGGREGATE FUNCTIONS ARE:
• Count()
• Sum()
• Avg()
• Min()
• Max()
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
COUNT
Used to return number of rows in a table.
SELECT COUNT(*) FROM table_name WHERE <condition>;
SELECT COUNT(column1) FROM table_name WHERE <condition>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
AVG()
Used to find average value of numeric column .
SELECT AVG(column_name) FROM table_name WHERE<condition>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SUM()
Used to find total sum of column .
SELECT SUM(column_name) FROM <table_name> WHERE <condition>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
MIN()
Used to find smallest value in selected column .
SELECT MIN(column_name) FROM table_name WHERE <condition>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
MAX()
Used to find largest value in selected column
SELECT MAX(column_name) FROM table_name WHERE <condition>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
GROUP BY
The group by statement in SQL is used to arrange identical data into groups with the help of
aggregate functions.
For eg:
1) What if I want to find total sales in different categories?
2) What if I want to find highest sales for each segment?
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
GROUP BY SINGLE COLUMN
SELECT column1 , function(column2) FROM <table_name> GROUP BY <column1> ORDER BY
<column1>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
HAVING CLAUSE
Having clause is used to filter the result obtained by the group by clause based on specific
condition.
SELECT column1 ,function(column2) FROM <table_name> GROUP BY <column1 > HAVING
<condition> ORDER BY <column2>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
RULES TO USE GROUP BY
• The group by clause is used with select statement.
• Group by is placed before having clause.
• Group by clause is placed before order by.
• Having clause is placed before order by.
• Conditions are specified in having clause.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SUBQUERY
SQL subquery is a query within another SQL query. Outer query is main query and inner query is
subquery.
•• Subquery must be enclosed within parenthesis.
••
Order by cannot be used in a subquery.
Between operator cannot be used.
A subquery can have only single column in subquery.
Select columns from table
Where column_name <expression operator>
(select column_name from table_name where <condition>;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
Sub query
• Write a sql query to find the employee who is taking maximum salary?
• Write a SQL query to find the employee who is taking minimum salary?
• Write a query to display second highest salary ?
• Write a query to display the employee name who is taking second highest salary?
• Write a query to display highest salary department wise and name of employee who is taking
that salary?
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
JOINS
SQL Joins are used combine two tables based on
common column, and selects records that have matching
values in these columns.
Types of Joins in SQL
• INNER JOINS
• LEFT JOIN
• Right join
• Full outer join
• Cross join
• Natural join
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
INNER JOIN
Inner join return those records which a have
matching values in both tables.
SELECT <column_names> FROM Table1
INNER JOIN Table2
ON Table1.matching_column=Table2.matching_column ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
empid ename Salary depid depid dname
E1 John 45000 D1 D1 IT
E2 Mary 60000 D2 D2 HR
E3 Steve 73000 D3 D3 Admin
E4 Helen 86000 D4 D4 Finance
E5 Joe 35000 D7 D5 Sales
empid ename Salary depid dname
E1 John 45000 D1 IT
E2 Mary 60000 D2 HR
E3 Steve 73000 D3 Admin
E4 Helen 86000 D4 Financ
e
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
LEFT JOIN
Left join returns all the records from left table and
also matching records from right table.
SELECT <column_names> FROM Table1
LEFT JOIN Table2
ON Table1.matching_column=Table2.matching_column ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
empid ename Salary depid depid dname
E1 John 45000 D1 D1 IT
E2 Mary 60000 D2 D2 HR
E3 Steve 73000 D3 D3 Admin
E4 Helen 86000 D4 D4 Finance
E5 Joe 35000 D7 D5 Sales
empi ename Salary depid dname
E1 John 45000 D1 IT
E2 Mary 60000 D2 HR
E3 Steve 73000 D3 Admin
E4 Helen 86000 D4 Finance
E5 Joe 35000 D7 NULL
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
RIGHT JOIN
Right join returns all the records from right table and also
matching records from left table.
SELECT <column_names> FROM Table1
RIGHT JOIN Table2
ON Table1.matching_column=Table2.matching_column ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
empid ename Salary depid depid dname
E1 John 45000 D1 D1 IT
E2 Mary 60000 D2 D2 HR
E3 Steve 73000 D3 D3 Admin
E4 Helen 86000 D4 D4 Finance
E5 Joe 35000 D7 D5 Sales
depid dname empid ename Salary
D1 IT E1 John 45000
D2 HR E2 Mary 60000
D3 Admin E3 Steve 73000
D4 Finance E4 Helen 86000
D5 Sales NULL NULL NULL
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
FULL OUTER JOIN
Full outer join returns all the records from both the
tables.
SELECT <column_names> FROM Table1
LEFT JOIN Table2
ON Table1.matching_column=Table2.matching_column
UNION
SELECT <column_names> FROM Table1
LEFT JOIN Table2
ON Table1.matching_column=Table2.matching_column ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
empid ename Salary depid depid dname
E1 John 45000 D1 D1 IT
E2 Mary 60000 D2 D2 HR
D3 Admin
E3 Steve 73000 D3
D4 Finance
E4 Helen 86000 D4
D5 Sales
E5 Joe 35000 D7
empid ename Salary depid dname
E1 John 45000 D1 IT
E2 Mary 60000 D2 HR
E3 Steve 73000 D3 Admin
E4 Helen 86000 D4 Finance
E5 Joe 35000 D7 Null
NULL NULL NULL D5 Sales
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
SELF JOIN
A self join is a regular join which helps you to join a table to itself
SELECT column_name(s) FROM table1 T1, table1 T2 WHERE
<condition> ;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
CROSS JOIN
The cross join returns all the records from both the tables.
SELECT column_name(s) FROM <Table 1> CROSS JOIN <Table 2>:
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION
NO SQL
• No SQL stands for “Not only SQL”.
• No SQL database is non-relational database
management system.
• No SQL database stores information in JSON
documents instead of rows and columns.
• No SQL databases are flexible ,scalable and capable
of rapidly responding to the data management
demands of modern businesses.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 80
POPULAR NO SQL DATABASES
• Document databases
• Key-value stores
• Wide-column databases
• Graph databases
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 81
APPLICATIONS OF NO SQL
• Support large number of concurrent users.
• Deliver highly responsive experience to a globally distributed base of users.
• It can store structured , semi-structure ,unstructured and polymorphic data.
• No SQL is used for Big data and real-time web apps.
• Companies like Twitter , Facebook and Google collect terabytes of user data every single
day.
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 82
WINDOW FUNCTIONS IN SQL
An advanced SQL tool used to perform calculation for every row in the dataset using other
records.
Types of Window functions
• Aggregate functions: avg, sum, count, min, max
• Ranking functions: row_number, rank, dense_rank
• Analytic functions: first_value , last value
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 83
WINDOW FUNCTIONS IN SQL
OVER CLAUSE
Over clause defines the partitioning and ordering of rows for the given functions.
Over clause accepts three arguments
• Partition by: divides the query result into sets. The window function is applied over each
partition.
• Order by : Define logical order of the rows
• Rows or range : Further limits the rows within the partition by specifying start and end
points within the partitions
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 84
WINDOW FUNCTIONS IN SQL
Syntax
Select * , aggregate_function(column) over() as alias from table_name;
Select * ,aggregate_function(column) over(partition by column order by column) as alias from
table_name;
© DataMites®. All Rights Reserved | www.datamites.com DATA SCIENCE FOUNDATION 85