0% found this document useful (0 votes)
47 views70 pages

Data Analytics

Uploaded by

smritiku1904
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
47 views70 pages

Data Analytics

Uploaded by

smritiku1904
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 70

DATA ANALYTICS

Data analytics is the science of analyzing raw data to find trends and answer question.

FUTURE SCOPE

Currently, the requirement of Data Analytics Professional is 45% in demand jobs.

With the increasing amount of data being generated every day, data analysts are expected to
continue to be demand in the future.

Python
Python is both compiled and interpreted, object-oriented, high-level programming language
with dynamic semantics.

Creator of python (Guido Van Rossum)

Python was developed in 1991 by a Dutch programmer, Guido Van Rossum.

Guido Van Rossum named Python from the British sketch-comedy series Monty Python’s Flying
Circus, of which he was a big fan.

Python 2.0 was released in 2000

Python 3.0, released in 2008

Features of Python

Object
Oriented
Easy to learn & Dynamically
Use Typed

Expressive GUI programming


Language Support

Interpreted
Extensible
language

Cross Platform
Large Standard Libray
Language

Free and Open


source
Applications of Python

Data Science

Web Development

Data Engineering

Machine Learning

Artificial Intelligence

Data Analytics

Python installation on windows

1. Visit website python.org


2. Select the version of Python you want to install.
3. Download Python Executable Installer
4. Run the Executable installer
5. Verify the installation

Installing PyCharm

Print Statement
To call a print function in python we just need to write print followed by parentheses () and value
written inside quotation marks “”.

Input -
Print (“Hello World”)

Output -
Hello World

Using multiple lines in Print Statement


There are two methods to write a statement in multiple lines:
 To print multiple lines in python, triple Quotations are used.
 \n (backslash) is used to insert something in next line.

Comment in Python
Single Line comments
To add single line comment, #hash is used.
Python completely ignores anything written after #.

Multiline comments
To add multiline comments in python, triple Quotations are used.
Variables
Variables are placeholder, which can store a value.

1) In simple word, Variable is a container that holds data inside it as a value.

Input –
A = “hello world”
Print (a)

Output –
hello world

2) Make sure to not use space while creating a variable.


One can use (_) underscore to separate the names while writing a variable.

3) A variable name should never start with a number or special symbols.

Datatypes and User-Input


Datatypes:
Text-type: String (str) -use “”
Numeric Types: integer (int), floating point (float), complex
Sequence Types: list, tuple and range
Mapping Types: Dictionaries (dict)
Set Type: set, frozenset
Boolean Type: bool
Binary Types: bytes, bytearray, memoryview

User-inputs
To ask for the input from the user. Default datatype is string

Input:
Name = input(“enter your name here”)
Print(Name)

Output:
Entered name by the user:

Input:
Age =int(input(“enter your age here”)
Print(age)

Output:
Entered age by the user
TypeCasting and Subtypes
Conversion of one datatype to another is called as type-casting.

There are two types of type-casting:

Implicit Type Conversion: where python itself converts one datatype to another.
Explicit Type Conversion: where the user converts one data to another

Problem Solving
Write a program to display a person’s name, age and address in three different lines.

Write a program to swap two variables.

Write a program to convert a float into integer.

Write a program to details from a student for ID-card and then print it in different lines.

Write a program to take an user input as integer then convert into float.
Operators and Operands
Operators indicates what operation is to be performed while Operands indicates on what

the action or the operation should be performed.

x+y=0

In the given expression; x, y, and 0 are Operands and + = are Operators.

Types of Operators:

Operators can be further divided into 6 categories:

1. Arithmetic Operators
2. Comparison Operators
3. Logical Operators
4. Assignment Operators
5. Identity Operators
6. Membership Operators
7. Bitwise Operators

Python Arithmetic Operators


Addition (+) 5+4 = 9

Subtraction (-) 5-4 = 1

Multiplication (*) 5*4 = 20

Modulus (%) 8/3 = reminders= 8%3 = 2

Division (/) 4/2 =2

Exponentiation (**) 2^10 = 2**10

Floor Division (//) 8/3 = for getting values before decimals = 8//3 = 2

Comparison Operators
< Less than

<= Less than or Equal to

!= Not Equal to

== Equal to

>= Greater than or Equal to


> Greater than

Logical Operators
Operator Meaning Example

And True if both the operands are true x and y 3>2 and 3>7

Or True if either of the operands is true x or y 3>2 or 3>7

Not True if operand is false not x not (3>2 and 3>7)

(complements the operand) not (3>2 or 3>7)

Assignment Operators
Assignment Operators are used in Python to assign value to variables.

a = 6 is a simple assignment operator that assign the value 6 on the right to the variable a on the left.

Operator Example Equivalent to

= x=6 x=6

+= x+=6 x=x+6

-= x-=6 x=x–6

*= x*=6 x = x*6

Identity Operators
Identity Operators are used to compare items to see if they are the same object with the same
memory address.

Types:

1. Is
2. Is not

Bitwise Operators
These Operators are used to compare the Binary number
Types:

1. AND (&) Operator


2. OR (|) Operator
3. XOR (^) Operator
4. << zero fill left shift
5. >> zero fill right shift

1 = on

0 = off

AND Operator:

Implementation of And Operator on Binary Digits

Operation Result

0&0 0

1&0 0

0&1 0

1&1 1

Binary number

10

Divisor Dividend Remainder

2 10 0

2 5 1 No. Binary no.

2 2 0 10 =1010

1 1 8 =1000

8 1000

15

Divisor Dividend Remainder

2 15 1

2 7 1

2 3 1

1 1
OR Operator:

Implementation of Or Operator on Binary Digits

Operation Result

0|0 0 No. Binary no.

1|0 1 10 =1010

0|1 1 8 =1000

1|1 1 10 1010

XOR Operator:

Implementation of Xor Operator on Binary Digits

Operation Result

0^0 0 No. Binary no.

1^0 1 10 =1010

0^1 1 8 =1000

1^1 0 2 0010

Zero fill left shift

Cut by given number from end and add 0

10>>2

10 = 1010 = 0010 = 2

10>>1

10 = 1010 = 0101 = 5

Zero fill right shift

Add 0 in end

10<<1

10 = 1010 = 10100 = 20

10<<2

10 = 1010 = 101000 = 40
Membership Operators
Membership Operators are used to check the presence of the sequence in an object.

Types:

1. In
2. Not in

Conditional Statements
Conditional Statement allows computer to execute a certain condition only if it is true.

Types of Conditional Statements:

1. If the Statement
2. If-else Statement
3. If-elif-else Statement
4. Nested Statement
5. Short hand If Statement
6. Short hand If-else Statement

If the Statement
The If statement is the most fundamental decision-making statement

The If statement in Python has the subsequent syntax:

If expression

Statement

Mark = 87

If (condition) :

Body of if

Print (“thank you”)

If - Else Statement
If-else statement is used when you want to give two conditions to the computer.

Here if one condition is false, program executes the another condition.

if condition:

#Will executes this block if the condition is true

else:

#Will executes this block if the condition is false

A = 10

If (condition):

(body of if)

Else:

(body of else)

If-elif-else Statement
In this case, the if condition is evaluated first. if it is false, the elif statement will be executed, if it also
comes false then else statement will be executed.

For multiple conditions, more elif statements are added.

If (condition):

(body of if)

Elif (condition):

(body of elif)

Else:

(body of else)

Nested IF Statement
A Nested IF statement is one in which an If statement is nestled inside another If statement. This is
used when a variable must be processed more than once. The Nested if statement in Python has the
following syntax: if (condition1):

#Executes if condition 1 is true

if (condition 2):

#Executes if condition 2 is true


#Condition 2 ends here

#Condition 1 ends here

If (condition1):

Body of if

If (condition2)

Body of if

Else:

Body of else

Short Hand if statement


Short Hand if statement is used when only one statement needs to be executed inside the if block.
This statement can be mentioned in the same line which holds the If statement.

The Short Hand if statement in Python has the following syntax:

if condition: statement

if (condition): Body of if

Short Hand if-else statement


It is used to mention If-else statements in one line in which there is only one statement to execute in
both if and else blocks. In simple words, If you have only one statement to execute, one for if, and
one for else, you can put it all on the same line.

(Body of if) if (condition) else (body of else)

Problem Solving
1. Write a program to check if a number is positive.

2. Write a program to check whether a number is odd or even.

3. Write a program to create area calculator.

4. Write a program check whether the passed letter is a vowel or not.

5. Write a program to check if a number is a single digit number, 2- digit number and so on.., up to 5
digits.
Introduction to Loops
A loop means to repeat something in the exact same way.

Types of loops are:

1. For loop
2. While loop
3. While True
4. Nested loop

For Loop
 For loop is a loop that repeat something in a given range.
 The range has a starting point, ending point and step/gap in it.
 +1 is added to the ending point while defining a range.

For (Variable) in range (1,6,2):

Print (“Hello world”)

1,6 = range

2 = gap

n=7

for i in range (1,11):

print (n,”x”,i, “=”,n*i)

While loop
 While Loop executes till the given condition is true.
 In while loop, the increment is done inside the loop.

While (condition):

(Body of while)

(Increment)

While True
 It is an infinite loop
 To break a while true loop, break statement is used.
 while True:
print("hello")
this creates infinite loop
while True:
num1 = int(input("Enter a number here: "))
num2 = int(input("Enter another number here: "))

print(num1+num2)
repeat = input("Do you want to stop the program: ")
if repeat == "yes":
break
when repeat is yes only that case this loop stop. For stop this loop must be given break statement.

Nested loop
 A loop inside a loop is called is called as nested loop.
 Nested loops are also used to solve patter problems.

For loop:
For loop:
(body of inner for loop)
(body of outer for loop)

For Loop with Conditional Statements


The use of if-else statements increases the ability of for loop to completes the task effectively. By
using if-else statement we can provide with special condition inside for loop.
for i in range (1,11):
if i == 3:
print("add this song to the favs")
else:
print(i)

Break and Continue Statement


Continue Statement:

Continue Statement is used when you want to skip a particular condition.

Break Statement:

Break Statement is used when you want to destroy a loop at a certain condition and come out of the
loop.
for i in range(1,11):
if i == 5:
continue
else:
print(i)

Problem Solving

Write a program to find a sum of all the even number up to 50.

Write a program to write first 20 number and their squared numbers.

Write a program to find sum of first 10 odd numbers using while loop.

Write a program to check if a number is divisible by 8 and 12, up to 100 numbers.

Write a program to create a billing system at supermarket.

A ="Why fit in, When you are Born to Stand Out!"

Write a program to find the length of the following string.


Write a program to check how many time alphabet o is occurring.

Write a program to convert the whole string into lower and upper cases.
Write a program to convert the following string into a title.
Write a program to find the index number of “fit in”.

1
12
123
1234
12345
Write a program to display this patter

11111
2222
333
44
5
Write a program to display this patter
*
**
***
****
*****
Write a program to display this patter

1
21
321
4321
54321
Write a program to display this patter

*
**
***
****
*****
****
***
**
*
Write a program to display this pattern

1
24
369
4 8 12 16
5 10 15 20 25
6 12 18 24 30 36
7 14 21 28 35 42 49
8 16 24 32 40 48 56 64
Write a program to display this following pattern

String Manipulation/Functions
Strings are the combination of number, symbols and letters, enclosed inside doubles quotations.

It means assigning a string value to a variables.

a = (“hello world”)

print (a)

a = “hello world”

1) Length
print(len(a)) = 11
2) Count
print(a.count(“o”) = 2
3) Upper
print(a.upper())
4) Lower
print(a.lower())
5) Index
print(a.index(“o”))
6) Capitalize
print(a.capitalize())
7) Casefold
print(
8) Find
print(a.find(“l”))
9) Format
print(a.format(Name))
10) Center
print(name.center(10))
print(name.center(10,’*’))

isalum - Reture True if all characters in the string are alphanumeric

isalpha - Reture True if all characters in the string are in the alphabet

isdecimal - Reture True if all characters in the string are decimals

isdigit - Reture True if all characters in the string are digits

isnumeric - Reture True if all characters in the string are numeric

islower - check if a string is lower case or not


isupper - Reture True if all characters in the string are upper case

isspace - Reture True if all characters in the string are whitespaces

istitle - Reture True if the string follows the rules of a title

endswith() - Reture true if the string ends with the specified value

startswith() - Reture true if the string starts with the specified value

swapcase() - Swaps cases, lower case becomes upper case and vice versa

strip() - Reture a trimmed version of the string

split() - Splits the string at the specified separator, and a list

ljust() - Reture a left justified version of the string

rjust() - Reture a right justified version of the string

replace() - Reture a string where a specified value is replaced with a specified value

rindex() - Searches the string for a specified value and retures the last position of where it
was found
rfind() - Searches the string for a specified value and retures the last position of where it
was found

Slicing in Strings
First colon “:” is for range of printing & Second colon “:” is for giving gap in printing

-num before the : is using for backward counting of range & -num after the :is using for reverse
printing or +num for gapping

: = define which part are print

:: = define gap between print

::-1 = print from backward (print reverse)

-4: = count word print backside (no reverse printing)

::2 = gapping of 2 in list

Problem Solving
Q. Write a program to get Fibonacci series up to 10 numbers.

Fibonacci series: 01(0+1)1(1+1)2(1+2)3(2+3)5......10num


Q. Write a program to check if a number is prime or not.

Q. Write a program to find a palindrome of integers.

Palindrome integer = 131 111 1221 read from starting or end it would be same. 123 134 = it is not
Palindrome intreger.

Q. Write a program to create an area calculator.

A = "OOTD.YOLO.ASAP.BRB.GTG.OTW"

Write a program to separate the following string into coma(,) separated values.

Write a program to strings alphabetically in python.

Write a program to remove a given character from a string.

Z = "F.R.I.E.N.D.S."

Write a program to remove dot(.) from the following string.

Write a program to check the number of occurrence of a substring.

Take an input from a user as a string then, reverse it.

Write a program to check if a string contains only digits.

Write a program to check if a string is palindrome.

Write a program to find number of vowels in a string.

Write a program to check if every word in a string begin with a capital letter.
Introduction to lists

List:
Lists are the collection of ordered and mutable data.

 List are written inside the squared brackets.


 The value inside a list is separated by coma(,).
 Mutable means once created, they can be changed.
 Multiple datatypes can be written inside a list.

Slicing List:
First colon “:” is for range of printing & Second colon “:” is for giving gap in printing

-num before the : is using for backward counting of range & -num after the :is using for reverse
printing or +num for gapping

: = define which part are print

:: = define gap between print

::-1 = print from backward (print reverse)

-4: = count word print backside (no reverse printing)

::2 = gapping of 2 in list

List Iteration:
Iteration using For Loop

Iteration using For Loop with Range and Length function

Iteration using While Loop

Using Short-Hand For Loop

List Functions:
To find the length of a list
To count an occurence of a particular element
To add to the list
To add to a specific location
To remove from a list
To remove from a certain location
to create a copy of a list
to access an element
to entend the list
to reverse the list
to sort the list
to clear all the data from list

List Comprehension:
l1 = [30,40,50,60]
l2 = []

for i in l1:
if i>45:
l2.append(i)
print(l1,"\n",l2)
l3 = [i for i in l1 if i>45]
print(l3)

Problem Solving
A = ["Ross", "Rachel","Monica","Joe"]

Write a program to swap first and forth element.

Write a program to add a new value at second position.

Write a program to delete a value from 3rd postion

B = [13,7,12,10]

Write a program to multiply all the numbers in the list.

Write a program to get the largest number from the list.

Write a program to get the smallest number from the list.


Introduction to Tuples
Tuples are the collection of ordered and un-mutable data.

 For tuples no brackets are mandatory. By choice one can use parenthese.
 The value inside a Tuple is separated by coma(,).
 Once created, tuples cannot be changed.
 Multiple datatypes can be written inside a tuples.

Slicing and Iteration in Tuples


First colon “:” is for range of printing & Second colon “:” is for giving gap in printing

-num before the : is using for backward counting of range & -num after the :is using for reverse
printing or +num for gapping

: = define which part are print

:: = define gap between print

::-1 = print from backward (print reverse)

-4: = count word print backside (no reverse printing)

::2 = gapping of 2 in list

 with for loop


 along with range and length in for loop
 along with while loop

Conversion of Tuples and Tuples Functions


Count or Index

a.count(“--”)

a.index(“--”)

For adding name in tuple

Convert Tuple into List

Use a.append(“**”)

Then Reconvert List into Tuple


Problem Solving
Student_data = {"name":"David","age":13,"marks":87}

 Convert the following dictionary into JSON formet.


 Access the value of age from the given data.
 Pretty Print following JSON data.
 Sort the following JSON keys and write them into a file.
 Access the nested key marks from the following nested data
Introduction to Dictionary
Dictionary allows user to write the data in the form of keys and values.

 Dictionaries are enclosed inside curly bracket {}.


 Key and Values are separated colon
 Every key value pair is separated by a coma (,).

Iteration in Dictionary
Printing all the key names one by one

Printing all the value names one by one

Using Value Function

Using item function

Dictionary Functions
get item

keys values

copy setdefault

update pop

popitem clear

Nested Dictionary
Employee = {1: {“Name”:”John”,”age”:24}

2:{“Name”:”Lily”,”Gender”:”Female”}}

Problem Solving
Write a python program to sort a dictionary by value.

Write a python script to print a dictionary where the keys are numbers between 1 to 15 and the
values are square of keys.

Write a program to multiply all the items in dictionary.

Write a python program to sort a dictionary by key.


Sets
Sets are unordered collection of data. Every element inside the set is unique and mutable.

 Sets are written inside the curly brackets.


 The value inside a set is separated by coma (,).
 Mutable means once created, they can be changed.

Sets Functions
add isdisjoint

pop issubset

remove issuperset

discard update

copy clear

Union

Difference

Difference update

Intersection

Intersection Update

Symmetric Differerce

Symmetric Differerce Update

Problem Solving
Write a program to find max and min in a set.

Write a program to find common elements in three lists using sets.

Write a program to find difference between two sets.

Write a Python program to remove an item from a set if it is present in the set.

Write a Python program to check if a set is a subset of another set.


Introduction to Functions
Functions are a set of code, which once created, they can be used throughout the program.

Functions help break our program into smaller parts and helps it look more organized and
manageable.

Functions

Define the Call the


Function Function

(Body) Var ()

Parameters and Arguments


Parameters are the variables written inside the parentheses with the name of function.

Arguments are the values passed to the parameters while calling the function.

Return Statement and Recursion in Python


Return keyword in python is used to exit a function and return the value of the function.

Recursion in most commonly used mathematical and programming concept.

In simple word, recursion means a function can call itself, giving us a benefit of looping through data
in order to get a result.

Advantages and Disadvantages of Recursion


Adv.:

 They make the code look clean and organized.


 By the use of recursive function, a complex task can be broken down into small sub-parts.
 Sequence generation becomes easier.

Disadv.:

 Recursive Function take up a lot of memory.


 Sometimes the logic becomes hard to follow.
 Debugging is difficult.

Lambda Function in Python


It is used when an anonymous function is required for short period of time.

It can take numerous arguments.

It can only have one expression.

Local and Global Variables


Local Variables are restricted to only one block of code and cannot be changed throughout the
program.

Global Variables are not restricted to one block of code and be changed throughout the program.

Problem Solving
Write a function to find maximum of three number in Python.

Write a Python function to create and print a list where the values are square of numbers between 1
and 30.

Write a Python function that takes a number as a parameter and check if the number is prime or not.

Write a Python function to sum all the numbers in a list.

Write a Python program to solve the Fibonacci Sequence using Recursion.


Introduction to Modules
Modules are the (.py) files, that contains set of functions you want to include in your program.

In-Built Modules in Python


Datetime

%A = day %a = day in short

%B = Month

%Y = year

%p = pm/am

%M = mintues

%S = Second

%f = microsec

Random

Math

Creation of Modules
To create a module, all you need to do is create a .py file in a similar path to your python file. Inside
that file, you can add required function you need your program to perform.

To call the module inside your program, all you need to do is use import keyword followed by the
name of your .py file.

Main.py demo.py

Import demo def add(x,y)

a = demo.add(2,3) return(x+y)

print(a)

HOTEL CANCELLATION PROJECT ON EXCEL


Pivot table with Dashboard
Jupyter Notebook / Pycharm
Introduction to NumPy
NumPy is the short form of Numerical Python.

In 2005, Travis Oliphant created NumPy package.

NumPy is a package that define a multi-dimension array object and associates fast math functions
that operate on

It also has function for working in domain of linear Algebra, Fourier Transformation and Matrices.

In simple words, it is the fundamental package for scientific computing in Python.

Arrays:
An array is defined as a collection of items that are stored at contiguous memory locations.

It is a container which can hold a fixed number of items, and these items should be of the same type.

A combination of arrays saves a lot of time, The array can reduce the overall size of the code.

Advantages of using Arrays:


NumPy uses much less memory to store data.

NumPy makes it extremely ease to perform mathematical operations on it.

Used for the creation of n-dimensional arrays.

Finding elements in NumPy array is easy.

Arrays v/s Lists:


A list cannot directly handle Mathematical Operations, while Array can.

An array consumes less memory than a List.

Using an Array is faster than List.

A list can store different datatypes, while you can’t do that in an Array.

A list is easier to modify since a list store a list store each element individually, it is easier to add and
delete an element than an array does.

In Lists one can have nested data with different size, while you can’t do the same in Array.
Creation, Indexing and Slicing of NumPy Arrays:
NumPy - Creating Arrays, Slicing and Attributes

Inspection an Arrays
Function:

 a.shape – Array dimensions


 len(a) – Length of array
 b.ndim – Number of array dimension
 e.size – Number of array element
 b.dtype – Data type of array elements
 b.astype(int) – convert on array to a different type

Mathematical Operations and Functions on Arrays


Function:

 g=a–b
 np.subtract(a,b)
 b +a n
 p.add(b,a)
 a/b
 np.divide(a,b)
 a*b
 np.multiply(a,b)
 np.exp(b)
 np.sqrt(b)
 np.pow(a)

Combining and Splitting Arrays


 np.concatenation
 np.vstack (vertical concatenation)
 np.hstack (horizontal concatenation)

Adding and Removing Element in the Arrays


 np.append(h,g) – Append items to an array
 np.insert(a,1,5) – Insert items in an Array
 np.delete(a,[1]) – Delete items from an Array
Search, Sort and Filter Arrays
NumPy – Sort, Filter and Search

 np.sort()
 np.where(condition)
 np.searchsorted()
 fa = condition
new = arr[fa]

Aggregating Functions in Arrays


 print(np.sum(a))
 print(np.min(a))
 print(np.max(a))
 print(np.size(a))
 print(np.mean(a))
 print(np.cumsum(a))
 print(np.cumprod(a))

Statistical Functions in Arrays


 print(np.mean())
 print(np.median())
 print(stats.mode())
 print(np.std())
 print(np.var())
 print(np.corrcoef())
Introduction to Pandas
Pandas is a Python package providing fast, flexible and expressive data structure designed to make
working with “relational” or “labelled” data both easy and intuitive.

Here are just a few of the thing that pandas does well:

It has functions for analyzing, cleaning, exploring and manipulating data.

The name “Pandas” has reference to both “Panel Data”, and “Python Data Analysis” and was created
by Wes McKinney in 2008.

Pandas Applications
Easy handling of missing data.

Size Mutability: Columns can be inserted and deleted from DataFrame and higher dimensional
objects.

Automatic and explicit data alignment: object can be explicitly aligned to a set of labels, or the user
can simply ignore the labels and let Series, DataFrame, etc.

Automatically align the data for you in computations.

Powerful, flexible group by functionality.

Intelligent label-based slicing, fancy indexing, and subsetting of large data sets.

Intuitive merging and joining data sets.

Flexible reshaping and pivoting of data sets.

Data Structures in Pandas


The best way to think about the pandas data structure is as flexible containers for lower dimensional
data. For example, DataFrame is a container for Series, and Series is a container for scalars. We
would like to be able to insert and remove object from these containers in a dictionary-like fashion.

Series in Pandas
Pandas Series is one-dimensional labelled array capable of holding data of any type (integer, string,
float, python objects, etc.) The axis labels are collectively called index. Pandas Series is nothing but a
column in an excel sheet.

The object supports both integer and label-based indexing and provides a host of method for
performing operations involving the index.
DataFrames
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labelled axes (rows and columns). A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in row and column. Pandas DataFrame consists of three
principal components, the data, row, and columns.

Creation of DataFrames in Pandas


 data = {"Name":["John","Peter","Lisa"],
"Age":[25,28,31],
"Salary":[30000,45000,25000]}
df = pd.DataFrame(data)
 data = pd.read_csv("D:/Smriti/hotel_booking.csv")
 data = pd.read_excel("D:/Smriti/Data.xlsx")

Exploring Data in Pandas


 print(data.head(num))
 print(data.tail(num))
 print(data.info())
 data.describe()
 print(data.isnull())
 print(data.isnull().sum())

Handling Duplicate Values in Pandas


 print(data)
 print(data.duplicated())
 print(data["EEID"].duplicated())
 print(data["EEID"].duplicated().sum())
 print(data.drop_duplicates("Unique colume"))

Working with Missing Data in Pandas


 print(data.isnull())
 print(data.isnull().sum())
 print(data.dropna())
#remove null data with column use carefully
 print(data["Salary"].mean())
data["Salary"] = data["Salary"].replace(np.nan,57500)
print(data)
#find mean and fill null value in Salary column
 print(data.fillna(method = "bfill")) #fill backward value in null
 print(data.fillna(method = "ffill")) #fill forword value in null
 print(data.fillna("hi")) #fill any value in null cell

Column Transformation in Pandas


 df = pd.read_excel("D:/Smriti/Employee_Data.xlsx")
print(df)
df.loc[(df["Bonus %"]==0),"GetBonus"] = "no bonus"
df.loc[(df["Bonus %"] > 0),"GetBonus"] = "bonus"
print(df.head(10))

#create a new column of GetBonus according to condition given

 data["Full Name"] = data["Name"]+" " + data["Last Name"].str.capitalize()


print(data)

 data["Bonus"] = (data["Salary"]/100)*20
print(data)

 data = {"Months":["January","February","March","April"]}a = pd.DataFrame(data)


print(a)
def extract(value):
return value[0:3]
a["Short Months"] = a["Months"].map(extract)
print(a)

Group By in Pandas
 gp = data.groupby("Job Title").agg({"EEID":"count"})
 gp = data.groupby("Department").agg({"EEID":"count"})
 gp = data.groupby(["Department","Gender"]).agg({"EEID":"count"})
 gp1 = data.groupby("Country").agg({"Age":"mean"})
 gp2 = data.groupby("Country").agg({"Annual Salary":"max"}
 gp3 = data.groupby(["Country","Gender"]).agg({"Annual Salary":"max"})
 gp4 = data.groupby(["Country","Gender"]).agg({"Annual Salary":"max","Age":"min"})

Merge, Join and Concatenate in Pandas


 print(pd.merge(df1,df2,on = "Emp. ID"))
 print(pd.merge(df1,df2,on = "Emp. ID",how = "left")) #left data
 print(pd.merge(df1,df2,on = "Emp. ID",how = "right")) #Right data
 print(pd.concat([ ]))

Compare DataFrames Pandas


 print(df.compare(df2))
 print(df.compare(df2,keep_shape=True))
 print(df.compare(df2,keep_equal=True))
 print(df.compare(df2,align_axis=0))

Pandas – Pivoting and Melting DataFrames


Pivot

 print(df.pivot(index="keys",columns="Names",values="Houses"))
 print(df.pivot(index="keys",columns="Names",values=["Houses","Grades"]))

Melt

 print(pd.melt(df,id_vars=["Names"],value_vars=["Houses"]))
 print(pd.melt(df,id_vars=["Names"],value_vars=["Houses","Grades"]))
 print(pd.melt(df,id_vars=["Names"],value_vars=["Houses","Grades"],var_name="Houses&Gr
ades",value_name="values"))
Introduction to Matplotlib
Data Visualization

Data Visualization is the graphical representation of information and data.

In the world of Big Data, data visualization tools and technologies are essential to analyze massive
amounts of information and make data-driven decisions.

Name Apples
Apples Eaten This Month
Jill 13
16
Jack 11
14
Susan 7
12
Adrian 10
10
Sam 6
8
Seth 14
6
Maria 10
4
Jamal 9
2
0
Jill Jack Susan Adrian Sam Seth Maria Jamal

Matplotlib:

Matplotlib is a low-level graph plotting library in python that serves as a visualization utility.

Matplotlib was created by John D. Hunter.

Matplotlib is open-source and we can use it freely.

Matplotlib Chart:
Bar Plot Matplotlib
import matplotlib.pyplot as plt

y = [98,67,88,95,88]

x = ["Part1","Part2","Part3","Part4","Part5"]

color = ["Red","Green","Blue","Yellow","Orange"]

plt.bar(x,y,color = color)

or

plt.bar(x,y,color = "red")

or

plt.bar(x,y,color = color,edgecolor = "black")

plt.xlabel("Parts of Harry Potter",fontsize = 17)

plt.ylabel("Popularity",fontsize = 17)

plt.title("Popularity of Different Parts Of Harry Potter",fontsize = 20)

plt.show

#Shift+Tab on bar in box gives all parameters

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_excel("D:/Smriti/Expenses.xlsx")
df = pd.DataFrame(data)
print(df)
grouped_by = df.groupby("Payment Mode")["Amount"].sum()
print(grouped_by)
plt.bar(grouped_by.index,grouped_by.values)
plt.show()

Line Plot-Matplotlib
import pandas as pd

import matplotlib.pyplot as plt

data = pd.read_excel("D:/Smriti/Expenses.xlsx")

df = pd.DataFrame(data)
#print(df)

grouped_by = df.groupby("Category")["Amount"].sum()

print(grouped_by)

#plt.plot(df["Data"],df["Amount"]) its not show all data so it is not effective

plt.plot(grouped_by.index,grouped_by.values)

plt.show()

# plt.plot(x,y,marker ="o")

#plt.plot(x,y,marker ="*")

#plt.plot(x,y,marker ="D")

# plt.plot(x,y,marker ="^",ls = ":")

plt.plot(x,y,marker ="o",ls = "--",color = "red",label = "week1")

plt.plot(x,y1,marker ="*",ls = "--",color = "green",label = "week2",alpha = 0.5)

plt.legend()

plt.show()

Scatter Plot-Matplotlib

import matplotlib.pyplot as plt

import numpy as np

x = np.random.randint(1,10,50)

y = np.random.randint(10,100,50)

color = np.random.randint(10,100,50)

size = np.random.randint(10,100,50)

print(x,y)

plt.scatter(x,y,marker = "*",cmap ="hot",c = color, s = size)#cmap ="any" is show all color

plt.colorbar()

plt.show()

data = pd.read_excel("D:/Smriti/Employee_Data.xlsx")

#print(data,type(data))

df = pd.DataFrame(data)
size = df["Age"]

plt.scatter(df["Annual Salary"],df["EEID"],s = size)

plt.show()

or

plt.scatter(df["Age"],df["EEID"],s = 10)

plt.show()

Pie Chart Matplotlib


brands =["Oneplus","Apple","Samsung","Nokia","Redmi"]

x = [22,35,30,3,10]

c = ["red","purple","blue","Magenta","orange"]

ex =[0,0.1,0,0,0]

plt.pie(x,labels = brands,colors =c,explode = ex,shadow = True, autopct = "%.2f",startangle =90) #2


for 2 demical

plt.show()

data = pd.read_excel("D:/Smriti/Expenses.xlsx")

df = pd.DataFrame(data)

group_by = df.groupby("Payment Mode")["Amount"].sum()

print(group_by)

ex = [0,0,0.1]

plt.pie(group_by.values,labels = group_by.index,autopct = "%.2f",explode = ex)

plt.show()

Box Plot Matplotlib


Box plot

Max (Upper Fence) Max = Q3 + 1.5(IQR) IQR = Q3-Q1

Q3 = 75% Q3 = 3n+1/4

Median median = n+1/2

Q1 = 25% Q1 = n+1/4

Min (Lower Fence) Min = Q1 – 1.5(IQR)


[1, 3, 4, 7, 12, 2, 8, 9, 24]

[1, 2, 3, 4, 7, 8, 9, 12, 24]

Q1 = 10/4 = 2.5th

Q3 = 3*10/4 = 7.5th

IQR = 7.5 -2.5 = 5th

UF = 7.5 + 1.5(5) = 15th = v = 12

LF = 2.5 -1.5(5) = -5th = v =1

Median = IQR = 10/2 = 5th = v = 7

24 is outlier

Histogram Matplotlib
data = pd.read_excel("D:/Smriti/Employee_Data.xlsx")

df = pd.DataFrame(data)

plt.hist(df["Age"],bins = 15, edgecolor = "Black")

plt.show()

Violin Plot-Matplotlib
plt.violinplot(df["Annual Salary"],showmedians = True)

plt.show()

Stem Plot Matplotlib


plt.stem(df1["Age"])

plt.plot(df1["Age"])

plt.show()

Stack Plot Matplotlib


plt.stackplot(days,NOP1,NOP2,NOP3,colors = ["red","orange","yellow"],labels =["Week 1","Week
2","Week 3"])

#baseline = "sym" or "wiggle"


plt.legend()

plt.show()

grouped = df.groupby("Category")[["Calories","Protein","Fat"]].agg("mean")

print(grouped)

plt.stackplot(df["Category"].unique(),grouped["Calories"],grouped["Protein"],grouped["Fat"])

OR

plt.stackplot(grouped.index,grouped["Calories"],grouped["Protein"],grouped["Fat"])

plt.show()

Step Plot Matplotlib


group = df.groupby("Category").agg({"Amount":"sum"})

print(group)

plt.step(group.index,group["Amount"],where = "mid",marker ="o")

plt.show()

Legend Matplotlib
plt.figure(figsize= [5,3])

plt.plot(x,y,label = "Male")

plt.plot(x,y1,label = "Female")

plt.legend(bbox_to_anchor = (0.8 , 1.2),ncols = 2,labelspacing = 1)

#plt.legend(loc = 0) loc stand for label place 0-10

#plt.legend(["a1","a2"] ncols = 2) a1,a2...is for label name; ncols is for create 2 column in label

plt.show()

Subplot Matplotlib
import matplotlib.pyplot as plt

x = [1,2,3,4,5]

y = [45,34,56,23,45]

plt.subplot(2,2,1) #rows, column, chartnum

plt.plot(x,y)
x = [5,6,7,8,9] import matplotlib.pyplot as plt

y = [67,50,66,56,82] x = [1,2,3,4,5]

plt.subplot(2,2,2) y = [45,34,56,23,45]

plt.bar(x,y) plt.subplot(1,2,1)

plt.title("Age")

x = [2,4,6,8,10] plt.bar(x,y)

y = [57,50,60,55,60]

plt.subplot(2,2,3) x = [5,6,7,8,9]

plt.scatter(x,y) y = [67,50,66,56,82]

plt.subplot(1,2,2)

plt.plot(x,y)

x = [1,3,5,7,9] plt.title("Weight")

y = [67,51,62,50,23] plt.suptitle("Employee Data")

plt.subplot(2,2,4) plt.show()

plt.step(x,y)

plt.show()

Save a Chart Using Matplotlib


plt.savefig("bar.png",facecolor = "black")

plt.savefig("pie.png",facecolor = "green",pad_inches = 0.3,bbox_inches = "tight")


Introduction to Seaborn
Data Visualization

Data Visualization is the graphical representation of information and data.

In the world of Big Data, data visualization tools and technologies are essential to analyze massive
amounts of information and make data-driven decisions.

Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface
for drawing attractive and informative statistical graphics.

Seaborn Chart

Line Plot Seaborn


import seaborn as sns

import pandas as pd

import matplotlib.pyplot as plt

data = {"Days":[1,2,3,4,5],

"NOP":[50,40,60,30,44]}

df = pd.DataFrame(data)

sns.lineplot(data = data, x = "Days", y = "NOP")


plt.show()

data = pd.read_excel("D:/Smriti/Employee_Data.xlsx")

#print(data)

color = sns.color_palette("GnBu")

sns.lineplot(data = data, x = "Business Unit", y = "Age", hue = "Ethnicity", palette = color)

color is use with hue

#sns.lineplot(data = data, x = "Business Unit", y = "Age", hue ="Gender", style = "Ethnicity")

#sns.lineplot(data = data, x = "Business Unit", y = "Age", hue ="Ethnicity", style = "Gender")

plt.show()

Bar Plot Seaborn


seaborn github dataset >> https://github.com/mwaskom/seaborn-data >> different data set name
here of Seaborn

data = sns.load_dataset("tips")

print(data)

sns.barplot(data = data, x = "day", y = "tip",hue = "sex",palette = "spring", order =


["Sun","Sat","Fri","Thur"],errorbar = ("ci",0))

#estimator = "mean","median","sum"

plt.plot()

Histogram Plot Seaborn


data = sns.load_dataset("tips")

sns.histplot(data,x = "day", hue = "sex", kde = True)

plt.show()

data = sns.load_dataset("titanic")

print(data)

sns.histplot(data = data,x = "age", hue ="who",kde = True, bins = 30)

#discrete = True

plt.show()
Scatter Plot Seaborn
data = sns.load_dataset("tips")

#print(data)

sns.scatterplot(data = data, x = "total_bill", y = "tip", hue = "day", size = "size",marker = "o",palette =


"viridis")

plt.show()

data = pd.read_excel("D:/Smriti/Employee_Data.xlsx")

#print(data)

sns.scatterplot(data = data, x = "Age", y = "Annual Salary",size = "Bonus %", hue = "Department")

plt.legend(bbox_to_anchor = (0.2,0,1.2,1)) # x, y, width, Height

plt.show()

Heatmap Seaborn
data = sns.load_dataset("tips")

gp = data.groupby("day").agg({"tip":"mean"})

sns.heatmap(gp)

data = pd.read_excel("D:/Smriti/Employee_Data.xlsx")

gp = data.groupby(["Job Title"]).agg({"Annual Salary":"mean"})

sns.heatmap(gp)

Count Plot Seaborn


data = sns.load_dataset("tips")

sns.countplot(data = data,x = "day")

plt.show()

df = pd.read_excel("D:/Smriti/Employee_Data.xlsx")

sns.countplot(data= df, x = "Ethnicity",hue = "Gender",palette = "viridis")

plt.show()
Violin Plot Seaborn
data = sns.load_dataset("tips")

sns.violinplot(data = data,x ="tip")

plt.show()

df = pd.read_excel("D:/Smriti/Employee_Data.xlsx")

sns.violinplot(data = df, x = "Bonus %")

plt.show()

sns.violinplot(data = df, x = "Annual Salary")

plt.show()

Pair Plot Seaborn


data = sns.load_dataset("tips")

sns.pairplot(data,hue = "day")

#diag_kind = "bar" or "plot" or "hist"

plt.show()

data = sns.load_dataset("iris")

sns.pairplot(data,hue = "species")

plt.show()

Strip Plot Seaborn


data = sns.load_dataset("tips")

sns.stripplot(data= data, x = "day", y = "total_bill",hue = "sex",dodge = True, jitter = 0.5)

#stripplot use for bar data show in dots form (x axis may be string value)

#dodge = show hue and x value separate/ jitter = gap b/w dots

plt.show()

data = sns.load_dataset("tips")

#sns.scatterplot(data= data, x = "day", y = "total_bill")

#scatterplot use for bar data show in dots form on adject data point.(both num value)

sns.scatterplot(data= data, x = "tip", y = "total_bill")


plt.show()

Box Plot Seaborn


data = sns.load_dataset("tips")

sns.boxplot(data = data, y = "tip", x = "day",orient = "vertical",fliersize = 3)

plt.show()

Cat Plot Seaborn


data = sns.load_dataset("tips")

# cat plot stand for categorical plot

#sns.catplot(data = data,x = "day",y = "tip",hue = "sex", kind = "violin")

sns.catplot(data = data,x = "day",hue = "sex", kind = "count")

# kind = "strip" or "bar" or "violin"

plt.show()

Style and color in Plots Seaborn


data = sns.load_dataset("exercise")

sns.set_style(style = "darkgrid")

# style = "darkgrid" or "whitegrid" or "ticks"

sns.barplot(data = data, x = "time", y = "pulse")

plt.show()

sns.palplot(sns.color_palette())

# sns.color_palette("viridis",3 or "spring" …….)

plt.show()

Multiple Plots Seaborn


data = sns.load_dataset("tips")

a =sns.FacetGrid(data, col= "time", height = 2, hue = "sex")

a.map(sns.barplot,"day", "tip")
plt.show()

Relational Plot Seaborn


data = sns.load_dataset("tips")

#sns.relplot(data,x = "tip", y = "total_bill",hue = "sex",kind = "line")

sns.relplot(data,x = "tip", y = "total_bill",hue = "sex",col = "day", size = "smoker")

plt.show()

Swarm Plot Seaborn


data = sns.load_dataset("tips")

#data don't overlape in swarm as compared to strip, more clear data

sns.swarmplot(data, x= "day",y ="total_bill",hue = "sex", dodge = True)

plt.show()

as compare…

data = sns.load_dataset("tips")

sns.stripplot(data, x= "day",y ="total_bill",jitter = 0.3,hue = "sex")

plt.show()

KDE Plot Seaborn


data = sns.load_dataset("tips")

sns.kdeplot(data,x= "total_bill",hue = "day")

plt.show()

sns.kdeplot(data,x= "total_bill",hue = "day",multiple = "fill")

plt.show()

#multiple = "layer" or "stack" or "fill"

sns.histplot(data,x= "total_bill")

plt.show()
Introduction to MySQL for Data Analytics

MySQL is a relational database management system (RDBMS) developed by


Oracle that is based on structured query language (SQL).

The software used to store, manage, query, and retrieve data stored in a relational database is
called a relational database management system (RDBMS).

The RDBMS provides an interface between users and applications and the database, as well as
administrative functions for managing data storage, access and performance.

MySQL support the following popular development languages and drivers:


PHP Python Java/JDBC

Node.js Perl Ruby

Go Rust C

C++ C#/.NET ODBC

Benefits of Using MySQL


Ease to use

Reliability

Scalability

High Availability

Security

Flexibility

Installation of MySQL
Download MySQL >>> 2nd File Download and install follow video

Import CSV File and Sample Databases in MySQL


Open Download File and add by follow video process
Open Demo

Create New Schemas >>> write Name(Demo) >>Apply >>>Apply>>>Finish

Import CSV file >>> Convert file into csv format >>> Schemas >>> Demo >>> click on Table >>> Table
Data Import >>> Select File >>> Next >>> Drop Table (Right) >>>Next >>> Next >>> Execute >>
>Next>>>Finish

select *from demo.>>> Select File(ESD) >>>select *from demo.esd >>>press Ctrl + Enter

Select Query in MySQL


Ctrl + Shift + Enter (# for all Execution)

Ctrl + Enter (#for only last execution)

User: root

Password: 12346789

For Everything

select * from demo.ESD;

For Particular thing:

Select EEID, Department from demo.ESD;

Where Clause in MySQL


select * from demo.esd where City = "Seattle";

select FullName, AnnualSalary from demo.esd where JobTitle = "Sr. Manger";

And, Or and Not Operators in MySQL


select JobTitle, FullName, AnnualSalary, City from demo.esd where JobTitle = "Sr. Manger" and City =
"Seattle";

select Gender, FullName, AnnualSalary, Country from demo.esd where Gender = "Female" or
Country = "China";
select * from demo.esd where not Gender = "Female";

Like Operators in MySQL


select * from demo.esd where FullName like "%mar%"; # end mid starting

select * from demo.esd where FullName like "mar%"; # starting

select * from demo.esd where FullName like "%mar"; # end

Order By Keyword in MySQL


select * from demo.esd order by Department asc;

select * from demo.esd order by Age desc;

select * from demo.esd order by Department, Age asc; #Both are arrange in ascending order

select * from demo.esd order by Department asc, Age desc;

select * from sakila.payment order by amount desc;

Limit Clause in MySQL


select * from demo.gapminder order by pop desc limit 5;

select * from demo.gapminder limit 3;

select * from demo.gapminder limit 2,3; # show 3 data after 2 data

select * from demo.esd order by Department, Age asc limit 3;

Between Operators in MySQL


select * from demo.gapminder where lifeExp between 30 and 35;

select * from demo.esd where AnnualSalary between "140000" and "150000";

In and Not Operators in MySQL


select * from demo.esd where Department in ("IT","Accounting");
select * from demo.esd where City not in ("Seattle","Miami");

select * from demo.esd where Department in ("IT","Accounting") and City in ("Seattle","Miami");

String Function in MySQL


select concat(FullName," - ", JobTitle," - ", Department) as Designation from demo.esd;

select concat_ws(" - ",FullName, JobTitle, Department) as Emp_detail from demo.esd;

select length(AnnualSalary) as DigitCount from demo.esd;

select upper(FullName) as Names from demo.esd;

select lower(FullName) as Names from demo.esd;

select left(FullName,4) as Username from demo.esd;

select right(FullName,4) as Username from demo.esd;

select mid(FullName,2,5) as Username from demo.esd;

Data Aggregation Numeric Function in MySQL


select sum(amount) as total_amount from classicmodels.payments;

select count(customerNumber) as Total_Customers from classicmodels.customers;

select avg(quantityOrdered) as Avg_Quantity from classicmodels.orderdetails;

select max(amount) as Max_Value from classicmodels.payments;

select min(amount) as Min_Value from classicmodels.payments;

select truncate(amount,0) as Amount from classicmodels.payments; # for eliminating decimal values

select ceil(amount) as Higher_amount from classicmodels.payments;

select floor(amount) as Higher_amount from classicmodels.payments;

Date Function in MySQL


select date(payment_date) as dates from sakila.payment;

select time(payment_date) as time from sakila.payment;

select day(payment_date) as day from sakila.payment;

select dayname(payment_date) as day_name from sakila.payment;

select monthname(payment_date) as Month_Name from sakila.payment;

select hour(payment_date) as hour from sakila.payment;


select minute(payment_date) as minute from sakila.payment;

select datediff(shippedDate, orderDate) as dates from classicmodels.orders;

select year(orderDate) as year from classicmodels.orders;

select yearweek(orderDate) as year from classicmodels.orders;

Case Operators in MySQL


select productName, quantityInStock,

case

when quantityInStock < 1000 then "Urgent need of more stock"

else "No requirement as of now"

end as Production_Details

from classicmodels.products;

select orderNumber, quantityOrdered,

case

when quantityOrdered <= 30 then "Low Order"

when quantityOrdered >= 40 then "High Order"

else "Average Order"

end as Order_Type

from classicmodels.orderdetails;

Group By in MySQL
select Department, count(EEID) as Total_Employes from demo.esd group by Department;

select Gender, count(EEID) as Total_Employes from demo.esd group by Gender;

select productLine, count(productCode) as Total_Products from classicmodels.products group by


productLine order by count(productCode) asc;

Having Clause in MySQL


# we are not use where clause with group by bcoz it show error

That’s why use having clause.

select Department, count(EEID) as Total_Employes from demo.esd group by Department having


count(EEID)>150;

select productLine, sum(quantityInStock) as Total_Stock from classicmodels.products group by


productLine having sum(quantityInStock)<50000;

Joins in MySQL (Inner, Left, Right, Cross)


Table 1 Table 2

Order Shipping

Custom Key Custom Key

003 Mobile 003 Shipped

420 Tablet 613 Shipped

613 Laptop

411 Tablet

Inner join find common element in both tables.

select classicmodels.products.productName, classicmodels.orderdetails.quantityOrdered from


classicmodels.products

inner join classicmodels.orderdetails

on classicmodels.products.productCode = classicmodels.orderdetails.productCode;

select classicmodels.products.productName, sum(classicmodels.orderdetails.quantityOrdered) as


Quantity_Ordered from classicmodels.products

inner join classicmodels.orderdetails

on classicmodels.products.productCode = classicmodels.orderdetails.productCode

group by classicmodels.products.productName;

Left join means all data 1st table and common data of both table.

select classicmodels.products.productName,classicmodels.orderdetails.quantityOrdered
from classicmodels.products left join classicmodels.orderdetails

on classicmodels.products.productCode = classicmodels.orderdetails.productCode;

Right join means all data 2nd table and common data of both table.

select classicmodels.products.productName,classicmodels.orderdetails.quantityOrdered

from classicmodels.products right join classicmodels.orderdetails

on classicmodels.products.productCode = classicmodels.orderdetails.productCode;

Cross join means all data 1st and 2nd table including common and uncommon data.

select * from classicmodels.products cross join classicmodels.orderdetails;

select classicmodels.products.productName,
classicmodels.products.quantityInStock,classicmodels.orderdetails.quantityOrdered from
classicmodels.products cross join classicmodels.orderdetails

on classicmodels.products.productCode = classicmodels.orderdetails.productCode;

Set Operators in MySQL


union means show 1st and 2nd table common data once and all different data.

select FirstName, Department from employeeid.employee2

union

select FirstName, Department from employeeid.employee1;

union all means show 1st and 2nd table all different data common data are show twics.

select FirstName, Department from employeeid.employee2

union all

select FirstName, Department from employeeid.employee1;

intersect means show 1st and 2nd table common data.

select FirstName, Department from employeeid.employee2

intersect

select FirstName, Department from employeeid.employee1;


#showing error bcoz of don't use intersect directly

select FirstName, Department from employeeid.employee2

where FirstName in (select FirstName from employeeid.employee1);

except means show all diff. data from 1st and 2nd table excepting common data.

select FirstName, Department from employeeid.employee2

except

select FirstName, Department from employeeid.employee1;

#showing error bcoz of don't use except directly

select FirstName, Department from employeeid.employee2

where FirstName not in (select FirstName from employeeid.employee1);

Subqueries in MySQL
select avg(creditLimit) from classicmodels.customers;

select * from classicmodels.customers where creditLimit > 67659;

Firstly execute inner query within bracket then execute outer query.

select * from classicmodels.customers where creditLimit >

(select avg(creditLimit) from classicmodels.customers);

select employee1.FirstName, employee1.Department from employeeid.employee1

where FirstName in (select employee2.FirstName from employeeid.employee2);

select employee1.FirstName, employee1.Department from employeeid.employee1

where FirstName not in (select employee2.FirstName from employeeid.employee2);

Views in MySQL
For creating of virtual table

create view count_of_customer as

select country, count(customerNumber) as Customer from classicmodels.customers

group by country;

create view france_data as

select * from customers where country = "France";

For analysis within virtual table data easily

select sum(creditLimit) from classicmodels.france_data;

Stored Procedure in MySQL


Delimiter &&

create procedure get_data()

begin

select * from classicmodels.customers ;

end &&

Delimiter ;

call classicmodels.get_data();

any you want to get

Delimiter &&

create procedure get_limit(in var int)

begin

select * from classicmodels.customers limit var ;

end &&

Delimiter ;

call classicmodels.get_limit(3)

no. of data in you want

Delimiter &&
create procedure get_credit(out var int)

begin

select max(creditLimit) into var from classicmodels.customers ;

end &&

Delimiter ;

call classicmodels.get_credit(@abc);

select @adc;

data you can extract

Delimiter &&

create procedure get_name(inout var int)

begin

select classicmodels.customerName from classicmodels.customers where customerNumber


= var;

end &&

Delimiter ;

set @m = 125;

call classicmodels.get_name(@m);

select @m;

you will data in and out both

Window Functions in MySQL


select FirstName, Occupation, EducationLevel,sum(TotalChildren)

from expense.customer

group by FirstName,Occupation,EducationLevel;

select FirstName, Occupation, EducationLevel,TotalChildren,sum(TotalChildren)

over(partition by Occupation order by EducationLevel)from expense.customer;

select*, row_number()

over(partition by product_detail.Price)from expense.product_detail;


# row no. on the basis of price

select*, rank()

over(partition by product_detail.Product_Name

order by product_detail.Kids)from expense.product_detail;

select*, dense_rank()

over(partition by product_detail.Product_Name

order by product_detail.Kids)from expense.product_detail;

select FirstName, Occupation, EducationLevel,TotalChildren, rank()

over(partition by Occupation order by EducationLevel)from expense.customer;

#ranking on the basis like we have 1-5 Bachalors then 6- 12 Graduction in that case all bachalors
have 1st rank then Graduction have 6th rank

select FirstName, Occupation, EducationLevel,TotalChildren, dense_rank()

over(partition by Occupation order by EducationLevel)from expense.customer;

#ranking on the basis like we have 1-5 Bachalors then 6- 12 Graduction in that case all bachalors
have 1st rank then Graduction have 2th rank
Introduction to MS Excel Beginner’s Guide
Ribbon have diff. tools

Formula bar

Spreadsheet grid

Status bar

Ctrl+shift +(+) >>> r >>>>Enter (for add colume up side)

Ctrl+T (for create table)

Basic Function
Auto fill  Double click and drag (For creation of series)(Sheet1 1,2,3… series)

Flash fill  Write in 2 rows flash patter in entire rows >>> Enter(Sheet 2 First Name, Last Name)

Data >>> text to column >>> Delimited >>> Next >>> Basis(Comma,etc) >>> Next >>> Finish(sheet 2
Country , Abbreviations)

Data Validation
Select cells >>> Data >>> Data Validation >>> Choose Date for dates >>> enter start and end >>> if
you will write out of range date, it show error. (sheet 1 Date 2(01-01-2023 to 31-01-2023))

Data >>> Data Validation >>> Choose Text length >>> enter min and max >>> if you will write out of
range date, it show error. (sheet 1 Sub Category 2 (2-20 letter))

Data >>> Data Validation >>> Choose Whole Number >>> enter min and max >>> if you will write out
of range date, it show error. (sheet 1 Amount2 (10 to 50000 max amount))

Data >>> Data Validation >>> Choose List >>> Make list >>> if you will write out of range date, it show
error. (sheet 1 Payment Mode2 (UPI, Cash, Card))

Select cells >>> Data >>> Data Validation >>> Error Alert >>> Write Title >>> Write Error Msg >>> if
you will write out of range date, it show this error. (sheet 1 Date 2 (Title: Invalid Date, Error Msg:
Date should be between 01-01-2023 to 31-01-2023))
Data Connectors
BY LINK

Data >>> From Web >>> Paste link From Chorme for export data from website >>> Select table which
you want and then load >>> Right click on query on table >>> go to LOAD TO >>> Select Table and
select cell where you want show >>> Then OKK. {Table 1}

BY OTHER SOURCE

Get Data >>> Other Source >>> From ODBC >>> select MySQL >>> enter user password (root
12346789) >>> okk >>> select Table which you want from data >>> Then Load {customers}

For ODBC >>> Download ODBC MySQL Driver (64 bit first link) >>> open and accept term condition
>>> choose normal 1st option >> and then okk

Then Search App (ODBC Data source 64 bit) >>> Go to Drive tap on MySQL ODBC 9.0 Unicode driver
>>> Okk >>> Go to User DSN >>Add >>> MySQL ODBC 9.0 Unicode driver >>> enter name, username,
password >>> Then Okk (After this process you while connect with ms excel)

Conditional Formatting
Firstly Select Cells >>> Home >>> Conditional Formatting >>> Highlight Cells Rules >>> A Date
Occurring >>> Select where you need highlighting or on which basis (sheet 3 {Next Month} data must
be in current date, it use for checking last month, yesterday,…etc date)

Conditional Formatting >>> Highlight Cells Rules >>> Text that Contains >>> Select where you need
highlighting or on which basis (sheet 3 {Food}, it use for checking any particular data)

For clear rules >>> Conditional Formatting >>> Clear Rules

Highlight Cells Rules >>> Greater than, Less than, Between, Equal to, A Date Occurring, Text that
Contains, Duplicate Value

Top/Bottom Rules >>> Top/Bottom 10 item, Top/Bott 10%, Above Avg, Below Avg.

Firstly Select Cells >>> Home >>> Conditional Formatting >>> Data Bars (Price)

Conditional Formatting >>> Icon Sets

Conditional Formatting >>> Color Scales

Basic Formatting
Home >>> Size Increasing, Changing Font Style, Bold (Ctrl +B), Italic (ctrl +I), Underline (Ctrl +U), Text
Color, Font Color, Diff. types of borders, Diff. types of Alignment, Merger & Center, Wrap Text, Diff.
formate of number like date, %, Currency, time, etc., Reducing or increasing points in number of Data
or any Particular part of data.

Sorting Formatting
Select table >>> Home >>> Editing >>> Sort & Filter >>> Smallest to largest or Largest to Smallest

Home >>> Editing >>> Sort & Filter >>> Custom Sort >>> Add level (Column, Sort On, Order)

Filtering Data
Home >>> Editing >>> Sort & Filter or use Ctrl + Shift + L >>> Add Filter icon >>> Text or Number or
Date Filter shows on basis of color, Greater or less than no., Start with or end with or contain
alphabet, this month or next month etc.

Dealing with Null Value


Select cells >>> F5 (Go to) >>> Special >>> Blank >>> Okk {Or}

Select cells >>> Home >>> Find & Select >>> Go to Special >>> Blank >>> Okk

Then >>> Ctrl + - >>> Delete >>> Shift cells up >>> okk (Sheet 6)

Select table >>> F5 (Go to) >>> Special >>> Blank >>> Okk >>> Ctrl + - >>> Delete >>> Entire row
(Data)

Fill downward value in upward blank cells

Select cells >>> F5 (Go to) >>> Special >>> Blank >>> Okk >>> = Cell of downward >>> Ctrl+ Enter
(Sheet 6 B-c)

Fill any value in blank cells

Select cells >>> F5 (Go to) >>> Special >>> Blank >>> Okk >>> Anything write(NY) >>> Ctrl+ Enter
(Sheet 6 C-c)

Select cells >>> F5 (Go to) >>> Special >>> Blank >>> Okk >>> Take avg of data >>> write >>> Ctrl+
Enter (Sheet 6 D-c)

Dealing with Duplicate Values


Data >>> Data Tools >>> Remove Duplicates >>> Select column which you want (Must be use with
EEID And Unique no. bcoz Name may be same of two different person) (Sheet 7 & esd)

Dealing with White Spaces


Sometimes we add space wrongly at the beginning or at the end of value.

Use = Trim (Text)

but if we convert Number into String or text Format

Fixing Column Formats


Conversion to Number Format (esd (2))
Data >>> Data Tools >>> Text to column >>> Fixed width >>> Create a break line >>> Finish

Data >>> Data Tools >>> Text to column >>> Delimited >>> Comma, Other, Anything Where
to u need separate value

Text Functions
Concatenate:
It is use for merge diff. column text in a column.
=CONCATENATE(text1,text2,text3…)

Lower:
It is use for show text in the small letters.
=LOWER(text)

Upper:
It is use for show text in the Capital letters.
=UPPER(text)

Proper:
It is use for show value in the proper letters 1st Alphabet is Big and then all in small.
=PROPER(text)

Length:
It is use for count no. of length in text.
= =LEN(text)

Left:
It is use of write left value of text.
=LEFT (text, num_char)

Right:
It is use of write right value of text.
=RIGHT (text, num_char)

Mid:
It is use of write mid value of text.
=MID (text,start_num, num_char)

Find:
Find text or any alphabet in text. It is sensitive case must be careful use of Small and big
alphabet.
=FIND (find text, within text, [start num])

Search:
Search text or any alphabet in text. It is not sensitive case use any alphabet Small or big.
=SEARCH (find text, within text, [start num])

Replace:
Replace text or any alphabet in text.
=REPLACE(text, start num, num char, new text)
Substitute:
Substitute of text or any alphabet in text. Use Instance for how many time you need to use
subsitute.
=REPLACE(text, old text, new text,[instance num])
In customer (2) sheet

If, And and Or Functions


Condition:
If given condition is right, it shows True otherwise False.
=[@Age]>50

If:
Given a condition and if it is right show text of ‘Value if True’ otherwise show text of ‘Value if
False’.
=If (logical text, [ Value if True], [ Value if False])

And:
And is use with if function for give Multiple condition. If both condition is satisfied show text
of ‘Value if True’ otherwise show text of ‘Value if False’.
=If (and (logical1, logical2….), [ Value if True], [ Value if False])

Or:
Or is use with if function for give Multiple condition. If any one condition is satisfied show
text of ‘Value if True’ otherwise show text of ‘Value if False’.
=If (or (logical1, logical2….), [ Value if True], [ Value if False])
Esd (4) sheet

Date and Time Functions


Today: =today  21-07-2024
Now: =now 21-07-2024 12:55
Day: =Day (serial number) = serial number  Now select
=day(select now or any column where available)
=day((now()))  21

Month: =Month (serial number)  7


Year: =Year (serial number)  2024
Date: =Date(year,month,day)  21-07-2024
Hour: =hour (serial number)  12
Min: =minute (serial number)  55
Secs: second (serial number)  59
Date + 3 Days: = Date+3  DATE(E4,D4,C4)+3 or F4+3  24-07-2024
Date + 3 Months: =edate(start date,months)  months = how many month you will need to
add  EDATE(F4,3)  21-10-2024
Date + 3 Year: =edate(start date,months)  EDATE(F4,12*3)  21-10-2027
Day in name format =now  custom format = DDDD  Sunday
Sheet 9

Countif, Countifs, Sumif, Sumifs Functions


Sum: It is use for sum all the array

=sum(array)

Sumif: it is use for sum with criteria

=sumif(range, criteria, [sum range])

Sumifs: it is use for sum with multiple criteria

=sumifs (sum range, criteria range1, criteria range2…)

Count: It is use for count of array

=Count(array)
Countif: it is use for count with criteria

=countif (range, criteria)

Countifs: it is use for count with multiple criteria

=countifs(criteria range1, criteria range2…)

Sheet 9

XLookup
Xlookup is use to look related data to in constant to available data in data.

=XLOOKUP (lookup value, lookup array, return array)

Ex. If we want Salary or Job Title of any emp Id. This Emp Id is lookup value, emp Id array is lookup
array and Salary or Job Title array is return array.

Hlook is also use for hortizontal Value finding hlookup don't able find upward value on the basis of
downward value.

=HLOOKUP (lookup value, Table array, Row index no., [range lookup])

We can use data validation for more effectiveness.

Create list of lookup Value

And also for return array use additional xlookup and create list for lookup array

=XLOOKUP(F27,Table20[EEID],(XLOOKUP(F28,Table20[[#Headers],[FullName]:[Annual
Salary]],Table20[[FullName]:[Annual Salary]])))

Sheet 10
Power Query
Data >>> Get Data >>> From File or else >>> Select a file >>> Transform data >>> Close Load from
power query

Power Query includes Applied Step

Sheet Employee_Sample_Data, Employee1

Cleaning and Transformation in Power Query


Query >>> Edit >>> Opened Power Query >>> Transform >>>click on heading bar on 123 >>> whole
number >>> fill down or up

(salesRepEmpNo.>>> Fill >>> Down)

Query >>> Edit >>> Opened Power Query >>> Transform >>> click on heading bar on 123 >>> whole
number >>> Replace value

(salesRepEmpNo.>>> Value find = Null, Replace with 0)

Query >>> Edit >>> Opened Power Query >>> Transform >>> click on heading bar on 123 >>> Text
>>> Replace value

(salesRepEmpNo.>>> Value find = Null, Replace with Data Unavailable)

Query >>> Edit >>> Opened Power Query >>> Transform >>> click on heading bar on 123 >>> whole
number >>> Home >>> Remove rows >>> with error

(Postal Code)

Remove Column >>> Double click on headline >>> Remove

Sheet customer (3)

Text Tools in Power Query


First remove white spaces >>> Transform >>> Text column >>> Format >>>Trim

Query >>> Edit >>> Opened Power Query >>> Add column >>> Custom column >>> Enter column
name >>> Enter formula

(Full Name

=select cus. first name & “ “ & select last name)

Power Query >>> Add column >>> From Text >>> Extract >>> First Characters >>> 3
(Transform is use for existing column and add column is use for new column)

Sheet customer (3)

Number Tools in Power Query


Product >>> buy price change in whole number >>> Remove error >>> [Statistics >>> Sum or
Rounding >>> Round down

Sheet product

Date Tools in Power Query


Power Query >>> Add Column >>> Date >>> Day >>> Name of Day

Power Query >>> Add Column >>> Custom Column >>> (Name = Day Diff., formula =Select shipped
date – order date) >>> Duration >>> Days

Sheet order

Conditional Columns in Power Query


Power Query >>> Add Column >>> Conditional Column >>> enter New Column; Select Column Name,
Operator, Value >>> Write Output >>> Write else output

For multiple condition add clause (else if)

(surprise gift in customer, quantity Type in order details)

Sheet customer (3), orderdetails

Combine Multiple files in Power Query


Data >>> Get Data >>> From File >>> From Folder >>> Select a folder (Must be both files have same
column name) >>> Transform data >>> in 1st Column icon of combine file >>> Then a query open
press ok >>> both file combine

For combine for data add excel file in folder >>> go to Data in excel >>> And refresh all.

Sheet Employee Data


Data Modelling and its Importance
Data Modelling is building connection with diff. data by using power pivot.

After importing data Diagram View show modelling lines, cardinality, flow direction.

Importance

Building relationship between one table with another.

See all tables of different sheet together in a sheet by chart.

Importing Data in Power Pivot


Developer ribbon import

File >>> Option >>> Customize Ribbon >>> Select developer >>> okk

Power Pivot import

File >>> Option >>> Add ins >>> Com add-ins >>> Microsoft Power Pivot

Build Data Modelling

Power pivot >>> Manage >>> open Power Pivot >>> From Other Sources >>> Text File(If csv) or Excel
File >>> Browse (Location) & tick in Use first row as column header >>> next show preview >>> ok
>>> Finish >>> View >>> Diagram view >>> it show all data.

Create a excel file with multiple sheet which have link same where (cs) >>> data >>> From file >>>
excel >>> import >>> open power query navigator >>> Select multiple time >>> select all sheets >>>
Transform data >>> Close and load to

Data tab >>> Data modelling >>> go to Diagram view and move tables >>> Create relations between
diff. data >>> tap on column name and black line join with another column name where you want to
build relation >>> employee no. To employee no. show 1 to 1 relation & sales rep employee to
employee no. show 1 to many relation bcoz in customer data show employee no. multiple times in
sales rep employee where 1 employee represent many customers.

May be 1 table data have relation with multiple tables. But we can build one relation only with one
column basis with another table, we can create multiple column relation with 2 same table its called
data modelling.

Cardinality and Filter Direction


Cardinality is relation between different table. We have 1 to 1 relation, 1 to many relation & Many to
many relation.
Employee no. To employee no. show 1 to 1 relation & sales rep employee to employee no. show 1 to
many relation bcoz in customer data show employee no. multiple times in sales rep employee where
1 employee represent many customers. Many to many relation is 1st column have repeated data and
and 2nd column also have repeated data, that’s why this is not successful relation.

Filter is line which direction of relation with arrow in centre.

Benefit

Customer Sheet >>> Insert >>> Pivot Table from data model >>> new worksheet >>> Employee id add
in row label and customer number in Value as Count

bnhvvn

You might also like