SKILL ORIENTED COURSE-2
On
PYTHON WITH DATA ANALYSIS
ghjgugfjhh
NAME: ROLL NO:
• Bommali Thanmayee 22A51A0513
• Bontu Tanoj Kumar 22A51A0514
• Chowdari Sai Harsha Vardhan 22A51A0515
• Devala Sameera 22A51A0516
• Lankalapalli Srikalyani 22A51A0517
• Gunda Chakri 22A51A0518
List of contents:
• What is data analysis ?
• Steps of data analysis
• Packages in python
• Regular expressions in python
• Matplotlib
• Stacking
• Filtering on datasets
What is Data Analysis ?
• Data analysis is a process of
inspecting , cleansing ,
transforming, and modeling
data with a goal of discovering
useful information , informing
conclusions and supporting
decision making.
Example:
➢ Data analysis is the process of
systematically applying statistical and /
or logical techniques to describe and
illustrate , condense and recap , and
evaluate data
➢For example , descriptive statistical
analysis could show the distribution of
sales across a group of employees and
the average sales figure per employee
Process of data analysis:
• Data Analysis is a process of cleaning , changing
, and processing raw data and extracting
actionable , relevant information that helps
business.
Steps of data Analysis:
✓Defining the question
✓Collecting the data
✓Cleaning the data
✓Analyzing the data
✓Sharing your results.
✓Embracing failure
✓summary
Stages of data analysis:
➢Data collection : what type of data you need to find the
answer
➢Cleaning the data: data cleaning convert raw data
into data that is suitable for data analysis
➢Analysis of data: you have wealth of data you have
organized as it’ll ever be let explore
steps in a standard data analysis
Data analysis steps and techniques:
1. Descriptive analysis
2. Diagnostic analysis
3. Exploratory analysis
4. Predictive analysis
5. Prescriptive analysis
Packages in python:
• A Python module may contain several classes,functions,
variables, etc. whereas a Python package can contains several
module. In simpler terms a package is folder that contains
various modules as files
• Creating Package Let’s create a package named mypckg that
will contain two modules mod1 and mod2. To create this
module follow the below steps –
• Create a folder named mypckg.
• Inside this folder create an empty Python file i.e. _init_.py
• Then create two modules mod1 and mod2 in this folder.
The hierarchy of the our package looks
like this –
mypckg
|
|
---_init_.py
|
|
---mod1.py
|
|
---mod2.py
Import Modules from a Package
➢ We can import these modules using the from…import
statement and the dot(.) operator. Syntax: import
package_name.module_name
➢ __init.py helps the Python interpreter to recognise the
folder as package. It also specifies the resources to be
imported from the modules. If the __init_.py is empty
this means that all the functions of the modules will be
imported. We can also specify the functions from each
module to be made available.
Example:
• Import Module from package
from mypckg import mod1
from mypckg import mod2
mod1.gfg()
res = mod2.sum(1, 2) print(res)
Output: Welcome to GFG
Regular Expressions:
❖ Regular Expressions (RegEx) is a special sequence of
characters that uses a search pattern to find a string or
set of strings. It can detect the presence or absence of a
text by matching it with a particular pattern, and also
can split a pattern into one or more sub-patterns. Python
provides a re module that supports the use of regex in
Python. Its primary function is to offer a search, where it
takes a regular expression and a string. Here, it either
returns the first match or else none
Meta characters:
• Tounderstand the RE analogy, MetaCharacters are
useful, important, and will be used in functions of
module re. Below is the list of metacharacters.
• \ Used to drop the special meaning of character
following it
• [] Represent a character class
• ^ Matches the beginning
• $ Matches the end
•. Matches any character except newline
Meta characters:
➢| Means OR (Matches with any of the characters
separated by it.
➢ ? Matches zero or one occurrence
➢* Any number of occurrences (including 0
occurrences)
➢+ One or more occurrences
➢ {} Indicate the number of occurrences of a preceding
regex to match.
➢ () Enclose a group of Regex
Functions:
• Function Description
• findall - Returns a list containing all matches
• Search - Returns a Match object if there is a match
anywhere in the string
• Split - Returns a list where the string has been
split at each match
• Sub - Replaces one or many matches with a string
MATPLOTLIB
• Matplotlib is an amazing visualization library in Python for
2D plots of arrays
• Matplotlib is a multi-platform data visualization library
built on NumPy arrays and designed to work with the
broader SciPy stack. It was introduced by John Hunter in
the year 2002.
• Matplotlib consists of several plots like line, bar,
scatter, histogram etc.
• Importing matplotlib library:
✓from matplotlib import pyplot as plt
✓import matplotlib.pyplot as plt
Line plot :
•# import matplotlib module
• from matplotlib import pyplot as plt
# x-axis values
x=[1,3,5,7,9]
# y-axis values
y=[1,2,4,5,8]
# function to plot
plt.plot(x,y)
# function to show the plot
plt.show()
Bar plot :
• # importing matplotlib module
• from matplotlib import pyplot as plt
•# x-axis values
•x = [10,20,30,40,50]
• # y-axis values
• y= [22,48,59,62,100]
• # Function to plot the bar
• plt.bar(x,y,color=‘r’,width=3)
• # function to show the plot
• plt.show()
Histogram :
➢ # importing matplotlib module
➢ from matplotlib import pyplot as plt
➢ # Y-axis values
➢ y = [1,3,5,7,9]
➢ # Function to plot histogram
➢ plt.hist(y)
➢ # Function to show the plot
➢ plt.show()
Scatter Plot :
• # importing matplotlib module
• from matplotlib import pyplot as plt
•# x-axis values
•x = [5, 2, 9, 4, 7,6]
• • # y-axis values
• y = [10, 5,8, 4, 2, 6.5]
• # Function to plot scatter
• plt.scatter(x, y)
• # function to show the plot
• plt.show()
Legends:
• Legends allows us to distinguish between plots. With Legends,
you can use label texts to identify or differentiate one plot from
another. For example, say we have a figure having a plot like
below:
• x_axis=[34,67,89,90]
• y_axis=[45,37,80,95]
• plt.plot(x_axis,y_axis,label="Graph")
• plt.xlabel("x-axis")
• plt.ylabel("y-axis")
• plt.title("Linear Graph")
• plt.legend(loc="upper left")
• plt.show()
Functional Approach:
Others:
➢ Matplotlib allows us easily create multi-plots on the
same figure using the .subplot() method. This .subplot()
method takes in three parameters, namely:
➢ nrows: the number of rows figure should have.
➢ ncols: the number of columns figure should have.
➢ plot_number : which refers to a specific plot in the
Figure.
Using .subplot() we will create a two plots on
the same canvas:
• image=plt.imread("jupyter.png")
plt.imshow(image)
Stacking:
• Whatis stacking in NumPy?
✓Stacking is the concept of joining arrays in NumPy. Arrays
having the same dimensions can be stacked. The stacking is
done along a new axis. Stacking leads to increased
customization of arrays.
NumPy implements the function of stacking. We can perform
stacking along three dimensions:
• vstack() – it performs vertical stacking along the rows.
• hstack()– it performs horizontal stacking along with the
columns.
• dstack()– it performs in-depth stacking along a new third axis.
vstack():
We use the vstack()function to sequentially stack arrays in a
vertical order i.e. along the rows. It is very useful for arrays with up
to 3 dimensions. The vsplit() function splits the array into a list of
multiple sub-arrays vertically.
a = np.array([1, 2, 3])
b = np.array([5,6,7])
np.vstack((a,b))
Output
array([[1, 2, 3],[5,6,7]])
hstack(): dstack():
• We use this function to • Thisfunction is for stacking
sequentially stack arrays in the arrays along depth. The
horizontal order i.e. along resultant array is along a
third new dimension.
with the columns
a = np.array([[1, 2],[3,4]]) a = np.array([[1, 2],[3,4]])
b = np.array([[5,6],[7,8]]) b = np.array([[5,6],[7,8]])
np.hstack((a,b)) np.dstack((a,b))
Output Output
array ( [ [1,2.5.6] , [3,4,7,8] ] ) array ( [ [ [1,5] , [2,6] ] , [
[3,7] , [4,8] ] ] )
Filtering on dataset:
• Filtering is an essential task when working with
datasets. It allows us to extract specific information
based on certain conditions. Numpy and Pandas are two
popular Python libraries that provide powerful tools for
filtering data.
• Numpy is a library for numerical computing in Python.
It provides efficient array operations, mathematical
functions, and linear algebra routines. Pandas is built on
top of Numpy and provides high-performance data
manipulation and analysis tools.
filtering:
• Filtering is a fundamental task in data analysis
and manipulation. Numpy and Pandas provide
powerful tools for filtering data efficiently and
effectively. By using vectorized operations and
appropriate data types, we can improve
performance and handle large datasets with
ease.
• With the increasing demand for data-driven
insights in various industries, mastering
filtering techniques using Numpy and Pandas
can be a valuable skill for any data scientist or
analyst.
THANK YOU