0% found this document useful (0 votes)
37 views57 pages

Module 4

The document provides an introduction to Python libraries, emphasizing their importance in data analysis and visualization. It covers key libraries like Pandas, NumPy, Matplotlib, and their functionalities, including data manipulation and visualization techniques. Additionally, it outlines installation steps and basic usage examples for these libraries.

Uploaded by

kiransam1709
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views57 pages

Module 4

The document provides an introduction to Python libraries, emphasizing their importance in data analysis and visualization. It covers key libraries like Pandas, NumPy, Matplotlib, and their functionalities, including data manipulation and visualization techniques. Additionally, it outlines installation steps and basic usage examples for these libraries.

Uploaded by

kiransam1709
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Introduction to Python

Libraries for Data Analysis and


Visualization
Introduction to Python Libraries

• A Python library is a collection of related modules.


• It contains bundles of code that can be used repeatedly in different programs.
• It makes Python Programming simpler and convenient for the programmer. As we
don’t need to write the same code again and again for different programs.
• Python libraries play a very vital role in fields of Machine Learning, Data Science,
Data Visualization, etc.
Introduction to Python Libraries

• The Python Standard Library contains the exact syntax, semantics, and tokens of
Python.
• It contains built-in modules that provide access to basic system functionality like I/O
and some other core modules.
• Most of the Python Libraries are written in the C programming language.
• The Python standard library consists of more than 200 core modules. All these work
together to make Python a high-level programming language.
• Python Standard Library plays a very important role. Without it, the programmers
can’t have access to the functionalities of Python.
Introduction to Python Libraries

Some of the commonly used libraries are:


1. TensorFlow
2. Matplotlib
3. Pandas
4. Numpy
5. Scikit-learn
6. Math
And many more.
Introduction to Python Libraries

Example:

Here in the above code, we imported the math library and used one of its methods i.e.
sqrt (square root) without writing the actual code to calculate the square root of a
number. That’s how a library makes the programmers’ job easier.
How to install python libraries

Step 1:
How to install python libraries

Step 2:

Press Shift + Right Click


How to install python libraries

Step 3:

In the command line, type: pip install library-name


In this case the library name is python-math
Python Libraries for Data Analysis and Visualization

Data Analysis:
• Pandas: The cornerstone of data manipulation and analysis. It provides powerful data structures like
DataFrames, enabling you to efficiently clean, transform, and analyze data.
• NumPy: The foundation for numerical computing in Python. It offers high-performance multi-
dimensional arrays and mathematical functions, crucial for handling large datasets and performing
complex calculations.
• SciPy: Built on top of NumPy, SciPy provides advanced scientific and technical computing capabilities,
including statistical analysis, optimization, linear algebra, and more.
Python Libraries for Data Analysis and Visualization

Data Visualization:
• Matplotlib: The granddaddy of Python visualization libraries. It offers a comprehensive set of plotting
functions for creating a wide variety of static, animated, and interactive visualizations.
• Seaborn: Built on top of Matplotlib, Seaborn simplifies the creation of visually appealing statistical
graphics. It provides a high-level interface for common statistical plots and integrates seamlessly with
Pandas DataFrames.
• Plotly: A powerful library for creating interactive and web-based visualizations. It supports a wide
range of chart types, including 3D plots, and allows you to easily embed visualizations in web
applications.
PANDAS and MATPLOTLIB

• Pandas is a Python library used for working with data sets.

• It has functions for analyzing, cleaning, exploring, and manipulating data.

• The name "Pandas" refers to PANEL DATA SYSTEM and was created by Wes
McKinney in 2008

• Pandas allows us to analyze big data and make conclusions based on statistical
theories.

• Pandas can clean messy data sets, and make them readable and relevant
PANDAS and MATPLOTLIB
PANDAS and MATPLOTLIB

Installation of PANDAS: pip install pandas

Checking PANDAS Version: import pandas


print(pandas.__version__)

Importing PANDAS: import pandas

Importing PANDAS as ALIAS: import pandas as pd


PANDAS: Series and Dataframes

Series:
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.
PANDAS: Series and Dataframes

Create a simple Pandas Series from a list


If nothing else is specified, the values are labeled with their index number. First value has
index 0, second value has index 1 etc.
This label can be used to access a specified value.
PANDAS: Series and Dataframes

With the index argument, you can name your own labels.
PANDAS: Series and Dataframes

Dataframe:
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a
table with rows and columns.
PANDAS: Series and Dataframes

Dataframe:
A Pandas DataFrame is a 2 dimensional
data structure, like a 2 dimensional array,
or a table with rows and columns.
PANDAS: Series and Dataframes

Loc and iloc in Dataframe:

Used to access a group of rows and


columns by labels.
PANDAS: Series and Dataframes

Loc and iloc in Dataframe:


PANDAS: Series and Dataframes

Loc and iloc in Dataframe:

Row Column
PANDAS: Importing and Exporting dataset

Reading and Writing Data to/from CSV Files


CSV (Comma-Separated Values) is a lightweight, easy-to-read format that is widely used for
storing data. Pandas provides robust functions for reading and writing CSV files.
Reading CSV Files
To load data from a CSV file into a Pandas DataFrame, use the pd.read_csv() function
PANDAS: Importing and Exporting dataset

Writing Data to CSV Files


To export a DataFrame back to a CSV file, use the pd.to_csv() function
PANDAS: Dataset Data Manipulation

Pandas data manipulation is the process of cleaning, transforming, and aggregating data using the
Pandas library. Pandas provides a variety of functions for performing these tasks, making it a
powerful and versatile tool for data analysis.

Here are some of the most common Pandas data manipulation tasks:
• Data selection: Pandas provides a variety of functions for selecting data, such as head(), tail(),
iloc(), and loc(). These functions allow you to select specific rows, columns, or subsets of data
from a DataFrame.
• Data filtering: Pandas provides a variety of functions for filtering data, such as query(),
drop(), and dropna(). These functions allow you to filter data based on specific criteria, such as
values, data types, or missing values.
PANDAS: Dataset Data Manipulation

• Data aggregation: Pandas provides a variety of functions for aggregating data, such as
mean(), median(), sum(), mode(), and count(). These functions allow you to calculate
summary statistics for groups of data. For example, you can calculate the total sales for each
product category or the average order value for each customer region.
• Data transformation: Pandas provides a variety of functions for transforming data, such as
map(), apply(), and replace(). These functions allow you to create new columns, modify
existing columns, and perform other transformations on data.
• Data Sorting: Pandas can be used to sort data by any column or index. For example, you can
sort a DataFrame by the customer's name or by the order date.
• Data Grouping: Pandas can be used to group data by any column or index. For example, you
can group a DataFrame by product category or by customer region.
PANDAS: Dataset Data Manipulation

• Merging: Pandas can be used to merge two or more DataFrames together. For example, you
can merge a DataFrame of customer data with a DataFrame of order data to create a single
DataFrame that contains all of the information for each customer.
• Joining: Pandas can be used to join two or more DataFrames together based on a common
column. For example, you can join a DataFrame of customer data with a DataFrame of
product data to create a single DataFrame that contains all of the information for each
customer and the products they have ordered.
PANDAS: Dataset Data Manipulation - Extras

• Use the head() and tail() functions to preview the data before you start manipulating it. This will help you to
identify any errors or inconsistencies in the data.
• Use the info() function to get information about the DataFrame, such as the data types of the columns and the
number of rows and columns in the DataFrame. This information can be helpful when choosing the
appropriate functions to use for data manipulation.
• Use the describe() function to calculate summary statistics for the data. This can help you to understand the
distribution of the data and to identify any outliers.
• Use the groupby() function to group the data by one or more columns. This can be useful for performing
aggregate operations on the data, such as calculating summary statistics or finding the most common values
in a column.
• Use the apply() function to apply a function to each row or column of the DataFrame. This can be useful for
performing transformations on the data, such as creating new columns or modifying existing columns.
MATPLOTLIB: Use of Data Visualization

Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.

Installation: pip install matplotlib

Import: import matplotlib


MATPLOTLIB: Use of Data Visualization

Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:
import matplotlib.pyplot as plt
MATPLOTLIB

Markers: You can use the keyword argument marker to emphasize each point with a specified marker:
MATPLOTLIB

Markers: You can use the keyword argument marker to emphasize each point with a specified marker:
MATPLOTLIB

Format Strings: marker | line | color


MATPLOTLIB

Format Strings: marker | line | color


MATPLOTLIB

Format Strings: marker | line | color


MATPLOTLIB

Marker Size: You can use the keyword argument markersize or the shorter version, ms to set the
size of the markers:
MATPLOTLIB

Marker Color: You can use the keyword argument markeredgecolor or the shorter mec to set the
color of the edge of the markers:
MATPLOTLIB

Marker Color: You can use the keyword argument markeredgecolor or the shorter mec to set the
color of the edge of the markers. Use both the mec and mfc arguments to color the entire marker.
MATPLOTLIB

Linestyle: You can use the keyword argument linestyle, or shorter ls, to change the style of the
plotted line:
MATPLOTLIB
MATPLOTLIB

Create Labels for a Plot: With Pyplot, you can use the xlabel() and ylabel() functions to set a
label for the x- and y-axis.

Create a Title for a Plot: With Pyplot, you can use the title() function to set a title for the plot.

Set Font Properties for Title and Labels: You can use the fontdict parameter in xlabel(), ylabel(),
and title() to set font properties for the title and labels.

Position the Title: You can use the loc parameter in title() to position the title. Legal values are:
'left', 'right', and 'center'. Default value is 'center'.
MATPLOTLIB

Display Multiple Plots: With the subplot() function you can draw multiple plots in one figure:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization

Creating Scatter Plots: With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization

Providing Colors in Scatter Plots:


MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Size: You can change the size of the dots with the s argument. Just like colors, make sure the array
for sizes has the same length as the arrays for the x- and y-axis:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization

Alpha: You can adjust the transparency of the dots with the alpha argument. Just like colors, make
sure the array for sizes has the same length as the arrays for the x- and y-axis:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization

Creating Bars: With Pyplot, you can use the bar() function to draw bar graphs:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Horizontal Bars: If you want the bars to be displayed horizontally instead of vertically, use
the barh() function:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Bar Width: The bar() takes the keyword argument width to set the width of the bars. The default
width value is 0.8
The barh() takes the keyword argument height to set the height of the bars
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Create Histogram: In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
MATPLOTLIB: Basic Plots and Customizing for
effective visualization

Creating Pie Charts: With Pyplot, you can use the pie() function to draw pie charts:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization

Explode: Maybe you want one of the wedges to stand out? The explode parameter allows you to do
that. The explode parameter, if specified, and not None, must be an array with one value for each
wedge. Each value represents how far from the center each wedge is displayed.
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Legend: To add a list of explanation for each wedge, use the legend() function:
Thank You!

You might also like