Introduction to Python
Libraries for Data Analysis and
Visualization
Introduction to Python Libraries
• A Python library is a collection of related modules.
• It contains bundles of code that can be used repeatedly in different programs.
• It makes Python Programming simpler and convenient for the programmer. As we
don’t need to write the same code again and again for different programs.
• Python libraries play a very vital role in fields of Machine Learning, Data Science,
Data Visualization, etc.
Introduction to Python Libraries
• The Python Standard Library contains the exact syntax, semantics, and tokens of
Python.
• It contains built-in modules that provide access to basic system functionality like I/O
and some other core modules.
• Most of the Python Libraries are written in the C programming language.
• The Python standard library consists of more than 200 core modules. All these work
together to make Python a high-level programming language.
• Python Standard Library plays a very important role. Without it, the programmers
can’t have access to the functionalities of Python.
Introduction to Python Libraries
Some of the commonly used libraries are:
1. TensorFlow
2. Matplotlib
3. Pandas
4. Numpy
5. Scikit-learn
6. Math
And many more.
Introduction to Python Libraries
Example:
Here in the above code, we imported the math library and used one of its methods i.e.
sqrt (square root) without writing the actual code to calculate the square root of a
number. That’s how a library makes the programmers’ job easier.
How to install python libraries
Step 1:
How to install python libraries
Step 2:
Press Shift + Right Click
How to install python libraries
Step 3:
In the command line, type: pip install library-name
In this case the library name is python-math
Python Libraries for Data Analysis and Visualization
Data Analysis:
• Pandas: The cornerstone of data manipulation and analysis. It provides powerful data structures like
DataFrames, enabling you to efficiently clean, transform, and analyze data.
• NumPy: The foundation for numerical computing in Python. It offers high-performance multi-
dimensional arrays and mathematical functions, crucial for handling large datasets and performing
complex calculations.
• SciPy: Built on top of NumPy, SciPy provides advanced scientific and technical computing capabilities,
including statistical analysis, optimization, linear algebra, and more.
Python Libraries for Data Analysis and Visualization
Data Visualization:
• Matplotlib: The granddaddy of Python visualization libraries. It offers a comprehensive set of plotting
functions for creating a wide variety of static, animated, and interactive visualizations.
• Seaborn: Built on top of Matplotlib, Seaborn simplifies the creation of visually appealing statistical
graphics. It provides a high-level interface for common statistical plots and integrates seamlessly with
Pandas DataFrames.
• Plotly: A powerful library for creating interactive and web-based visualizations. It supports a wide
range of chart types, including 3D plots, and allows you to easily embed visualizations in web
applications.
PANDAS and MATPLOTLIB
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and manipulating data.
• The name "Pandas" refers to PANEL DATA SYSTEM and was created by Wes
McKinney in 2008
• Pandas allows us to analyze big data and make conclusions based on statistical
theories.
• Pandas can clean messy data sets, and make them readable and relevant
PANDAS and MATPLOTLIB
PANDAS and MATPLOTLIB
Installation of PANDAS: pip install pandas
Checking PANDAS Version: import pandas
print(pandas.__version__)
Importing PANDAS: import pandas
Importing PANDAS as ALIAS: import pandas as pd
PANDAS: Series and Dataframes
Series:
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.
PANDAS: Series and Dataframes
Create a simple Pandas Series from a list
If nothing else is specified, the values are labeled with their index number. First value has
index 0, second value has index 1 etc.
This label can be used to access a specified value.
PANDAS: Series and Dataframes
With the index argument, you can name your own labels.
PANDAS: Series and Dataframes
Dataframe:
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a
table with rows and columns.
PANDAS: Series and Dataframes
Dataframe:
A Pandas DataFrame is a 2 dimensional
data structure, like a 2 dimensional array,
or a table with rows and columns.
PANDAS: Series and Dataframes
Loc and iloc in Dataframe:
Used to access a group of rows and
columns by labels.
PANDAS: Series and Dataframes
Loc and iloc in Dataframe:
PANDAS: Series and Dataframes
Loc and iloc in Dataframe:
Row Column
PANDAS: Importing and Exporting dataset
Reading and Writing Data to/from CSV Files
CSV (Comma-Separated Values) is a lightweight, easy-to-read format that is widely used for
storing data. Pandas provides robust functions for reading and writing CSV files.
Reading CSV Files
To load data from a CSV file into a Pandas DataFrame, use the pd.read_csv() function
PANDAS: Importing and Exporting dataset
Writing Data to CSV Files
To export a DataFrame back to a CSV file, use the pd.to_csv() function
PANDAS: Dataset Data Manipulation
Pandas data manipulation is the process of cleaning, transforming, and aggregating data using the
Pandas library. Pandas provides a variety of functions for performing these tasks, making it a
powerful and versatile tool for data analysis.
Here are some of the most common Pandas data manipulation tasks:
• Data selection: Pandas provides a variety of functions for selecting data, such as head(), tail(),
iloc(), and loc(). These functions allow you to select specific rows, columns, or subsets of data
from a DataFrame.
• Data filtering: Pandas provides a variety of functions for filtering data, such as query(),
drop(), and dropna(). These functions allow you to filter data based on specific criteria, such as
values, data types, or missing values.
PANDAS: Dataset Data Manipulation
• Data aggregation: Pandas provides a variety of functions for aggregating data, such as
mean(), median(), sum(), mode(), and count(). These functions allow you to calculate
summary statistics for groups of data. For example, you can calculate the total sales for each
product category or the average order value for each customer region.
• Data transformation: Pandas provides a variety of functions for transforming data, such as
map(), apply(), and replace(). These functions allow you to create new columns, modify
existing columns, and perform other transformations on data.
• Data Sorting: Pandas can be used to sort data by any column or index. For example, you can
sort a DataFrame by the customer's name or by the order date.
• Data Grouping: Pandas can be used to group data by any column or index. For example, you
can group a DataFrame by product category or by customer region.
PANDAS: Dataset Data Manipulation
• Merging: Pandas can be used to merge two or more DataFrames together. For example, you
can merge a DataFrame of customer data with a DataFrame of order data to create a single
DataFrame that contains all of the information for each customer.
• Joining: Pandas can be used to join two or more DataFrames together based on a common
column. For example, you can join a DataFrame of customer data with a DataFrame of
product data to create a single DataFrame that contains all of the information for each
customer and the products they have ordered.
PANDAS: Dataset Data Manipulation - Extras
• Use the head() and tail() functions to preview the data before you start manipulating it. This will help you to
identify any errors or inconsistencies in the data.
• Use the info() function to get information about the DataFrame, such as the data types of the columns and the
number of rows and columns in the DataFrame. This information can be helpful when choosing the
appropriate functions to use for data manipulation.
• Use the describe() function to calculate summary statistics for the data. This can help you to understand the
distribution of the data and to identify any outliers.
• Use the groupby() function to group the data by one or more columns. This can be useful for performing
aggregate operations on the data, such as calculating summary statistics or finding the most common values
in a column.
• Use the apply() function to apply a function to each row or column of the DataFrame. This can be useful for
performing transformations on the data, such as creating new columns or modifying existing columns.
MATPLOTLIB: Use of Data Visualization
Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.
Installation: pip install matplotlib
Import: import matplotlib
MATPLOTLIB: Use of Data Visualization
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:
import matplotlib.pyplot as plt
MATPLOTLIB
Markers: You can use the keyword argument marker to emphasize each point with a specified marker:
MATPLOTLIB
Markers: You can use the keyword argument marker to emphasize each point with a specified marker:
MATPLOTLIB
Format Strings: marker | line | color
MATPLOTLIB
Format Strings: marker | line | color
MATPLOTLIB
Format Strings: marker | line | color
MATPLOTLIB
Marker Size: You can use the keyword argument markersize or the shorter version, ms to set the
size of the markers:
MATPLOTLIB
Marker Color: You can use the keyword argument markeredgecolor or the shorter mec to set the
color of the edge of the markers:
MATPLOTLIB
Marker Color: You can use the keyword argument markeredgecolor or the shorter mec to set the
color of the edge of the markers. Use both the mec and mfc arguments to color the entire marker.
MATPLOTLIB
Linestyle: You can use the keyword argument linestyle, or shorter ls, to change the style of the
plotted line:
MATPLOTLIB
MATPLOTLIB
Create Labels for a Plot: With Pyplot, you can use the xlabel() and ylabel() functions to set a
label for the x- and y-axis.
Create a Title for a Plot: With Pyplot, you can use the title() function to set a title for the plot.
Set Font Properties for Title and Labels: You can use the fontdict parameter in xlabel(), ylabel(),
and title() to set font properties for the title and labels.
Position the Title: You can use the loc parameter in title() to position the title. Legal values are:
'left', 'right', and 'center'. Default value is 'center'.
MATPLOTLIB
Display Multiple Plots: With the subplot() function you can draw multiple plots in one figure:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Creating Scatter Plots: With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Providing Colors in Scatter Plots:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Size: You can change the size of the dots with the s argument. Just like colors, make sure the array
for sizes has the same length as the arrays for the x- and y-axis:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Alpha: You can adjust the transparency of the dots with the alpha argument. Just like colors, make
sure the array for sizes has the same length as the arrays for the x- and y-axis:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Creating Bars: With Pyplot, you can use the bar() function to draw bar graphs:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Horizontal Bars: If you want the bars to be displayed horizontally instead of vertically, use
the barh() function:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Bar Width: The bar() takes the keyword argument width to set the width of the bars. The default
width value is 0.8
The barh() takes the keyword argument height to set the height of the bars
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Create Histogram: In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Creating Pie Charts: With Pyplot, you can use the pie() function to draw pie charts:
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Explode: Maybe you want one of the wedges to stand out? The explode parameter allows you to do
that. The explode parameter, if specified, and not None, must be an array with one value for each
wedge. Each value represents how far from the center each wedge is displayed.
MATPLOTLIB: Basic Plots and Customizing for
effective visualization
Legend: To add a list of explanation for each wedge, use the legend() function:
Thank You!