0% found this document useful (0 votes)
6 views187 pages

Data Analyatics Notes

Data Analyatics Notes

Uploaded by

nbm4trade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views187 pages

Data Analyatics Notes

Data Analyatics Notes

Uploaded by

nbm4trade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

What is Data Analysis?

Data analysis is an essential aspect of modern decision-making


processes across various sectors, including business, healthcare, finance,
and academia. As organizations generate massive amounts of data daily,
understanding how to extract meaningful insights from this data becomes
crucial. In this article, we will explore the fundamental concepts of
data analysis, its types, significance, methods, and the tools used
for effective analysis. We will also address common queries related
to data analysis, providing clarity on its definition and applications
in various fields.

What Do You Mean by Data Analysis?


In today’s data-driven world, organizations rely on data analysis to
uncover patterns, trends, and relationships within their data. Whether it’s
for optimizing operations, improving customer satisfaction, or forecasting
future trends, effective data analysis helps stakeholders make informed
decisions. The term data analysis refers to the systematic application of
statistical and logical techniques to describe, summarize, and evaluate
data. This process can involve transforming raw data into a more
understandable format, identifying significant patterns, and drawing
conclusions based on the findings.
When we ask, “What do you mean by data analysis?” it essentially
refers to the practice of examining datasets to draw conclusions about the
information they contain. The process can be broken down into several
steps, including:
1. Data Collection: Gathering relevant data from various sources, which
could be databases, surveys, sensors, or web scraping.
2. Data Cleaning: Identifying and correcting inaccuracies or
inconsistencies in the data to ensure its quality and reliability.
3. Data Transformation: Modifying data into a suitable format for
analysis, which may involve normalization, aggregation, or creating
new variables.
4. Data Analysis: Applying statistical methods and algorithms to explore
the data, identify trends, and extract meaningful insights.
5. Data Interpretation: Translating the findings into actionable
recommendations or conclusions that inform decision-making.
By employing these steps, organizations can transform raw data into a
valuable asset that guides strategic planning and enhances operational
efficiency.
To solidify our understanding, let’s define data analysis with an
example. Imagine a retail company looking to improve its sales
performance. The company collects data on customer purchases,
demographics, and seasonal trends.
By conducting a data analysis, the company may discover that:
 Customers aged 18-25 are more likely to purchase specific products
during holiday seasons.
 There is a significant increase in sales when promotional discounts are
offered.
Based on these insights, the company can tailor its marketing strategies
to target younger customers with specific promotions during peak
seasons, ultimately leading to increased sales and customer satisfaction.
Data Analysis Definition
To further clarify the concept, let’s define data analysis in a more
structured manner. Data analysis can be defined as:
“The process of inspecting, cleaning, transforming, and modeling data to
discover useful information, draw conclusions, and support decision-
making.”
This definition emphasizes the systematic approach taken in analyzing
data, highlighting the importance of not only obtaining insights but also
ensuring the integrity and quality of the data used.
Data Analysis in Data Science
The field of data science relies heavily on data analysis to derive
insights from large datasets. Data analysis in data science refers to the
methods and processes used to manipulate data, identify trends, and
generate predictive models that aid in decision-making.
Data scientists employ various analytical techniques, such as:
 Statistical Analysis: Applying statistical tests to validate hypotheses
or understand relationships between variables.
 Machine Learning: Using algorithms to enable systems to learn from
data patterns and make predictions.
 Data Visualization: Creating graphical representations of data to
facilitate understanding and communication of insights.
These techniques play a vital role in enabling organizations to leverage
their data effectively, ensuring they remain competitive and responsive to
market changes.
Data Analysis in DBMS
Another area where data analysis plays a crucial role is within Database
Management Systems (DBMS). Data analysis in DBMS involves
querying and manipulating data stored in databases to extract meaningful
insights. Analysts utilize SQL (Structured Query Language) to perform
operations such as:
 Data Retrieval: Extracting specific data from large datasets using
queries.
 Aggregation: Summarizing data to provide insights at a higher level.
 Filtering: Narrowing down data to focus on specific criteria.
Understanding how to perform effective data analysis in DBMS is
essential for professionals who work with databases regularly, as it allows
them to derive insights that can influence business strategies.
Why Data Analysis is important?
Data analysis is crucial for informed decision-making, revealing patterns,
trends, and insights within datasets. It enhances strategic planning,
identifies opportunities and challenges, improves efficiency, and fosters a
deeper understanding of complex phenomena across various industries
and fields.
1. Informed Decision-Making: Analysis of data provides a basis for
informed decision-making by offering insights into past performance,
current trends, and potential future outcomes.
2. Business Intelligence: Analyzed data helps organizations gain a
competitive edge by identifying market trends, customer preferences,
and areas for improvement.
3. Problem Solving: It aids in identifying and solving problems within a
system or process by revealing patterns or anomalies that require
attention.
4. Performance Evaluation: Analysis of data enables the assessment of
performance metrics, allowing organizations to measure success,
identify areas for improvement, and set realistic goals.
5. Risk Management: Understanding patterns in data helps in predicting
and managing risks, allowing organizations to mitigate potential
challenges.
6. Optimizing Processes: Data analysis identifies inefficiencies in
processes, allowing for optimization and cost reduction.
The Process of Data Analysis
A Data analysis has the ability to transform raw available data into
meaningful insights for your business and your decision-making. While
there are several different ways of collecting and interpreting this data,
most data-analysis processes follow the same six general steps.
1. Define Objectives and Questions: Clearly define the goals of the
analysis and the specific questions you aim to answer. Establish a
clear understanding of what insights or decisions the analyzed data
should inform.
2. Data Collection: Gather relevant data from various sources. Ensure
data integrity, quality, and completeness. Organize the data in a format
suitable for analysis. There are two types of data: qualititative and
quantitative data.
3. Data Cleaning and Preprocessing: Address missing values, handle
outliers, and transform the data into a usable format. Cleaning and
preprocessing steps are crucial for ensuring the accuracy and reliability
of the analysis.
4. Exploratory Data Analysis (EDA) : Conduct exploratory analysis to
understand the characteristics of the data. Visualize distributions,
identify patterns, and calculate summary statistics. EDA helps in
formulating hypotheses and refining the analysis approach.
5. Statistical Analysis or Modeling: Apply appropriate statistical
methods or modeling techniques to answer the defined questions. This
step involves testing hypotheses, building predictive models, or
performing any analysis required to derive meaningful insights from the
data.
6. Interpretation and Communication: Interpret the results in the
context of the original objectives. Communicate findings through
reports, visualizations, or presentations. Clearly articulate insights,
conclusions, and recommendations based on the analysis to support
informed decision-making.
Analyzing Data: Techniques and Methods
When discussing analyzing data, several methods can be employed
depending on the nature of the data and the questions being addressed.
These methods can be broadly categorized into three types:
There are various data analysis methods, each tailored to specific goals
and types of data. The major Data Analysis methods are:
1. Descriptive Analysis
A Descriptive Analysis is foundational as it provides the necessary
insights into past performance. Understanding what has happened is
crucial for making informed decisions in data analysis. For instance, data
analysis in data science often begins with descriptive techniques to
summarize and visualize data trends.
2. Diagnostic Analysis
Diagnostic analysis works hand in hand with Descriptive Analysis. As
descriptive Analysis finds out what happened in the past, diagnostic
Analysis, on the other hand, finds out why did that happen or what
measures were taken at that time, or how frequently it has happened. By
analyzing data thoroughly, businesses can address the question, “what
do you mean by data analysis?” They can assess what factors
contributed to specific outcomes, providing a clearer picture of their
operational efficiency and effectiveness.
3. Predictive Analysis
By forecasting future trends based on historical data, Predictive
analysis predictive analysis enables organizations to prepare for
upcoming opportunities and challenges. This analysis type answers the
inquiry of what is data science analysis by leveraging data trends to
predict future behaviors and trends. This capability is vital for strategic
planning and risk management in business operations.
4. Prescriptive Analysis
Prescriptive Analysis is an advanced method that takes Predictive
Analysis insights and offers actionable recommendations, guiding
decision-makers toward the best course of action. It extends beyond
merely analyzing data to suggesting optimal solutions based on potential
future scenarios, thus addressing the need for a structured approach to
decision-making.
5. Statistical Analysis
Statistical Analysis is essential for summarizing data, helping in
identifying key characteristics and understanding relationships within
datasets. This analysis can reveal significant patterns that inform broader
strategies and policies, thereby allowing analysts to provide a
robust review of data analytics practices within an organization.
6. Regression Analysis
Regression analysis is a statistical method extensively used in data
analysis to model the relationship between a dependent variable and one
or more independent variables. This method is particularly useful in
establishing the relationship between variables, making it vital for
forecasting and strategic planning, as analysts often define data analysis
with examples that utilize regression techniques to illustrate these
concepts.
7. Cohort Analysis
By examining specific groups over time, cohort analysis aids in
understanding customer behavior and improving retention strategies. This
approach allows businesses to tailor their services to different segments,
thereby effectively utilizing data storage and analysis in big data to
enhance customer engagement and satisfaction.
8. Time Series Analysis
Time series analysis is crucial for any domain where data points are
collected over time, allowing for trend identification and forecasting.
Businesses can utilize this method to analyze seasonal trends and predict
future sales, addressing the question of what do you understand by
data analysis in the context of temporal data.
9. Factor Analysis
Factor analysis is a statistical method that explores underlying
relationships among a set of observed variables. It identifies latent factors
that contribute to observed patterns, simplifying complex data structures.
This technique is invaluable in reducing dimensionality, revealing hidden
patterns, and aiding in the interpretation of large datasets.
10. Text Analysis
Text analysis involves extracting valuable information from unstructured
textual data. Utilizing natural language processing and machine learning
techniques, it enables the extraction of sentiments, key themes, and
patterns within large volumes of text. analyze customer feedback, social
media sentiment, and more, showcasing the practical applications
of analyzing data in real-world scenarios.
Tools for Data Analysis
Several tools are available to facilitate effective data analysis. These
tools can range from simple spreadsheet applications to complex
statistical software. Some popular tools include:
 SAS :SAS was a programming language developed by the SAS
Institute for performed advanced analytics, multivariate analyses,
business intelligence, data management, and predictive analytics. ,
SAS was developed for very specific uses and powerful tools are not
added every day to the extensive already existing collection thus
making it less scalable for certain applications.
 Microsoft Excel :It is an important spreadsheet application that can be
useful for recording expenses, charting data, and performing easy
manipulation and lookup and or generating pivot tables to provide the
desired summarized reports of large datasets that contain significant
data findings. It is written in C#, C++, and .NET Framework, and its
stable version was released in 2016.
 R :It is one of the leading programming languages for performing
complex statistical computations and graphics. It is a free and open-
source language that can be run on various UNIX platforms,
Windows, and macOS. It also has a command-line interface that is
easy to use. However, it is tough to learn especially for people who do
not have prior knowledge about programming.
 Python: It is a powerful high-level programming language that is used
for general-purpose programming. Python supports both structured and
functional programming methods. Its extensive collection of libraries
make it very useful in data analysis. Knowledge of Tensorflow, Theano,
Keras, Matplotlib, Scikit-learn, and Keras can get you a lot closer to
your dream of becoming a machine learning engineer.
 Tableau Public: Tableau Public is free software developed by the
public company “Tableau Software” that allows users to connect to any
spreadsheet or file and create interactive data visualizations. It can
also be used to create maps, dashboards along with real-time updation
for easy presentation on the web. The results can be shared through
social media sites or directly with the client making it very convenient to
use.
 Knime :Knime, the Konstanz Information Miner is a free and open-
source data analytics software. It is also used as a reporting and
integration platform. It involves the integration of various components
for Machine Learning and data mining through the modular data-pipe
lining. It is written in Java and developed by KNIME.com AG. It can be
operated in various operating systems such as Linux, OS X, and
Windows.
 Power BI: A business analytics service that provides interactive
visualizations and business intelligence capabilities with a simple
interface.
What is the definition of data analysis in data science?
The define data analysis in data science refers to the methodology of
collecting, processing, and analyzing data to generate insights and
support data-driven decisions within the field of data science.
What is Data Analysis Examples?
To define data analysis with an example, consider a retail company
analyzing sales data to identify trends in customer purchasing behavior.
This can involve descriptive analysis to summarize past sales and
predictive analysis to forecast future trends based on historical data.
How to do data analysis in Excel?
Import data into Excel, use functions for summarizing and visualizing
data. Utilize PivotTables, charts, and Excel’s built-in analysis tools for
insights and trends.
How does data storage and analysis work in big data?
Data storage and analysis in big data involves utilizing technologies
that manage and analyze vast amounts of structured and unstructured
data. This enables organizations to derive meaningful insights from large
datasets, driving strategic decision-making.
What is computer data analysis?
Computer data analysis refers to the use of computer software and
algorithms to perform data analysis. This method streamlines the process,
allowing for efficient handling of large datasets and complex analyses.
Where can I find a review of data analytics?
A review of data analytics can be found on various platforms, including
academic journals, industry reports, and websites like Geeks for Geeks
that provide comprehensive insights into data analytics practices and
technologies.
What are the benefits of data analysis?
The benefits of data analysis include improved decision-making,
enhanced operational efficiency, better customer insights, and the ability
to identify market trends. Organizations that leverage data analysis gain a
competitive advantage by making informed choices.
Data Analytics and its type
Data analytics is an important field that involves the process of
collecting, processing, and interpreting data to uncover insights and help
in making decisions. Data analytics is the practice of examining raw data
to identify trends, draw conclusions, and extract meaningful information.
This involves various techniques and tools to process and transform data
into valuable insights that can be used for decision-making.
In this article, we will learn about Data analytics, data which will help
businesses and individuals that can help them to enhance and solve
complex problems, Types of Data Analytics, Techniques , Tools ,
and the Importance of Data Analytics .

What is Data Analytics?


In this new digital world, data is being generated in an enormous amount
which opens new paradigms. As we have high computing power and a
large amount of data we can use this data to help us make data-driven
decision making. The main benefits of data-driven decisions are that they
are made up by observing past trends which have resulted in beneficial
results.
In short, we can say that data analytics is the process of manipulating
data to extract useful trends and hidden patterns that can help us derive
valuable insights to make business predictions.
Understanding Data Analytics
Data analytics encompasses a wide array of techniques for analyzing data
to gain valuable insights that can enhance various aspects of operations.
By scrutinizing information, businesses can uncover patterns and metrics
that might otherwise go unnoticed, enabling them to optimize processes
and improve overall efficiency.
For instance, in manufacturing, companies collect data on machine
runtime, downtime, and work queues to analyze and improve workload
planning, ensuring machines operate at optimal levels.
Beyond production optimization, data analytics is utilized in diverse
sectors. Gaming firms utilize it to design reward systems that engage
players effectively, while content providers leverage analytics to optimize
content placement and presentation, ultimately driving user engagement.
Types of Data Analytics
There are four major types of data analytics:
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
Data Analytics and its Types

Predictive Analytics
Predictive analytics turn the data into valuable, actionable information.
predictive analytics uses data to determine the probable outcome of an
event or a likelihood of a situation occurring. Predictive analytics holds a
variety of statistical techniques from modeling, machine learning , data
mining , and game theory that analyze current and historical facts to make
predictions about a future event. Techniques that are used for predictive
analytics are:
 Linear Regression
 Time Series Analysis and Forecasting
 Data Mining
Basic Cornerstones of Predictive Analytics
 Predictive modeling
 Decision Analysis and optimization
 Transaction profiling
Descriptive Analytics
Descriptive analytics looks at data and analyze past event for insight as to
how to approach future events. It looks at past performance and
understands the performance by mining historical data to understand the
cause of success or failure in the past. Almost all management reporting
such as sales, marketing, operations, and finance uses this type of
analysis.
The descriptive model quantifies relationships in data in a way that is
often used to classify customers or prospects into groups. Unlike a
predictive model that focuses on predicting the behavior of a single
customer, Descriptive analytics identifies many different relationships
between customer and product.
Common examples of Descriptive analytics are company reports that
provide historic reviews like:
 Data Queries
 Reports
 Descriptive Statistics
 Data dashboard
Prescriptive Analytics
Prescriptive Analytics automatically synthesize big data, mathematical
science, business rule, and machine learning to make a prediction and
then suggests a decision option to take advantage of the prediction.
Prescriptive analytics goes beyond predicting future outcomes by also
suggesting action benefits from the predictions and showing the decision
maker the implication of each decision option. Prescriptive Analytics not
only anticipates what will happen and when to happen but also why it will
happen. Further, Prescriptive Analytics can suggest decision options on
how to take advantage of a future opportunity or mitigate a future risk and
illustrate the implication of each decision option.
For example, Prescriptive Analytics can benefit healthcare strategic
planning by using analytics to leverage operational and usage data
combined with data of external factors such as economic data, population
demography, etc.
Diagnostic Analytics
In this analysis, we generally use historical data over other data to answer
any question or for the solution of any problem. We try to find any
dependency and pattern in the historical data of the particular problem.
For example, companies go for this analysis because it gives a great
insight into a problem, and they also keep detailed information about their
disposal otherwise data collection may turn out individual for every
problem and it will be very time-consuming. Common techniques used for
Diagnostic Analytics are:
 Data discovery
 Data mining
 Correlations
The Role of Data Analytics
Data analytics plays a pivotal role in enhancing operations, efficiency, and
performance across various industries by uncovering valuable patterns
and insights. Implementing data analytics techniques can provide
companies with a competitive advantage. The process typically involves
four fundamental steps:
 Data Mining : This step involves gathering data and information from
diverse sources and transforming them into a standardized format for
subsequent analysis. Data mining can be a time-intensive process
compared to other steps but is crucial for obtaining a comprehensive
dataset.
 Data Management : Once collected, data needs to be stored,
managed, and made accessible. Creating a database is essential for
managing the vast amounts of information collected during the mining
process. SQL (Structured Query Language) remains a widely used tool
for database management, facilitating efficient querying and analysis of
relational databases.
 Statistical Analysis : In this step, the gathered data is subjected to
statistical analysis to identify trends and patterns. Statistical modeling
is used to interpret the data and make predictions about future trends.
Open-source programming languages like Python, as well as
specialized tools like R, are commonly used for statistical analysis and
graphical modeling.
 Data Presentation : The insights derived from data analytics need to
be effectively communicated to stakeholders. This final step involves
formatting the results in a manner that is accessible and
understandable to various stakeholders, including decision-makers,
analysts, and shareholders. Clear and concise data presentation is
essential for driving informed decision-making and driving business
growth.
Steps in Data Analysis
 Define Data Requirements : This involves determining how the data
will be grouped or categorized. Data can be segmented based on
various factors such as age, demographic, income, or gender, and can
consist of numerical values or categorical data.
 Data Collection : Data is gathered from different sources, including
computers, online platforms, cameras, environmental sensors, or
through human personnel.
 Data Organization : Once collected, the data needs to be organized in
a structured format to facilitate analysis. This could involve using
spreadsheets or specialized software designed for managing and
analyzing statistical data.
 Data Cleaning : Before analysis, the data undergoes a cleaning
process to ensure accuracy and reliability. This involves identifying and
removing any duplicate or erroneous entries, as well as addressing any
missing or incomplete data. Cleaning the data helps to mitigate
potential biases and errors that could affect the analysis results.
Usage of Data Analytics
There are some key domains and strategic planning techniques in which
Data Analytics has played a vital role:
 Improved Decision-Making – If we have supporting data in favour of
a decision, then we can implement them with even more success
probability. For example, if a certain decision or plan has to lead to
better outcomes then there will be no doubt in implementing them
again.
 Better Customer Service – Churn modeling is the best example of
this in which we try to predict or identify what leads to customer churn
and change those things accordingly so, that the attrition of the
customers is as low as possible which is a most important factor in any
organization.
 Efficient Operations – Data Analytics can help us understand what is
the demand of the situation and what should be done to get better
results then we will be able to streamline our processes which in turn
will lead to efficient operations.
 Effective Marketing – Market segmentation techniques have been
implemented to target this important factor only in which we are
supposed to find the marketing techniques which will help us increase
our sales and leads to effective marketing strategies.
Future Scope of Data Analytics
 Retail : To study sales patterns, consumer behavior, and inventory
management, data analytics can be applied in the retail sector. Data
analytics can be used by retailers to make data-driven decisions
regarding what products to stock, how to price them, and how to best
organize their stores.
 Healthcare : Data analytics can be used to evaluate patient data, spot
trends in patient health, and create individualized treatment regimens.
Data analytics can be used by healthcare companies to enhance
patient outcomes and lower healthcare expenditures.
 Finance : In the field of finance, data analytics can be used to evaluate
investment data, spot trends in the financial markets, and make wise
investment decisions. Data analytics can be used by financial
institutions to lower risk and boost the performance of investment
portfolios.
 Marketing : By analyzing customer data, spotting trends in consumer
behavior, and creating customized marketing strategies, data analytics
can be used in marketing. Data analytics can be used by marketers to
boost the efficiency of their campaigns and their overall impact.
 Manufacturing : Data analytics can be used to examine production
data, spot trends in production methods, and boost production
efficiency in the manufacturing sector. Data analytics can be used by
manufacturers to cut costs and enhance product quality.
 Transportation : To evaluate logistics data, spot trends in
transportation routes, and improve transportation routes, the
transportation sector can employ data analytics. Data analytics can
help transportation businesses cut expenses and speed up delivery
times.

How to Install Numpy on Windows?


Python NumPy is a general-purpose array processing package that
provides tools for handling n-dimensional arrays. It provides various
computing tools such as comprehensive mathematical functions, and
linear algebra routines. NumPy provides both the flexibility of Python and
the speed of well-optimized compiled C code. Its easy-to-use syntax
makes it highly accessible and productive for programmers from any
background. In this article, we will see how to install NumPy as well as
how to import Numpy in Python.
Pre-requisites:
 Python
 PIP or Conda (depending upon user preference)
Installing Numpy on Windows
Below are the ways by which we can install NumPy on Windows and later
on import Numpy in Python:
 Using Conda
 Using PIP
Install Numpy Using Conda
If you want the installation to be done through conda, you can use the
below command:
conda install -c anaconda numpy
You will get a similar message once the installation is complete

Make sure you follow the best practices for installation


using conda as:
 Use an environment for installation rather than in the base environment
using the below command:
conda create -n my-env
conda activate my-env
Note: If your preferred method of installation is conda-forge, use the
below command:
conda config --env --add channels conda-forge
Installing Numpy For PIP Users
Users who prefer to use pip can use the below command to install
NumPy:
pip install numpy
You will get a similar message once the installation is complete:

Now that we have installed Numpy successfully in our system, let’s take a
look at few simple examples.
Example of Numpy
In this example, a 2D NumPy array named arr is created, and its
characteristics are demonstrated: the array type, number of dimensions
(2), shape (2 rows, 3 columns), size (6 elements), and the data type of its
elements (int64).
# Python program to demonstrate
# basic array characteristics
import numpy as np

# Creating array object


arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )

# Printing type of arr object


print("Array is of type: ", type(arr))

# Printing array dimensions (axes)


print("No. of dimensions: ", arr.ndim)

# Printing shape of array


print("Shape of array: ", arr.shape)

# Printing size (total number of elements) of array


print("Size of array: ", arr.size)

# Printing type of elements in array


print("Array stores elements of type: ", arr.dtype)
Output:
Array is of type:
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64
How to Install Numpy on Windows? – FAQ
How do I install NumPy?
You can install NumPy by using the pip package installer. Open your
command prompt or terminal and run the following command: pip install
numpy. This will download and install the latest version of NumPy from
PyPI.
Do I need to install any dependencies for NumPy?
NumPy has a few dependencies, such as the Python development
headers and a C compiler. However, when you install NumPy using pip, it
automatically handles the dependencies for you.
Can I install a specific version of NumPy?
Yes, you can install a specific version of NumPy by specifying the version
number in the pip install command. For example, to install version 1.19.5,
you would run: pip install numpy==1.19.5.
I encountered an error related to building or compiling NumPy.
What should I do?
Building NumPy from source requires certain development tools. On
Windows, you might need to install Microsoft Visual C++ Build Tools. On
macOS, you may need to install the Xcode Command Line Tools. On
Linux, you may need to install the build-essential package. Refer to the
NumPy documentation for detailed instructions based on your operating
system.
How to Install Pandas in Python?
Pandas in Python is a package that is written for data analysis and
manipulation. Pandas offer various operations and data structures to
perform numerical data manipulations and time series. Pandas is an
open-source library that is built over Numpy libraries. Pandas library is
known for its high productivity and high performance. Pandas are popular
because they make importing and analyzing data much easier. Pandas
programs can be written on any plain text editor like Notepad, notepad+
+, or anything of that sort and saved with a .py extension.
To begin with Install Pandas in Python, write Pandas Codes, and perform
various intriguing and useful operations, one must have Python installed
on their System.
Check if Python is Already Present
To check if your device is pre-installed with Python or not, just go to
the Command line(search for cmd in the Run dialog( + R). Now run the
following command:
python --version
If Python is already installed, it will generate a message with the Python
version available else install Python, for installing please visit: How to
Install Python on Windows or Linux and PIP.

Python version

Pandas can be installed in multiple ways on Windows, Linux, and MacOS.


Various ways are listed below:
Import Pandas in Python
Now, that we have installed pandas on the system. Let’s see how we can
import it to make use of it.
For this, go to a Jupyter Notebook or open a Python file, and write the
following code:
import pandas as pd
Here, pd is referred to as an alias to the Pandas, which will help us in
optimizing the code.
How to Install or Download Python Pandas
Pandas can be installed in multiple ways on Windows, Linux and MacOS.
Various different ways are listed below:
Install Pandas on Windows
Python Pandas can be installed on Windows in two ways:
 Using pip
 Using Anaconda
Install Pandas using pip
PIP is a package management system used to install and manage
software packages/libraries written in Python. These files are stored in a
large “online repository” termed as Python Package Index (PyPI).
Step 1 : Launch Command Prompt
To open the Start menu, press the Windows key or click the Start button.
To access the Command Prompt, type “cmd” in the search bar, click the
displayed app, or use Windows key + r, enter “cmd,” and press Enter.

Command Prompt

Step 2 : Run the Command


Pandas can be installed using PIP by use of the following command in
Command Prompt.
pip install pandas
Installed Pandas

Install Pandas using Anaconda


Anaconda is open-source software that contains Jupyter, spyder, etc that
is used for large data processing, Data Analytics, and heavy scientific
computing. If your system is not pre-equipped with Anaconda Navigator,
you can learn how to install Anaconda Navigator
on Windows or Linux.
Install and Run Pandas from Anaconda Navigator
Step 1: Search for Anaconda Navigator in Start Menu and open it.
Step 2: Click on the Environment tab and then click on the Create button
to create a new Pandas Environment.
Creating Environment

Step 3: Give a name to your Environment, e.g. Pandas, and then choose
a Python and its version to run in the environment. Now click on
the Create button to create Pandas Environment.
Naming the environment and selecting version

Step 4: Now click on the Pandas Environment created to activate it.


Activate the environment

Step 5: In the list above package names, select All to filter all the
packages.
Getting all the packages

Step 6: Now in the Search Bar, look for ‘Pandas‘. Select the Pandas
package for Installation.
Selecting the package to install

Step 7: Now Right Click on the checkbox given before the name of the
package and then go to ‘Mark for specific version installation‘. Now
select the version that you want to install.
Selecting the version for installation

Step 8: Click on the Apply button to install the Pandas Package.


Step 9: Finish the Installation process by clicking on the Apply button.
Step 10: Now to open the Pandas Environment, click on the Green
Arrow on the right of the package name and select the Console with
which you want to begin your Pandas programming.
Pandas Terminal Window:
Pandas Terminal

Install Pandas on Linux


Install Pandas on Linux, just type the following command in the Terminal
Window and press Enter. Linux will automatically download and install the
packages and files required to run Pandas Environment in Python:
pip3 install pandas
Install Pandas on MacOS
Install Pandas on MacOS, type the following command in the Terminal,
and make sure that python is already installed in your system.
pip install pandas
How To Use Jupyter Notebook – An Ultimate
Guide
The Jupyter Notebook is an open-source web application that allows you
to create and share documents that contain live code, equations,
visualizations, and narrative text. Uses include data cleaning and
transformation, numerical simulation, statistical modeling, data
visualization, machine learning, and much more. Jupyter has support for
over 40 different programming languages and Python is one of them.
Python is a requirement (Python 3.3 or greater, or Python 2.7) for
installing the Jupyter Notebook itself.
Table Of Content
 Installation
 Starting Jupyter Notebook
 Creating a Notebook
 Hello World in Jupyter Notebook
 Cells in Jupyter Notebook
 Kernel
 Naming the notebook
 Notebook Extensions
Installation
Install Python and Jupyter using the Anaconda Distribution, which
includes Python, the Jupyter Notebook, and other commonly used
packages for scientific computing and data science. You can download
Anaconda’s latest Python3 version. Now, install the downloaded version
of Anaconda. Installing Jupyter Notebook using pip:
python3 -m pip install --upgrade pip
python3 -m pip install jupyter
Starting Jupyter Notebook
To start the jupyter notebook, type the below command in the terminal.
jupyter notebook
This will print some information about the notebook server in your
terminal, including the URL of the web application (by default,
http://localhost:8888) and then open your default web browser to this
URL.
After the notebook is opened, you’ll see the Notebook Dashboard, which
will show a list of the notebooks, files, and subdirectories in the directory
where the notebook server was started. Most of the time, you will wish to
start a notebook server in the highest level directory containing
notebooks. Often this will be your home directory.

Creating a Notebook
To create a new notebook, click on the new button at the top right corner.
Click it to open a drop-down list and then if you’ll click on Python3, it will
open a new notebook. The web page
should look like this:

Hello World in Jupyter Notebook


After successfully installing and creating a notebook in Jupyter Notebook,
let’s see how to write code in it. Jupyter notebook provides a cell for
writing code in it. The type of code depends on the type of notebook you
created. For example, if you created a Python3 notebook then you can
write Python3 code in the cell. Now, let’s add the following code –

 Python3

print("Hello World")

To run a cell either click the run button or press shift ⇧ + enter ⏎ after
selecting the cell you want to execute. After writing the above code in the
jupyter notebook, the output was:
Note: When a cell has executed the label on the left i.e. ln[] changes to
ln[1]. If the cell is still under execution the label remains ln[*].
Cells in Jupyter Notebook
Cells can be considered as the body of the Jupyter. In the above
screenshot, the box with the green outline is a cell. There are 3 types of
cell:
 Code
 Markup
 Raw NBConverter
Code
This is where the code is typed and when executed the code will display
the output below the cell. The type of code depends on the type of the
notebook you have created. For example, if the notebook of Python3 is
created then the code of Python3 can be added. Consider the below
example, where a simple code of the Fibonacci series is created and this
code also takes input from the user. Example:
The tex bar in the above code is prompted for taking input from the user.
The output of the above code is as follows: Output:

Markdown
Markdown is a popular markup language that is the superset of the HTML.
Jupyter Notebook also supports markdown. The cell type can be changed
to markdown using the cell menu.
Adding Headers: Heading
can be added by prefixing any line by single or multiple ‘#’ followed by
space. Example:

Output:
Adding List: Adding List is really simple in Jupyter Notebook. The list can
be added by using ‘*’ sign. And the Nested list can be created by using
indentation. Example:

Output:
Adding Latex Equations: Latex expressions can be added by
surrounding the latex code by ‘$’ and for writing the expressions in the
middle, surrounds the latex code by ‘$$’. Example:

Output:
Adding Table: A table can be added by writing the content in the
following format.

Output:
Note: The text can be made bold or italic by enclosing the text in ‘**’ and
‘*’ respectively.
Raw NBConverter
Raw cells are provided to write the output directly. This cell is not
evaluated by Jupyter notebook. After passing through nbconvert the raw
cells arrives in the destination folder without any modification. For
example, one can write full Python into a raw cell that can only be
rendered by Python only after conversion by nbconvert.
Kernel
A kernel runs behind every notebook. Whenever a cell is executed, the
code inside the cell is executed within the kernel and the output is
returned back to the cell to be displayed. The kernel continues to exist to
the document as a whole and not for individual cells. For example, if a
module is imported in one cell then, that module will be available for the
whole document. See the below example for better
understanding. Example:
Note: The order of execution of each cell is stated to the left of the cell. In
the above example, the cell with In[1] is executed first then the cell with
In[2] is executed. Options for kernels: Jupyter Notebook provides
various options for kernels. This can be useful if you want to reset things.
The options are:
 Restart: This will restart the kernels i.e. clearing all the variables that
were defined, clearing the modules that were imported, etc.
 Restart and Clear Output: This will do the same as above but will also
clear all the output that was displayed below the cell.
 Restart and Run All: This is also the same as above but will also run
all the cells in the top-down order.
 Interrupt: This option will interrupt the kernel execution. It can be
useful in the case where the programs continue for execution or the
kernel is stuck over some computation.
Naming the notebook
When the notebook is created, Jupyter Notebook names the notebook as
Untitled as default. However, the notebook can be renamed. To rename
the notebook just click on the word Untitled. This will prompt a dialogue
box titled Rename Notebook. Enter the valid name for your notebook in
the text bar, then click ok.

Notebook Extensions
New functionality can be added to Jupyter through extensions. Extensions
are javascript module. You can even write your own extension that can
access the page’s DOM and the Jupyter Javascript API. Jupyter supports
four types of extensions.
 Kernel
 IPyhton Kernel
 Notebook
 Notebook server
Installing Extensions
Most of the extensions can be installed using Python’s pip tool. If an
extension can not be installed using pip, then install the extension using
the below command.
jupyter nbextension install extension_name
The above only installs the extension but does not enables it. To enable it
type the below command in the terminal.
jupyter nbextension enable extension_name
Creating a Pandas DataFrame
In the real world, a Pandas DataFrame will be created by loading the
datasets from existing storage, storage can be SQL Database, CSV file,
and Excel file. Pandas DataFrame can be created from the lists,
dictionary, and from a list of dictionary etc.

A Dataframe is a two-dimensional data structure, i.e., data is aligned in a


tabular fashion in rows and columns. In dataframe datasets arrange in
rows and columns, we can store any number of datasets in a dataframe.
We can perform many operations on these datasets like arithmetic
operation, columns/rows selection, columns/rows addition etc.
Pandas DataFrame can be created in multiple ways. Let’s discuss
different ways to create a DataFrame one by one.
Creating an empty dataframe :
A basic DataFrame, which can be created is an Empty Dataframe. An
Empty Dataframe is created just by calling a dataframe constructor.

 Python3

# import pandas as pd
import pandas as pd

# Calling DataFrame constructor

df = pd.DataFrame()

print(df)

Output :

Empty DataFrame
Columns: []
Index: []

Creating a dataframe using List :


DataFrame can be created using a single list or a list of lists.

 Python3

# import pandas as pd

import pandas as pd

# list of strings

lst = ['Geeks', 'For', 'Geeks', 'is',

'portal', 'for', 'Geeks']

# Calling DataFrame constructor on list


df = pd.DataFrame(lst)

print(df)

Output:

Creating DataFrame from dict of ndarray/lists :


To create DataFrame from dict of narray/list, all the narray must be of
same length. If index is passed then the length index should be equal to
the length of arrays. If no index is passed, then by default, index will be
range(n) where n is the array length.

 Python3

# Python code demonstrate creating

# DataFrame from dict narray / lists

# By default addresses.

import pandas as pd
# initialise data of lists.

data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}

# Create DataFrame

df = pd.DataFrame(data)

# Print the output.

print(df)

Output:

Create pandas dataframe from lists using dictionary :


Creating pandas data-frame from lists using dictionary can be achieved in
different ways. We can create pandas dataframe from lists using
dictionary using pandas.DataFrame. With this method in Pandas we can
transform a dictionary of list to a dataframe.

 Python3

# importing pandas as pd
import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

df = pd.DataFrame(dict)

print(df)

Output:

Multiple ways of creating dataframe :

 Different ways to create Pandas Dataframe


 Create pandas dataframe from lists using zip
 Create a Pandas DataFrame from List of Dicts
 Create a Pandas Dataframe from a dict of equal length lists
 Creating a dataframe using List
 Create pandas dataframe from lists using dictionary
Python Pandas Series
Pandas Series is a one-dimensional labeled array capable of holding data
of any type (integer, string, float, python objects, etc.).
Pandas Series Examples

# import pandas as pd
2

import pandas as pd
3

# simple array
5

data = [1, 2, 3, 4]
6

ser = pd.Series(data)
8

print(ser)

Output
0 1
1 2
2 3
3 4
dtype: int64
The axis labels are collectively called index. Pandas Series is nothing but
a column in an excel sheet.
Labels need not be unique but must be a hashable type. The object
supports both integer and label-based indexing and provides a host of
methods for performing operations involving the index.
Python Pandas Series
We will get a brief insight on all these basic operations which can be
performed on Pandas Series :
 Creating a Series
 Accessing element of Series
 Indexing and Selecting Data in Series
 Binary operation on Series
 Conversion Operation on Series
Creating a Pandas Series
In the real world, a Pandas Series will be created by loading the datasets
from existing storage, storage can be SQL Database, CSV file, and Excel
file. Pandas Series can be created from the lists, dictionary, and from a
scalar value etc. Series can be created in different ways, here are some
ways by which we create a series:
Creating a series from array: In order to create a series from array, we
have to import a numpy module and have to use array() function.

# import pandas as pd
2
import pandas as pd
3

# import numpy as np
5

import numpy as np
6

# simple array
8

data = np.array(['g','e','e','k','s'])
9

10

ser = pd.Series(data)
11

print(ser)

Output
0 g
1 e
2 e
3 k
4 s
dtype: object
Creating a series from Lists :
In order to create a series from list, we have to first create a list after that
we can create a series from list.

import pandas as pd
2

# a simple list
4

list = ['g', 'e', 'e', 'k', 's']


5

# create series form a list


7

ser = pd.Series(list)
8

print(ser)

Output
0 g
1 e
2 e
3 k
4 s
dtype: object
For more details refer to Creating a Pandas Series.
Accessing element of Series
There are two ways through which we can access element of series, they
are :
 Accessing Element from Series with Position
 Accessing Element Using Label (index)
Accessing Element from Series with Position : In order to access the
series element refers to the index number. Use the index operator [ ] to
access an element in a series. The index must be an integer. In order to
access multiple elements from a series, we use Slice operation.
Accessing first 5 elements of Series.

# import pandas and numpy


2

import pandas as pd
3

import numpy as np
4

# creating simple array


6
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
7

ser = pd.Series(data)
8

10

#retrieve the first element


11

print(ser[:5])

Output
0 g
1 e
2 e
3 k
4 s
dtype: object
Accessing Element Using Label (index) :
In order to access an element from series, we have to set values by index
label. A Series is like a fixed-size dictionary in that you can get and set
values by index label.
Accessing a single element using index label.

# import pandas and numpy


2

import pandas as pd
3

import numpy as np
4

# creating simple array


6

data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])


7

ser = pd.Series(data,index=[10,11,12,13,14,15,16,17,18,19,20,21,22])
8

10

# accessing a element using index element


11

print(ser[16])

Output
o
For more details refer to Accessing element of Series
Indexing and Selecting Data in Series
Indexing in pandas means simply selecting particular data from a Series.
Indexing could mean selecting all the data, some of the data from
particular columns. Indexing can also be known as Subset Selection.
Indexing a Series using indexing operator [] :
Indexing operator is used to refer to the square brackets following an
object. The .loc and .iloc indexers also use the indexing operator to
make selections. In this indexing operator to refer to df[ ].

# importing pandas module


2

import pandas as pd
3

# making data frame


5

df = pd.read_csv("nba.csv")
6

ser = pd.Series(df['Name'])
8

data = ser.head(10)
9

data
Now we access the element of series using index operator [ ].

# using indexing operator


2

data[3:6]

Indexing a Series using .loc[ ] :


This function selects data by refering the explicit index .
The df.loc indexer selects data in a different way than just the indexing
operator. It can select subsets of data.

# importing pandas module


2

import pandas as pd
3

# making data frame


5

df = pd.read_csv("nba.csv")
6

ser = pd.Series(df['Name'])
8

data = ser.head(10)
9

data

Now we access the element of series using .loc[] function.

# using .loc[] function


2

data.loc[3:6]
Output :

Indexing a Series using .iloc[ ] :


This function allows us to retrieve data by position. In order to do that,
we’ll need to specify the positions of the data that we want.
The df.iloc indexer is very similar to df.loc but only uses integer
locations to make its selections.

# importing pandas module


2

import pandas as pd
3
4

# making data frame


5

df = pd.read_csv("nba.csv")
6

ser = pd.Series(df['Name'])
8

data = ser.head(10)
9

data
Output:

Now we access the element of Series using .iloc[] function.

# using .iloc[] function


2

data.iloc[3:6]
Output :

Binary Operation on Series


We can perform binary operation on series like addition, subtraction and
many other operation. In order to perform binary operation on series we
have to use some function like .add(),.sub() etc..
Code #1:
1

# importing pandas module


2

import pandas as pd
3

# creating a series
5

data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])


6

# creating a series
8

data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])


9

10

print(data, "\n\n", data1)

Output
a 5
b 2
c 3
d 7
dtype: int64

a 1
b 6
d 4
e 9
dtype: int64
Now we add two series using .add() function.
1

# adding two series using


2

# .add
3

data.add(data1, fill_value=0)
Output :

Code #2:

# importing pandas module


2

import pandas as pd
3

# creating a series
5

data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])


6

# creating a series
8

data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])


9

10

print(data, "\n\n", data1)

Output
a 5
b 2
c 3
d 7
dtype: int64

a 1
b 6
d 4
e 9
dtype: int64
Now we subtract two series using .sub function.

# subtracting two series using


2

# .sub
3

data.sub(data1, fill_value=0)
Output :

For more details refer to Binary operation methods on series

Conversion Operation on Series


In conversion operation we perform various operation like changing
datatype of series, changing a series to list etc. In order to perform
conversion operation we have various function which help in conversion
like .astype(), .tolist() etc.
Code #1:

1
# Python program using astype
2

# to convert a datatype of series


3

# importing pandas module


5

import pandas as pd
6

# reading csv file from url


8

data = pd.read_csv("nba.csv")
9

10

# dropping null value columns to avoid errors


11

data.dropna(inplace = True)
12

13

# storing dtype before converting


14

before = data.dtypes
15

16

# converting dtypes using astype


17

data["Salary"]= data["Salary"].astype(int)
18

data["Number"]= data["Number"].astype(str)
19

20

# storing dtype after converting


21

after = data.dtypes
22

23
# printing to compare
24

print("BEFORE CONVERSION\n", before, "\n")


25

print("AFTER CONVERSION\n", after, "\n")


Output :

Code #2:

# Python program converting


2

# a series into list


3

# importing pandas module


5

import pandas as pd
6

7
# importing regex module
8

import re
9

10

# making data frame


11

data = pd.read_csv("nba.csv")
12

13

# removing null values to avoid errors


14

data.dropna(inplace = True)
15

16

# storing dtype before operation


17

dtype_before = type(data["Salary"])
18

19

# converting to list
20

salary_list = data["Salary"].tolist()
21

22

# storing dtype after operation


23

dtype_after = type(salary_list)
24

25

# printing dtype
26

print("Data type before converting = {}\nData type after converting = {}"


27

.format(dtype_before, dtype_after))
28

29
# displaying list
30

salary_list
Output :

Binary operation methods on series:


FUNCTION DESCRIPTION

Method is used to add series or list like objects with same


add()
length to the caller series

Method is used to subtract series or list like objects with same


sub()
length from the caller series

Method is used to multiply series or list like objects with same


mul()
length with the caller series

Method is used to divide series or list like objects with same


div()
length by the caller series
FUNCTION DESCRIPTION

sum() Returns the sum of the values for the requested axis

prod() Returns the product of the values for the requested axis

mean() Returns the mean of the values for the requested axis

Method is used to put each element of passed series as


pow()
exponential power of caller series and returned the results

Method is used to get the absolute numeric value of each


abs()
element in Series/DataFrame

cov() Method is used to find covariance of two series

Pandas series method:


FUNCTION DESCRIPTION

A pandas Series can be created with the Series()


Series() constructor method. This constructor method accepts a
variety of inputs

combine_first() Method is used to combine two series into one

count() Returns number of non-NA/null observations in the Series

size() Returns the number of elements in the underlying data

Method allows to give a name to a Series object, i.e. to


name()
the column

is_unique() Method returns boolean if values in the object are unique


FUNCTION DESCRIPTION

Method to extract the index positions of the highest values


idxmax()
in a Series

Method to extract the index positions of the lowest values


idxmin()
in a Series

Method is called on a Series to sort the values in


sort_values()
ascending or descending order

Method is called on a pandas Series to sort it by the index


sort_index()
instead of its values

Method is used to return a specified number of rows from


head() the beginning of a Series. The method returns a brand
new Series

Method is used to return a specified number of rows from


tail() the end of a Series. The method returns a brand new
Series

Used to compare every element of Caller series with


le() passed series.It returns True for every element which is
Less than or Equal to the element in passed series

Used to compare every element of Caller series with


ne() passed series. It returns True for every element which is
Not Equal to the element in passed series

Used to compare every element of Caller series with


ge() passed series. It returns True for every element which is
Greater than or Equal to the element in passed series

eq() Used to compare every element of Caller series with


passed series. It returns True for every element which is
FUNCTION DESCRIPTION

Equal to the element in passed series

Used to compare two series and return Boolean value for


gt()
every respective element

Used to compare two series and return Boolean value for


lt()
every respective element

Used to clip value below and above to passed Least and


clip()
Max value

clip_lower() Used to clip values below a passed least value

clip_upper() Used to clip values above a passed maximum value

astype() Method is used to change data type of a series

tolist() Method is used to convert a series to list

Method is called on a Series to extract values from a


get() Series. This is alternative syntax to the traditional bracket
syntax

Pandas unique() is used to see the unique values in a


unique()
particular column

nunique() Pandas nunique() is used to get a count of unique values

Method to count the number of the times each unique


value_counts()
value occurs in a Series

factorize() Method helps to get the numeric representation of an


FUNCTION DESCRIPTION

array by identifying distinct values

Method to tie together the values from one object to


map()
another

Pandas between() method is used on series to check


between()
which values lie between first and second argument

Method is called and feeded a Python function as an


argument to use the function on every Series value. This
apply()
method is helpful for executing custom operations that are
not included in pandas or numpy
Creating a Pandas Series
Pandas Series is a one-dimensional labeled array capable of holding data
of any type (integer, string, float, python objects, etc.). The axis labels are
collectively called index. Labels need not be unique but must be a
hashable type. The object supports both integer and label-based indexing
and provides a host of methods for performing operations involving the
index.

To create Series with any of the methods make sure to import pandas
library.
Creating an empty Series: Series() function of Pandas is used to create
a series. A basic series, which can be created is an Empty Series.
 Python3

# import pandas as pd

import pandas as pd

# Creating empty series

ser = pd.Series()

print(ser)

Output :
Series([], dtype: float64)
By default, the data type of Series is float.
Creating a series from array: In order to create a series from NumPy
array, we have to import numpy module and have to use array() function.
 Python3

# import pandas as pd
import pandas as pd

# import numpy as np

import numpy as np

# simple array

data = np.array(['g', 'e', 'e', 'k', 's'])

ser = pd.Series(data)

print(ser)

Output:

By default, the index of the series starts from 0 till the length of series -1.
Creating a series from array with an index: In order to create a series
by explicitly proving index instead of the default, we have to provide a list
of elements to the index parameter with the same number of elements as
it is an array.
 Python3

# import pandas as pd

import pandas as pd
# import numpy as np

import numpy as np

# simple array

data = np.array(['g', 'e', 'e', 'k', 's'])

# providing an index

ser = pd.Series(data, index=[10, 11, 12, 13, 14])

print(ser)

Output:

Creating a series from Lists : In order to create a series from list, we


have to first create a list after that we can create a series from list.
 Python3

import pandas as pd

# a simple list

list = ['g', 'e', 'e', 'k', 's']


# create series form a list

ser = pd.Series(list)

print(ser)

Output :

Creating a series from Dictionary : In order to create a series from the


dictionary, we have to first create a dictionary after that we can make a
series using dictionary. Dictionary keys are used to construct indexes of
Series.
 Python3

import pandas as pd

# a simple dictionary

dict = {'Geeks': 10,

'for': 20,

'geeks': 30}

# create series from dictionary

ser = pd.Series(dict)

print(ser)
Output:

Creating a series from Scalar value: In order to create a series from


scalar value, an index must be provided. The scalar value will be repeated
to match the length of the index.
 Python3

import pandas as pd

import numpy as np

# giving a scalar value with index

ser = pd.Series(10, index=[0, 1, 2, 3, 4, 5])

print(ser)

Output:

Creating a series using NumPy functions : In order to create a series


using numpy function, we can use different function of numpy
like numpy.linspace(), numpy.random.radn().
 Python3

# import pandas and numpy


import pandas as pd

import numpy as np

# series with numpy linspace()

ser1 = pd.Series(np.linspace(3, 33, 3))

print(ser1)

# series with numpy linspace()

ser2 = pd.Series(np.linspace(1, 100, 10))

print(& quot

\n" , ser2)

Output:

Creating a Series using range function:

 Python3

# code
import pandas as pd

ser=pd.Series(range(10))

print(ser)

Output:
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
dtype: int64
Creating a Series using for loop and list comprehension:

 Python3

import pandas as pd

ser=pd.Series(range(1,20,3), index=[x for x in 'abcdefg'])

print(ser)

Output:
a 1
b 4
c 7
d 10
e 13
f 16
g 19
dtype: int64
Creating a Series using mathematical expressions:

 Python3

import pandas as pd

import numpy as np

ser=np.arange(10,15)

serobj=pd.Series(data=ser*5,index=ser)

print(serobj)

Output:
10 50
11 55
12 60
13 65
14 70
dtype: int32
Python | Pandas Dataframe/Series.head()
method
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric Python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas head() method is used to return top n (5 by default) rows of a data
frame or series.
Syntax: Dataframe.head(n=5)
Parameters:
n: integer value, number of rows to be returned
Return type: Dataframe with top n rows
To download the data set used in following example, click here.
In the following examples, the data frame used contains data of some
NBA players. The image of data frame before any operations is attached
below.

Example #1:
In this example, top 5 rows of data frame are returned and stored in a new
variable. No parameter is passed to .head() method since by default it is
5.

# importing pandas module

import pandas as pd
# making data frame

data =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

# calling head() method

# storing in new variable

data_top = data.head()

# display

data_top

Output:
As shown in the output image, it can be seen that the index of returned
rows is ranging from 0 to 4. Hence, top 5 rows were returned.

Example #2: Calling on Series with n parameter()


In this example, the .head() method is called on series with custom input
of n parameter to return top 9 rows of the series.

# importing pandas module

import pandas as pd
# making data frame

data =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

# number of rows to return

n =9

# creating series

series = data["Name"]

# returning top n rows

top = series.head(n = n)

# display

top

Output:
As shown in the output image, top 9 rows ranging from 0 to 8th index
position were returned.
Python | Pandas Dataframe/Series.tail()
method
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric Python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas tail() method is used to return bottom n (5 by default) rows of a
data frame or series.
Syntax: Dataframe.tail(n=5)
Parameters:
n: integer value, number of rows to be returned
Return type: Dataframe with bottom n rows
To download the data set used in following example, click here.
In the following examples, the data frame used contains data of some
NBA players. The image of data frame before any operations is attached
below.

Example #1:
In this example, bottom 5 rows of data frame are returned and stored in a
new variable. No parameter is passed to .tail() method since by default
it is 5.

# importing pandas module

import pandas as pd
# making data frame

data =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

# calling tail() method

# storing in new variable

data_bottom = data.tail()

# display

data_bottom

Output:
As shown in the output image, it can be seen that the index of returned
rows is ranging from 453 to 457. Hence, last 5 rows were returned.

Example #2: Calling on Series with n parameter()


In this example, the .tail() method is called on series with custom input of
n parameter to return bottom 12 rows of the series.

# importing pandas module

import pandas as pd
# making data frame

data =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

# number of rows to return

n = 12

# creating series

series = data["Salary"]

# returning top n rows

bottom = series.tail(n = n)

# display

bottom

Output:
As shown in the output image, top 12 rows ranging from 446 to 457th
index position of the Salary column were returned.
Pandas DataFrame describe() Method
describe() method in Pandas is used to generate descriptive statistics of
DataFrame columns. It gives a quick summary of key statistical metrics
like mean, standard deviation, percentiles, and more. By
default, describe() works with numeric data but can also handle
categorical data, offering tailored insights based on data type.
Syntax: DataFrame.describe(percentiles=None, include=None,
exclude=None)
Parameters:
 percentiles: A list of numbers between 0 and 1, specifying which
percentiles to return. The default is None, which returns the 25th, 50th,
and 75th percentiles.
 include: A list of data types to include in the summary. You can
specify data types such as int, float, object (for strings), etc. The
default is None, meaning all numeric types are included.
 exclude: A list of data types to exclude from the summary. This
parameter is also None by default, meaning no types are excluded.
The describe() method returns a statistical summary of the data frame or
series.
Using describe() method on a DataFrame
Let’s walk through an example using an NBA dataset and then use
the describe() method to generate a statistical summary.
Dataset Link: nba.csv

import pandas as pd
2

# Reading the CSV file


4

data = pd.read_csv('nba.csv')
5

# Displaying the first few rows of the dataset


7

print("NBA Dataset:")
8

display(data.head())
9

10

print("\n Summary Table Generated by .describe() Method:")


11

display(data.describe())
Output

Summary generated by .describe() method

Descriptive Statistics for Numerical Columns generated


using .describe() Method
 count: Total number of non-null values in the column.
 mean: Average value of the column.
 std: Standard deviation, showing how spread out the values are.
 min: Minimum value in the column.
 25%: 25th percentile (Q1).
 50%: Median value (50th percentile).
 75%: 75th percentile (Q3).
 max: Maximum value in the column.
Customizing describe() Method with Percentiles
You can customize the describe() method to include specific percentiles
by passing a list to the percentiles parameter. Here’s an example:

percentiles = [.20, .40, .60, .80]


2

include = ['object', 'float', 'int']


3

desc = data.describe(percentiles=percentiles, include=include)


5

print(desc)
Output
In this output, you can see that the percentiles have been applied,
providing additional insights.
Describing Series of Strings (Object Data Type)
If you want to describe a column with string data (i.e., an object data
type), the output will be different. Here’s an example using the “Name”
column from the dataset:

desc = data["Name"].describe()
2

print(desc)
Output
count 457
unique 457
top Avery Bradley
freq 1
Name: Name, dtype: object
For string data, the describe() method provides:
 count: Total number of non-null values.
 unique: The number of unique values.
 top: The most frequent value.
 freq: The frequency of the most common value.
The describe() method in Pandas is a powerful tool for quickly obtaining
an overview of a DataFrame’s numeric and object columns.
Dealing with Rows and Columns in Pandas
DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a
tabular fashion in rows and columns. We can perform basic operations on
rows/columns like selecting, deleting, adding, and renaming. In this article,
we are using nba.csv file.

Dealing with Columns


In order to deal with columns, we perform basic operations on columns
like selecting, deleting, adding and renaming.

Column Selection
:
In Order to select a column in Pandas DataFrame, we can either access
the columns by calling them by their columns name.
 Python3

# Import pandas package


import pandas as pd

# Define a dictionary containing employee data

data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],

'Age':[27, 24, 22, 32],

'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],

'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

# select two columns

print(df[['Name', 'Qualification']])

Output:

For more examples refer to How to select multiple columns in a pandas


dataframeColumn Addition: In Order to add a column in Pandas
DataFrame, we can declare a new list as a column and add to a existing
Dataframe.
 Python3

# Import pandas package


import pandas as pd

# Define a dictionary containing Students data


data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# Declare a list that is to be converted into a column


address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']

# Using 'Address' as the column name


# and equating it to the list
df['Address'] = address

# Observe the result


print(df)

Output:

For more examples refer to Adding new column to existing DataFrame in


PandasColumn Deletion: In Order to delete a column in Pandas
DataFrame, we can use the
drop() method. Columns is deleted by dropping columns with column
names.
 Python3

# importing pandas module


import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name" )

# dropping passed columns

data.drop(["Team", "Weight"], axis = 1, inplace = True)

# display

print(data)

Output:
As shown in the output images, the new output doesn’t have the passed
columns. Those values were dropped since axis was set equal to 1 and
the changes were made in the original data frame since inplace was True.
Data Frame before Dropping Columns-
Data Frame after Dropping Columns-

For more examples refer to Delete columns from DataFrame using


Pandas.drop()
Dealing with Rows:
In order to deal with rows, we can perform basic operations on rows like
selecting, deleting, adding and renaming.
Row Selection
:
Pandas provide a unique method to retrieve rows from a Data frame.
DataFrame.loc[]
method is used to retrieve rows from Pandas DataFrame. Rows can also
be selected by passing integer location to an
iloc[]
function.
 Python

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name")


# retrieving row by loc method

first = data.loc["Avery Bradley"]

second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)

Output:
As shown in the output image, two series were returned since there was
only one parameter both of the times.

For more examples refer to Pandas Extracting rows using .loc[] Row
Addition:
In Order to add a Row in Pandas DataFrame, we can concat the old
dataframe with new one.
 Python3
# importing pandas module
import pandas as pd

# making data frame


df = pd.read_csv("nba.csv", index_col ="Name")

df.head(10)

new_row = pd.DataFrame({'Name':'Geeks', 'Team':'Boston',


'Number':3,
'Position':'PG', 'Age':33, 'Height':'6-2',
'Weight':189, 'College':'MIT',
'Salary':99999},
index
=[0])
# simply concatenate both dataframes
df = pd.concat([new_row, df]).reset_index(drop = True)
df.head(5)

Output:Data Frame before Adding Row-

Data Frame after Adding Row-


For more examples refer to Add a row at top in pandas DataFrame Row
Deletion:
In Order to delete a row in Pandas DataFrame, we can use the drop()
method. Rows is deleted by dropping Rows by index label.
 Python3

# importing pandas module

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name" )

# dropping passed values

data.drop(["Avery Bradley", "John Holland", "R.J. Hunter",

"R.J. Hunter"], inplace = True)

# display

data

Output:
As shown in the output images, the new output doesn’t have the passed
values. Those values were dropped and the changes were made in the
original data frame since inplace was True.
Data Frame before Dropping values-

Data Frame after Dropping values-

For more examples refer to Delete rows from DataFrame using


Pandas.drop()
Problem related to Columns:
 How to get column names in Pandas dataframe
 How to rename columns in Pandas DataFrame
 How to drop one or multiple columns in Pandas Dataframe
 Get unique values from a column in Pandas DataFrame
 How to lowercase column names in Pandas dataframe
 Apply uppercase to a column in Pandas dataframe
 Capitalize first letter of a column in Pandas dataframe
 Get n-largest values from a particular column in Pandas DataFrame
 Get n-smallest values from a particular column in Pandas DataFrame
 Convert a column to row name/index in Pandas
Problem related to Rows:
 Apply function to every row in a Pandas DataFrame
 How to get rows names in Pandas dataframe
Python | Pandas Extracting rows using .loc[]
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric Python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas provide a unique method to retrieve rows from a Data
frame. DataFrame.loc[] method is a method that takes only index labels
and returns row or dataframe if the index label exists in the caller data
frame.
Syntax: pandas.DataFrame.loc[]
Parameters:
Index label: String or list of string of index label of rows
Return type: Data frame or Series depending on parameters
To download the CSV used in code, click here.
Example #1: Extracting single Row
In this example, Name column is made as the index column and then two
single rows are extracted one by one in the form of series using index
label of rows.

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving row by loc method

first = data.loc["Avery Bradley"]

second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)


Output:
As shown in the output image, two series were returned since there was
only one parameter both of the times.

Example #2: Multiple parameters


In this example, Name column is made as the index column and then two
single rows are extracted at the same time by passing a list as parameter.

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving rows by loc method

rows = data.loc[["Avery Bradley", "R.J. Hunter"]]


# checking data type of rows

print(type(rows))

# display

rows

Output:
As shown in the output image, this time the data type of returned value is
a data frame. Both of the rows were extracted and displayed like a new
data frame.

Example #3: Extracting multiple rows with same index


In this example, Team name is made as the index column and one team
name is passed to .loc method to check if all values with same team name
have been returned or not.

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Team")

# retrieving rows by loc method


rows = data.loc["Utah Jazz"]

# checking data type of rows

print(type(rows))

# display

rows

Output:
As shown in the output image, All rows with team name “Utah Jazz” were
returned in the form of a data frame.
Example #4: Extracting rows between two index labels
In this example, two index label of rows are passed and all the rows that
fall between those two index label have been returned (Both index labels
Inclusive).

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving rows by loc method

rows = data.loc["Avery Bradley":"Isaiah Thomas"]

# checking data type of rows

print(type(rows))

# display

rows

Output:
As shown in the output image, all the rows that fall between passed two
index labels are returned in the form of a data frame.
Extracting rows using Pandas .iloc[] in
Python
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric Python packages. Pandas is one
of those packages that makes importing and analyzing data much
easier. here we are learning how to Extract rows using Pandas .iloc[]
in Python.
Pandas .iloc[] Syntax
Syntax: pandas.DataFrame.iloc[]
Parameters: Index position of rows in integer or list of integer.
Return type: Data frame or Series depending on parameters
What is Pandas .iloc[] in Python?
In the Python Pandas library, .iloc[] is an indexer used for integer-
location-based indexing of data in a DataFrame. It allows users to select
specific rows and columns by providing integer indices, making it a
valuable tool for data manipulation and extraction based on numerical
positions within the DataFrame. This indexer is particularly useful when
you want to access or manipulate data using integer-based positional
indexing rather than labels.
Dataset Used: To download the CSV used in the code, click here.
Extracting Rows using Pandas .iloc[] in Python
The Pandas library provides a unique method to retrieve rows from a
DataFrame. Dataframe.iloc[] method is used when the index label of a
data frame is something other than numeric series of 0, 1, 2, 3….n or in
case the user doesn’t know the index label. Rows can be extracted using
an imaginary index position that isn’t visible in the Dataframe.
There are various method to Extracting rows using Pandas .iloc[] in
Python here we are using some generally used methods which are
following:
 Selecting rows using Pandas .iloc and loc
 Selecting Multiple Rows using Pandas .iloc[] in Python
 Select Rows by Name or Index usingPandas .iloc[] in Python
Selecting rows using Pandas .iloc and loc
In this example, the same index number row is extracted by both .iloc[]
and.loc[] methods and compared. Since the index column by default is
numeric, hence the index label will also be integers.
# importing pandas package
import pandas as pd

# making data frame from csv file


data = pd.read_csv('nba.csv')

# retrieving rows by loc method


row1 = data.loc[3]

# retrieving rows by iloc method


row2 = data.iloc[3]

# checking if values are equal


row1 == row2
Output:
Name True
Team True
Number True
Position True
Age True
Height True
Weight True
College True
Salary True
Name: 3, dtype: bool

As shown in the output image, the results returned by both methods are
the same.
Selecting Multiple Rows using Pandas .iloc[] in Python
In this example, multiple rows are extracted, first by passing a list and
then by passing integers to extract rows between that range. After that,
both values are compared.
# importing pandas package
import pandas as pd

# making data frame from csv file


data = pd.read_csv('nba.csv')

# retrieving rows by loc method


row1 = data.iloc[[4, 5, 6, 7]]

# retrieving rows by loc method


row2 = data.iloc[4:8]

# comparing values
row1 == row2
Output:
Name Team Number Position Age Height Weight
College Salary
4 True True True True True True True
False True
5 True True True True True True True
False True
6 True True True True True True True
True True
7 True True True True True True True
True True
As shown in the output image, the results returned by both methods are
the same. All values are True except values in the college column since
those were NaN values.

Select Rows by Name or Index usingPandas .iloc[] in Python


This code uses Pandas to create a DataFrame with information about
individuals (Geek1 to Geek5) regarding their age and salary. It sets the
‘Name’ column as the index for clarity. The original DataFrame is
displayed, and then it demonstrates the extraction of a single row (Geek1)
and multiple rows (Geek2 to Geek3) using Pandas .iloc[] for integer-
location based indexing. The extracted rows are printed for verification.

import pandas as pd
2

# Creating a sample DataFrame


4

data = pd.DataFrame({
5

'Name': ['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5'],


6

'Age': [25, 30, 22, 35, 28],


7

'Salary': [50000, 60000, 45000, 70000, 55000]


8

})
9

10

# Setting 'Name' column as the index for clarity


11

data.set_index('Name', inplace=True)
12

13

# Displaying the original DataFrame


14

print("Original DataFrame:")
15
print(data)
16

17

# Extracting a single row by index


18

row_alice = data.iloc[0, :]
19

print("\nExtracted Row (Geek1):")


20

print(row_alice)
21

22

# Extracting multiple rows using a slice


23

rows_geek2_to_geek3 = data.iloc[1:3, :]
24

print("\nExtracted Rows (Geek2 to Geek3):")


25

print(rows_geek2_to_geek3)
Output :
Original DataFrame:
Age Salary
Name
Geek1 25 50000
Geek2 30 60000
Geek3 22 45000
Geek4 35 70000
Geek5 28 55000
Extracted Row (Geek1):
Age 25
Salary 50000
Name: Geek1, dtype: int64
Extracted Rows (Geek2 to Geek3):
Age Salary
Name
Geek2 30 60000
Geek3 22 45000

Conclusion
In Conclusion, Pandas .iloc[] in Python is a powerful tool for extracting
rows based on integer-location indexing. Its value shines in datasets
where numerical positions matter more than labels. This feature allows
selective retrieval of individual rows or slices, making it essential for
efficient data manipulation and analysis. The versatility
of .iloc[] enhances flexibility in data extraction, enabling seamless
access to specific portions of datasets. As a fundamental component of
Pandas, .iloc[] significantly contributes to the efficiency and clarity of
data-related tasks for developers and data scientists.
Extracting rows using Pandas .iloc[] in Python – FAQs
How to Drop Rows Using iloc in Pandas?
While iloc is generally used for indexing and selecting data in Pandas, it
is not directly used to drop rows. Instead, to drop rows using index
positions, you can use a combination of iloc and drop methods or use
slicing to create a new DataFrame that excludes the rows you want to
drop.
Example using slicing to exclude rows:
import pandas as pd

# Create a sample DataFrame


df = pd.DataFrame({
'A': range(10),
'B': range(10, 20)
})

# Drop the first 5 rows


df = df.iloc[5:] # Keeps rows from index 5 onwards
print(df)
What is the Difference Between iloc and [] in Pandas?
 iloc: This is an integer-location based indexing method used to access
data in specific positions in the DataFrame. It is strictly integer-based,
from 0 to the length-1 of the axis. It is used to retrieve rows and
columns by integer positions.
df.iloc[0] # Retrieves the first row of the DataFrame
 []:This indexing operator is more versatile and can be used to select
columns by column names or rows based on boolean arrays.
df['A'] # Retrieves the column named 'A'
df[df['A'] > 5] # Retrieves rows where the value in column
'A' is greater than 5
What Does iloc[:0] Do?
The expression iloc[:0] in Pandas is used to select rows up to but not
including index 0, effectively returning an empty DataFrame with the same
column headers.
Example:
df.iloc[:0]
This will return an empty DataFrame with the same structure (columns
and types) as df but no rows.
How to Drop the First 5 Rows in Pandas?
To drop the first 5 rows in a Pandas DataFrame, you can use the drop
method with the row indices you want to remove, or you can simply slice
the DataFrame to skip the first 5 rows.
Example using slicing:
df = df.iloc[5:] # Keeps rows starting from index 5,
dropping the first 5 rows
What is the Difference Between [] and {} in Python?
my_list = [1, 'apple', 3.14]
 [] are used to define lists in Python. Lists are ordered, mutable
collections of items that can be of mixed types.
 {} are used to define dictionaries or sets in Python.
o As a dictionary, it contains key-value pairs where each key is
unique.
o When used with distinct elements without key-value pairs, it
defines a set, which is an unordered collection of unique
elements.
my_dict = {'name': 'Alice', 'age': 25}
my_set = {1, 2, 3}
Both are fundamental data structures in Python, used extensively in
various types of applications.
Indexing and Selecting Data with Pandas
Indexing in Pandas refers to selecting specific rows and columns from a
DataFrame. It allows you to subset data in various ways, such as
selecting all rows with specific columns, some rows with all columns, or a
subset of both rows and columns. This technique is also known as Subset
Selection.
Let’s learn how to use different techniques for indexing and selecting data
with Pandas.
Indexing Data using the [] Operator
The most straightforward way to index data in Pandas is by using the []
operator. This method can be used to select individual columns or
multiple columns.
Selecting a Single Column
To select a single column, you simply reference the column name inside
square brackets:

import pandas as pd
2

# Load the data


4

data = pd.read_csv("nba.csv", index_col="Name")


5

print("Dataset")
6

display(data.head(5))
7

# Select a single column


9

first = data["Age"]
10

print("\nSingle Column selected from Dataset")


11

display(first.head(5))
Output:
Selecting Multiple Columns
To select multiple columns, pass a list of column names:

1
first = data[["Age", "College", "Salary"]]
2

print("\nMultiple Columns selected from Dataset")


3

display(first.head(5))
Output:
Pandas offers several indexing methods to efficiently extract elements,
rows, and columns from a DataFrame. These methods, while similar, have
distinct behaviors. The four main types of indexing in Pandas are:
1. DataFrame[]: Known as the indexing operator, used for basic
selection.
2. DataFrame.loc[]: Label-based indexing for selecting data by
row/column labels.
3. DataFrame.iloc[]: Position-based indexing for selecting data by
row/column integer positions.
Together, these indexing methods, also called “indexers,” are the most
common ways to access data in a Pandas DataFrame.
Indexing a DataFrame using .loc[ ]
The .loc[] function in Pandas is used for selecting data by row and
column labels. Unlike the indexing operator, .loc[] can select subsets of
rows and columns simultaneously, offering flexibility in data retrieval.
Selecting a single row
To select a single row, provide the row label inside the .loc[] function:
1

# importing pandas package


2

import pandas as pd
3

# making data frame from csv file


5

data = pd.read_csv("nba.csv", index_col ="Name")


6

# retrieving row by loc method


8

first = data.loc["Avery Bradley"]


9

second = data.loc["R.J. Hunter"]


10

11

12

print(first, "\n\n\n", second)


Output:

As shown in the output image, two series were returned since there was
only one parameter both of the times.
Selecting multiple rows
For multiple rows, pass a list of row labels to .loc[]:

# Select multiple rows


2

first = data.loc[["Avery Bradley", "R.J. Hunter"]]


3

display(first)
Output:

Selecting Specific Rows and Columns


To select specific rows and columns, provide both row labels and column
names as lists:
Dataframe.loc[["row1", "row2"], ["column1", "column2",
"column3"]]

# Select two rows and three columns


2

first = data.loc[["Avery Bradley", "R.J. Hunter"], ["Team", "Number",


"Position"]]
3

print(first)
Output:
Selecting all of the rows and some columns
To select all rows and specific columns, use a colon [:] to indicate all rows,
followed by the list of column names:
Dataframe.loc[:, ["column1", "column2", "column3"]]

# Select all rows and specific columns


2

first = data.loc[:, ["Team", "Number", "Position"]]


3

print(first)
Output:

The .loc[] method in Pandas offers a powerful way to index and filter data
using labels, making it a core tool for data selection.
Indexing a DataFrame using .iloc[ ]
The .iloc[] function in Pandas allows data selection based on integer
positions (rather than labels). It is similar to .loc[], but only accepts
integer-based indices to specify rows and columns.
Selecting a Single Row
To select a single row using .iloc[], provide the integer position of the row:

import pandas as pd
2

data = pd.read_csv("nba.csv", index_col="Name")


3

# Select a single row by position


5

row2 = data.iloc[3]
6

print(row2)
Output:

Selecting Multiple Rows


To select multiple rows, pass a list of integer positions:

# Select multiple rows by position


2

row2 = data.iloc[[3, 5, 7]]


3

display(row2)
Output:

Selecting Specific Rows and Columns


To select specific rows and columns, provide integer positions for both
rows and columns:

# Select two rows and two columns by position


2

row2 = data.iloc[[3, 4], [1, 2]]


3

print(row2)
Output:

Selecting All Rows and Some Columns


To select all rows and specific columns, use a colon [:] for all rows and a
list of column positions:

# Select all rows and specific columns


2

row2 = data.iloc[:, [1, 2]]


3

print(row2)
Output:
The .iloc[] method is a powerful way to select data by position, offering
flexibility in accessing subsets of a DataFrame based on integer indexing.
Other methods for indexing in a Pandas DataFrame include:
deprecated ones:
Function Description

DataFrame.head() Return top n rows of a DataFrame.

DataFrame.tail() Return bottom n rows of a DataFrame.

DataFrame.at[] Access a single value for a row/column label pair.

Access a single value for a row/column pair by integer


DataFrame.iat[] position.
Function Description

DataFrame.lookup() Label-based “fancy indexing” function for DataFrame.

DataFrame.pop() Return item and drop from DataFrame.

Return a cross-section (row(s) or column(s)) from the


DataFrame.
DataFrame.xs()

Get item from object for given key (e.g., DataFrame


DataFrame.get() column).

Return a boolean DataFrame showing whether each


DataFrame.isin() element is contained in values.

Return an object of the same shape with entries from


DataFrame.where() self where cond is True, otherwise from other.

Return an object of the same shape with entries from


DataFrame.mask() self where cond is False, otherwise from other.

Query the columns of a DataFrame with a boolean


DataFrame.query() expression.

DataFrame.insert() Insert a column into DataFrame at a specified location.

This table contains only active, non-deprecated methods for DataFrame


indexing in Pandas. Let me know if you need more details!
Indexing and Selecting Data with Pandas – FAQs
What is indexing and selecting data with Pandas in Python?
Indexing and selecting data in Pandas refer to the methods used to
access or modify specific rows and columns in a DataFrame or Series.
This can be done using label indexing, integer indexing, or condition-
based filtering.
How to select data based on index in Pandas?
You can select data by index using the loc and iloc attributes:
 loc: For label-based indexing (selects data by row labels and column
names).
 iloc: For integer-based indexing (selects data by row and column
positions).
What are the methods of indexing in pandas?
Pandas supports several methods of indexing:
 Label-based indexing (loc): Selects data based on data index value
labels.
 Integer-based indexing ( iloc): Selects data based on the integer
position of rows and columns.
 Boolean indexing: Uses a boolean vector to filter data.
 Conditional indexing: Uses conditions to filter rows or columns.
 MultiIndex (hierarchical): Advanced indexing on multiple levels of
index rows or columns.
What are types of indexing?
In the broader context beyond pandas, types of indexing include:
 Single-level indexing: Regular index with a single label for each
entry.
 Multi-level indexing (Hierarchical): Multiple index levels, allowing for
more complex data arrangements.
 Datetime indexing: Specific to time series data, allowing date and
time-based indexing.
 Interval indexing: For data indexed by ranges of values.
 Categorical indexing: For data categorized based on specific criteria.
What is indexing method?
An indexing method defines how data is organized and accessed within a
structure like a DataFrame or Series. It determines the efficiency of data
retrieval and manipulation.
Boolean Indexing in Pandas
In boolean indexing, we will select subsets of data based on the actual
values of the data in the DataFrame and not on their row/column labels or
integer locations. In boolean indexing, we use a boolean vector to filter the
data.
Boolean indexing is a type of indexing that uses actual values of the data
in the DataFrame. In boolean indexing, we can filter a data in four ways:
 Accessing a DataFrame with a boolean index
 Applying a boolean mask to a dataframe
 Masking data based on column value
 Masking data based on an index value
Accessing a DataFrame with a boolean index:
In order to access a dataframe with a boolean index, we have to create a
dataframe in which the index of dataframe contains a boolean value that
is “True” or “False”.
Example
 Python3

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

df = pd.DataFrame(dict, index = [True, False, True, False])

print(df)
Output:

Now we have created a dataframe with the boolean index after that user
can access a dataframe with the help of the boolean index. User can
access a dataframe using three functions that is .loc[], .iloc[], .ix[]
Accessing a Dataframe with a boolean index using .loc[]
In order to access a dataframe with a boolean index using .loc[], we
simply pass a boolean value (True or False) in a .loc[] function.

 Python3

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

# creating a dataframe with boolean index

df = pd.DataFrame(dict, index = [True, False, True, False])

# accessing a dataframe using .loc[] function


print(df.loc[True])

Output:

Accessing a Dataframe with a boolean index using .iloc[]


In order to access a dataframe using .iloc[], we have to pass a boolean
value (True or False) but iloc[] function accepts only integer as an
argument so it will throw an error so we can only access a dataframe
when we pass an integer in iloc[] function
Code #1:
 Python3

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

# creating a dataframe with boolean index

df = pd.DataFrame(dict, index = [True, False, True, False])


# accessing a dataframe using .iloc[] function

print(df.iloc[True])

Output:
TypeError
Code #2:
 Python3

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

# creating a dataframe with boolean index

df = pd.DataFrame(dict, index = [True, False, True, False])

# accessing a dataframe using .iloc[] function

print(df.iloc[1])

Output:
Accessing a Dataframe with a boolean index using .ix[]
In order to access a dataframe using .ix[], we have to pass boolean value
(True or False) and integer value to .ix[] function because as we know that
.ix[] function is a hybrid of .loc[] and .iloc[] function.
Code #1:
 Python3

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

# creating a dataframe with boolean index

df = pd.DataFrame(dict, index = [True, False, True, False])

# accessing a dataframe using .ix[] function

print(df.ix[True])

Output:
Code #2:
 Python

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

# creating a dataframe with boolean index

df = pd.DataFrame(dict, index = [True, False, True, False])

# accessing a dataframe using .ix[] function

print(df.ix[1])

Output:
Applying a boolean mask to a dataframe :
In a dataframe, we can apply a boolean mask. In order to do that we can
use __getitems__ or [] accessor. We can apply a boolean mask by giving
a list of True and False of the same length as contain in a dataframe.
When we apply a boolean mask it will print only that dataframe in which
we pass a boolean value True. To download “nba1.1” CSV file click here.
Code #1:
 Python3

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["MBA", "BCA", "M.Tech", "MBA"],

'score':[90, 40, 80, 98]}

df = pd.DataFrame(dict, index = [0, 1, 2, 3])

print(df[[True, False, True, False]])

Output:
Code #2:

 Python3

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba1.1.csv")

df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6,

7, 8, 9, 10, 11, 12])

print(df[[True, False, True, False, True,

False, True, False, True, False,

True, False, True]])

Output:
Masking data based on column value:
In a dataframe we can filter a data based on a column value. In order to
filter data, we can apply certain conditions on the dataframe using
different operators like ==, >, <, <=, >=. When we apply these operators to
the dataframe then it produces a Series of True and False. To download
the “nba.csv” CSV, click here.
Code #1:
 Python

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["BCA", "BCA", "M.Tech", "BCA"],

'score':[90, 40, 80, 98]}

# creating a dataframe

df = pd.DataFrame(dict)

# using a comparison operator for filtering of data

print(df['degree'] == 'BCA')

Output:

Code #2:
 Python

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name")

# using greater than operator for filtering of data

print(data['Age'] > 25)

Output:
Masking data based on index value :
In a dataframe we can filter a data based on a column value. In order to
filter data, we can create a mask based on the index values using different
operators like ==, >, <, etc… . To download “nba1.1” CSV file click here.
Code #1:
 Python3

# importing pandas as pd

import pandas as pd

# dictionary of lists

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],

'degree': ["BCA", "BCA", "M.Tech", "BCA"],

'score':[90, 40, 80, 98]}

df = pd.DataFrame(dict, index = [0, 1, 2, 3])

mask = df.index == 0

print(df[mask])

Output:

Code #2:
 Python3

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba1.1.csv")

# giving a index to a dataframe

df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6,

7, 8, 9, 10, 11, 12])

# filtering data on index value

mask = df.index > 7

print(df[mask])

Output:
Python | Pandas DataFrame.ix[ ]
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric Python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas DataFrame.ix[ ] is both Label and Integer based slicing
technique. Besides pure label based and integer based, Pandas provides
a hybrid method for selections and subsetting the object using
the ix[] operator. ix[] is the most general indexer and will support any of
the inputs in loc[] and iloc[].
Syntax: DataFrame.ix[ ]
Parameters:
Index Position: Index position of rows in integer or list of integer.
Index label: String or list of string of index label of rows
Returns: Data frame or Series depending on parameters
Code #1:

# importing pandas package

import pandas as geek

# making data frame from csv file

data = geek.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/
nba.csv")

# Integer slicing

print("Slicing only rows(till index 4):")

x1 = data.ix[:4, ]

print(x1, "\n")

print("Slicing rows and columns(rows=4, col 1-4, excluding 4):")

x2 = data.ix[:4, 1:4]
print(x2)

Output :

Code #2:

# importing pandas package

import pandas as geek

# making data frame from csv file

data = geek.read_csv("nba.csv")
# Index slicing on Height column

print("After index slicing:")

x1 = data.ix[10:20, 'Height']

print(x1, "\n")

# Index slicing on Salary column

x2 = data.ix[10:20, 'Salary']

print(x2)
Output:

Code #3:

# importing pandas and numpy

import pandas as pd

import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),

columns = ['A', 'B', 'C', 'D'])

print("Original DataFrame: \n" , df)

# Integer slicing

print("\n Slicing only rows:")

print("--------------------------")

x1 = df.ix[:4, ]

print(x1)

print("\n Slicing rows and columns:")

print("----------------------------")

x2 = df.ix[:4, 1:3]

print(x2)
Output :

Code #4:

# importing pandas and numpy

import pandas as pd

import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),

columns = ['A', 'B', 'C', 'D'])

print("Original DataFrame: \n" , df)

# Integer slicing (printing all the rows of column 'A')

print("\n After index slicing (On 'A'):")

print("--------------------------")

x = df.ix[:, 'A']

print(x)
Output :
Python | Pandas Series.str.slice()
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas str.slice() method is used to slice substrings from a string
present in Pandas series object. It is very similar to Python’s basic
principal of slicing objects that works on [start:stop:step] which means it
requires three parameters, where to start, where to end and how much
elements to skip.
Since this is a pandas string method, .str has to be prefixed every time
before calling this method. Otherwise, it gives an error.
Syntax: Series.str.slice(start=None, stop=None, step=None)
Parameters:
start: int value, tells where to start slicing
stop: int value, tells where to end slicing
step: int value, tells how much characters to step during slicing
Return type: Series with sliced substrings
To download the CSV used in code, click here.
In the following examples, the data frame used contains data of some
NBA players. The image of data frame before any operations is attached
below.

Example #1:
In this example, the salary column has been sliced to get values before
decimal. For example, we want to do some mathematical operations and
for that we need integer data, so the salary column will be sliced till the
2nd last element(-2 position).
Since the salary column is imported as float64 data type, it is first
converted to string using the .astype() method.

# importing pandas module

import pandas as pd

# making data frame

data =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

# removing null values to avoid errors

data.dropna(inplace = True)

# start stop and step variables

start, stop, step = 0, -2, 1

# converting to string data type

data["Salary"]= data["Salary"].astype(str)

# slicing till 2nd last element

data["Salary (int)"]= data["Salary"].str.slice(start, stop, step)

# display

data.head(10)
Output:
As shown in the output image, the string has been sliced and the string
before decimal is stored in new column.

Note:This method doesn’t have any parameters to handle null values and
hence they were already removed using .dropna() method.

Example #2:
In this example, the name column is sliced and step parameter is kept 2.
Hence it will be stepping two characters during slicing.

# importing pandas module

import pandas as pd

# making data frame

data =
pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

# removing null values to avoid errors

data.dropna(inplace = True)
# start stop and step variables

start, stop, step = 0, -2, 2

# slicing till 2nd last element

data["Name"]= data["Name"].str.slice(start, stop, step)

# display

data.head(10)

Output:
As it can be seen in the output image, the Name was sliced and 2
characters were skipped during slicing.
How to take column-slices of DataFrame in
Pandas?
In this article, we will learn how to slice a DataFrame column-wise
in Python. DataFrame is a two-dimensional tabular data structure with
labeled axes. i.e. columns.
Creating Dataframe to slice columns
# importing pandas
import pandas as pd

# Using DataFrame() method from pandas module


df1 = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7],
"b": [2, 3, 4, 2, 3, 4, 5],
"c": [3, 4, 5, 2, 3, 4, 5],
"d": [4, 5, 6, 2, 3, 4, 5],
"e": [5, 6, 7, 2, 3, 4, 5]})

display(df1)
Output:

Method 1: Slice Columns in pandas using reindex


Slicing column from ‘c’ to ‘b’.
df2 = df1.reindex(columns = ['c','b'])
print(df2)
Output:
Method 2: Slice Columns in pandas using loc[]
The df.loc[] is present in the Pandas package loc can be used to slice a
Dataframe using indexing. Pandas DataFrame.loc attribute accesses a
group of rows and columns by label(s) or a boolean array in the given
DataFrame.
Syntax: [ : , first : last : step]
Example 1:
Slicing column from ‘b’ to ‘d’ with step 2.
df2 = df1.loc[:, "b":"d":2]
print(df2)
Output:

Example 2:
Slicing column from ‘c’ to ‘e’ with step 1.
df2 = df1.loc[:, "c":"e":1]
print(df2)
Output:

Method 3: Slice Columns in pandas using iloc[]


The iloc is present in the Pandas package. The iloc can be used to slice a
Dataframe using indexing. df.iloc[] method is used when the index label of
a data frame is something other than numeric series of 0, 1, 2, 3….n or in
case the user doesn’t know the index label. Rows can be extracted using
an imaginary index position that isn’t visible in the data frame.
Syntax: [ start : stop : step]
Example 1:
Slicing column from ‘1’ to ‘3’ with step 1.
df2 = df1.iloc[:, 1:3:1]
print(df2)
Output:

Example 2:
Slicing column from ‘0’ to ‘3’ with step 2.
df2 = df1.iloc[:, 0:3:2]
print(df2)
Output:

Output
Python | Pandas.apply()
Pandas.apply allow the users to pass a function and apply it on every
single value of the Pandas series. It comes as a huge improvement for the
pandas library as this function helps to segregate data according to the
conditions required due to which it is efficiently used in data science and
machine learning.
Installation:
Import the Pandas module into the python file using the following
commands on the terminal:
pip install pandas

To read the csv file and squeezing it into a pandas series following
commands are used:
import pandas as pd
s = pd.read_csv("stock.csv", squeeze=True)

Syntax:
s.apply(func, convert_dtype=True, args=())
Parameters:
func: .apply takes a function and applies it to all values of pandas series.
convert_dtype: Convert dtype as per the function’s operation. args=():
Additional arguments to pass to function instead of series. Return Type:
Pandas Series after applied function/operation.
Example #1:
The following example passes a function and checks the value of each
element in series and returns low, normal or High accordingly.
import pandas as pd

# reading csv
s = pd.read_csv("stock.csv", squeeze = True)

# defining function to check price


def fun(num):

if num<200:
return "Low"

elif num>= 200 and num<400:


return "Normal"

else:
return "High"

# passing function to apply and storing returned series in new


new = s.apply(fun)

# printing first 3 element


print(new.head(3))
# printing elements somewhere near the middle of series
print(new[1400], new[1500], new[1600])

# printing last 3 elements


print(new.tail(3))
Output:

Example #2:
In the following example, a temporary anonymous function is made
in .apply itself using lambda. It adds 5 to each value in series and returns
a new series.
import pandas as pd
s = pd.read_csv("stock.csv", squeeze = True)

# adding 5 to each value


new = s.apply(lambda num : num + 5)

# printing first 5 elements of old and new series


print(s.head(), '\n', new.head())

# printing last 5 elements of old and new series


print('\n\n', s.tail(), '\n', new.tail())
Output:
0 50.12
1 54.10
2 54.65
3 52.38
4 52.95
Name: Stock Price, dtype: float64

0 55.12
1 59.10
2 59.65
3 57.38
4 57.95
Name: Stock Price, dtype: float64

3007 772.88
3008 771.07
3009 773.18
3010 771.61
3011 782.22
Name: Stock Price, dtype: float64
3007 777.88
3008 776.07
3009 778.18
3010 776.61
3011 787.22
Name: Stock Price, dtype: float64

As observed, New values = old values + 5

Python | Pandas.apply() – FAQs


What are () Called in Python?
In Python, parentheses () are called several things based on their
context:
1. Tuple: When used to enclose a series of comma-separated values,
they define a tuple, a type of immutable sequence in Python.
2. Function Call: When used after a function name, they execute a
function call.
3. Order of Operations: Parentheses are used to group expressions and
override the natural precedence of operators to control the flow and
outcome of operations.
What is () Data Type in Python?
When () is used with values separated by commas, it defines a tuple,
which is an immutable and ordered sequence type in Python. An empty
tuple can be defined by (), while a tuple with one item must use a trailing
comma, like (item,).
Example:
# Empty tuple
empty_tuple = ()
# Tuple with one element
single_tuple = (1,)
# Tuple with multiple elements
multi_tuple = (1, 2, 3)
What is the Function of items() in Python?
The items() method is used with dictionaries in Python. It returns a view
object that displays a list of a dictionary’s key-value tuple pairs. This
method is particularly useful when you need to iterate over the keys and
values of a dictionary simultaneously.
Example:
my_dict = {'a': 1, 'b': 2, 'c': 3}
for key, value in my_dict.items():
print(f"{key}: {value}")
# Output:
# a: 1
# b: 2
# c: 3
What are () Called in Programming?
In the broader context of programming, not just Python:
1. Parentheses: Generally called parentheses, they are used in many
programming languages to initiate function calls, define order in
operations, specify tuples, and in expressions to group operators and
operands.
2. Arguments: In the context of functions, anything inside parentheses is
typically an argument passed to the function.
What is the Function of items() in Python? (Repeated Question)
The items() method, as previously described, allows access to items in a
dictionary, providing a way to iterate over key-value pairs, which can be
critical for looping through dictionaries and accessing both keys and
values efficiently. Here’s another practical usage in a different context:
Example:
# Counting the frequency of items
counts = {'apple': 1, 'banana': 2}
for fruit, count in counts.items():
print(f"There are {count} {fruit}(s).")
# Output:
# There are 1 apple(s).
# There are 2 banana(s).
This method is especially useful in data handling where dictionaries play a
central role in storing and managing data.
Apply function to every row in a Pandas
DataFrame
Python is a great language for performing data analysis tasks. It provides
a huge amount of Classes and functions which help in analyzing and
manipulating data more easily. In this article, we will see how we can
apply a function to every row in a Pandas Dataframe.
Apply Function to Every Row in a Pandas DataFrame
There are various ways to Perform element-wise operations on
DataFrame columns. here we are discussing some examples for Perform
element-wise operations on DataFrame columns those are following.
 Applying User-Defined Function to Every Row of Pandas DataFrame
 Apply Lambda to Every Row of DataFrame
 Apply NumPy.sum() to Every Row
 Normalizing DataFrame Column Values Using Custom Function in
Pandas
 Applying Range Generation Function to DataFrame Rows in Pandas
One can use apply() function to apply a function to every row in a given
data frame. Let’s see the ways we can do this task.
Applying User-Defined Function to Every Row of Pandas
DataFrame
In this example, we defines a function add_values(row) that calculates the
sum of values in the ‘A’, ‘B’, and ‘C’ columns for each row. In
the main() function, a DataFrame is created from a dictionary, and the
function is applied to every row using the apply() method, resulting in a
new column ‘add’ containing the sum values. The original and modified
DataFrames are then printed.
 Python3

import pandas as pd

# Function to add

def add_values(row):

return row['A'] + row['B'] + row['C']

def main():
# Create a dictionary with three fields each

data = {

'A': [1, 2, 3],

'B': [4, 5, 6],

'C': [7, 8, 9]}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Apply the user-defined function to every row

df['add'] = df.apply(add_values, axis=1)

print('\nAfter Applying Function: ')

# Print the new DataFrame

print(df)

if __name__ == '__main__':

main()

Output
Original DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
After Applying Function:
A B C add
0 1 4 7 12
1 2 5 8 15
2 3 6 9 18

Apply Lambda to Every Row of DataFrame


In this example, we defines a function add(a, b, c) that returns the sum
of its three arguments. In the main() function, a DataFrame is created
from a dictionary, and a new column ‘add’ is added to the DataFrame
using the apply() method with a lambda function. The lambda function
applies the add function element-wise to the ‘A’, ‘B’, and ‘C’ columns for
every row, and the resulting DataFrame is printed before and after the
function is applied. The output demonstrates applying a user-defined
function to every row of the DataFrame.
 Python3

# Import pandas package

import pandas as pd

# Function to add

def add(a, b, c):

return a + b + c

def main():

# create a dictionary with

# three fields each

data = {

'A': [1, 2, 3],


'B': [4, 5, 6],

'C': [7, 8, 9]}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

df['add'] = df.apply(lambda row: add(row['A'],

row['B'], row['C']), axis=1)

print('\nAfter Applying Function: ')

# printing the new dataframe

print(df)

if __name__ == '__main__':

main()

Output
Original DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
After Applying Function:
A B C add
0 1 4 7 12
1 2 5 8 15
2 3 6 9 18
Apply NumPy.sum() to Every Row
You can use the numpy function as the parameters to the dataframe as
well. In this example, we create a DataFrame from a dictionary, and then
applies the NumPy sum function to each row using the apply() method
with axis=1, resulting in a new column ‘add’ containing the sum of values
in each row. The original and modified DataFrames are then printed to
demonstrate the application of the function.
 Python3

import pandas as pd

import numpy as np

def main():

# create a dictionary with

# five fields each

data = {

'A': [1, 2, 3],

'B': [4, 5, 6],

'C': [7, 8, 9]}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# applying function to each row in the dataframe


# and storing result in a new column

df['add'] = df.apply(np.sum, axis=1)

print('\nAfter Applying Function: ')

# printing the new dataframe

print(df)

if __name__ == '__main__':

main()

Output
Original DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
After Applying Function:
A B C add
0 1 4 7 12
1 2 5 8 15
2 3 6 9 18

Normalizing DataFrame Column Values Using Custom Function in


Pandas
Here, we defines a normalize function that takes two arguments and
calculates a normalized value based on their mean and range. In
the main() function, a DataFrame is created from a dictionary, and
the normalize function is applied to each row using the apply() method
with a lambda function. The resulting DataFrame contains the normalized
values in column ‘X’, and both the original and modified DataFrames are
printed.
 Python3
# Import pandas package

import pandas as pd

def normalize(x, y):

x_new = ((x - np.mean([x, y])) /

(max(x, y) - min(x, y)))

# print(x_new)

return x_new

def main():

# create a dictionary with three fields each

data = {

'X': [1, 2, 3],

'Y': [45, 65, 89]}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

df['X'] = df.apply(lambda row: normalize(row['X'],


row['Y']), axis=1)

print('\nNormalized:')

print(df)

if __name__ == '__main__':

main()

Output
Original DataFrame:
X Y
0 1 45
1 2 65
2 3 89
Normalized:
X Y
0 -0.5 45
1 -0.5 65
2 -0.5 89

Applying Range Generation Function to DataFrame Rows in


Pandas
In this example, we are creating a generate_range function to create a
range based on the given integer input, and a replace function that
applies the generate_range function element-wise to each row of a
DataFrame. In the main() function, a DataFrame is created from a
dictionary, and the replace function is applied to each row using
the apply() method with a lambda function, resulting in a new DataFrame
with values replaced by corresponding ranges. The original and modified
DataFrames are then printed.
 Python3

import pandas as pd
import numpy as np

pd.options.mode.chained_assignment = None

# Function to generate range

def generate_range(n):

# printing the range for eg:

# input is 67 output is 60-70

n = int(n)

lower_limit = n//10 * 10

upper_limit = lower_limit + 10

return str(str(lower_limit) + '-' + str(upper_limit))

def replace(row):

for i, item in enumerate(row):


# updating the value of the row

row[i] = generate_range(item)

return row

def main():

# create a dictionary with

# three fields each

data = {

'A': [0, 2, 3],

'B': [4, 15, 6],

'C': [47, 8, 19]}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

print('Before applying function: ')

print(df)

# applying function to each row in

# dataframe and storing result in a new column


df = df.apply(lambda row: replace(row))

print('After Applying Function: ')

# printing the new dataframe

print(df)

if __name__ == '__main__':

main()

Output
Before applying function:
A B C
0 0 4 47
1 2 15 8
2 3 6 19
After Applying Function:
A B C
0 0-10 0-10 40-50
1 0-10 10-20 0-10
2 0-10 0-10 10-20
Python | Pandas Series.apply()
Pandas series is a One-dimensional ndarray with axis labels. The labels
need not be unique but must be a hashable type. The object supports
both integer- and label-based indexing and provides a host of methods for
performing operations involving the index.
Pandas Series.apply() function invoke the passed function on each
element of the given series object.
Syntax: Series.apply(func, convert_dtype=True, args=(), **kwds)
Parameter :
func : Python function or NumPy ufunc to apply.
convert_dtype : Try to find better dtype for elementwise function results.
args : Positional arguments passed to func after the series value.
**kwds : Additional keyword arguments passed to func.
Returns : Series
Example #1: Use Series.apply() function to change the city name to
‘Montreal’ if the city is ‘Rio’.

# importing pandas as pd

import pandas as pd

# Creating the Series

sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio'])

# Create the Index

index_ = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5']

# set the index

sr.index = index_

# Print the series


print(sr)

Output :
City 1 New York
City 2 Chicago
City 3 Toronto
City 4 Lisbon
City 5 Rio
dtype: object
Now we will use Series.apply() function to change the city name to
‘Montreal’ if the city is ‘Rio’.

# change 'Rio' to 'Montreal'

# we have used a lambda function

result = sr.apply(lambda x : 'Montreal' if x =='Rio' else x )

# Print the result

print(result)

Output :
City 1 New York
City 2 Chicago
City 3 Toronto
City 4 Lisbon
City 5 Montreal
dtype: object
As we can see in the output, the Series.apply() function has successfully
changed the name of the city to ‘Montreal’.

Example #2 : Use Series.apply() function to return True if the value in the


given series object is greater than 30 else return False.
# importing pandas as pd

import pandas as pd

# Creating the Series

sr = pd.Series([11, 21, 8, 18, 65, 18, 32, 10, 5, 32, None])

# Create the Index

# apply yearly frequency

index_ = pd.date_range('2010-10-09 08:45', periods = 11, freq ='Y')

# set the index

sr.index = index_

# Print the series

print(sr)

Output :
2010-12-31 08:45:00 11.0
2011-12-31 08:45:00 21.0
2012-12-31 08:45:00 8.0
2013-12-31 08:45:00 18.0
2014-12-31 08:45:00 65.0
2015-12-31 08:45:00 18.0
2016-12-31 08:45:00 32.0
2017-12-31 08:45:00 10.0
2018-12-31 08:45:00 5.0
2019-12-31 08:45:00 32.0
2020-12-31 08:45:00 NaN
Freq: A-DEC, dtype: float64
Now we will use Series.apply() function to return True if a value in the
given series object is greater than 30 else return False.

# return True if greater than 30

# else return False

result = sr.apply(lambda x : True if x>30 else False)

# Print the result

print(result)

Output :
2010-12-31 08:45:00 False
2011-12-31 08:45:00 False
2012-12-31 08:45:00 False
2013-12-31 08:45:00 False
2014-12-31 08:45:00 True
2015-12-31 08:45:00 False
2016-12-31 08:45:00 True
2017-12-31 08:45:00 False
2018-12-31 08:45:00 False
2019-12-31 08:45:00 True
2020-12-31 08:45:00 False
Freq: A-DEC, dtype: bool
As we can see in the output, the Series.apply() function has successfully
returned the numpy array representation of the given series object.
Python | Pandas dataframe.aggregate()
Python is a great language for doing data analysis, primarily
because of the fantastic ecosystem of data-centric Python
packages. Pandas is one of those packages and makes importing
and analyzing data much easier.
Dataframe.aggregate() function is used to apply some
aggregation across one or more columns. Aggregate using
callable, string, dict, or list of string/callables. The most frequently
used aggregations are:
 sum: Return the sum of the values for the requested axis
 min: Return the minimum of the values for the requested axis
 max: Return the maximum of the values for the requested axis
Pandas dataframe.aggregate() Syntax in Python
Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)
Parameters:
 func : callable, string, dictionary, or list of string/callables.
Function to use for aggregating the data. If a function, must
either work when passed a DataFrame or when passed to
DataFrame.apply. For a DataFrame, can pass a dict, if the keys
are DataFrame column names.
 axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’:
apply function to each column. 1 or ‘columns’: apply function
to each row.
Returns: Aggregated DataFrame
Python dataframe.aggregate() Example
Below, we are discussing how to add values of Excel
in Python using Pandas , we will see step-by-step how to add
values of Excel in Python using Pandas are follows:
For link to CSV file Used in Code, click
Step 1: Importing Pandas and Reading CSV File
Aggregate ‘sum’ and ‘min’ function across all the columns in data
frame.
[GFGTABS]
Python3
# importing pandas package
import pandas as pd

# making data frame from csv file


df = pd.read_csv(&quot;nba.csv&quot;)

# printing the first 10 rows of the dataframe


df[:10]
[/GFGTABS]
Output :
Step 2: Aggregating Data Across All Columns
Aggregation works with only numeric type columns.
[GFGTABS]
Python3
# Applying aggregation across all the columns
df.aggregate(['sum', 'min'])
[/GFGTABS]
Output:
For each column which are having numeric values, minimum and sum of all values has been found. For Pandas Dataframe df ,
we have four such columns Number, Age, Weight, Salary.

Step 3: Aggregating Specific Columns


In Pandas, we can also apply different aggregation functions across different columns. For that, we need to pass a dictionary
with key containing the column names and values containing the list of aggregation functions for any specific column.
[GFGTABS]
Python3
# importing pandas package
import pandas as pd

# making data frame from csv file


df = pd.read_csv("nba.csv")

# We are going to find aggregation for these columns


df.aggregate({"Number":['sum', 'min'],
"Age":['max', 'min'],
"Weight":['min', 'sum'],
"Salary":['sum']})
[/GFGTABS]
Output:
Separate aggregation has been applied to each column, if any specific aggregation is not applied on a column then it has NaN
value corresponding to it.
Pandas DataFrame mean() Method
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric Python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas DataFrame mean()
Pandas dataframe.mean() function returns the mean of the values for the
requested axis. If the method is applied on a pandas series object, then
the method returns a scalar value which is the mean value of all the
observations in the Pandas Dataframe. If the method is applied on a
Pandas Dataframe object, then the method returns a Pandas series object
which contains the mean of the values over the specified axis.
Syntax: DataFrame.mean(axis=0, skipna=True, level=None,
numeric_only=False, **kwargs)
Parameters :
 axis : {index (0), columns (1)}
 skipna : Exclude NA/null values when computing the result
 level : If the axis is a MultiIndex (hierarchical), count along a particular
level, collapsing into a Series
 numeric_only : Include only float, int, boolean columns. If None, will
attempt to use everything, then use only numeric data. Not
implemented for Series.
Returns : mean : Series or DataFrame (if level specified)
Pandas DataFrame.mean() Examples
Example 1:
Use mean() function to find the mean of all the observations over the
index axis.
# importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.DataFrame({"A":[12, 4, 5, 44, 1],
"B":[5, 2, 54, 3, 2],
"C":[20, 16, 7, 3, 8],
"D":[14, 3, 17, 2, 6]})

# Print the dataframe


df
Let’s use the Dataframe.mean() function to find the mean over the index
axis.
# Even if we do not specify axis = 0,
# the method will return the mean over
# the index axis by default
df.mean(axis = 0)
Output:

Example 2:
Use mean() function on a Dataframe that has None values. Also, find the
mean over the column axis.
# importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.DataFrame({"A":[12, 4, 5, None, 1],
"B":[7, 2, 54, 3, None],
"C":[20, 16, 11, 3, 8],
"D":[14, 3, None, 2, 6]})

# skip the Na values while finding the mean


df.mean(axis = 1, skipna = True)
Output:
Python | Pandas Series.mean()
Pandas series is a One-dimensional ndarray with axis labels. The labels
need not be unique but must be a hashable type. The object supports
both integer- and label-based indexing and provides a host of methods for
performing operations involving the index.
Pandas Series.mean() function return the mean of the underlying data in
the given Series object.
Syntax: Series.mean(axis=None, skipna=None, level=None,
numeric_only=None, **kwargs)
Parameter :
axis : Axis for the function to be applied on.
skipna : Exclude NA/null values when computing the result.
level : If the axis is a MultiIndex (hierarchical), count along a particular
level, collapsing into a scalar.
numeric_only : Include only float, int, boolean columns.
**kwargs : Additional keyword arguments to be passed to the function.
Returns : mean : scalar or Series (if level specified)
Example #1: Use Series.mean() function to find the mean of the underlying
data in the given series object.

# importing pandas as pd

import pandas as pd

# Creating the Series

sr = pd.Series([10, 25, 3, 25, 24, 6])

# Create the Index

index_ = ['Coca Cola', 'Sprite', 'Coke', 'Fanta', 'Dew', 'ThumbsUp']

# set the index

sr.index = index_
# Print the series

print(sr)

Output :

Now we will use Series.mean() function to find the mean of the given series
object.

# return the mean

result = sr.mean()

# Print the result

print(result)

Output :

As we can see in the output, the Series.mean() function has successfully


returned the mean of the given series object.

Example #2: Use Series.mean() function to find the mean of the underlying
data in the given series object. The given series object also contains
some missing values.

# importing pandas as pd

import pandas as pd
# Creating the Series

sr = pd.Series([19.5, 16.8, None, 22.78, 16.8, 20.124, None, 18.1002,


19.5])

# Print the series

print(sr)

Output :

Now we will use Series.mean() function to find the mean of the given series
object. we are going to skip all the missing values while calculating the
mean.

# return the mean

# skip all the missing values

result = sr.mean(skipna = True)

# Print the result

print(result)

Output :
Python | Pandas dataframe.mad()
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric python packages.
Pandas is one of those packages and makes importing and analyzing
data much easier.
Pandas dataframe.mad() function return the mean absolute deviation of
the values for the requested axis. The mean absolute deviation of a
dataset is the average distance between each data point and the mean. It
gives us an idea about the variability in a dataset.
Syntax: DataFrame.mad(axis=None, skipna=None, level=None)
Parameters : axis : {index (0), columns (1)} skipna : Exclude NA/null
values when computing the result level : If the axis is a MultiIndex
(hierarchical), count along a particular level, collapsing into a Series
numeric_only : Include only float, int, boolean columns. If None, will
attempt to use everything, then use only numeric data. Not implemented
for Series. Returns : mad : Series or DataFrame (if level specified)
Note: This method was removed in pandas version 1.0.0 and later.
Example #1: Use mad() function to find the mean absolute deviation of
the values over the index axis.
# importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.DataFrame({"A":[12, 4, 5, 44, 1],
"B":[5, 2, 54, 3, 2],
"C":[20, 16, 7, 3, 8],
"D":[14, 3, 17, 2, 6]})

# Print the dataframe


df

Let’s use the dataframe.mad() function to find the mean absolute


deviation.
# find the mean absolute deviation
# over the index axis
df.mad(axis = 0)
Output :
Example #2: Use mad() function to find the mean absolute deviation of
values over the column axis which is having some Na values in it.
# importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.DataFrame({"A":[12, 4, 5, None, 1],
"B":[7, 2, 54, 3, None],
"C":[20, 16, 11, 3, 8],
"D":[14, 3, None, 2, 6]})

# To find the mean absolute deviation


# skip the Na values when finding the mad value
df.mad(axis = 1, skipna = True)
Output :
Python | Pandas Series.mad() to calculate
Mean Absolute Deviation of a Series
Pandas provide a method to make Calculation of MAD (Mean Absolute
Deviation) very easy. MAD is defined as average distance between each
value and mean.
The formula used to calculate MAD is:

Syntax: Series.mad(axis=None, skipna=None, level=None)


Parameters:
axis: 0 or ‘index’ for row wise operation and 1 or ‘columns’ for column
wise operation.
skipna: Includes NaN values too if False, Result will also be NaN even if
a single Null value is included.
level: Defines level name or number in case of multilevel series.
Return Type: Float value
Example #1:
In this example, a Series is created from a Python List using
Pandas .Series() method. The .mad() method is called on series with all
default parameters.

# importing pandas module

import pandas as pd

# importing numpy module

import numpy as np

# creating list

list =[5, 12, 1, 0, 4, 22, 15, 3, 9]


# creating series

series = pd.Series(list)

# calling .mad() method

result = series.mad()

# display

result

Output:
5.876543209876543
Explanation:
Calculating Mean of series mean = (5+12+1+0+4+22+15+3+9) / 9 =
7.8888
MAD = | (5-7.88)+(12-7.88)+(1-7.88)+(0-7.88)+(4-7.88)+(22-7.88)+(15-
7.88)+(3-7.88)+(9-7.88)) | / 9.00
MAD = (2.88 + 4.12 + 6.88 + 7.88 + 3.88 + 14.12 + 7.12 + 4.88 + 1.12) /
9.00
MAD = 5.8755 (More accurately = 5.876543209876543)
Python | Pandas dataframe.sem()
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas dataframe.sem() function return unbiased standard error of the
mean over requested axis. The standard error (SE) of a statistic (usually
an estimate of a parameter) is the standard deviation of its sampling
distribution[1] or an estimate of that standard deviation. If the parameter or
the statistic is the mean, it is called the standard error of the mean (SEM).
Syntax : DataFrame.sem(axis=None, skipna=None, level=None, ddof=1,
numeric_only=None, **kwargs)
Parameters :
axis : {index (0), columns (1)}
skipna : Exclude NA/null values. If an entire row/column is NA, the result
will be NA
level : If the axis is a MultiIndex (hierarchical), count along a particular
level, collapsing into a Series
ddof : Delta Degrees of Freedom. The divisor used in calculations is N –
ddof, where N represents the number of elements.
numeric_only : Include only float, int, boolean columns. If None, will
attempt to use everything, then use only numeric data. Not implemented
for Series
Return : sem : Series or DataFrame (if level specified)
For link to the CSV file used in the code, click here
Example #1: Use sem() function to find the standard error of the mean of
the given dataframe over the index axis.

# importing pandas as pd

import pandas as pd

# Creating the dataframe

df = pd.read_csv("nba.csv")

# Print the dataframe

df
Let’s use the dataframe.sem() function to find the standard error of the
mean over the index axis.

# find standard error of the mean of all the columns

df.sem(axis = 0)

Output :

Notice, all the non-numeric columns and values are automatically not
included in the calculation of the dataframe. We did not have to
specifically input the numeric columns for the calculation of the standard
error of the mean.

Example #2: Use sem() function to find the standard error of the mean
over the column axis. Also do not skip the NaN values in the calculation of
the dataframe.
# importing pandas as pd

import pandas as pd

# Creating the dataframe

df = pd.read_csv("nba.csv")

# Calculate the standard error of

# the mean of all the rows in dataframe

df.sem(axis = 1, skipna = False)

Output :

When we include the NaN values then it will cause that particular row or
column to be NaN
Python | Pandas Series.value_counts()
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas series is a One-dimensional ndarray with axis labels. The labels
need not be unique but must be a hashable type. The object supports
both integer- and label-based indexing and provides a host of methods for
performing operations involving the index.
Pandas Series.value_counts() function return a Series containing counts
of unique values. The resulting object will be in descending order so that
the first element is the most frequently-occurring element. Excludes NA
values by default.
Syntax: Series.value_counts(normalize=False, sort=True,
ascending=False, bins=None, dropna=True)
Parameter :
normalize : If True then the object returned will contain the relative
frequencies of the unique values.
sort : Sort by values.
ascending : Sort in ascending order.
bins : Rather than count values, group them into half-open bins, a
convenience for pd.cut, only works with numeric data.
dropna : Don’t include counts of NaN.
Returns : counts : Series
Example #1: Use Series.value_counts() function to find the unique value
counts of each element in the given Series object.

# importing pandas as pd

import pandas as pd

# Creating the Series

sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio',


'Chicago', 'Lisbon'])

# Print the series

print(sr)
Output :

Now we will use Series.value_counts() function to find the values counts of


each unique value in the given Series object.

# find the value counts

sr.value_counts()

Output :

As we can see in the output, the Series.value_counts() function has


returned the value counts of each unique value in the given Series object.
Example #2: Use Series.value_counts() function to find the unique value
counts of each element in the given Series object.

# importing pandas as pd

import pandas as pd

# Creating the Series

sr = pd.Series([100, 214, 325, 88, None, 325, None, 325, 100])

# Print the series

print(sr)
Output :

Now we will use Series.value_counts() function to find the values counts of


each unique value in the given Series object.

# find the value counts

sr.value_counts()

Output :

As we can see in the output, the Series.value_counts() function has


returned the value counts of each unique value in the given Series object.
Python | Pandas Index.value_counts()
Python is a great language for doing data analysis, primarily because of
the fantastic ecosystem of data-centric python packages. Pandas is one
of those packages and makes importing and analyzing data much easier.
Pandas Index.value_counts() function returns object containing counts
of unique values. The resulting object will be in descending order so that
the first element is the most frequently-occurring element. Excludes NA
values by default.

Syntax: Index.value_counts(normalize=False, sort=True,


ascending=False, bins=None, dropna=True)
Parameters :
normalize : If True then the object returned will contain the relative
frequencies of the unique values.
sort : Sort by values
ascending : Sort in ascending order
bins : Rather than count values, group them into half-open bins, a
convenience for pd.cut, only works with numeric data
dropna : Don’t include counts of NaN.
Returns : counts : Series

Example #1: Use Index.value_counts() function to count the number of


unique values in the given Index.

 Python3

# importing pandas as pd

import pandas as pd

# Creating the index

idx = pd.Index(['Harry', 'Mike', 'Arther', 'Nick',

'Harry', 'Arther'], name ='Student')

# Print the Index


print(idx)

Output :
Index(['Harry', 'Mike', 'Arther', 'Nick', 'Harry', 'Arther'],
dtype='object', name='Student')
Let’s find the count of all unique values in the index.

 Python3

# find the count of unique values in the index

idx.value_counts()

Output :
Harry 2
Arther 2
Nick 1
Mike 1
Name: Student, dtype: int64
The function has returned the count of all unique values in the given
index. Notice the object returned by the function contains the occurrence
of the values in descending order.

Example #2: Use Index.value_counts() function to find the count of all


unique values in the given index.
 Python3

# importing pandas as pd

import pandas as pd

# Creating the index

idx = pd.Index([21, 10, 30, 40, 50, 10, 50])


# Print the Index

print(idx)

Output :
Int64Index([21, 10, 30, 40, 50, 10, 50], dtype='int64')
Let’s count the occurrence of all the unique values in the Index.

 Python3

# for finding the count of all

# unique values in the index.

idx.value_counts()

Output :
10 2
50 2
30 1
21 1
40 1
dtype: int64
The function has returned the count of all unique values in the index.

You might also like