0% found this document useful (0 votes)

130 views

Day64 - Pandas Interview Questions

Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: Series for one-dimensional labeled data and DataFrame for two-dimensional labeled data similar to a spreadsheet. DataFrame allows for flexible and powerful data analysis through its tabular structure with labeled rows and columns, ability to handle heterogeneous data types, and built-in functions for cleaning, transforming, exploring and visualizing data. Loc and iloc are two methods for indexing and selecting data from a DataFrame based on labels and integer positions respectively. Loc uses actual labels while iloc uses integer indices.

Uploaded by

tikar69314

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views

Day64 - Pandas Interview Questions

Uploaded by

tikar69314

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

11/24/23, 11:51 AM Day64 - Pandas Interview Questions

Pandas Interview Questions

1. What is Pandas, and why is it popular in data
analyst?
Pandas is a popular open-source data manipulation and analysis library for the Python
programming language. It provides data structures for efficiently storing and manipulating
large datasets and tools for reading and writing data in various formats. The two primary
data structures in Pandas are:

1. Series: A one-dimensional labeled array capable of holding any data type.

2. DataFrame: A two-dimensional labeled data structure with columns that can be of
different types.

Pandas is widely used in the field of data analysis for several reasons:

1. Ease of Use: Pandas provides a simple and intuitive syntax for data manipulation. Its
data structures are designed to be easy to use and interact with.

2. Data Cleaning and Transformation: Pandas makes it easy to clean and transform data.
It provides functions for handling missing data, reshaping data, merging and joining
datasets, and performing various data transformations.

3. Data Exploration: Pandas allows data analysts to explore and understand their datasets
quickly. Descriptive statistics, data summarization, and various methods for slicing and
dicing data are readily available.

4. Data Input/Output: Pandas supports reading and writing data in various formats,
including CSV, Excel, SQL databases, and more. This makes it easy to work with data
from different sources.

5. Integration with Other Libraries: Pandas integrates well with other popular data
science and machine learning libraries in Python, such as NumPy, Matplotlib, and Scikit-
learn. This allows for a seamless workflow when performing more complex analyses.

6. Time Series Analysis: Pandas provides excellent support for time series data, including
tools for date range generation, frequency conversion, and resampling.

7. Community and Documentation: Pandas has a large and active community, which
means there is extensive documentation and a wealth of online resources, tutorials, and
forums available for users to seek help and guidance.

8. Open Source: Being an open-source project, Pandas allows users to contribute to its
development and improvement. This collaborative nature has helped Pandas evolve and
stay relevant in the rapidly changing landscape of data analysis and data science.

In summary, Pandas is popular in data analysis because it simplifies the process of working
with structured data, provides powerful tools for data manipulation, and has become a

file:///C:/Users/disha/Downloads/Day64 - Pandas Interview Questions.html 1/5

11/24/23, 11:51 AM Day64 - Pandas Interview Questions

standard tool in the Python ecosystem for data analysis tasks.

2. What is DataFrame in Pandas?

In Pandas, a DataFrame is a two-dimensional, tabular data structure with labeled axes (rows
and columns). It is similar to a spreadsheet or SQL table, where data can be stored in rows
and columns. The key features of a DataFrame include:

1. Tabular Structure: A DataFrame is a two-dimensional table with rows and columns.

Each column can have a different data type, such as integer, float, string, or even
custom types.

2. Labeled Axes: Both rows and columns of a DataFrame are labeled. This means that
each row and each column has a unique label or index associated with it, allowing for
easy access and manipulation of data.

3. Flexible Size: DataFrames can grow and shrink in size. You can add or remove rows and
columns as needed.

4. Heterogeneous Data Types: Different columns in a DataFrame can have different data
types. For example, one column might contain integers, while another column contains
strings.

5. Data Alignment: When performing operations on DataFrames, Pandas automatically

aligns the data based on labels, making it easy to work with data even if it is not
perfectly clean or aligned.

6. Missing Data Handling: DataFrames can handle missing data gracefully. Pandas
provides methods for detecting, removing, or filling missing values.

7. Powerful Operations: DataFrames support a wide range of operations, including

arithmetic operations, aggregation, filtering, merging, and reshaping. This makes it a
powerful tool for data analysis and manipulation.

In [ ]: import pandas as pd

# Creating a DataFrame from a dictionary

data = {'Name': ['John', 'Jane', 'Bob'],
'Age': [28, 24, 22],
'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

# Displaying the DataFrame

print(df)

Name Age City

0 John 28 New York
1 Jane 24 San Francisco
2 Bob 22 Los Angeles

In this example, each column represents a different attribute (Name, Age, City), and each
row represents a different individual. The DataFrame provides a convenient way to work with
this tabular data in a structured and labeled format.
file:///C:/Users/disha/Downloads/Day64 - Pandas Interview Questions.html 2/5
11/24/23, 11:51 AM Day64 - Pandas Interview Questions

3. What is diffrence between loc and iloc in

pandas?
In Pandas, loc and iloc are two different methods used for indexing and selecting data
from a DataFrame. They are primarily used for label-based and integer-location-based
indexing, respectively. Here's the key difference between loc and iloc :

1. loc (Label-based Indexing):

The loc method is used for selection by label.
It allows you to access a group of rows and columns by labels or a boolean array.
The syntax is df.loc[row_label, column_label] or df.loc[row_label] for
selecting entire rows.
The labels used with loc are the actual labels of the index or column names, not
the integer position.
Inclusive slicing is supported with loc , meaning both the start and stop index are
included in the selection.

In [ ]: import pandas as pd

# Assuming 'df' is our DataFrame

selected_data = df.loc[2:4, 'Name':'City']
selected_data

Out[ ]: Name Age City

2 Bob 22 Los Angeles

1. iloc (Integer-location based Indexing):

The iloc method is used for selection by position.
It allows you to access a group of rows and columns by integer positions.
The syntax is df.iloc[row_index, column_index] or df.iloc[row_index]
for selecting entire rows.
The indices used with iloc are integer-based, meaning you specify the position
of the rows and columns based on their numerical order (0-based indexing).
Exclusive slicing is used with iloc , meaning the stop index is not included in the
selection.

In [ ]: import pandas as pd

# Assuming 'df' is our DataFrame

selected_data = df.iloc[2:5, 0:3]
selected_data

Out[ ]: Name Age City

2 Bob 22 Los Angeles

In summary, if you want to select data based on the labels of rows and columns, you use
loc . If you prefer to select data based on the integer positions of rows and columns, you

file:///C:/Users/disha/Downloads/Day64 - Pandas Interview Questions.html 3/5

11/24/23, 11:51 AM Day64 - Pandas Interview Questions

use iloc . The choice between them depends on whether you are working with labeled or
integer-based indexing.

4. How do you filter rows in a dataframe based on

condition?
To filter rows in a DataFrame based on a condition, you can use boolean indexing. Boolean
indexing involves creating a boolean Series that represents the condition you want to apply
and then using that boolean Series to filter the rows of the DataFrame. Here's a step-by-step
guide:

Assuming you have a DataFrame named df, and you want to filter rows based on a
condition, let's say a condition on the 'Age' column:

In [ ]: import pandas as pd

# Assuming 'df' is your DataFrame

data = {'Name': ['John', 'Jane', 'Bob', 'Alice'],
'Age': [28, 24, 22, 30],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)

# Condition for filtering (e.g., selecting rows where Age is greater than 25)
condition = df['Age'] > 25

# Applying the condition to filter rows

filtered_df = df[condition]

# Displaying the filtered DataFrame

print(filtered_df)

Name Age City

0 John 28 New York
3 Alice 30 Chicago

5. How do you handle missing values in data with

the help of pandas?
Handling missing values is a crucial step in the data cleaning process. Pandas provides
several methods for working with missing data in a DataFrame. Here are some common
techniques:

1. Detecting Missing Values:

The isnull() method can be used to detect missing values in a DataFrame. It
returns a DataFrame of the same shape, where each element is a boolean
indicating whether the corresponding element in the original DataFrame is missing.
The notnull() method is the opposite of isnull() and returns True for
non-missing values.

In [ ]: import pandas as pd

# Assuming 'df' is your DataFrame

missing_values = df.isnull()

file:///C:/Users/disha/Downloads/Day64 - Pandas Interview Questions.html 4/5

11/24/23, 11:51 AM Day64 - Pandas Interview Questions

1. Dropping Missing Values:

The dropna() method can be used to remove rows or columns containing
missing values.
The thresh parameter can be used to specify a threshold for the number of non-
null values required to keep a row or column.

In [ ]: # Drop rows with any missing values

df_no_missing_rows = df.dropna()

# Drop columns with any missing values

df_no_missing_cols = df.dropna(axis=1)

# Drop rows with at least 3 non-null values

df_thresh = df.dropna(thresh=3)

1. Filling Missing Values:

The fillna() method can be used to fill missing values with a specified constant
or using various filling methods like forward fill or backward fill.
Commonly, mean or median values are used to fill missing values in numerical
columns.

In [ ]: # Fill missing values with a constant

df_fill_constant = df.fillna(0)

# Fill missing values with the mean of the column

df_fill_mean = df.fillna(df.mean())

# Forward fill missing values (use the previous value)

df_ffill = df.fillna(method='ffill')

# Backward fill missing values (use the next value)

df_bfill = df.fillna(method='bfill')

file:///C:/Users/disha/Downloads/Day64 - Pandas Interview Questions.html 5/5

Pyspark Questions & Scenario Based
No ratings yet
Pyspark Questions & Scenario Based
25 pages
Power BI Interview Questions Sample
No ratings yet
Power BI Interview Questions Sample
2 pages
SQL For Testing Professional
No ratings yet
SQL For Testing Professional
88 pages
MySQL Cheatsheet - CodeWithHarry
100% (1)
MySQL Cheatsheet - CodeWithHarry
13 pages
SQL Interview Questions and Answers G
No ratings yet
SQL Interview Questions and Answers G
67 pages
Python Pandas Interview Questions and Answers
No ratings yet
Python Pandas Interview Questions and Answers
20 pages
SQL Query Interview Questions and Answers: (Salary) Employee Salary NOT ( (Salary) Employee)
100% (1)
SQL Query Interview Questions and Answers: (Salary) Employee Salary NOT ( (Salary) Employee)
5 pages
Python Technical Interviews Questions
100% (1)
Python Technical Interviews Questions
15 pages
Python For Non-Programmers - 1-1
No ratings yet
Python For Non-Programmers - 1-1
19 pages
Python Interview Questions and Answers For Freshers and Advanced Level Experienced
No ratings yet
Python Interview Questions and Answers For Freshers and Advanced Level Experienced
18 pages
Teradata Interview Questions and Answers
No ratings yet
Teradata Interview Questions and Answers
21 pages
Oracle PLSQL Notes
100% (4)
Oracle PLSQL Notes
59 pages
Python Flask Questions
No ratings yet
Python Flask Questions
3 pages
Advanced UNIX Commands
No ratings yet
Advanced UNIX Commands
3 pages
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
No ratings yet
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
8 pages
SQL Interview
100% (1)
SQL Interview
8 pages
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Python-Training Test
No ratings yet
Python-Training Test
13 pages
Python Question Bank Complete 100 Question
No ratings yet
Python Question Bank Complete 100 Question
23 pages
100 SQL Formulas Each Student Should Know
No ratings yet
100 SQL Formulas Each Student Should Know
10 pages
Basic UNIX Commands
No ratings yet
Basic UNIX Commands
3 pages
ETL Testing Int - 1
No ratings yet
ETL Testing Int - 1
16 pages
Informatica UNIX Commands
No ratings yet
Informatica UNIX Commands
14 pages
Etl Testing New Faqs23
No ratings yet
Etl Testing New Faqs23
3 pages
SQL For Everyone (Definitive Guide)
No ratings yet
SQL For Everyone (Definitive Guide)
10 pages
SQL Notes!
No ratings yet
SQL Notes!
92 pages
1c. Advanced SQL (Selected)
No ratings yet
1c. Advanced SQL (Selected)
10 pages
1 Demo Notes
100% (1)
1 Demo Notes
2 pages
Basic SQL: ITCS 201 Web Programming Part II
No ratings yet
Basic SQL: ITCS 201 Web Programming Part II
29 pages
Sssis Interview Questins
No ratings yet
Sssis Interview Questins
7 pages
3 Teradata Interview Questions and Answers
No ratings yet
3 Teradata Interview Questions and Answers
7 pages
Create Int Varchar Date Varchar State Varchar: Emp - Piyush Employeeid Empname 30 Dob City 20 20
100% (1)
Create Int Varchar Date Varchar State Varchar: Emp - Piyush Employeeid Empname 30 Dob City 20 20
10 pages
Big Data Hadoop MCQ Question
No ratings yet
Big Data Hadoop MCQ Question
109 pages
Learning SQL Zero To Hero
100% (1)
Learning SQL Zero To Hero
110 pages
Python Job Interview Questions and Answers
100% (1)
Python Job Interview Questions and Answers
7 pages
SQL Server Interview Questions Developers PDF
No ratings yet
SQL Server Interview Questions Developers PDF
142 pages
Best Informatica Interview Questions
No ratings yet
Best Informatica Interview Questions
27 pages
Senarios Ds
0% (1)
Senarios Ds
31 pages
DAX Interview Questions 1697470822
No ratings yet
DAX Interview Questions 1697470822
14 pages
Python Lists: List Initialization
No ratings yet
Python Lists: List Initialization
25 pages
Newgen Management Trainee: Oracle Technical Orientation Program
No ratings yet
Newgen Management Trainee: Oracle Technical Orientation Program
41 pages
Top 100+ SQL Interview Questions and Answers for 2025
No ratings yet
Top 100+ SQL Interview Questions and Answers for 2025
24 pages
Interview Question Python
No ratings yet
Interview Question Python
14 pages
What Is Difference Between TRUNCATE & DELETE?: Oracle PL SQL Interview Questions For 3+ Years
No ratings yet
What Is Difference Between TRUNCATE & DELETE?: Oracle PL SQL Interview Questions For 3+ Years
16 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
11 pages
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
No ratings yet
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
9 pages
Tableau Notes: (Dependent Variables) Role. The Field's Data Type Defines If The Field Is, For Example, A
No ratings yet
Tableau Notes: (Dependent Variables) Role. The Field's Data Type Defines If The Field Is, For Example, A
6 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Informatica University
No ratings yet
Informatica University
6 pages
Interview Questions - DAX in Power BI
No ratings yet
Interview Questions - DAX in Power BI
7 pages
EDA with Pandas
No ratings yet
EDA with Pandas
8 pages
200 Ansible Interview Questions & Answers
No ratings yet
200 Ansible Interview Questions & Answers
14 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Java Reflection Complete Self-Assessment Guide
From Everand
Java Reflection Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Java servlet Second Edition
From Everand
Java servlet Second Edition
Gerardus Blokdyk
No ratings yet
My Part-Time Study Notes on Mssql Server
From Everand
My Part-Time Study Notes on Mssql Server
Morris Sebenzile Mntoninzi
No ratings yet
The Simple Guide to SAS: From Null to Novice
From Everand
The Simple Guide to SAS: From Null to Novice
Kirby Thomas
No ratings yet
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
Chapter 08
No ratings yet
Chapter 08
11 pages
Technical Document Enterprise Integration Manager
No ratings yet
Technical Document Enterprise Integration Manager
38 pages
Data Mining in Business Intelligence Book
No ratings yet
Data Mining in Business Intelligence Book
112 pages
Wa0000.
No ratings yet
Wa0000.
2 pages
Information Systems in Health Care
No ratings yet
Information Systems in Health Care
7 pages
Database Authorization Conclusion
No ratings yet
Database Authorization Conclusion
2 pages
Company Profile Stratage
No ratings yet
Company Profile Stratage
44 pages
Big Data
No ratings yet
Big Data
27 pages
Report For Shane
No ratings yet
Report For Shane
18 pages
Classification Algorithm
No ratings yet
Classification Algorithm
51 pages
hb402 5 ch2
No ratings yet
hb402 5 ch2
12 pages
Coursera BioinfoMethods-I Lab01 PDF
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
22 pages
Friday Lunchtime Lecture: The Secret Lives of Buildings Revealed (With Open Data)
No ratings yet
Friday Lunchtime Lecture: The Secret Lives of Buildings Revealed (With Open Data)
38 pages
Boolean Retrieval
No ratings yet
Boolean Retrieval
34 pages
Define Business Intelligence.: 3. Define Role of Mathematical Models
No ratings yet
Define Business Intelligence.: 3. Define Role of Mathematical Models
7 pages
Common Data Model For Identity Access Management Data
No ratings yet
Common Data Model For Identity Access Management Data
45 pages
LIS 103 Reviewer
No ratings yet
LIS 103 Reviewer
5 pages
COMP Notes Class 8
No ratings yet
COMP Notes Class 8
5 pages
Postgresql Concurrency Issues: Tom Lane Red Hat Database Group Red Hat, Inc
No ratings yet
Postgresql Concurrency Issues: Tom Lane Red Hat Database Group Red Hat, Inc
36 pages
Huawei h19 110 v2.0 Certsdeals Actual Questions by Jenkins 15 04 2024 8qa
No ratings yet
Huawei h19 110 v2.0 Certsdeals Actual Questions by Jenkins 15 04 2024 8qa
12 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
Big - Data - Urban - Planning Dr. Harish
No ratings yet
Big - Data - Urban - Planning Dr. Harish
50 pages
Microsoft - Actualtests.dp 203.v2021!04!13.by - Liam.25q
No ratings yet
Microsoft - Actualtests.dp 203.v2021!04!13.by - Liam.25q
31 pages
Summative Test in PR1 (Q2.2)
No ratings yet
Summative Test in PR1 (Q2.2)
2 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
Spreadsheet Modelling - 2022
No ratings yet
Spreadsheet Modelling - 2022
2 pages
MIMIC-IV Clinical Database Demo On FHIR v2.0
No ratings yet
MIMIC-IV Clinical Database Demo On FHIR v2.0
8 pages
Data Flow Diagram DFD: S.Sakthybaalan
No ratings yet
Data Flow Diagram DFD: S.Sakthybaalan
20 pages
Chat GPT Generated Cs 105 Questions
No ratings yet
Chat GPT Generated Cs 105 Questions
25 pages
Topical Issue of SN Computer Science
0% (1)
Topical Issue of SN Computer Science
2 pages