Day64 - Pandas Interview Questions
Day64 - Pandas Interview Questions
Pandas is widely used in the field of data analysis for several reasons:
1. Ease of Use: Pandas provides a simple and intuitive syntax for data manipulation. Its
data structures are designed to be easy to use and interact with.
2. Data Cleaning and Transformation: Pandas makes it easy to clean and transform data.
It provides functions for handling missing data, reshaping data, merging and joining
datasets, and performing various data transformations.
3. Data Exploration: Pandas allows data analysts to explore and understand their datasets
quickly. Descriptive statistics, data summarization, and various methods for slicing and
dicing data are readily available.
4. Data Input/Output: Pandas supports reading and writing data in various formats,
including CSV, Excel, SQL databases, and more. This makes it easy to work with data
from different sources.
5. Integration with Other Libraries: Pandas integrates well with other popular data
science and machine learning libraries in Python, such as NumPy, Matplotlib, and Scikit-
learn. This allows for a seamless workflow when performing more complex analyses.
6. Time Series Analysis: Pandas provides excellent support for time series data, including
tools for date range generation, frequency conversion, and resampling.
7. Community and Documentation: Pandas has a large and active community, which
means there is extensive documentation and a wealth of online resources, tutorials, and
forums available for users to seek help and guidance.
8. Open Source: Being an open-source project, Pandas allows users to contribute to its
development and improvement. This collaborative nature has helped Pandas evolve and
stay relevant in the rapidly changing landscape of data analysis and data science.
In summary, Pandas is popular in data analysis because it simplifies the process of working
with structured data, provides powerful tools for data manipulation, and has become a
2. Labeled Axes: Both rows and columns of a DataFrame are labeled. This means that
each row and each column has a unique label or index associated with it, allowing for
easy access and manipulation of data.
3. Flexible Size: DataFrames can grow and shrink in size. You can add or remove rows and
columns as needed.
4. Heterogeneous Data Types: Different columns in a DataFrame can have different data
types. For example, one column might contain integers, while another column contains
strings.
6. Missing Data Handling: DataFrames can handle missing data gracefully. Pandas
provides methods for detecting, removing, or filling missing values.
In [ ]: import pandas as pd
df = pd.DataFrame(data)
In this example, each column represents a different attribute (Name, Age, City), and each
row represents a different individual. The DataFrame provides a convenient way to work with
this tabular data in a structured and labeled format.
file:///C:/Users/disha/Downloads/Day64 - Pandas Interview Questions.html 2/5
11/24/23, 11:51 AM Day64 - Pandas Interview Questions
In [ ]: import pandas as pd
In [ ]: import pandas as pd
In summary, if you want to select data based on the labels of rows and columns, you use
loc . If you prefer to select data based on the integer positions of rows and columns, you
use iloc . The choice between them depends on whether you are working with labeled or
integer-based indexing.
Assuming you have a DataFrame named df, and you want to filter rows based on a
condition, let's say a condition on the 'Age' column:
In [ ]: import pandas as pd
# Condition for filtering (e.g., selecting rows where Age is greater than 25)
condition = df['Age'] > 25
In [ ]: import pandas as pd