0% found this document useful (0 votes)
4 views7 pages

100+ Python Questions

The document contains 100 Python interview questions and answers tailored for data analysts, covering topics such as Python basics, NumPy, Pandas, data cleaning, visualization, SQL integration, and statistical analysis. It provides practical code snippets and explanations for each question, making it a comprehensive resource for preparing for data analyst interviews. Key areas include handling missing values, data manipulation, and creating visualizations.

Uploaded by

Chetan Priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

100+ Python Questions

The document contains 100 Python interview questions and answers tailored for data analysts, covering topics such as Python basics, NumPy, Pandas, data cleaning, visualization, SQL integration, and statistical analysis. It provides practical code snippets and explanations for each question, making it a comprehensive resource for preparing for data analyst interviews. Key areas include handling missing values, data manipulation, and creating visualizations.

Uploaded by

Chetan Priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Python Interview Q&A for Data Analysts

( 100 Questions )
Based on Pandas, NumPy, Data Cleaning, Data Visualization, SQL integration, and problem-solving — the areas
where Data Analysts usually get Python interview questions.

Python Interview Q&A for Data Analysts (100 Questions)

1. Python Basics for Data Analysis (Q1–Q15)

Q1. Why is Python popular in Data Analysis?


A: Easy to learn, rich libraries (Pandas, NumPy, Matplotlib), strong community support.

Q2. What are key Python libraries for Data Analysis?


A: Pandas, NumPy, Matplotlib, Seaborn, SciPy, Statsmodels, Scikit-learn.

Q3. What is Jupyter Notebook?


A: An interactive environment for writing, testing, and visualizing Python code.

Q4. What are Python data types important for analysis?


A: int, float, str, list, tuple, dict, set.

Q5. Difference between Python list and NumPy array?


A: List is flexible but slower; NumPy array is faster and optimized for numerical operations.

Q6. What are Python’s mutable vs immutable objects?


A: Mutable → list, dict. Immutable → int, float, str, tuple.

Q7. How do you install Python libraries?


A: Using pip install package_name.

Q8. What is a virtual environment?


A: Isolated workspace to manage project-specific dependencies.

Q9. Difference between Python script and Jupyter Notebook?


A: Script is sequential code, Notebook supports interactive coding + visualization.

Q10. What is the use of Python’s with statement?


A: Resource management (e.g., auto-closing files).

Q11. What is Python’s id() function?


A: Returns memory address of an object.

Q12. What is difference between Python 2 and Python 3 in data analysis?


A: Python 3 supports Unicode, better libraries; Python 2 is outdated.

Q13. What is the difference between is and == in Python?


A: is → identity, == → value equality.

Q14. How do you handle missing values in Python?


A: Using Pandas dropna() or fillna().

Q15. What is type casting in Python?


A: Converting data types (e.g., int("10")).
2. NumPy (Q16–Q25)

Q16. What is NumPy?


A: Numerical Python library for fast numerical computations.

Q17. How to create a NumPy array?

import numpy as np

arr = np.array([1, 2, 3])

Q18. Difference between list and NumPy array?


A: List is general-purpose; NumPy array supports vectorized operations.

Q19. How to create a 2D array in NumPy?

np.array([[1,2,3],[4,5,6]])

Q20. What is broadcasting in NumPy?


A: Performing operations on arrays of different shapes.

Q21. How to calculate mean, median, std in NumPy?

np.mean(arr), np.median(arr), np.std(arr)

Q22. What is difference between np.zeros() and np.ones()?


A: Creates arrays filled with zeros or ones.

Q23. How to generate random numbers in NumPy?

np.random.rand(3,2)

Q24. What is slicing in NumPy arrays?


A: Accessing subsets: arr[1:4].

Q25. How to find unique values in a NumPy array?

np.unique(arr)

3. Pandas (Q26–Q45)

Q26. What is Pandas?


A: A library for data manipulation and analysis.

Q27. Difference between Pandas Series and DataFrame?


A: Series = 1D, DataFrame = 2D table.

Q28. How to create a Pandas DataFrame?

import pandas as pd

df = pd.DataFrame({"Name":["A","B"],"Age":[20,25]})

Q29. How to read CSV file in Pandas?

pd.read_csv("file.csv")

Q30. How to check DataFrame shape?


df.shape

Q31. How to get column names in Pandas?

df.columns

Q32. How to filter rows in Pandas?

df[df["Age"] > 25]

Q33. How to handle missing values?

• df.dropna() → remove

• df.fillna(value) → replace

Q34. How to group data in Pandas?

df.groupby("Department")["Salary"].mean()

Q35. How to merge two DataFrames?

pd.merge(df1, df2, on="ID")

Q36. Difference between loc and iloc?

• loc → label-based

• iloc → index-based

Q37. How to sort DataFrame?

df.sort_values(by="Salary", ascending=False)

Q38. How to rename columns in Pandas?

df.rename(columns={"old":"new"})

Q39. How to apply custom function on DataFrame?

df["Col"].apply(lambda x: x*2)

Q40. How to check data types of columns?

df.dtypes

Q41. How to convert column to datetime in Pandas?

pd.to_datetime(df["Date"])

Q42. How to pivot table in Pandas?

df.pivot_table(values="Sales", index="Region", columns="Year")

Q43. How to reset index in Pandas?

df.reset_index(drop=True)

Q44. How to check duplicate rows?

df.duplicated().sum()

Q45. How to remove duplicates in Pandas?

df.drop_duplicates()
4. Data Cleaning & Transformation (Q46–Q60)

Q46. How do you handle outliers?


A: Using IQR, Z-score, or capping.

Q47. How to replace values in DataFrame?

df["Gender"].replace({"M":"Male","F":"Female"})

Q48. How to check null values in DataFrame?

df.isnull().sum()

Q49. How to combine text columns?

df["FullName"] = df["First"] + " " + df["Last"]

Q50. How to change column data type?

df["Age"] = df["Age"].astype(int)

Q51. How to extract year from datetime column?

df["Year"] = df["Date"].dt.year

Q52. What is difference between applymap, map, apply?

• map → Series

• apply → row/col in DataFrame

• applymap → entire DataFrame

Q53. How to normalize column values?

(df["col"] - df["col"].min()) / (df["col"].max()-df["col"].min())

Q54. How to concatenate DataFrames?

pd.concat([df1, df2])

Q55. How to one-hot encode categorical variables?

pd.get_dummies(df, columns=["Category"])

Q56. How to bin continuous values?

pd.cut(df["Age"], bins=[0,18,30,50,100])

Q57. How to change column order in Pandas?

df = df[["Col2","Col1","Col3"]]

Q58. How to melt a DataFrame?

pd.melt(df, id_vars=["ID"], value_vars=["Math","Science"])

Q59. How to check correlation in Pandas?

df.corr()

Q60. How to detect skewness?


df["col"].skew()

5. Visualization (Q61–Q70)

Q61. How to plot bar chart in Matplotlib?

import matplotlib.pyplot as plt

df["Col"].value_counts().plot(kind="bar")

plt.show()

Q62. How to plot histogram?

df["Col"].hist()

Q63. How to plot line chart?

df.plot(x="Date", y="Sales")

Q64. How to plot scatter plot?

df.plot.scatter(x="Age", y="Salary")

Q65. What is Seaborn?


A: Advanced data visualization library built on Matplotlib.

Q66. How to plot heatmap in Seaborn?

import seaborn as sns

sns.heatmap(df.corr(), annot=True)

Q67. How to plot boxplot in Seaborn?

sns.boxplot(x="Category", y="Sales", data=df)

Q68. How to show multiple plots in one figure?

plt.subplot(2,2,1); plt.plot(df["Sales"])

Q69. How to set figure size in Matplotlib?

plt.figure(figsize=(10,5))

Q70. How to save plot as image?

plt.savefig("chart.png")

6. SQL & Python (Q71–Q75)

Q71. How do you connect Python to SQL?


A: Using libraries like pyodbc, sqlalchemy, sqlite3.

Q72. How to read SQL data into Pandas?

pd.read_sql("SELECT * FROM table", conn)

Q73. How to insert Pandas DataFrame into SQL?

df.to_sql("table", conn, if_exists="replace", index=False)


Q74. What is difference between read_sql_query and read_sql_table?

• query → runs SQL query

• table → fetches full table

Q75. How to handle large datasets from SQL in Python?


A: Use chunks (chunksize), optimize queries, use indexes.

7. Statistics & Analysis (Q76–Q85)

Q76. How to calculate correlation in Python?

df.corr()

Q77. How to calculate standard deviation?

df["col"].std()

Q78. How to calculate variance?

df["col"].var()

Q79. How to calculate skewness and kurtosis?

df["col"].skew(), df["col"].kurt()

Q80. How to detect outliers using IQR?

Q1, Q3 = df["col"].quantile([0.25,0.75])

IQR = Q3 - Q1

outliers = df[(df["col"] < Q1-1.5*IQR) | (df["col"] > Q3+1.5*IQR)]

Q81. How to calculate moving average?

df["col"].rolling(3).mean()

Q82. How to calculate correlation heatmap?

sns.heatmap(df.corr(), annot=True)

Q83. How to calculate percentile?

np.percentile(df["col"], 90)

Q84. How to calculate z-score?

from scipy.stats import zscore

df["zscore"] = zscore(df["col"])

Q85. How to calculate covariance?

df.cov()

8. Case Studies & Coding (Q86–Q100)

Q86. Find top 5 highest salaries from DataFrame.


df.nlargest(5,"Salary")

Q87. Find employees with salary above average.

df[df["Salary"] > df["Salary"].mean()]

Q88. Count number of employees per department.

df["Dept"].value_counts()

Q89. Find customers who purchased more than 5 times.

df["Customer"].value_counts()[df["Customer"].value_counts() > 5]

Q90. Find 2nd highest salary.

df["Salary"].nlargest(2).iloc[-1]

Q91. Calculate total sales by region.

df.groupby("Region")["Sales"].sum()

Q92. Show top 10 products by revenue.

df.groupby("Product")["Revenue"].sum().nlargest(10)

Q93. Find percentage contribution of each category.

(df.groupby("Category")["Sales"].sum() / df["Sales"].sum())*100

Q94. Find duplicate customers.

df[df.duplicated("CustomerID")]

Q95. Get month with highest sales.

df.groupby(df["Date"].dt.month)["Sales"].sum().idxmax()

Q96. Find average order value (AOV).

df["Sales"].sum()/df["OrderID"].nunique()

Q97. Calculate churn rate.


A: (Lost Customers / Total Customers at Start) * 100.

Q98. Detect null percentage in each column.

df.isnull().mean()*100

Q99. Calculate profit margin % per order.

df["ProfitMargin"] = (df["Profit"]/df["Sales"])*100

Q100. Build customer segmentation using Pandas.

df.groupby("Segment")["Sales"].agg(["mean","sum","count"])

You might also like