Python Interview Q&A for Data Analysts
( 100 Questions )
Based on Pandas, NumPy, Data Cleaning, Data Visualization, SQL integration, and problem-solving — the areas
where Data Analysts usually get Python interview questions.
Python Interview Q&A for Data Analysts (100 Questions)
1. Python Basics for Data Analysis (Q1–Q15)
Q1. Why is Python popular in Data Analysis?
A: Easy to learn, rich libraries (Pandas, NumPy, Matplotlib), strong community support.
Q2. What are key Python libraries for Data Analysis?
A: Pandas, NumPy, Matplotlib, Seaborn, SciPy, Statsmodels, Scikit-learn.
Q3. What is Jupyter Notebook?
A: An interactive environment for writing, testing, and visualizing Python code.
Q4. What are Python data types important for analysis?
A: int, float, str, list, tuple, dict, set.
Q5. Difference between Python list and NumPy array?
A: List is flexible but slower; NumPy array is faster and optimized for numerical operations.
Q6. What are Python’s mutable vs immutable objects?
A: Mutable → list, dict. Immutable → int, float, str, tuple.
Q7. How do you install Python libraries?
A: Using pip install package_name.
Q8. What is a virtual environment?
A: Isolated workspace to manage project-specific dependencies.
Q9. Difference between Python script and Jupyter Notebook?
A: Script is sequential code, Notebook supports interactive coding + visualization.
Q10. What is the use of Python’s with statement?
A: Resource management (e.g., auto-closing files).
Q11. What is Python’s id() function?
A: Returns memory address of an object.
Q12. What is difference between Python 2 and Python 3 in data analysis?
A: Python 3 supports Unicode, better libraries; Python 2 is outdated.
Q13. What is the difference between is and == in Python?
A: is → identity, == → value equality.
Q14. How do you handle missing values in Python?
A: Using Pandas dropna() or fillna().
Q15. What is type casting in Python?
A: Converting data types (e.g., int("10")).
2. NumPy (Q16–Q25)
Q16. What is NumPy?
A: Numerical Python library for fast numerical computations.
Q17. How to create a NumPy array?
import numpy as np
arr = np.array([1, 2, 3])
Q18. Difference between list and NumPy array?
A: List is general-purpose; NumPy array supports vectorized operations.
Q19. How to create a 2D array in NumPy?
np.array([[1,2,3],[4,5,6]])
Q20. What is broadcasting in NumPy?
A: Performing operations on arrays of different shapes.
Q21. How to calculate mean, median, std in NumPy?
np.mean(arr), np.median(arr), np.std(arr)
Q22. What is difference between np.zeros() and np.ones()?
A: Creates arrays filled with zeros or ones.
Q23. How to generate random numbers in NumPy?
np.random.rand(3,2)
Q24. What is slicing in NumPy arrays?
A: Accessing subsets: arr[1:4].
Q25. How to find unique values in a NumPy array?
np.unique(arr)
3. Pandas (Q26–Q45)
Q26. What is Pandas?
A: A library for data manipulation and analysis.
Q27. Difference between Pandas Series and DataFrame?
A: Series = 1D, DataFrame = 2D table.
Q28. How to create a Pandas DataFrame?
import pandas as pd
df = pd.DataFrame({"Name":["A","B"],"Age":[20,25]})
Q29. How to read CSV file in Pandas?
pd.read_csv("file.csv")
Q30. How to check DataFrame shape?
df.shape
Q31. How to get column names in Pandas?
df.columns
Q32. How to filter rows in Pandas?
df[df["Age"] > 25]
Q33. How to handle missing values?
• df.dropna() → remove
• df.fillna(value) → replace
Q34. How to group data in Pandas?
df.groupby("Department")["Salary"].mean()
Q35. How to merge two DataFrames?
pd.merge(df1, df2, on="ID")
Q36. Difference between loc and iloc?
• loc → label-based
• iloc → index-based
Q37. How to sort DataFrame?
df.sort_values(by="Salary", ascending=False)
Q38. How to rename columns in Pandas?
df.rename(columns={"old":"new"})
Q39. How to apply custom function on DataFrame?
df["Col"].apply(lambda x: x*2)
Q40. How to check data types of columns?
df.dtypes
Q41. How to convert column to datetime in Pandas?
pd.to_datetime(df["Date"])
Q42. How to pivot table in Pandas?
df.pivot_table(values="Sales", index="Region", columns="Year")
Q43. How to reset index in Pandas?
df.reset_index(drop=True)
Q44. How to check duplicate rows?
df.duplicated().sum()
Q45. How to remove duplicates in Pandas?
df.drop_duplicates()
4. Data Cleaning & Transformation (Q46–Q60)
Q46. How do you handle outliers?
A: Using IQR, Z-score, or capping.
Q47. How to replace values in DataFrame?
df["Gender"].replace({"M":"Male","F":"Female"})
Q48. How to check null values in DataFrame?
df.isnull().sum()
Q49. How to combine text columns?
df["FullName"] = df["First"] + " " + df["Last"]
Q50. How to change column data type?
df["Age"] = df["Age"].astype(int)
Q51. How to extract year from datetime column?
df["Year"] = df["Date"].dt.year
Q52. What is difference between applymap, map, apply?
• map → Series
• apply → row/col in DataFrame
• applymap → entire DataFrame
Q53. How to normalize column values?
(df["col"] - df["col"].min()) / (df["col"].max()-df["col"].min())
Q54. How to concatenate DataFrames?
pd.concat([df1, df2])
Q55. How to one-hot encode categorical variables?
pd.get_dummies(df, columns=["Category"])
Q56. How to bin continuous values?
pd.cut(df["Age"], bins=[0,18,30,50,100])
Q57. How to change column order in Pandas?
df = df[["Col2","Col1","Col3"]]
Q58. How to melt a DataFrame?
pd.melt(df, id_vars=["ID"], value_vars=["Math","Science"])
Q59. How to check correlation in Pandas?
df.corr()
Q60. How to detect skewness?
df["col"].skew()
5. Visualization (Q61–Q70)
Q61. How to plot bar chart in Matplotlib?
import matplotlib.pyplot as plt
df["Col"].value_counts().plot(kind="bar")
plt.show()
Q62. How to plot histogram?
df["Col"].hist()
Q63. How to plot line chart?
df.plot(x="Date", y="Sales")
Q64. How to plot scatter plot?
df.plot.scatter(x="Age", y="Salary")
Q65. What is Seaborn?
A: Advanced data visualization library built on Matplotlib.
Q66. How to plot heatmap in Seaborn?
import seaborn as sns
sns.heatmap(df.corr(), annot=True)
Q67. How to plot boxplot in Seaborn?
sns.boxplot(x="Category", y="Sales", data=df)
Q68. How to show multiple plots in one figure?
plt.subplot(2,2,1); plt.plot(df["Sales"])
Q69. How to set figure size in Matplotlib?
plt.figure(figsize=(10,5))
Q70. How to save plot as image?
plt.savefig("chart.png")
6. SQL & Python (Q71–Q75)
Q71. How do you connect Python to SQL?
A: Using libraries like pyodbc, sqlalchemy, sqlite3.
Q72. How to read SQL data into Pandas?
pd.read_sql("SELECT * FROM table", conn)
Q73. How to insert Pandas DataFrame into SQL?
df.to_sql("table", conn, if_exists="replace", index=False)
Q74. What is difference between read_sql_query and read_sql_table?
• query → runs SQL query
• table → fetches full table
Q75. How to handle large datasets from SQL in Python?
A: Use chunks (chunksize), optimize queries, use indexes.
7. Statistics & Analysis (Q76–Q85)
Q76. How to calculate correlation in Python?
df.corr()
Q77. How to calculate standard deviation?
df["col"].std()
Q78. How to calculate variance?
df["col"].var()
Q79. How to calculate skewness and kurtosis?
df["col"].skew(), df["col"].kurt()
Q80. How to detect outliers using IQR?
Q1, Q3 = df["col"].quantile([0.25,0.75])
IQR = Q3 - Q1
outliers = df[(df["col"] < Q1-1.5*IQR) | (df["col"] > Q3+1.5*IQR)]
Q81. How to calculate moving average?
df["col"].rolling(3).mean()
Q82. How to calculate correlation heatmap?
sns.heatmap(df.corr(), annot=True)
Q83. How to calculate percentile?
np.percentile(df["col"], 90)
Q84. How to calculate z-score?
from scipy.stats import zscore
df["zscore"] = zscore(df["col"])
Q85. How to calculate covariance?
df.cov()
8. Case Studies & Coding (Q86–Q100)
Q86. Find top 5 highest salaries from DataFrame.
df.nlargest(5,"Salary")
Q87. Find employees with salary above average.
df[df["Salary"] > df["Salary"].mean()]
Q88. Count number of employees per department.
df["Dept"].value_counts()
Q89. Find customers who purchased more than 5 times.
df["Customer"].value_counts()[df["Customer"].value_counts() > 5]
Q90. Find 2nd highest salary.
df["Salary"].nlargest(2).iloc[-1]
Q91. Calculate total sales by region.
df.groupby("Region")["Sales"].sum()
Q92. Show top 10 products by revenue.
df.groupby("Product")["Revenue"].sum().nlargest(10)
Q93. Find percentage contribution of each category.
(df.groupby("Category")["Sales"].sum() / df["Sales"].sum())*100
Q94. Find duplicate customers.
df[df.duplicated("CustomerID")]
Q95. Get month with highest sales.
df.groupby(df["Date"].dt.month)["Sales"].sum().idxmax()
Q96. Find average order value (AOV).
df["Sales"].sum()/df["OrderID"].nunique()
Q97. Calculate churn rate.
A: (Lost Customers / Total Customers at Start) * 100.
Q98. Detect null percentage in each column.
df.isnull().mean()*100
Q99. Calculate profit margin % per order.
df["ProfitMargin"] = (df["Profit"]/df["Sales"])*100
Q100. Build customer segmentation using Pandas.
df.groupby("Segment")["Sales"].agg(["mean","sum","count"])