Python - NumPy, Pandas, Matplotlib Q&A (5M & 10M)
1) Explain NumPy datatypes in detail with examples:
NumPy supports int, float, bool, string, complex etc.
Example:
import numpy as np
arr = np.array([1,2,3], dtype=np.float32)
print(arr, arr.dtype)
2) Write a Python program using NumPy to create an array of 10 integers, and find maximum, minimum, mean and standard
import numpy as np
arr = np.arange(1,11)
print('Max:', arr.max())
print('Min:', arr.min())
print('Mean:', arr.mean())
print('Std:', arr.std())
3) Explain fancy indexing & sorting in NumPy with examples:
Fancy indexing uses integer arrays/lists to access elements.
arr = np.array([10,20,30,40,50])
print(arr[[0,3,4]]) # Fancy indexing
print(np.sort(arr)) # Sorting
4) Explain Pandas Series and DataFrame objects with examples:
Series = 1D labeled array, DataFrame = 2D labeled table.
Example:
import pandas as pd
s = pd.Series([1,2,3], index=['a','b','c'])
df = pd.DataFrame({'A':[1,2],'B':[3,4]})
5) Write a Pandas program to combine two datasets and perform group-wise aggregation:
import pandas as pd
df1 = pd.DataFrame({'ID':[1,2,3],'Value':[10,20,30]})
df2 = pd.DataFrame({'ID':[1,2,3],'Category':['X','Y','X']})
merged = pd.merge(df1, df2, on='ID')
print(merged.groupby('Category')['Value'].mean())
6) Explain different methods of handling missing data in Pandas with examples:
- dropna(): remove missing values
- fillna(value): fill with value
- interpolate(): estimate missing values
7) None vs NaN:
None = Python null object, type NoneType. NaN = Not a Number, float.
In Pandas numeric columns, None becomes NaN. Both detected by isnull().
8) What is NumPy?
Library for numerical computing. Provides fast ndarrays, linear algebra, stats, transforms.
9) Difference between Python lists & NumPy arrays:
List can hold different types, slower. NumPy arrays homogeneous, faster, vectorized ops.
10) Short note on aggregation functions:
Functions that summarize data: np.sum, np.mean, np.min, np.max, np.std.
11) Write two examples in NumPy:
arr = np.array([1,2,3])
print(np.sum(arr))
print(np.mean(arr))
12) What is Boolean masking?
Filtering arrays with conditions.
arr = np.array([10,20,30])
print(arr[arr>15])
13) Main data structures in Pandas:
Series (1D), DataFrame (2D), Panel (3D, deprecated).
14) Difference between loc[] and iloc[]:
loc = label-based indexing, iloc = integer index-based.
15) Hierarchical indexing:
Using multiple index levels.
pd.Series([1,2,3,4], index=[['A','A','B','B'],['x','y','x','y']])
16) Two methods to handle missing data:
- dropna()
- fillna(value)
17) Python code to plot line chart and scatter plot:
import matplotlib.pyplot as plt
x=[1,2,3]; y=[2,4,1]
plt.plot(x,y)
plt.scatter(x,y)
plt.show()
18) Explain Histograms and density plots:
Histogram shows frequency distribution. Density plot shows probability distribution.
plt.hist(data, bins=10, density=True)
19) Four types of plots: Line, Scatter, Bar, Histogram.
20) Use of xlabel() & ylabel():
plt.xlabel('X-axis') sets x-label, plt.ylabel('Y-axis') sets y-label.
Difference Line vs Scatter: Line connects points, Scatter shows separate points.
Histogram: frequency distribution plot.