Assignment 1: Python Libraries for Data Science
NumPy
1. What is NumPy, and why is it essential for numerical computations in Python?
NumPy (Numerical Python) is a powerful Python library used for numerical computing. It provides:
- A high-performance multidimensional array object (ndarray)
- Mathematical functions to operate on arrays
- Tools for integrating C/C++ and Fortran code
Importance:
- It supports vectorized operations, making code faster.
- It forms the foundation for many other libraries like Pandas, SciPy, and scikit-learn.
2. Create a 3x3 NumPy array filled with random integers between 1 and 10.
import numpy as np
array = np.random.randint(1, 11, (3, 3))
array.sum(), array.mean(), array.std()
3. Reshape the array into a 1x9 vector.
vector = array.reshape(1, 9)
SciPy
1. Discuss the main modules of SciPy and their applications.
SciPy modules include:
- scipy.integrate: Integration routines
- scipy.optimize: Optimization algorithms
- scipy.linalg: Linear algebra operations
- scipy.signal: Signal processing
- scipy.stats: Statistical functions
- scipy.fft: Fast Fourier Transforms
2. Use scipy.optimize to find the minimum of f(x) = x² + 5x + 6.
from scipy.optimize import minimize
f = lambda x: x**2 + 5*x + 6
minimize(f, x0=0)
3. Plot the function using Matplotlib.
import matplotlib.pyplot as plt
x = np.linspace(-10, 5, 100)
y = f(x)
plt.plot(x, y)
Pandas
1. What are the two primary data structures in Pandas?
- Series: 1D labeled array.
- DataFrame: 2D labeled table.
2. Load a CSV (or create DataFrame) with Name, Age, Salary.
import pandas as pd
df = pd.DataFrame({'Name':['A','B'],'Age':[25,30],'Salary':[60000,45000]})
3. Filter rows where Salary > 50000.
df[df['Salary'] > 50000]
4. Group the data by Age and calculate the average salary.
df.groupby('Age')['Salary'].mean()
5. Handle missing values.
df.fillna(value), df.dropna()
Matplotlib
1. Key features of Matplotlib
- Line, bar, pie charts
- Labels, titles, legends
- Subplots and gridlines
2. Generate a line plot for population growth.
years = np.arange(2015, 2025)
population = [10000, 11000, ..., 30000]
plt.plot(years, population)
3. Create a figure with bar and pie chart.
fig, axs = plt.subplots(1, 2)
axs[0].bar(...)
axs[1].pie(...)