Python for Data Science & Machine Learning
1. Data Selection in pandas: loc vs iloc
- `loc` is label-based: it selects rows/columns using labels (names).
- `iloc` is integer-position based: it selects rows/columns using index positions.
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35]
}, index=['a', 'b', 'c'])
print(df.loc['a']) # Row with index label 'a'
print(df.iloc[0]) # First row (position 0)
2. Data Wrangling with pandas
- Handling missing values
- Renaming, filtering, grouping
- Applying functions to columns
df.dropna() # Remove rows with NaNs
df.fillna(0) # Replace NaNs with 0
df.rename(columns={'age': 'Age'})
df[df['age'] > 25] # Filter rows
df.groupby('name').mean() # Group by
df['age'].apply(lambda x: x+1) # Apply function
3. NumPy Basics
NumPy provides fast numerical operations on arrays.
Python for Data Science & Machine Learning
import numpy as np
arr = np.array([1, 2, 3])
print(arr.mean())
print(np.arange(0, 10, 2)) # Create range of numbers
4. Machine Learning with scikit-learn
- Splitting data
- Preprocessing
- Fitting models
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
model = LogisticRegression()
model.fit(X_train, y_train)
print(model.score(X_test, y_test))