Machine Learning Algorithms Cheat Sheet
1. Linear Regression
**Overview**: Linear Regression is a linear approach to modeling the relationship between a
dependent variable and one or more independent variables.
**Key Hyperparameters**:
- `fit_intercept`: Whether to calculate the intercept for the model. Default is `True`.
- `normalize`: If `True`, the regressors X will be normalized before regression. Default is `False`.
**Example Code**:
```python
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Example data
X = ...
y = ...
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model initialization
lr = LinearRegression(fit_intercept=True, normalize=False)
# Model fitting
lr.fit(X_train, y_train)
# Predictions
y_pred = lr.predict(X_test)
# Evaluation
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
```
2. Logistic Regression
**Overview**: Logistic Regression is used for binary classification problems. It models the probability
of a binary outcome using a logistic function.
**Key Hyperparameters**:
- `penalty`: Used to specify the norm used in the penalization (`'l1'`, `'l2'`, `'elasticnet'`, `'none'`).
- `C`: Inverse of regularization strength; smaller values specify stronger regularization.
- `solver`: Algorithm to use in the optimization problem (`'newton-cg'`, `'lbfgs'`, `'liblinear'`, `'sag'`,
`'saga'`).
**Example Code**:
```python
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Example data
X = ...
y = ...
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model initialization
log_reg = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', max_iter=1000)
# Model fitting
log_reg.fit(X_train, y_train)
# Predictions
y_pred = log_reg.predict(X_test)
# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
```
3. Decision Tree
**Overview**: Decision Tree is a non-parametric supervised learning method used for classification
and regression.
**Key Hyperparameters**:
- `criterion`: The function to measure the quality of a split (`'gini'` for Gini impurity, `'entropy'` for
information gain).
- `max_depth`: The maximum depth of the tree.
- `min_samples_split`: The minimum number of samples required to split an internal node.
- `min_samples_leaf`: The minimum number of samples required to be at a leaf node.
**Example Code**:
```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Example data
X = ...
y = ...
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model initialization
dt = DecisionTreeClassifier(criterion='gini', max_depth=None, min_samples_split=2,
min_samples_leaf=1)
# Model fitting
dt.fit(X_train, y_train)
# Predictions
y_pred = dt.predict(X_test)
# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
```
4. Random Forest
**Overview**: Random Forest is an ensemble method that combines multiple decision trees to
improve classification or regression results.
**Key Hyperparameters**:
- `n_estimators`: The number of trees in the forest.
- `criterion`: The function to measure the quality of a split (`'gini'`, `'entropy'`).
- `max_depth`: The maximum depth of the tree.
- `min_samples_split`: The minimum number of samples required to split an internal node.
- `min_samples_leaf`: The minimum number of samples required to be at a leaf node.
**Example Code**:
```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Example data
X = ...
y = ...
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model initialization
rf = RandomForestClassifier(n_estimators=100, criterion='gini', max_depth=None,
min_samples_split=2, min_samples_leaf=1)
# Model fitting
rf.fit(X_train, y_train)
# Predictions
y_pred = rf.predict(X_test)
# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
```
5. AdaBoost
**Overview**: AdaBoost is an ensemble method that combines multiple weak classifiers to create a
strong classifier.
**Key Hyperparameters**:
- `n_estimators`: The maximum number of estimators at which boosting is terminated.
- `learning_rate`: Weight applied to each classifier at each boosting iteration.
- `base_estimator`: The base estimator from which the boosted ensemble is built (e.g.,
`DecisionTreeClassifier`).
**Example Code**:
```python
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Example data
X = ...
y = ...
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model initialization
ada = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1), n_estimators=50,
learning_rate=1.0)
# Model fitting
ada.fit(X_train, y_train)
# Predictions
y_pred = ada.predict(X_test)
# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
```
6. K-Nearest Neighbors (KNN)
**Overview**: KNN is a non-parametric method used for classification and regression by finding the
k most similar instances in the training data.
**Key Hyperparameters**:
- `n_neighbors`: Number of neighbors to use.
- `weights`: Weight function used in prediction (`'uniform'`, `'distance'`).
- `algorithm`: Algorithm used to compute the nearest neighbors (`'auto'`, `'ball_tree'`, `'kd_tree'`,
`'brute'`).
**Example Code**:
```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Example data
X = ...
y = ...
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model initialization
knn = KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto')
# Model fitting
knn.fit(X_train, y_train)
# Predictions
y_pred = knn.predict(X_test)
# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
```