Linear Classifier Examples with Python
1. Logistic Regression for Binary Classification (Iris Dataset)
Problem:
We aim to classify whether a flower is Iris Setosa or not using the Iris dataset. This is a classic
binary classification problem.
Approach:
We'll use logistic regression, a linear classifier suitable for binary outcomes. Features used are
sepal length and sepal width.
Solution Explanation:
We use sklearn's LogisticRegression to train the model and evaluate it using accuracy score.
Code:
```python
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = load_iris()
X = iris.data[:, :2]
y = (iris.target == 0).astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))
```
2. Linear SVM for Binary Classification (Breast Cancer Dataset)
Problem:
We want to predict whether a tumor is malignant or benign using linear SVM.
Approach:
Use a linear kernel in Support Vector Machine. SVM finds the optimal hyperplane to separate the
classes.
Solution Explanation:
We'll use sklearn's SVC with a linear kernel and evaluate the results using accuracy.
Code:
```python
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = SVC(kernel='linear')
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))
```
3. Perceptron for Linearly Separable Data
Problem:
Classify synthetic 2D points generated using make_classification into two classes.
Approach:
The perceptron algorithm is an online linear classifier suitable for linearly separable data.
Solution Explanation:
We use sklearn's Perceptron model to train and test on synthetic data.
Code:
```python
from sklearn.datasets import make_classification
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X, y = make_classification(n_samples=1000, n_features=2, n_redundant=0, n_informative=2,
n_clusters_per_class=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = Perceptron()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))
```
4. Ridge Classifier on Multi-class Classification (Digits Dataset)
Problem:
Recognize handwritten digits (0-9) using the sklearn digits dataset.
Approach:
Ridge Classifier is a linear classifier that uses ridge regression for classification.
Solution Explanation:
We use sklearn's RidgeClassifier and test its performance on a multi-class classification task.
Code:
```python
from sklearn.datasets import load_digits
from sklearn.linear_model import RidgeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, random_state=0)
model = RidgeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))
```
5. Passive Aggressive Classifier for Online Learning (News Classification)
Problem:
Classify news articles from the 20 Newsgroups dataset.
Approach:
Passive Aggressive Classifier is suitable for large-scale online learning.
Solution Explanation:
Using text data, we convert it into numerical features using TF-IDF. PassiveAggressiveClassifier is
then trained for classification.
Code:
```python
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = fetch_20newsgroups(subset='all', categories=['sci.space', 'rec.sport.baseball'])
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3,
random_state=42)
vectorizer = TfidfVectorizer(stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
model = PassiveAggressiveClassifier(max_iter=1000)
model.fit(X_train_tfidf, y_train)
predictions = model.predict(X_test_tfidf)
print("Accuracy:", accuracy_score(y_test, predictions))
```