Skip to content

ak811/logreg-naive-bayes

Repository files navigation

ML Classifiers: Logistic Regression (GD/IRLS) + Naive Bayes (Gaussian/Bernoulli)

A machine-learning project implementing:

  • Multiclass (softmax) logistic regression trained with gradient descent
  • Multiclass logistic regression (K−1 parameterization) trained with Newton / IRLS
  • Gaussian Naive Bayes (with variance smoothing) + sampling
  • Bernoulli Naive Bayes (with Lidstone smoothing) + sampling

Benchmarked on:

  • Iris (toy multiclass dataset)
  • MNIST (OpenML) (handwritten digits)

Setup

1) Create an environment (recommended)

python -m venv .venv
source .venv/bin/activate

2) Install dependencies

pip install -r requirements.txt

Optional dev dependencies (tests):

pip install -r requirements-dev.txt

Running experiments

All scripts are designed to run from the repo root.

Iris

Gradient Descent (no bias):

python scripts/iris_gd.py

Gradient Descent (with bias via intercept feature):

python scripts/iris_gd_bias.py

IRLS/Newton vs GD (no bias):

python scripts/iris_irls_vs_gd.py

IRLS/Newton vs GD (with bias via intercept feature):

python scripts/iris_irls_vs_gd_bias.py

Scikit-learn baseline comparisons:

python scripts/iris_sklearn_baseline.py

MNIST (OpenML)

Gaussian NB baseline (alpha=1e-7):

python scripts/mnist_gnb_baseline.py

Gaussian NB smoothing sweep:

python scripts/mnist_gnb_sweep.py

Gaussian NB digit generation (uses alpha_best=0.1 by default):

python scripts/mnist_gnb_generate.py

Bernoulli NB eval (alpha=1e-8):

python scripts/mnist_bnb_eval.py

Bernoulli NB digit generation:

python scripts/mnist_bnb_generate.py

Note on MNIST download: fetch_openml caches downloads under openml_cache/ (ignored by git). The first MNIST run can take a while depending on network speed.


Outputs (figures)

Iris: IRLS/Newton vs Gradient Descent (no bias)

Iris IRLS vs GD

MNIST: Gaussian Naive Bayes smoothing sweep

MNIST GNB Sweep

MNIST: Generated digits (Gaussian NB)

MNIST GNB Generated

MNIST: Generated digits (Bernoulli NB)

MNIST BNB Generated


Notes on numerical stability

  • Logistic regression loss uses scipy.special.log_softmax to avoid overflow/underflow.
  • Naive Bayes prediction is computed in log-space and normalized with scipy.special.logsumexp.
  • Gaussian NB variance smoothing improves stability on high-dimensional MNIST features.

Tests (optional)

Run sanity checks:

pytest -q

License

MIT (see LICENSE).

About

ML Classifiers: Logistic Regression (GD/IRLS) + Naive Bayes (Gaussian/Bernoulli)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages