fast_feature_selection

Genetic algorithms and CMA-ES (covariance matrix adaptation evolution strategy) for efficient feature selection

This is the companion repository for a series of two articles I've published on Medium, on the topic of efficient feature selection for regression models:

Efficient Feature Selection via CMA-ES (Covariance Matrix Adaptation Evolution Strategy)

Efficient Feature Selection via Genetic Algorithms

The work described in the first article is mentioned (ref. [6]) in the paper cmaes : A Simple yet Practical Python Library for CMA-ES by Nomura M. and Shibata M. (2024) as a noteworthy application of CMA-ES with Margin.

Tech details:

I've used the House Prices dataset from Kaggle. After some processing, the dataset yields 213 features with 1453 observations.

The model used for regression is statsmodels.api.OLS(). The objective function used to select the best features is BIC, or the Bayesian Information Criterion - less is better.

Three feature selection techniques are explored:

Sequential Feature Search (SFS) implemented via the mlxtend library
Genetic Algorithms (GA) implemented via the deap library
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) implemented via the cmaes library

SFS and GA used a multiprocessing pool with 16 workers to run the objective function. CMA-ES used a single process for everything.

Results:

Run time (less is better):

SFS:    42.448 sec
GA:     158.027 sec
CMA-ES: 48.326 sec

Number of times the objective function was invoked (less is better):

SFS:    22791
GA:     600525
CMA-ES: 20000

Objective function best value found (less is better):

baseline BIC: 34570.1662
SFS:          33708.9860
GA:           33705.5696
CMA-ES:       33703.0705

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
cmaes-distribution.png		cmaes-distribution.png
cmaes-performance.png		cmaes-performance.png
cmaes-update.png		cmaes-update.png
cmaes.gif		cmaes.gif
cmaes_init.png		cmaes_init.png
feature_selection.ipynb		feature_selection.ipynb
ga-performance.png		ga-performance.png
ga_crossover.png		ga_crossover.png
ga_mutation.png		ga_mutation.png
ga_population.png		ga_population.png
rastrigin.png		rastrigin.png
rastrigin_formula.png		rastrigin_formula.png
sfs-performance.png		sfs-performance.png
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fast_feature_selection

About

Uh oh!

Languages

FlorinAndrei/fast_feature_selection

Folders and files

Latest commit

History

Repository files navigation

fast_feature_selection

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages