0% found this document useful (0 votes)
10 views12 pages

Genetic Algorithms

Uploaded by

yamuna280704
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

Genetic Algorithms

Uploaded by

yamuna280704
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

GENETIC ALGORITHMS:

Genetic Algorithms (GAs) are optimization and search techniques inspired by the process of
natural selection and genetics in biology. They belong to the family of evolutionary algorithms
and are often used in machine learning (ML) and artificial intelligence (AI) for solving problems
where traditional optimization methods struggle.

“Genetic Algorithms are evolutionary optimization techniques inspired by natural selection. In


machine learning, they are used for feature selection, hyperparameter tuning, neural network
optimization, model selection, and clustering. They work through population-based search using
selection, crossover, and mutation, making them effective for complex optimization problems.”

Although computationally expensive, they are highly effective for solving problems where
traditional optimization methods fail.

How It Works in ML
1. Initialization
• Create a population of random candidate solutions (e.g., different sets of features,
hyperparameters, or model architectures).
2. Evaluation (Fitness Function)
• Evaluate each candidate using a performance measure like model accuracy, error,
or loss.
3. Selection
• Choose candidates with better performance to form the next generation.
4. Crossover (Combination)
• Combine two selected candidates to create new candidate solutions.
• Example: Combine two sets of hyperparameters to try a new combination.
5. Mutation (Variation)
• Randomly change parts of a candidate to explore new solutions.
• Example: Change a learning rate or number of neurons slightly.
6. Replacement / Iteration
• Form a new population and repeat the evaluation-selection-combination process until
a good solution is found.

Applications in Machine Learning


1. Feature Selection
• GA selects the best subset of input features to improve accuracy and reduce
training time.
2. Hyperparameter Optimization
• GA searches for optimal hyperparameters (e.g., learning rate, number of layers,
number of neurons).
3. Neural Network Optimization
• GA can optimize network architecture and weights without using gradient-based
methods.
4. Model Selection
• GA can choose the best model or combination of models for a dataset.
5. Clustering and Unsupervised Learning
• GA can optimize cluster centers or cluster assignments for better groupings.

Advantages
• Works well in large, complex, or non-linear search spaces.
• Does not require gradient information or derivative calculations.
• Can avoid getting stuck in poor solutions (local minima).

Disadvantages
• Requires high computational resources.
• Convergence may be slow.
• No guarantee of the absolute best solution.

Example: Feature Selection with Genetic


Algorithm
Problem Statement:
We have a dataset with 6 features: F1, F2, F3, F4, F5, F6.
Goal: Find the best subset of features that gives the highest classification accuracy using a
classifier (like Decision Tree or SVM).
We will solve this using a Genetic Algorithm.

Step 1: Represent Solutions


• Each subset of features is represented as a binary string (chromosome):
• 1 → include feature, 0 → exclude feature.
• Example: 101010 → select F1, F3, F5.

Step 2: Initialize Population


• Start with 5 random candidate solutions:

Chromosome Features Selected


101010 F1, F3, F5
111000 F1, F2, F3
000111 F4, F5, F6
110100 F1, F2, F4
011011 F2, F3, F5, F6

Step 3: Evaluate Fitness


• Train classifier with each subset and calculate accuracy:

Chromosome Features Accuracy (%)


101010 F1,F3,F5 78
111000 F1,F2,F3 82
000111 F4,F5,F6 70
110100 F1,F2,F4 80
011011 F2,F3,F5,F6 85
• Fitness function = Accuracy of classifier.

Step 4: Selection
• Select the top-performing solutions for reproduction (higher accuracy):
• Selected Parents: 011011 (85%) and 111000 (82%)

Step 5: Crossover
• Combine selected parents to create offspring:
• Parent1: 011011

• Parent2: 111000

• Perform crossover (swap middle part):

Offspring Features Selected


011000 F2, F3
111011 F1, F2, F3, F5, F6
Step 6: Mutation
• Randomly flip one bit in offspring to introduce variation:

Offspring After Mutation Features Selected


011000 011010 F2, F3, F5
111011 111111 F1, F2, F3, F4, F5, F6

Step 7: Form New Population


• New population consists of top parents + offspring:

Chromosome Features
011011 F2,F3,F5,F6
111000 F1,F2,F3
011010 F2,F3,F5
111111 F1,F2,F3,F4,F5,F6
• Evaluate fitness for new population again:

Chromosome Accuracy (%)


011011 85
111000 82
011010 80
111111 83

Step 8: Repeat
• Continue selection → crossover → mutation → evaluation for several generations until no
significant improvement in accuracy is observed.

Step 9: Final Result


• After a few generations, GA identifies the best feature subset:
Best Subset: 011011 → Features F2, F3, F5, F6
Maximum Accuracy Achieved: 85%

1. What is Hypothesis Space?


• In machine learning, the hypothesis space is the set of all possible models or solutions that
can be used to solve a problem.
• Example 1: For feature selection with 6 features, all 26=64 possible feature subsets
are part of the hypothesis space.
• Example 2: For a neural network, all possible combinations of weights and
architectures form the hypothesis space.
• Goal: Find the best hypothesis that gives the highest performance (accuracy, lowest error,
etc.).

2. How Genetic Algorithms Search the Hypothesis Space


Genetic Algorithms perform hypothesis space search by iteratively improving candidate
solutions:
1. Representation of Hypotheses:
• Each candidate solution is encoded as a chromosome (binary string, real numbers,
etc.).
• Example: [1,0,1,1,0,0] → select features F1, F3, F4.

2. Initial Population:
• GA starts with a random set of candidate solutions (initial population).
3. Fitness Evaluation:
• Each candidate is evaluated using a fitness function (e.g., classifier accuracy, error
rate).
4. Selection:
• The best-performing candidates are selected for the next generation.
5. Crossover (Combination):
• Parts of two candidates are combined to generate new candidates (offspring).
6. Mutation (Variation):
• Small random changes are applied to some candidates to explore new regions of the
hypothesis space.
7. Iteration / Termination:
• Repeat evaluation, selection, crossover, and mutation until a good solution or max
generations is reached.

3. Types of Hypothesis Space Search in GA


Different strategies can be used to explore the hypothesis space efficiently:
1. Blind / Random Search:
• Candidates are randomly generated without considering previous solutions.
• Simple but inefficient for large spaces.
2. Greedy Search:
• Always selects the best candidates to create new solutions.
• Fast convergence but may get stuck in local optima.
3. Steady-State Search:
• Only a few candidates are replaced each generation.
• Maintains diversity but slower exploration.
4. Generational Search:
• Entire population is replaced each generation.
• Explores space faster but risks losing good candidates.
5. Hybrid Search:
• GA is combined with local search (e.g., hill climbing).
• Efficient and can find global optimum more reliably.

4. Example (Feature Selection)


Problem: Select the best subset of 6 features for classification.
• Hypothesis space = all 64 subsets.
• Steps using GA:
1. Randomly generate 5 subsets: [F1,F3,F5], [F2,F4,F6], etc.

2. Evaluate accuracy of each subset.


3. Select top-performing subsets.
4. Apply crossover → new subsets.
5. Apply mutation → explore new subsets.
6. Repeat until the best subset [F2,F3,F5,F6] with highest accuracy is found.

✅ GA efficiently searches the hypothesis space without evaluating all 64 subsets individually.
GENETIC PROGRAMMING:

1. What is Genetic Programming (GP)?


• Genetic Programming is an extension of Genetic Algorithms where the goal is to evolve
computer programs or expressions to solve a problem.
• While GA works on fixed-length representations (like feature subsets or parameters), GP
works on variable-length structures like trees or programs.
• In machine learning, GP is used to automatically generate models, formulas, or rules.
2. Key Idea
• GP treats candidate solutions as programs or expressions instead of simple strings or
numbers.
• These programs are evolved over generations to improve performance based on a fitness
function.
• Example: GP can evolve a symbolic regression model or a decision tree without manually
designing it.

3. How GP Works (Similar to GA)


1. Representation
• Each program is represented as a tree:
• Internal nodes → functions/operators (e.g., +, −, *, /)
• Leaf nodes → inputs or constants (features, numbers)
2. Initial Population
• Generate a set of random programs (trees).
3. Fitness Evaluation
• Evaluate each program based on how well it performs on the task (accuracy, error,
etc.).
4. Selection
• Select the best-performing programs to reproduce.
5. Crossover
• Swap subtrees between two parent programs to create new programs.
6. Mutation
• Randomly change a node or subtree to create variation.
7. Iteration / Termination
• Repeat until a program achieves the desired performance or maximum
generations are reached.

4. Applications in Machine Learning


1. Symbolic Regression → Evolve formulas to fit data.
2. Feature Construction → Automatically create new features for ML models.
3. Classification Rules → Evolve rules for decision-making (like if-then rules).
4. Neural Network Design → Evolve network architectures automatically.

Example: Symbolic Regression using Genetic


Programming
Problem:
We have a dataset of points (x,y) and want to find a formula that predicts y from x automatically,
without manually designing it.

Step 1: Represent Programs


• Each candidate formula is represented as a tree:
Example trees:
1. y = x + 2
+
/ \
x 2

2. y = x * 3 - 1
-
/ \
* 1
/ \
x 3

3. y = x^2 + 5
+
/ \
^ 5
/ \
x 2

Step 2: Initial Population


• Generate random programs like above.

Step 3: Fitness Evaluation


• Calculate how well each program predicts y for given x values.
• Fitness function: Mean Squared Error (MSE) or accuracy.
Example:
Program Predicted y MSE Fitness
y=x+2 … 5.2 0.19
y=x*3-1 … 2.8 0.36
y = x^2 + 5 … 7.5 0.13
• Higher fitness = better solution.

Step 4: Selection
• Choose top-performing programs to generate the next generation.

Step 5: Crossover
• Swap subtrees between two programs to create new formulas.
Example:
• Parent1: y = x + 2

• Parent2: y = x * 3 - 1

Crossover → Child1: y = x * 2
Crossover → Child2: y = x + 3 - 1

Step 6: Mutation
• Randomly modify nodes to explore new possibilities.
Example: Change + 2 → + 4 → New formula: y = x + 4

Step 7: Repeat
• Evaluate fitness → selection → crossover → mutation → next generation.
• Continue until the best program is found.

Step 8: Final Result


• After several generations, GP finds the best formula:
Best Program: y = 2*x + 1

• This program accurately predicts y from x without manually designing the formula.
Genetic Programming: Models of Evaluation
and Learning (Detailed Exam Version)
Genetic Programming (GP) is an extension of Genetic Algorithms (GA) where candidate solutions
are programs or expressions. To evolve good programs over generations, GP uses evaluation
models to measure performance and learning models to improve the population.

1. Evaluation Models in GP
Evaluation models determine how good a program is. This is done using a fitness function.

a) Supervised Evaluation
• Used when target outputs are known.
• Fitness is measured by comparing the program’s output with the actual output.
• Examples:
• Symbolic regression → predicted y vs actual y.
• Classification → accuracy, precision, recall, F1-score.
• Fitness Function Examples:
• Mean Squared Error (MSE) for regression.
• Accuracy or F1-score for classification.

b) Unsupervised Evaluation
• Used when no target output is available.
• Fitness is based on internal criteria like:
• Clustering quality.
• Data compression or simplification.
• Example: Evolving programs to group similar data points in clustering.

c) Reward-Based / Reinforcement Evaluation


• Fitness is determined by rewards received from the environment.
• Example: Evolving programs for robot navigation or game playing.
• Higher rewards → higher fitness.

2. Learning Models in GP
Learning models determine how programs improve over generations using GA principles:
a) Selection
• Choose high-fitness programs to reproduce and generate the next generation.
• Methods:
• Tournament Selection: Randomly pick a few programs and select the best.
• Roulette Wheel Selection: Probability proportional to fitness.
• Rank Selection: Programs ranked by fitness; selection probability based on rank.

b) Crossover (Recombination)
• Combine parts of two programs to create new offspring.
• Example: Swap subtrees in program trees.
• Purpose: Exploit good solutions to create potentially better programs.

c) Mutation
• Randomly change parts of a program to explore new solutions.
• Example: Replace a constant, change an operator (+ to -), or modify a subtree.
• Purpose: Explore new regions of the hypothesis space and maintain diversity.

d) Elitism
• Keep the best programs unchanged in the next generation.
• Ensures that top-performing solutions are not lost during evolution.

e) Adaptive Learning
• Adjust crossover and mutation rates based on population performance.
• Helps prevent premature convergence and encourages exploration when needed.

3. Example of Evaluation and Learning in GP


Task: Predict y from x (symbolic regression).
1. Initial Programs:
• y = x + 2
• y = x^2 - 1
• y = 2*x + 1
2. Evaluation:
• Compute error (fitness) for each program using Mean Squared Error.
3. Selection:
• Select the top 2 programs with lowest error.
4. Crossover:
• Swap subtrees → new programs like y = 2*x - 1.

5. Mutation:
• Randomly change constants → new program: y = 2*x + 2.

6. Iteration:
• Repeat evaluation → selection → crossover/mutation for several generations.
7. Result:
• Final program: y = 2*x + 1 → best fit for the data.

You might also like