0% found this document useful (0 votes)

17 views11 pages

Daa Assignment 10 Aryan Project

The document outlines an assignment on gene sequencing analysis using the Needleman-Wunsch algorithm for sequence alignment. It details the methodology including data collection, scoring matrix construction, and backtracking for optimal alignment, along with code implementation and performance analysis. The project concludes with potential future enhancements such as multi-sequence alignment and machine learning integration.

Uploaded by

mrunknownsir10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views11 pages

Daa Assignment 10 Aryan Project

Uploaded by

mrunknownsir10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

INDIAN INSTITUTE OF INFORMATION

TECHNOLOGY, BHOPAL

Assignment 10

Lab Manual: Design and Analysis of Algorithm (IT-215)

Topic: Gene Sequencing Analysis

Branch: Information Technology Section-2

3rd Semester

Submitted by: Submitted to:

Aryan Dixit Dr. Yatendra Sahu
Scholar no.: 23U03116 Assistant Professor, IIIT Bhopal
Title: Gene Sequence Alignment (Needleman-Wunsch Algorithm)
using Dynamic Programming

4. Proposed Methodology:
The project uses the Needleman-Wunsch algorithm to develop a reliable
approach for sequence alignment. Key steps include data collection, matrix
setup, score calculation, and path backtracking.

4.1 Data Collection:

Genetic data is sourced from public databases, such as GenBank or Ensembl,
focusing on nucleotide sequences for gene regions of interest. The chosen
sequences should be homologous to observe the effects of the algorithm on
evolutionary relationships.

4.2 Alignment Scoring Matrix:

The scoring matrix is foundational to Needleman-Wunsch, where scores for
matches, mismatches, and gaps are predefined. Matches are assigned positive
scores based on sequence similarity, while mismatches and gaps receive
penalties. By adjusting these scores, the algorithm can be tuned to emphasize
either sequence identity or evolutionary distance.

4.3 Dynamic Programming Matrix Construction:

The alignment matrix is constructed using a recursive relationship, filling each
cell based on the highest score calculated from the neighboring cells. Each cell
score represents the best alignment score achievable to that point, incorporating
either a match, mismatch, or gap. The matrix's final cell contains the highest
possible alignment score for the two sequences.

4.4 Backtracking for Optimal Path:

After filling the matrix, the algorithm backtracks from the final cell to
determine the optimal alignment path. By moving through the matrix in reverse,
the algorithm identifies the sequence of matches, mismatches, and gaps that
yield the maximum score.

4.5 Validation and Testing:

To evaluate the effectiveness of the alignment, the resulting sequences are
compared with known homologous sequences. Metrics such as alignment score,
accuracy, and computation time are analyzed to assess algorithm performance.
Testing will include comparison across different scoring matrices to understand
the sensitivity of the algorithm to parameter adjustments.

Algorithm:

1. Initialization of matrices
This is how both matrices look like after initialization, where the linear gap
penalty = -1 is used.

2. Calculate scores to fill score matrix and traceback matrix.

3. Deduce the best alignment from traceback matrix
Traceback begins with the bottom right-most cell (last cell to be filled). Move
according to the value in the cell until ‘done’ cell is reached.

How to interpret the best alignment from above matrix?

The cell value ‘diag’ interprets that residues from two sequences are aligned,
‘up’ can be interpreted as a gap added in top sequence or insertion. Similarly,
‘left’ can be interpreted as a gap added in left sequence or deletion.

This is the optimal alignment derived using Needleman-Wunsch algorithm.

5. Implementation and Results Analysis:

5.1 Code Implementation:

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>

using namespace std;

// Function to perform Needleman-Wunsch alignment and return the alignment

score and aligned sequences
pair<int, pair<string, string>> needlemanWunsch(const string &seqA, const
string &seqB, int match, int mismatch, int gap)
{
int m = seqA.size();
int n = seqB.size();

// Create the scoring matrix

vector<vector<int>> scoreMatrix(m + 1, vector<int>(n + 1, 0));
// Initialize the first row and column with gap penalties
for (int i = 0; i <= m; ++i)
scoreMatrix[i][0] = i * gap;
for (int j = 0; j <= n; ++j)
scoreMatrix[0][j] = j * gap;

// Fill the score matrix

for (int i = 1; i <= m; ++i)
{
for (int j = 1; j <= n; ++j)
{
int matchScore = scoreMatrix[i - 1][j - 1] + (seqA[i - 1] == seqB[j - 1] ?
match : mismatch);
int deleteScore = scoreMatrix[i - 1][j] + gap;
int insertScore = scoreMatrix[i][j - 1] + gap;
scoreMatrix[i][j] = max({matchScore, deleteScore, insertScore});
}
}

// Alignment score is in the bottom-right cell

int alignmentScore = scoreMatrix[m][n];

// Backtracking to find the aligned sequences

string alignedA = "", alignedB = "";
int i = m, j = n;

while (i > 0 || j > 0)

{
if (i > 0 && j > 0 && scoreMatrix[i][j] == scoreMatrix[i - 1][j - 1] +
(seqA[i - 1] == seqB[j - 1] ? match : mismatch))
{
alignedA = seqA[i - 1] + alignedA;
alignedB = seqB[j - 1] + alignedB;
--i;
--j;
}
else if (i > 0 && scoreMatrix[i][j] == scoreMatrix[i - 1][j] + gap)
{
alignedA = seqA[i - 1] + alignedA;
alignedB = "-" + alignedB;
--i;
}
else
{
alignedA = "-" + alignedA;
alignedB = seqB[j - 1] + alignedB;
--j;
}
}

return {alignmentScore, {alignedA, alignedB}};

}

int main()
{
string seqA = "GATTACA";
string seqB = "GCATGCU";
int match = 1;
int mismatch = -1;
int gap = -2;

// Call the Needleman-Wunsch function

auto result = needlemanWunsch(seqA, seqB, match, mismatch, gap);

// Output similarity score and aligned sequences

cout << "Similarity Score: " << result.first << endl;
cout << "Aligned Sequences:\n";
cout << result.second.first << "\n";
cout << result.second.second << endl;

return 0;
}

Output:
Time Complexity: O(m×n), where m and n are the lengths of seqA and seqB.
The matrix filling step iterates through every cell once.
Matrix initialization: O(m+n)
Matrix filling: O(m×n)
Backtracking: O(m+n)
Total Time Complexity=O(m×n).

Space Complexity: O(m×n) for storing the scoreMatrix and additional O(m+n)
for the aligned sequences during backtracking.
Full scoring matrix: O(m×n) (or O(n) with row optimization)
Aligned sequences: O(m+n)
Total Space Complexity=O(m×n)(or O(n) with optimization).

5.2 Challenges Encountered:

 Large Sequence Alignment: The alignment matrix grows significantly
with sequence length, requiring optimized memory management to
maintain computational efficiency. Techniques such as sparse matrices or
sliding window implementations are considered for managing larger data.
 Scoring System Calibration: Assigning biologically accurate scoring
requires fine-tuning to reflect evolutionary relationships accurately. This
was approached through empirical testing and comparison with known
benchmarks.

5.3 Result Analysis:

 Visualization of Aligned Sequences: Aligned sequences are displayed
with highlights on matches, mismatches, and gaps, providing a clear view
of conserved regions and genetic divergence.
 Accuracy Metrics: The alignment accuracy is quantified by comparing
with benchmarked sequences, with metrics such as sensitivity and
specificity. Results demonstrate how well the algorithm identifies true
matches versus evolutionary gaps, essential for phylogenetic studies.
6. Conclusion and Future Work:
The project successfully applies the Needleman-Wunsch algorithm for global
sequence alignment, producing accurate and biologically meaningful
alignments. The results showcase the algorithm’s potential in revealing
conserved genetic regions, essential for studies in comparative genomics and
evolutionary biology. Future work could include:
 Multi-sequence Alignment Extension: Expanding the algorithm to align
multiple sequences, beneficial for evolutionary studies across multiple
species.
 Incorporating Machine Learning: Using machine learning to improve
scoring by learning from known sequence alignments, adapting scoring
dynamically for different organisms or sequence types.
 Graphical Interface Development: A GUI could improve accessibility,
allowing users to adjust scoring parameters and visualize alignments in
real-time.
 Integration with High-Performance Computing (HPC): Running the
algorithm on parallel computing frameworks to handle genome-wide
alignments efficiently.

7. References:
 Needleman, S. B., & Wunsch, C. D. (1970).
"A general method applicable to the search for similarities in the
amino acid sequence of two proteins." Journal of Molecular Biology,
48(3), 443-453.
 Smith, T. F., & Waterman, M. S. (1981).
"Identification of common molecular subsequences." Journal of
Molecular Biology, 147(1), 195-197.
 Altschul, S. F., et al. (1990).
"Basic local alignment search tool." Journal of Molecular Biology,
215(3), 403-410.
 Durbin, R., Eddy, S. R., Krogh, A., & Mitchison, G. (1998).
Biological Sequence Analysis: Probabilistic Models of Proteins and
Nucleic Acids. Cambridge University Press.
 Mount, D. W. (2004).
Bioinformatics: Sequence and Genome Analysis. 2nd Edition. Cold
Spring Harbor Laboratory Press.
 Gusfield, D. (1997).
Algorithms on Strings, Trees, and Sequences: Computer Science and
Computational Biology. Cambridge University Press.
 Waterman, M. S. (1995).
Introduction to Computational Biology: Maps, Sequences, and
Genomes. CRC Press.
 Pearson, W. R., & Lipman, D. J. (1988).
"Improved tools for biological sequence comparison." Proceedings of
the National Academy of Sciences, 85(8), 2444-2448.
 Gapped BLAST and PSI-BLAST: A new generation of protein
database search programs.
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z.,
Miller, W., & Lipman, D.J. (1997). Nucleic Acids Research, 25(17),
3389-3402.

Tabby
No ratings yet
Tabby
11 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Ada 1
No ratings yet
Ada 1
9 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Bioinformatics Sequence Alignments
No ratings yet
Bioinformatics Sequence Alignments
37 pages
Sequence Alignment Algorithms Overview
75% (4)
Sequence Alignment Algorithms Overview
37 pages
Daa Assignment 9
No ratings yet
Daa Assignment 9
4 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
Lecture 5 Introduction Dynamic Programming
No ratings yet
Lecture 5 Introduction Dynamic Programming
52 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Introduction Dynamic Programming
No ratings yet
Introduction Dynamic Programming
52 pages
Daa Assignment 9 Aryan Project
No ratings yet
Daa Assignment 9 Aryan Project
5 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
41 pages
Sequence Alignment Algorithms in Bioinformatics
No ratings yet
Sequence Alignment Algorithms in Bioinformatics
95 pages
DNA Sequence Alignment Techniques
No ratings yet
DNA Sequence Alignment Techniques
57 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Unit Iv
No ratings yet
Unit Iv
98 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Sequence Alignment Methods Overview
No ratings yet
Sequence Alignment Methods Overview
57 pages
HW1 2014
No ratings yet
HW1 2014
2 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Alignment
No ratings yet
Alignment
58 pages
Memo Ization
No ratings yet
Memo Ization
4 pages
Accelerating DNA Pairwise Sequence Alignment Using FPGA and A Customized Convolutional Neural Network - ScienceDirect
No ratings yet
Accelerating DNA Pairwise Sequence Alignment Using FPGA and A Customized Convolutional Neural Network - ScienceDirect
9 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
Shamam, Waheeda, Haris
No ratings yet
Shamam, Waheeda, Haris
4 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Module 3 Session.2 Practical Assignment-Lucy Nakabazzi
No ratings yet
Module 3 Session.2 Practical Assignment-Lucy Nakabazzi
4 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
05 Minimum Edit Distance in Computational Biology 9-29
No ratings yet
05 Minimum Edit Distance in Computational Biology 9-29
4 pages
Lecture2 Sequence Alignment
No ratings yet
Lecture2 Sequence Alignment
26 pages
Needleman Wunsch
100% (1)
Needleman Wunsch
6 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Needleman-Wunsch Algorithm Explained
No ratings yet
Needleman-Wunsch Algorithm Explained
39 pages
Lectures 9-12
No ratings yet
Lectures 9-12
39 pages
Minor
No ratings yet
Minor
37 pages
Global Sequence Alignment Guide
No ratings yet
Global Sequence Alignment Guide
24 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Bioinformatics PDF Bak
No ratings yet
Bioinformatics PDF Bak
14 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
CS 838: Pairwise Sequence Alignment
No ratings yet
CS 838: Pairwise Sequence Alignment
18 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
1 T Coffee Dalign 18
No ratings yet
1 T Coffee Dalign 18
31 pages
Local DNA Sequence Alignment in A Cluster of Workstations: Algorithms and Tools
No ratings yet
Local DNA Sequence Alignment in A Cluster of Workstations: Algorithms and Tools
8 pages
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
No ratings yet
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
7 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Offline DP
No ratings yet
Offline DP
3 pages
Computational Biology Alignment
No ratings yet
Computational Biology Alignment
34 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
BT KTL
No ratings yet
BT KTL
17 pages
Data Communication: A Historical Overview
0% (1)
Data Communication: A Historical Overview
33 pages
AQA MA01 WRE Jan19
No ratings yet
AQA MA01 WRE Jan19
7 pages
Engineering Graphics
No ratings yet
Engineering Graphics
6 pages
Chm102 CBT CA Questions
No ratings yet
Chm102 CBT CA Questions
95 pages
Blu Ram HS
No ratings yet
Blu Ram HS
1 page
NIC Components NRWY Series
No ratings yet
NIC Components NRWY Series
8 pages
Unit-III DS Search Trees
No ratings yet
Unit-III DS Search Trees
69 pages
Screenshot 2024-08-30 at 21.34.20
No ratings yet
Screenshot 2024-08-30 at 21.34.20
1 page
Basic IP2 Win Tutorial
No ratings yet
Basic IP2 Win Tutorial
32 pages
Project Report
No ratings yet
Project Report
27 pages
Exercise - Quality Tools
No ratings yet
Exercise - Quality Tools
10 pages
Company Presentation 29.07.2024
No ratings yet
Company Presentation 29.07.2024
20 pages
Qns Maps and Mapwork
No ratings yet
Qns Maps and Mapwork
24 pages
Understanding and Troubleshooting 4G Alarms
100% (1)
Understanding and Troubleshooting 4G Alarms
22 pages
7 Path Profile
No ratings yet
7 Path Profile
19 pages
Theory of Machines Lab Manual
No ratings yet
Theory of Machines Lab Manual
25 pages
Understanding Ground Water Modeling: Gary Johnson Donna Cosgrove
No ratings yet
Understanding Ground Water Modeling: Gary Johnson Donna Cosgrove
31 pages
33N25 FairchildSemiconductor
No ratings yet
33N25 FairchildSemiconductor
8 pages
Iso 2911 2004
No ratings yet
Iso 2911 2004
9 pages
Katana Technical Guide en
No ratings yet
Katana Technical Guide en
16 pages
Machine Learning for ICT Students
No ratings yet
Machine Learning for ICT Students
26 pages
Rangkaian Inverter
No ratings yet
Rangkaian Inverter
42 pages
260 Postulates of The Dynamics of Time
100% (5)
260 Postulates of The Dynamics of Time
78 pages
1 - Measurement Notes
No ratings yet
1 - Measurement Notes
11 pages
Gate Drive Design for Large MOSFETs
No ratings yet
Gate Drive Design for Large MOSFETs
10 pages
CN r19 Lecturenotes Unit 2
No ratings yet
CN r19 Lecturenotes Unit 2
19 pages
Curb Angle
100% (1)
Curb Angle
1 page
Anwana
No ratings yet
Anwana
16 pages
Cambridge IGCSE: Chemistry 0620/12
No ratings yet
Cambridge IGCSE: Chemistry 0620/12
16 pages

Daa Assignment 10 Aryan Project

Uploaded by

Daa Assignment 10 Aryan Project

Uploaded by

INDIAN INSTITUTE OF INFORMATION

Lab Manual: Design and Analysis of Algorithm (IT-215)

Topic: Gene Sequencing Analysis

Branch: Information Technology Section-2

Submitted by: Submitted to:

4.1 Data Collection:

4.2 Alignment Scoring Matrix:

4.3 Dynamic Programming Matrix Construction:

4.4 Backtracking for Optimal Path:

4.5 Validation and Testing:

2. Calculate scores to fill score matrix and traceback matrix.

How to interpret the best alignment from above matrix?

This is the optimal alignment derived using Needleman-Wunsch algorithm.

5. Implementation and Results Analysis:

5.1 Code Implementation:

using namespace std;

// Function to perform Needleman-Wunsch alignment and return the alignment

// Create the scoring matrix

// Fill the score matrix

// Alignment score is in the bottom-right cell

// Backtracking to find the aligned sequences

while (i > 0 || j > 0)

return {alignmentScore, {alignedA, alignedB}};

// Call the Needleman-Wunsch function

// Output similarity score and aligned sequences

5.2 Challenges Encountered:

5.3 Result Analysis:

You might also like