28/01/2025
IPC ASSIGNMENT 2
Problem Statement
Title: Parallel Processing of Genome Sequence Alignment Using OpenMP
Description: Genome alignment is a critical computational task in bioinformatics used to
compare DNA sequences, identify similarities, and detect mutations or evolutionary
relationships. Given the growing availability of genomic data from sequencing projects,
analyzing large genome datasets (e.g., multi-gigabyte FASTA files) is computationally expensive
and time-consuming.
Your task is to design and implement a parallel solution using OpenMP to perform
pairwise sequence alignment on genome data stored in FASTA files. The alignment should
compute a similarity score for each pair of sequences using a scoring scheme (e.g.,
match/mismatch penalties and gap penalties).
Objectives:
1. Input: A FASTA file containing genome sequences, each ranging from 1 MB to 10 MB
in size.
2. Output: A similarity matrix containing scores for all pairwise sequence alignments.
Constraints:
1. Scoring scheme for alignment will be based on any relevant algorithm of your choice.
2. The program should utilize OpenMP to parallelize the computation across multiple cores.
Deliverables:
1. An OpenMP-enabled C/C++ program that reads FASTA files, performs sequence
alignment, and outputs the similarity matrix to a file.
2. A performance analysis report comparing single-threaded and parallel implementations,
highlighting the speedup achieved using OpenMP.
3. Documentation explaining the approach, the OpenMP directives used, and any
optimization techniques employed.
Dataset help:-
To work on the problem of parallel processing of genome sequence alignment using
OpenMP, you'll need access to large genomic datasets in FASTA format real genomic
data (composed of A, T, G, C).
. The reputable source where you can download such data:
NCBI Datasets: The National Center for Biotechnology Information (NCBI) provides a
command-line tool to download large genome data packages. Link-
https://www.ncbi.nlm.nih.gov/
When selecting or creating FASTA files for your OpenMP-based genome sequence alignment,
the characteristics of the data should align with the specific goals and challenges you want to
address. Refer below on how to prepare the appropriate dataset:
a. Genome sequences should share high similarity but still have significant variations
b. Similar Species Dataset:
o Human (Homo sapiens) and Neanderthal genome sequences.
o Human (Homo sapiens) vs. Chimpanzee (Pan troglodytes), or different strains of
the same bacterium (e.g., Escherichia coli K-12 vs. O157:H7).
c. Dataset Size
o Small-scale testing: 10–100 sequences of 1–2 MB each.
Make sure the sequences vary in length, as this introduces computational challenges
Problem Statement
1. Parallelize a Binary Tree Traversal using tasks.
2. Write separate code for the following directives and explain its use in
various applications during viva
#pragma omp atomic
#pragma omp barrier
#pragma omp critical
#pragma omp flush
#pragma omp for
#pragma omp master
#pragma omp ordered
#pragma omp parallel
#pragma omp parallel for
#pragma omp parallel sections
#pragma omp section
#pragma omp sections
#pragma omp single
#pragma omp task
#pragma omp taskwait
#pragma omp taskyield
#pragma omp task priority(pvalue)
#pragma omp threadprivate