THE HUMAN GENOME
PROJECT
Bijay Kumar Gupta
[Link]. Biochemistry & Molecular Biology
4th Semester, KUSMS
CONTENTS
Introduction
Goal of human genome project
Techniques
Conclusion
INTRODUCTION
The Human Genome Project (HGP) was a global effort to
sequence the entire human genome.
Official timeline: 1990–2003.
HISTORICAL BACKGROUND
Milestones leading to HGP:
1953: Discovery of DNA structure by Watson & Crick.
1977: Development of Sanger sequencing.
1980s: Small-scale genome mapping efforts.
ORGANIZATIONS INVOLVED
It was a collaborative effort between the U.S. Department of Energy (DOE)
and the National Institutes of Health (NIH).
Major international partners included:
The united kingdom
Japan
France
Germany
China
GOALS OF THE HUMAN GENOME
PROJECT
Optimization of the data analysis.
Sequencing the entire genome.
Identification of the complete human genome.
Creating genome sequence databases to store the data.
Taking care of the legal, ethical and social issues that the
project may pose.
THE PROCESS OF THE HUMAN
GENOME PROJECT
The whole DNA of the cell is isolated and randomly broken into fragments.
They are inserted into special vectors like BAC (Bacterial Artificial Chromosomes) and YAC (Yeast
Artificial Chromosomes).
These fragments are then cloned into suitable hosts like bacteria and yeast.
A Polymerase Chain Reaction (PCR) is used to make copies of DNA fragments.
The fragments are sequenced using Sanger sequencing.
CONT..
The sequences are then arranged based on the overlapping
regions.
The sequences were then annotated and assigned to different
chromosomes.
The genetic and physical maps are also made with the help of
polymorphism of microsatellites and restriction endonuclease
TECHNIQUES INVOLVED
1. Collection and Preparation of DNA Samples
Source of DNA: The DNA for the HGP was extracted from a
small number of anonymous donors. Blood and saliva samples
were used as primary sources.
Isolation: DNA was isolated from cells using methods such as
lysis of the cells, removal of proteins using detergents or
proteases, and precipitation of DNA using ethanol.
2. CUTTING DNA INTO SMALLER
FRAGMENTS
Restriction Enzymes: DNA was cut into
smaller fragments using restriction
enzymes that recognize specific sequences.
Sonication: Alternatively, high-frequency
sound waves were used to shear DNA into
random fragments.
3. CLONING OF DNA FRAGMENTS
Vector-Based Cloning: DNA fragments were inserted into vectors
(plasmids, BACs, or YACs) to create a "DNA library."
Bacterial Artificial Chromosomes (BACs): Large DNA fragments (100–
300 kb) were cloned into BACs.
Yeast Artificial Chromosomes (YACs): Larger DNA fragments (up to 1
Mb) were cloned into YACs.
Transformation: The vectors carrying the DNA fragments were introduced
into host cells (bacteria or yeast), which multiplied to produce many copies.
4. SEQUENCING DNA FRAGMENTS
Two primary sequencing techniques were used:
a. Sanger Sequencing
Chain Termination Method: DNA synthesis was terminated using
dideoxynucleotides (ddNTPs), each labeled with a fluorescent dye.
Gel Electrophoresis: The terminated DNA fragments were separated
by size using capillary electrophoresis.
Fluorescent Detection: A laser detected the fluorescently labeled
SANGER SEQUENCING
Sanger sequencing also known as the “chain termination
method,” was developed by the English biochemist Frederick
Sanger and his colleagues in 1977.
It is designed for determining the sequence of nucleotide bases in
a piece of DNA (commonly less than 1,000 bp in length).
Sanger sequencing with 99.99% base accuracy is considered the
“gold standard” for validating DNA sequences, including those
already sequenced through next-generation sequencing (NGS).
SANGER SEQUENCING VS NGS
The development of NGS technologies has accelerated genomics
research
NGS can simultaneously sequence more than 100 genes and
whole genomes with low-input DNA.
Sanger sequencing remains widely used in the sequencing field
as it offers several prominent advantages:-
cost-efficiency for sequencing single genes
99.99% accuracy, especially suitable for verification
Source: “Whole-genome sequencing” by OpenStax
College, Biology
b. Shotgun Sequencing
Random Fragmentation: DNA was
fragmented randomly and sequenced.
Computational Assembly: Overlapping
sequences were identified using software
to reconstruct the complete genome
sequence.
5. PHYSICAL MAPPING
Chromosome Mapping: DNA fragments were ordered and
aligned along chromosomes using:
Restriction Mapping: Based on the patterns of DNA
cleavage by restriction enzymes.
Fluorescent In Situ Hybridization (FISH): Marked DNA
probes hybridized to chromosomes to determine fragment
positions.
6. COMPUTATIONAL ANALYSIS
(BIOINFORMATICS)
Sequence Assembly: Overlapping sequences were assembled to
reconstruct the genome.
Error Correction: Algorithms detected and corrected sequencing
errors.
Annotation: Software identified genes, regulatory regions, and
repetitive sequences.
Data Storage and Sharing: Data was stored in databases like
GenBank and shared globally for analysis.
SALIENT FEATURES OF THE
HUMAN GENOME PROJECT
The human genome is made up of 3164.7 million
nucleotides.
The average gene is 3000 base pairs long. On the X-
chromosome, the largest gene is Duchenne Muscular
Dystrophy. It has 2.4 million base pairs (2400 kilo). The
genes for B-globin and insulin are less than 10 kilobases
long.
CONT..
The human genome contains approximately 30,000 genes.
It was previously estimated that it contained 80,000 to
100,000 genes. The number of genes in humans is roughly
equal to that of mice.
More than half of the discovered genes' functions are
unknown.
Proteins are coded for in less than 2% of the genome
CONT..
Approximately 1 million copies of short 5-8 base pair repeated
sequences are clustered around centromeres and near the ends of
chromosomes. They represent junk DNA.
Chromosome I has the most genes (2968) and Y has the fewest
(231).
In humans, there are approximately 1.4 million locations where
single-base DNA differences (SNPs- Single nucleotide
polymorphism) occur.
CHALLENGES
Handling vast amounts of data.
Assembling repetitive DNA sequences.
Ensuring accuracy and completeness.
ACHIEVEMENTS
Sequencing of 3 billion base pairs.
Identification of approximately 20,000–25,000 human genes.
Discovery of numerous single nucleotide polymorphisms
(SNPs).
REFERENCES
Sikkema‐Raddatz, Birgit, et al. Targeted next‐generation sequencing
can replace Sanger sequencing in clinical diagnostics. Human
mutation 34.7 (2013): 1035-1042.
Sanger F; Coulson AR (May 1975). “A rapid method for determining
sequences in DNA by primed synthesis with DNA polymerase”. J.
Mol. Biol. 94 (3): 441–8.
[Link]
animaciones/[Link]
Thank
you