Practical Assignment IV
Module topic: Bioinformatics resources and databases
Session title: DNA sequence analysis
DNA sequence analysis
Introduction
DNA sequences can be extracted from public databases, or users may have their own piece of DNA
which they have sequenced, and which requires some basic analysis. In this practical we explore
Features of a DNA sequence extracted from the public database. And doing some basic DNA Analysis
of the sequences to find important features encoded on the sequence. In this practical we will identify
the gnomic region features and its related mRNA and gene sequence. Finding the open reading frames
and trying to find suitable restriction enzyme sites for cloning most of the ORF.
Tools used in this session
For extraction and analysis of the sequence, we will use the websites below
Primer design: https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC=BlastHome
Restriction maps: https://nc3.neb.com/NEBcutter/
Promoter binding transcription factors:
https://alggen.lsi.upc.es/cgibin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3
Promoter finder: https://genome.ucsc.edu/util.html
ORF finder: https://www.ncbi.nlm.nih.gov/orffinder/
NCBI: www.ncbi.nlm.nih.gov
Task 1: Finding the DNA Sequence and extracting its features
We will work on the gene symbol (CYP2C19) in NCBI database. CYP2C19 gene is a member of the
cytochrome P450 gene family. Enzymes produced from cytochrome P450 genes are involved in the
formation and breakdown (metabolism) of various molecules and chemicals within cells, it is a
genetically polymorphic drug-metabolizing enzyme, it plays a role in the processing or metabolizing
of at least 10 percent of commonly prescribed drugs. Patients with CYP2C19 genetic variants can
significantly influence how patients metabolize drugs, especially, clopidogrel, a widely used
antiplatelet drug.
STEP 1: Finding the correct NCBI record.
Start at the NCBI homepage: www.ncbi.nlm.nih.gov and choose nucleotide. Please, answer the
following questions:
1. Entrez search for CYP2C19, how many databases return result?
2. Search for genomic CYP2C19, which database you will choose?
3. In section results by taxon, did you obtain any results for Mus musculus organism?
4. Is the construct sequence the same as synthetic sequence?
5. Try different types of page display & different sort, to get genomic refseq human cyp2c19 ,
how many result do you get? report your result
a. What is the difference between NC_ & NG_?
b. Which record you will choose and why?
STEP 2: Navigate the graphics in the chosen record
First have an overview of your sequence from graphics view. Open the Graphics tape in the upper left
corner
1. take screenshot and describe the meaning of colors and arrows directions?
2. How many genes and located on which strand?
3. How many mRNA, proteins, and exons?
4. Are there any overlapping genes?
5. Do you have any partial sequences, why?
STEP 3: Identification of flat file sections
From header section:
1. What the type and length of the sequence?
2. What is the version of the sequence and how to track the changes of the sequence?
3. When this sequence recently updated?
4. What is PRI meaning in the locus?
5. What is the status of the Refseq sequence and what it means?
From feature section
1. Mention 3 ways enable you to get all information of sequence features.
2. What is CYP2C19 gene synonym, gene id, transcript id and protein id? From the previously
extracted gene id --- click it to open the gene flat file
3. What is the ensemble id?
4. What is the location of gene?
Click the mRNA id --- From Highlight sequence feature
5. Is the start codon position in CDS the same as the first exon? Why?
6. To find the answer WHY? From Graphics tape From tools open sequence to text and
describe the window
7. How many exons and introns the sequence ?
Task 2: Finding the Open Reading Frame (ORF)
1. Using graphics: From the graphics of mRNA -Configure tracks -----choose sequence
from left bar---tick all active track to show 6 reading frames ----configure
2. Check you answer by using ORF finder: https://www.ncbi.nlm.nih.gov/orffinder/ Try all
result displaying options and take screen shots.
Task 3: Identify the promoter and binding transcription factors
Instructions
For promoter finding use https://genome.ucsc.edu/util.html. Using UCSC genome browser - choose
genome browser. Type the first 2 letter of your gene then choose it from the suggested drop menu--
choose your gene from the result page --- choose genomic sequence.
1. What is the promoter sequence
For promoter binding transcription factors use promo database
https://alggen.lsi.upc.es/cgibin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3. Select search factors
for humans. Select search factor - past promoter sequence you get from UCSC submit.
1. What is the distribution of the nucleotides over the given chain:
2. Click data txt to get txt sheet of all sites.
Task 4: Identify the restriction enzymes in the sequence
Instructions
If we want to clone the first part of the ORF we need to generate a “Restriction endonuclease map”,
we will use: https://nc3.neb.com/NEBcutter/, and paste your original DNA sequence in (be sure it is
the DNA sequence (ACGTs) and not the protein sequence).
1. What is GC % and AT % ?
2. What is the finding ORF?
To select restriction enzymes to use for cloning, you would need to find 2 different restriction
enzymes that cut around the beginning of the predicted ORF and somewhere in the middle of the ORF
to get the longest possible fragment. The enzymes should only cut the insert once each otherwise you
would get multiple fragments.
3. Which restriction enzymes would you select to get the largest portion of the ORF (appropriate
enzymes closest to the start and end of the ORF)? Hint: from the Restriction summary table
find the enzymes that cut closest to the beginning and middle/end of the ORF.
Task 5 Design primers using primer blast
To design primer, use this link:
https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC=BlastHome. Paste the mRNA
accession number. Change the PCR product 100 -500 and use default tm settings.
1. Find the primer pairs
2. Check the specificity of Primer 3 sequence