0% found this document useful (0 votes)

247 views10 pages

Perl DNA Sequence Manipulation Guide

This document contains a Perl tutorial covering various tasks for working with DNA/RNA sequences including: storing sequences in variables; concatenating sequences; transcribing DNA to RNA; calculating the reverse complement of a sequence; reading protein sequences from files; determining nucleotide frequencies using regular expressions and loops; and writing results to files. The tutorial provides code examples for each task and discusses concepts like using variables, file I/O, pattern matching, and conditional logic.

Uploaded by

Jessica Mitchell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

247 views10 pages

Perl DNA Sequence Manipulation Guide

Uploaded by

Jessica Mitchell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Perl

tutorial

Working with DNA Sequences

#!/usr/bin/perl -w
# Storing DNA in a variable, and printing it out
# First we store the DNA in a variable called $DNA

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Next, we print the DNA onto the screen

print $DNA;

# Finally, we'll specifically tell the program to exit.

exit;

Concatenating the DNA sequences

#!/usr/bin/perl -w
# Concatenating DNA
# Store two DNA fragments into variables called $DNA1
#and $DNA2

$DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
$DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';

# Print the DNA onto the screen

print "Here are the original two DNA fragments:\n\n";

print $DNA1, "\n";
print $DNA2, "\n\n";

# Concatenate the DNA fragments into a third variable and

#print them Using "string interpolation"
$DNA3 = "$DNA1$DNA2";
print "Here is the new DNA of the two fragments

version 1):\n\n";
print "$DNA3\n\n";

# An alternative way using the "dot operator":

# Concatenate the DNA fragments into a third variable and
# print them

$DNA3 = $DNA1 . $DNA2;

print "Here is the concatenation of the first two fragments
(version 2):\n\n";
print "$DNA3\n\n";

# Print the same thing without using the variable $DNA3

print "Here is the concatenation of the first two fragments

(version 3):\n\n";
print $DNA1, $DNA2, "\n";
exit;

TRANSCRIPTION: DNA -> RNA

#!/usr/bin/perl -w

# Transcribing DNA into RNA

# The DNA

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screen

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";

# Transcribe the DNA to RNA by substituting all T's with U's.

$RNA = $DNA;
$RNA =~ s/T/U/g;
# Print the RNA onto the screen
print "Here is the result of transcribing the DNA to
RNA:\n\n";
print "$RNA\n";

# Exit the program.

exit;

Reverse Complement

#!/usr/bin/perl -w
# Calculating the reverse complement of a strand of DNA

# The DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screen

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";

# Calculate the reverse complement

# First, copy the DNA into new variable $revcom

# (short for REVerse COMplement)
#
# It doesn't matter if we first reverse the string and then
# do the complementation; or if we first do the
complementation
# and then reverse the string. Same result each time.
# So when we make the copy we'll do the reverse in the same
statement.

$revcom = reverse $DNA;

-----
The DNA is now reversed.. we neeed to complement the bases
in revcom - substitute all bases by their complements.
# A->T, T->A, G->C, C->G
####Attempt 1:

$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/G/C/g;
$revcom =~ s/C/G/g;
# Print the reverse complement DNA onto the screen
print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";

#################

Does this work?? Why?

# See the text for a discussion of tr///
$revcom =~ tr/ACGTacgt/TGCAtgca/;

# Print the reverse complement DNA onto the screen

print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";
print "\nThis time it worked!\n\n";
exit;

Reading Proteins in files

#!/usr/bin/perl -w
# Reading protein sequence data from a file
# The filename of the file containing the protein sequence
data

$proteinfilename = 'Name_Of_your_sequence_file.txt';

# First we have to "open" the file, and associate

# a "filehandle" with it. We choose the filehandle
# PROTEINFILE for readability.
open(PROTEINFILE, $proteinfilename) || Die ("cannot open
file");

# Now we do the actual reading of the protein sequence data

from the file, by using the angle brackets < and > to get
the input from the filehandle. We store the data into our
variable $protein.

@protein = <PROTEINFILE>;

# Now that we've got our data, we can close the file.

close PROTEINFILE;

# Print the protein onto the screen

print "Here is the protein:\n\n";
print @protein;
exit;

Pattern matching: Motifs and Loops

Proceed ONLY if condition is true...

code layout..
if (condition)

do something

Finding Motifs
#!/usr/bin/perl -w
# if-elsif-else

$word = 'MNIDDKL';

# if-elsif-else conditionals

if($word eq 'QSTVSGE') {
print "QSTVSGE\n";
} elsif($word eq 'MRQQDMISHDEL') {
print "MRQQDMISHDEL\n";
}

GC CONTENT

In PCR experiments, the GC-content of primers are used to predict their annealing temperature
to the template DNA. A higher GC-content level indicates a higher melting temperature.

GC % = G + C x100

A+G+C+T

Logical:

for each base in the DNA

if base is A
count_of_A = count_of_A + 1

if base is C
count_of_C = count_of_C + 1
if base is G
count_of_G = count_of_G + 1

if base is T
count_of_T = count_of_T + 1

done

print count_of_A, count_of_C, count_of_G, count_of_T

the script

#!/usr/bin/perl -w
# Determining frequency of nucleotides
# Get the name of the file with the DNA sequence data

$dna_filename = File_name.txt;

# Remove the newline from the DNA filename

chomp $dna_filename;

# open the file, or exit

open(DNAFILE, $dna_filename) || die ("Cannot open file

\"$dna_filename\");
exit;
}

# Read the DNA sequence data from the file, and store it
# into the array variable @DNA
@DNA = <DNAFILE>;
# Close the file
close DNAFILE;

# From the lines of the DNA file,

# put the DNA sequence data into a single string.
$DNA = join( '', @DNA);
# Remove whitespace
$DNA =~ s/\s//g;

# Now explode the DNA into an array where each letter of

# the original string is now an element in the array.
# This will make it easy to look at each position.
# Notice that we're reusing the variable @DNA for this
purpose.
@DNA = split( '', $DNA );

# Initialize the counts.

# Notice that we can use scalar variables to hold numbers.
$count_of_A = 0;
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$errors = 0;

# In a loop, look at each base in turn, determine which of

# the four types of nucleotides it is, and increment the
# appropriate count.

foreach $base (@DNA)

{
if ( $base eq 'A' ) {
++$count_of_A;
}
elsif ( $base eq 'C' ) {
++$count_of_C;
}
elsif ( $base eq 'G' ) {
++$count_of_G;
}
elsif ( $base eq 'T' ) {
++$count_of_T;
}
else {
print "!!!!!!!! Error - I don\'t recognize this
base: $base\n";
++$errors;
}
}

# print the results

print "A = $count_of_A\n";
print "C = $count_of_C\n";
print "G = $count_of_G\n";
print "T = $count_of_T\n";
print "errors = $errors\n";
# exit the program
exit;

---using regex ---

while($DNA =~ /a/ig){$a++}
while($DNA =~ /c/ig){$c++}
while($DNA =~ /g/ig){$g++}
while($DNA =~ /t/ig){$t++}
while($DNA =~ /[^acgt]/ig){$e++}
print "A=$a C=$c G=$g T=$t errors=$e\n";

----

Next is a new kind of loop, the foreach loop. This loop works over the elements
of an
array. The line:
foreach $base (@DNA)

Wrtiting to files

# Also write the results to a file called "countbase"

$outputfile = "countbase";
(
unless
open(COUNTBASE, ">$outputfile") || die ("Cannot open file
\"$outputfile\" to write to!!\n\n");

print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

close(COUNTBASE);

Perl Techniques for Bioinformatics
No ratings yet
Perl Techniques for Bioinformatics
69 pages
Perl Programming Exercises 1 - 'A B C'
No ratings yet
Perl Programming Exercises 1 - 'A B C'
29 pages
Perl Regular Expressions for Biologists
No ratings yet
Perl Regular Expressions for Biologists
11 pages
Bioinformatics Programming Assignments
100% (1)
Bioinformatics Programming Assignments
4 pages
Bioinformatics Lab Manual V Semester
No ratings yet
Bioinformatics Lab Manual V Semester
28 pages
Lecture14-Perl in Bioinformatics
No ratings yet
Lecture14-Perl in Bioinformatics
19 pages
Bioinformatics with Perl
No ratings yet
Bioinformatics with Perl
49 pages
Perl Program
No ratings yet
Perl Program
38 pages
Introduction To Perl: Part 1
No ratings yet
Introduction To Perl: Part 1
11 pages
HW 13
No ratings yet
HW 13
6 pages
Bioperl: Perl Modules for Life Sciences
No ratings yet
Bioperl: Perl Modules for Life Sciences
47 pages
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
No ratings yet
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
13 pages
B Perl: Submitted To:S .N
No ratings yet
B Perl: Submitted To:S .N
8 pages
Assignment - Idc306
No ratings yet
Assignment - Idc306
6 pages
Primr Design
No ratings yet
Primr Design
57 pages
Perl Exercises
No ratings yet
Perl Exercises
14 pages
Perl & BioPerl for Programmers
No ratings yet
Perl & BioPerl for Programmers
103 pages
Manual de Ejercicios de Python
No ratings yet
Manual de Ejercicios de Python
1 page
Sequence File Formats
No ratings yet
Sequence File Formats
22 pages
Introduction to Perl Programming
No ratings yet
Introduction to Perl Programming
25 pages
Perl Reference Card Overview
No ratings yet
Perl Reference Card Overview
2 pages
Linux Commands for Bioinformatics Tutorial
No ratings yet
Linux Commands for Bioinformatics Tutorial
3 pages
PERL Bioinformatics Course Guide
No ratings yet
PERL Bioinformatics Course Guide
2 pages
IBS Basic Problems
No ratings yet
IBS Basic Problems
10 pages
Perl Scripts for Beginners
No ratings yet
Perl Scripts for Beginners
3 pages
Beginning Perl For Bioinformatics-RVS
No ratings yet
Beginning Perl For Bioinformatics-RVS
49 pages
PERL Programming for Bioinformatics
No ratings yet
PERL Programming for Bioinformatics
3 pages
Linux Bootcamp Exercises
No ratings yet
Linux Bootcamp Exercises
9 pages
Perl Doc
No ratings yet
Perl Doc
13 pages
02 Handling Files
No ratings yet
02 Handling Files
18 pages
Afpjawprwa'tj 3
No ratings yet
Afpjawprwa'tj 3
6 pages
Bioinformatics Data Skills (PDFDrive)
No ratings yet
Bioinformatics Data Skills (PDFDrive)
30 pages
02 Sequence Alignment
No ratings yet
02 Sequence Alignment
43 pages
Scripting Through PERL
No ratings yet
Scripting Through PERL
22 pages
Analyzing DNA with Bioinformatics Techniques
No ratings yet
Analyzing DNA with Bioinformatics Techniques
119 pages
Perl Programming Basics Tutorial
No ratings yet
Perl Programming Basics Tutorial
54 pages
Computer Manipulation of DNA and Protein Sequences
No ratings yet
Computer Manipulation of DNA and Protein Sequences
23 pages
Linux Examples Exercises
No ratings yet
Linux Examples Exercises
7 pages
Perl Reference Card #2
No ratings yet
Perl Reference Card #2
3 pages
Perl Tutorial: Based On A Tutorial by Nano Gough
No ratings yet
Perl Tutorial: Based On A Tutorial by Nano Gough
24 pages
Arhqh 32 Po 9 Lknan 2
No ratings yet
Arhqh 32 Po 9 Lknan 2
6 pages
Computational Problem For Practice
No ratings yet
Computational Problem For Practice
18 pages
Biopython Tutorial PDF
No ratings yet
Biopython Tutorial PDF
332 pages
Biopython Tutorial and Cookbook
No ratings yet
Biopython Tutorial and Cookbook
324 pages
Bio Python Tutorial
No ratings yet
Bio Python Tutorial
331 pages
Bio Python
100% (1)
Bio Python
357 pages
Introduction to Shell Scripting
No ratings yet
Introduction to Shell Scripting
6 pages
Bioinfomatics
No ratings yet
Bioinfomatics
21 pages
BioPython Cookbook
No ratings yet
BioPython Cookbook
310 pages
Perl Short-Cut For Variable - Scalar Would Be Scalar Short-Cut Names Have The Least
No ratings yet
Perl Short-Cut For Variable - Scalar Would Be Scalar Short-Cut Names Have The Least
23 pages
Web Technologies
No ratings yet
Web Technologies
12 pages
Practical 6 Com
No ratings yet
Practical 6 Com
5 pages
Lecture 01
No ratings yet
Lecture 01
20 pages
Pract 6
No ratings yet
Pract 6
5 pages
Outline For Chapter 7
No ratings yet
Outline For Chapter 7
2 pages
Key Cancer Terms for Patients
No ratings yet
Key Cancer Terms for Patients
7 pages
Pra 1 Swiss Prot
No ratings yet
Pra 1 Swiss Prot
2 pages
Nguyen, Mechanistic Model For Production of rAAV in HEK293
No ratings yet
Nguyen, Mechanistic Model For Production of rAAV in HEK293
16 pages
Understanding Voluntary Actions and Coordination
No ratings yet
Understanding Voluntary Actions and Coordination
3 pages
Introduction
No ratings yet
Introduction
3 pages
KK5701 PDF
No ratings yet
KK5701 PDF
4 pages
Conservation Genetics Finalz Handouts
No ratings yet
Conservation Genetics Finalz Handouts
15 pages
Genetic Diversity Based On Morphology and RAPD Analysis in Vegetable Soybean
No ratings yet
Genetic Diversity Based On Morphology and RAPD Analysis in Vegetable Soybean
10 pages
Principles of Inheritance and Variation New
No ratings yet
Principles of Inheritance and Variation New
207 pages
Nutritional Quality Traits in Beans
No ratings yet
Nutritional Quality Traits in Beans
206 pages
UPSC Anthropology Exam Guide
No ratings yet
UPSC Anthropology Exam Guide
42 pages
Initiation of DNA Replication: Replicon
No ratings yet
Initiation of DNA Replication: Replicon
20 pages
Syllabus Outline Mains ARS
No ratings yet
Syllabus Outline Mains ARS
155 pages
RNA Structure and Function Guide
100% (1)
RNA Structure and Function Guide
20 pages
Mendelian Genetics Worksheet
No ratings yet
Mendelian Genetics Worksheet
4 pages
BCH 231 Nucleic Acid Chemistry 70 CBT Questions
No ratings yet
BCH 231 Nucleic Acid Chemistry 70 CBT Questions
4 pages
Lesson Topic: Objective:: Quick Play
No ratings yet
Lesson Topic: Objective:: Quick Play
13 pages
Pisharody Genetic Counseling
No ratings yet
Pisharody Genetic Counseling
15 pages
Ikram Et Al 2024
No ratings yet
Ikram Et Al 2024
12 pages
JASPAR: An Open-Access Database For Eukaryotic Transcription Factor Binding Pro®les
No ratings yet
JASPAR: An Open-Access Database For Eukaryotic Transcription Factor Binding Pro®les
4 pages
Blank Quiz
No ratings yet
Blank Quiz
6 pages
Epistasis in Computational Genomics
No ratings yet
Epistasis in Computational Genomics
28 pages
Kaavya 2018
No ratings yet
Kaavya 2018
49 pages
Mendel: The Father of Genetics
No ratings yet
Mendel: The Father of Genetics
2 pages
DNA Synthesis and Polymerase Mechanism
No ratings yet
DNA Synthesis and Polymerase Mechanism
22 pages
Transposable Elements
No ratings yet
Transposable Elements
11 pages
GENERAL BIOLOGY 2 Evolution and Origin of Biodiversity
No ratings yet
GENERAL BIOLOGY 2 Evolution and Origin of Biodiversity
30 pages
Cells vs. Viruses Venn Diagram Sort: Interactive Notebooking Activity
No ratings yet
Cells vs. Viruses Venn Diagram Sort: Interactive Notebooking Activity
2 pages
Scorpion Gene Cabbage Innovation
No ratings yet
Scorpion Gene Cabbage Innovation
7 pages

Perl DNA Sequence Manipulation Guide

Uploaded by

Perl DNA Sequence Manipulation Guide

Uploaded by

Perl

Working with DNA Sequences

# Next, we print the DNA onto the screen

# Finally, we'll specifically tell the program to exit.

Concatenating the DNA sequences

# Print the DNA onto the screen

print "Here are the original two DNA fragments:\n\n";

# Concatenate the DNA fragments into a third variable and

# An alternative way using the "dot operator":

$DNA3 = $DNA1 . $DNA2;

# Print the same thing without using the variable $DNA3

print "Here is the concatenation of the first two fragments

TRANSCRIPTION: DNA -> RNA

# Transcribing DNA into RNA

# Print the DNA onto the screen

# Transcribe the DNA to RNA by substituting all T's with U's.

# Exit the program.

# Print the DNA onto the screen

# Calculate the reverse complement

# First, copy the DNA into new variable $revcom

$revcom = reverse $DNA;

Does this work?? Why?

# Print the reverse complement DNA onto the screen

# First we have to "open" the file, and associate

# Now we do the actual reading of the protein sequence data

# Print the protein onto the screen

Pattern matching: Motifs and Loops

Proceed ONLY if condition is true...

for each base in the DNA

print count_of_A, count_of_C, count_of_G, count_of_T

# Remove the newline from the DNA filename

# open the file, or exit

open(DNAFILE, $dna_filename) || die ("Cannot open file

# From the lines of the DNA file,

# Now explode the DNA into an array where each letter of

# Initialize the counts.

# In a loop, look at each base in turn, determine which of

foreach $base (@DNA)

# print the results

---using regex ---

# Also write the results to a file called "countbase"

print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

You might also like