Computing with DNA
James A. Foster Laboratory for Applied Logic Dept. of Computer Science University of Idaho May 8, 1997
Outline part one
Chemistry of DNA Polymerase Chain Reaction Brute Force Computing Finding Hamiltonian Paths
May 8, 1997 jaf
Outline part two
A Mathematical Model Solving SAT P-DNA is PSPACE Potential and Limitations
May 8, 1997 jaf
Chemistry of DNA
DNA molecules: paired strands of nucleotides bases attached to sugarphosphate backbones Nucleotides bases: Adenine binds with Thymine, Guanine binds with Cytosine Backbone: 5 carbons, similar to linked list One molecule's 50 binds to next ones 30 10 binds to nucleotide Paired Strands: Bases bond to complementary strand Sequences: listed 50 to 30
Laboratory for Uphill Computing
May 8, 1997 jaf
DNA Molecule Illustration
5 3
T A
G C
C G
A T
GCCA
TGCATTC
CGGT
ACGTAAG
Note:
GCCATGCATTC
CGGTACGTAAG
s is complement of s is
May 8, 1997 jaf
Laboratory for Uphill Computing
Polymerase Chain Reaction PCR
Given: collection of DNA and two primers, s; t Action: amplify strands of the form
any sequence v
Input: Repeat until satisfied 1 denature DNA with heat 2 anneal DNA with primers 3 elongate strands with DNA polymerase
svt
for
tube T of DNA, primers s and t
Note: copy number for target doubles each iteration
Laboratory for Uphill Computing
May 8, 1997 jaf
0) Given
5 3
PCR Illustration
s s t
1111111111111111 0000000000000000 11 00 1111111111111111 0000000000000000 1111111111111111 0000000000000000 11 00 1111111111111111 0000000000000000
t s t
3 5
1) Denature (heat)
1111111111111111 0000000000000000 11 00 1111111111111111 0000000000000000 11111 00000 111111111111 000000000000 11111 00000 111111111111 000000000000 11111 00000 1111111111 0000000000
t s s t
2) Anneal (Add primers)
s
111111111111111111 000000000000000000 111111111111111111 000000000000000000 11111 00000 111111111111 000000000000 11111 00000 1111111111 0000000000 11111 00000
t t s
11111 00000
3) Elongate (add polymerase)
s s
111111111111111111 000000000000000000 111111111111111111 000000000000000000 11111 00000 111111111111 000000000000 11111 00000 1111111111 0000000000 11111 00000
t t t s
11111 00000
Laboratory for Uphill Computing
May 8, 1997 jaf
Brute Force Computing
To solve problem P :
Represent input instance x as DNA Make tube T Amplify positive results in T Sample T to get answer Represent possible solutions to P x
as DNA
with every possible solution to P x
Recall: NP problems are easy to solve given a short hint". This algorithm checks all possible hints" in parallel, with a polynomial number of operations.
Laboratory for Uphill Computing
May 8, 1997 jaf
Example: Finding Hamiltonian Paths
The NP complete directed Hamitonian Path dHP problem:
Given: Directed graph G, nodes f , t Question: Is there a directed path, visiting every node exactly once, from f to t in G?
2 5
3
A possible Hamitonian path: (F,2,4,6,3,5,T)
Laboratory for Uphill Computing
May 8, 1997 jaf
DNA Algorithm for dHP
Input: Graph G, nodes f and t 0 Represent nodes, edges, paths with DNA 1 Fill tubes with all possible paths 2 Select paths from f to t 3 Select paths of correct length 4 Select paths without duplicate vertexes 5 If anything remains Then return ``yes'' Else ``no''
Laboratory for Uphill Computing
May 8, 1997 jaf
0: Representation Node v: 20 random base pair sequence S
v
Long enough not to bind to each other Short enough for PCR to work Edge u; v: build sequence S with 10-base su x from S , 10-base pre x from S except use all of S and S
uv u v f t
Path a
b c:
catenate a; b and b; c
Laboratory for Uphill Computing
May 8, 1997 jaf
10
Example Examples of Adlemans encoding Nodes S2: GTCACACTTC GGACTGACCT S2: AGGTCAGTCC GAAGTGTGAC S4: TGTGCTATGG GAACTCAGCG S4: CGCTGAGTTC CCATAGCACA S5: CACGTAAGAC GGAGGAAAAA S5: TTTTTCCTCC GTCTTACGTG Edges (2,4): GGACTGACCT TGTGCTATGG (4,5): GAACTCAGCG CACGTAAGAC Paths (2.4.5):
GTCACACTTC GGACTGACCT TGTGCTATGG GAACTCAGCG CACGTAAGAC GGAGGAAAAA
11
AGGTCAGTCC GAAGTGTGAC
CGCTGAGTTC CCATAGCACA
TTTTTCCTCC GTCTTACGTG
1: Fill tube with all paths
Amplify tubes of Sx and Sx for each node x Amplify tubes of Suv and Suv for each edge Mix all tubes into tube T
uv
Overlapping segments will bind and leave sticky ends" to promote further binding
Example
Edge (2,4) Node 4
GGACTGACCT TGTGCTATGG CGCTGAGTTC CCATAGCACA
With high probability, every possible path through G will be represented in T
Laboratory for Uphill Computing
May 8, 1997 jaf
12
2,3,4: Select candidate paths f
F T
t
Run PCR on tube T using S and S as primers, put products in T 0 Separate strands with 20n + 10 bases from T 0, put products in tube R Note: by construction, nodes can be visited at most once with high probability If any DNA is left in R, return yes", else no"
Laboratory for Uphill Computing
May 8, 1997 jaf
13
A Mathematical Model
Primitives: tubes of DNA or similar Operations:
Remove T; T 0;
of form
i
,
placing them in T 0
ig
: Remove all strings in T
Detect T : Decide if T has DNA in it Mix T
f
ig
;T
: Pour all T s into T
i
Copy T;
Tig
: Pour T into each T
We implicitly assume operations such as separation by size, ampli cation, ligation, annealing, and denaturing where needed
Laboratory for Uphill Computing
May 8, 1997 jaf
14
Solving dHP
Representation: as before
Input: for each node v and edge
Tv contains Sv and Sv 0 Tuv contains Suv and Suv 0 MixfTi; Tuv g,T RemoveT ,T0,fSf g RemoveT0,T 0,fSt g Move length 20n + 10 strings from T 0 to T 00 if DetectT 00
then return ``Yes'' else return ``No''
u; v,
Complexity: linear in number of nodes for Mix Note: S and S are DNA strands representing node v and edge u; v in input graph
v uv
Laboratory for Uphill Computing
May 8, 1997 jaf
15
Listing Permutations
Problem: input n, list all permutations of items Representation: p1i1p2i : : : p i where codes position j ", i 1; 2; : : : ; n
x n n j 2 f
pj
en-
Input: T with all valid strings for j to n CopyT ,fT1; T2; : : : ; Tng
=1
i is any string other than i MixfT1; T2; : : : ; Tn g,T first j is are distinct in each string T contains permutations of f1; 2; : : : ; ng
:
for i to n for k j to n RemoveTi,J ,fpj :i; pk ig
=1 = +1
Requires On2 operations
Laboratory for Uphill Computing
May 8, 1997 jaf
16
Solving SAT
The Problem
Given: Boolean formula in CNF p conjunctions, q literals per clause, n variables
^_ l F ~ = x
p q n i
where l = x or x for some variable x
i;j k k
=1 =1
j
i;j
Question: Is there an ~ s. t. F x~ = T ? x
n n
Example: F ~ 3 = x1 x3 x1 x2 x3 x2 x G~ 2 = x1 x1 x2 x2 x
_ ^ _ _ ^ ^ _ ^
x3
is satis able F T; T; T = T , G is not
May 8, 1997 jaf
Laboratory for Uphill Computing
17
Laboratory for Uphill Computing
May 8, 1997 jaf
Representing Truth Assignments x1 xn
v1
v2
Etc.
vn
x1
xn
Variables: x1, x2, ..., xn Extra nodes: v1, v2, ..., vn Paths: sequences of literals over these variables
18
DNA Algorithm for SAT
T0 full of all truth assignment p q Input: Boolean formula F = ^i=1_j =1li;j for i = 1 to p for j = 1 to p if li;j is positive then RemoveTi,1,T 0,fxj g else RemoveTi,1,T 0,fxj g Strands in T 0 will make clause j true Re-label T 0 as Ti if DetectTp
Input: then return ``yes'' else return ``no''
Requires On2 operations
Laboratory for Uphill Computing
May 8, 1997 jaf
19
Computational Complexity
Let P-DNA be problems solvable with polynomial steps in this model Thm Beaver: P-DNA = PSPACE Generalized Turing-complete models splicing systems, DNA TMs exist But, P-DNA computations still require exponential volume, and perhaps lots of clock time
Laboratory for Uphill Computing
May 8, 1997 jaf
20
Disadvantages
Steps" are manual and slow Reaction time proportional to volume of reactants: real time can be much slower than number of steps Required volume can be huge Processes can introduce errors Processes do not scale up well
Laboratory for Uphill Computing
May 8, 1997 jaf
21
Possible solutions to problems
Add active transport or catalyst to tubes Build targeted solutions forget brute force Compute on surfaces Change molecules
Laboratory for Uphill Computing
May 8, 1997 jaf
22
Advantages
Massive parallelism Attack special instances e.g. Keller graphs for MC Very low energy consumption 10,19 J versus 10,9 J per basic operation with no inherent lower bound Way cool
Laboratory for Uphill Computing
May 8, 1997 jaf
23
L. Adleman, Molecular Computation of Solutions to Combinatorial Problems", Science, 266:1021 1024, 1994 L. Adleman, On Constructing a Molecular Computer", manuscript, ftp: [Link] pub csinfo papers adlemanmolecular [Link], 1995 D. Beaver, A Universal Molecular Computer", Technical report, Penn. State U., 1995 J. Hartmanis, On the Weight of Computations", Bull. Euro. Assoc. for Theoretical Comp. Sci., 55:136 138, 1995 J. H. Reif, Parallel Molecular Computation", in Proc. 7th ACM Symp. on Parallel Alg. and Arch., pp. 213 223, 1995 Also see URLs from my homepage bookmarks
Further Reading
Laboratory for Uphill Computing
May 8, 1997 jaf
24