Dynamic and Scalable DNA-based Information
Dynamic and Scalable DNA-based Information
[Link] OPEN
The physical architectures of information storage systems often dictate how information is
encoded, databases are organized, and files are accessed. Here we show that a simple
1234567890():,;
1 Department of Chemical and Biomolecular Engineering, North Carolina State University, Campus Box 7905, Raleigh, NC 27695-7905, USA. 2 Department
of Electrical and Computer Engineering, North Carolina State University, Campus Box 7911, Raleigh, NC 27695-7911, USA. ✉email: jtuck@[Link];
ajkeung@[Link]
T
he creation of digital information is rapidly outpacing creative use of molecular biology approaches in prior work to
conventional storage technologies1. DNA may provide a address some of these challenges. For example, polymerase chain
timely technological leap due to its high storage density, reaction (PCR) is the predominant method for information access
longevity2–4, and energy efficiency5. A generic DNA-based in DNA storage systems7,12 and is scalable, especially with some
information storage system is shown in Fig. 1a, where digital modifications17, while single-stranded DNA toeholds and strand
information is encoded into a series of DNA sequences, synthe- displacement have been used for DNA computation18–20, DNA
sized as a pool of DNA strands, read by DNA sequencing, and search21, detection22, and rewritable12,23–25 information storage.
decoded back into an electronically compatible form. Recently, a The challenge is that in their current form these technologies have
growing body of work has focused on implementing and inherent limitations and tradeoffs either in physical scalability,
improving each of these four steps6–11; however, a relative dearth encoding density, or reusability. For example, while it is currently
of research has explored technologies to access and manipulate the most scalable and robust technique, PCR-based information
desired subsets of data within storage databases12,13, especially access requires a portion of the database to be physically removed
dynamically. This is likely because DNA synthesis and sequen- and amplified, with the number of data copies present dictating
cing are considerably slower processes than electronic writing and the number of times information can be accessed6,26,27. It also
reading of data;14 thus, DNA would likely serve at the level of requires double-stranded DNA (dsDNA) templates to be melted
archival or cold storage where information would be infrequently in each cycle during which time primers can bind similar off-
accessed from a relatively static DNA database. Yet, an archival target sequences in the data payload regions, thus requiring
DNA database, just like electronic versions, need not be com- encoding strategies that tradeoff reducing system densities and
pletely static and would benefit greatly from dynamic capacities to avoid these cross-interactions7 (Fig. 1b).
properties15,16. For example, in-storage file operations and com- Here, we present a dynamic DNA-based storage system that
putations and the ability to repeatedly access DNA databases satisfies these three criteria. It is inspired by work in the syn-
would reduce DNA synthesis costs and abrogate the need to store thetic biology and molecular biology communities and by the
multiple copies of archives. Therefore, implementation of way cells naturally access information in their genome. As
dynamic properties would bring DNA-based storage systems one described in Fig. 1c, we engineer an information storage system
step closer to practical viability. that has as its fundamental unit a double-stranded DNA with a
A practical system to dynamically access information from a single-stranded overhang (ss-dsDNA). A database of informa-
DNA database should satisfy three criteria. It must be: (1) phy- tion would be comprised of many of such ss-dsDNA strands,
sically scalable to extreme capacities; (2) compatible with efficient with all strands that comprise a file having the same single-
and dense encodings; and (3) repeatedly accessible. Ideally, it stranded overhang sequence or file address. The overhang also
would also be modifiable to some extent. While these criteria have provides a handle with which a file can be separated as well as
not yet been achieved in aggregate, we were inspired by the operated on in-storage. All strands have a T7 promoter enabling
a b
And : Payloads
ATG...CAT Syn Storage And : Primers/file addresses
g
din
…
the
co CGC...TAG sis Limited by
En 1)
Library Sequencing
# of
PCR strand copies
g S
c CGC...TAG
DNA
2) File retention
3) File
operations
160 nt template design
File T7 Data e) Lock and unlock f) Rename g) Delete
address promoter payload
20 nt 23 nt 117 nt Lock Blocked Blocked
20 nt 23 nt 117 nt Key
20 nt 23 nt 117 nt Blocked
Fig. 1 Molecular technologies unlock dynamic operations for DNA storage. a The generic framework for DNA-based storage systems includes encoding
of digital information to nucleotide sequences, DNA synthesis and storage, DNA sequencing, and decoding the desired information. b Schematic of
challenges faced by PCR-based file access. c Schematic of DORIS (Dynamic Operations and Reusable Information Storage). ss-dsDNA strands enable
repeatable information access through non-PCR-based magnetic separation, in vitro transcription, reverse transcription, and the return of separated files to
the database. Additionally, the overhangs of ss-dsDNAs enable in-storage file operations including lock, unlock, rename, and delete.
transcription of information into RNA, while the original ss- DORIS increases density and capacity limits. One potential
dsDNAs are retained and returned to the DNA database. This advantage of room temperature separations of files is that the
system can be created at scale, reduces off-target information double-stranded portions of the ss-dsDNAs remain annealed
access, facilitates computationally tractable design of orthogonal together and may block undesired oligo binding to any similar
file addresses, increases information density and theoretical sequences in the data payload regions. The data payload region is
maximum capacity, enables repeatable information access with the majority of the sequence in the middle of ss-dsDNAs that
minimal strand copy number required, and supports multiple contains the stored information. To test this hypothesis, we cre-
in-storage operations. This work demonstrates scalable dynamic ated two ss-dsDNAs (Fig. 2c). One ss-dsDNA had an overhang
information access and manipulations can be practical for that bound oligo A’ and an internal binding site for oligo B’. We
DNA-based information storage. For convenience, we refer to experimentally verified that by using DORIS, only oligo A’ but
this system collectively as DORIS (Dynamic Operations and not oligo B’ could separate out the strand. For comparison, PCR-
Reusable Information Storage). based systems melt dsDNAs in each cycle, allowing primers to
bind off-target within the data payload. As expected, when PCR
was used, both oligo A’ and oligo B’ bound, with oligo B’ pro-
Results ducing undesired truncated products. The second strand we
ss-dsDNA strands can be efficiently created in one-pot. As tested had an internal binding site and overhang that both were
future DNA databases would be comprised of upwards of 1015 complementary to oligo C’. We showed that using DORIS, oligo
distinct strands17, we first asked if ss-dsDNAs could be created C’ yielded only the full-length strand. In contrast, when using
in a high throughput and parallelized manner. We ordered 160 PCR, oligo C’ created both full length and truncated strands.
nucleotide (nt) single-stranded DNAs (ssDNA) with a common We next asked what implications this blocking property of
23 nt sequence that was inset 20 nt from the 3’ end (Fig. 1c and DORIS had for DNA-based information storage. As databases
2a, Supplementary Table 1). This 23 nt sequence contained the increase in size, intuitively the likelihood for sequences identical
T7 RNA polymerase promoter, but was also used to bind a to address sequences (either overhangs for DORIS or primer sites
common primer to fill-out and convert the ssDNA into a ss- for PCR) appearing in data payload regions increases. With
dsDNA. This was achieved by several cycles of thermal DORIS, this is not an issue as oligos are blocked from binding the
annealing and DNA polymerase extension (e.g., PCR cycles but dsDNA data payload regions. However, in PCR, primers do bind
with only one primer). This resulted in ss-dsDNA strands with a these data payload regions, so previous approaches have
20 nt overhang (Fig. 2a, top). We optimized the ratio of ssDNA developed encoding algorithms that restrict primer sequences
to primer, the number of cycles, along with other environmental (addresses) from overlapping with any identical or similar
parameters (Fig. 2a, Supplementary Fig. 1) to maximize the sequence in the data payloads11,12, typically avoiding Hamming
amount of ssDNA converted to ss-dsDNA. We found that Distances within ~<6. This inherently reduces either the density
decreasing the ssDNA:primer ratio past 1:10 led to a step change with which databases can be encoded due to restrictions on data
in the amount of ss-dsDNA produced as quantified by gel payload sequence space, or their capacity due to a reduction in
electrophoresis (Supplementary Fig. 1b). We decided to con- the number of unique primer sequences that can be used. Density
servatively work with a 1:20 ssDNA:primer ratio. At that ratio is the amount of information stored per nt (Eq. (6)), and it
we found that only 4 PCR cycles were needed to convert the decreases as encoding restrictions are placed limiting what
ssDNA into ss-dsDNA, as seen by the upward shift in the DNA sequences can be used in the payload region (lower diversity
gel (Fig. 2a). sequence space), while capacity is the total amount of information
Next, we tested whether this method could be used to create 3 that can be stored in a system (Eq. (7)) and is dependent on the
distinct ss-dsDNAs in one-pot reactions and if each ss-dsDNA number of addresses available as they dictate the number of files
could then be specifically separated from the mixture (Fig. 2b). that can be stored.
We mixed 3 distinct ssDNAs “A”, “B”, and “C” together, added To show these relationships quantitatively, it is currently
the common primer, and performed 4 PCR cycles to create the ss- intractable to analytically solve for or comprehensively compute
dsDNAs (here referred to as files comprised of just one unique the number of addresses available that do not interact with the
strand each). We then used biotin-linked 20 nt DNA oligos to data payload region, even for moderately sized databases.
bind each ss-dsDNA (i.e., each file, A, B, and C has a distinct Therefore, we performed Monte Carlo simulations to estimate
overhang sequence or file address) and separated them out from the total number of addresses and total capacities achievable.
the mixture using magnetic beads functionalized with streptavi- Address sequences were (PCR) or were not (DORIS) excluded if
din. Each of these oligos were able to specifically separate only they appeared in the data payload regions of a database with 109
their corresponding file without the other two (Fig. 2b, bottom, distinct DNA strands (Fig. 2d, Methods). To simplify the analysis,
Eq. (1)). Importantly, this separation step could be performed at we used computational codewords to encode the data payload
room temperature (25 °C) with only minimal gains observed at region. Each codeword is a distinct nt sequence and holds one
higher oligo annealing temperatures of 35 or 45 °C (Supplemen- byte (B) of digital information. The data payload region can be
tary Fig. 2, Eq. (2)). The room temperature and isothermal nature made more information dense by reducing the size of the
of this step is useful for practical DNA storage systems and for codewords so more codewords (and bytes) fit within each fixed-
reducing DNA degradation. length strand. The tradeoff is that smaller codewords will also
While 20 nt is a standard PCR primer length, we asked if the increase the sequence diversity of the strands (the number of
separation efficiency could be modulated by different overhang possible distinct sequences per strand length) due to more
lengths and separation temperatures. We designed 5 ss-dsDNAs codeword-codeword junctions per strand. This increases the
with 5–25 nt overhangs (Supplementary Fig. 3). We then chance of similar sequences appearing in the payload that conflict
separated each strand using its specific biotin-linked oligo at with address sequences.
15–55 °C. We observed enhanced separation efficiency for longer The simulation assessed whether address sequences would
oligos (20mers and 25mers) and at lower temperatures (15 °C and conflict with any sequences in the payload. However, for DORIS,
25 °C, Supplementary Fig. 3b). This was in agreement with a even if address sequences conflicted with the payload, these
thermodynamic analysis using the Oligonucleotide Properties addresses were allowed. The simulation therefore showed that as
Calculator (Supplementary Fig. 3c, Methods, Eqs. (3)–(5))28–30. the payload information density was increased by shrinking
a b
Optimizing conditions to create overhang 400 ‘One-pot’ creation of ss-dsDNA
File A A A′
Fluorescence (A.U.)
Single primer extension 3′ 5′
File B Anneal
3′ 5′ 300 B B′ Magnetic
Extend separation
Optimizing File C C C′
ssDNA ss-dsDNA
conditions template 200
A′ B′ C′
1.5k nt
1.0k nt
File specificity
500 nt 100 100%
100 nt
:1
1
:1
:1
10
40
20
1
2
2:
1:
1:
10
40
20
ssDNA:Primer = 1:20 mer NA
1:
1:
1:
Pri ssD 0%
ssDNA : Primer A B C A B C A B C
c PCR vs DORIS
A′ B′ B C′ A′ B′ B′ C′
A′ B′ B
A B′ PCR Separation
1.5k nt PCR DORIS
B B
A′ 1.0k nt
Blocked 200
Fluorescence (A.U.)
DORIS Separation
500 nt
Blocked 150
B′
C′
C C′ PCR Separation 100
100 nt
C 50
C′
DORIS Separation
0
Blocked PCR DORIS
C′
d 12,000 42
DORIS
35 PCR
Database capacity* (TB)
10,000
Number of addresses
8000 28
6000 21
4000 14
2000 DORIS 7
PCR
0 0
7 14 21 28 35 7 14 21 28 35
Density (bytes/strand) Density (bytes/strand)
Fig. 2 DORIS eliminates non-specific interactions and increases density and capacity limits. a Single primer extension created ss-dsDNAs. (Bottom) 4
cycles of PCR generated the optimal amount of 160 nt ss-dsDNAs while minimizing excess ssDNA production. (Right) DNA gel showed a marked increase
in generation of ss-dsDNAs below 1:10 ssDNA:primer ratios. b Individual files can be separated from a three-file database created by a one-pot single
primer extension. Each file was bound by its corresponding biotin-linked oligo, followed by a non-PCR-based separation using functionalized magnetic
beads. File separation specificity is the percentage of the DNA separated by that is either file A, B, or C as measured by qPCR. c (Left) PCR but not DORIS
will allow oligos to bind internal off-target sites and produce undesired products. (Middle) DNA gels and (Right) their quantified fluorescence (blue for
PCR, pink for DORIS) showed that PCR-based access resulted in truncated and undesired amplicons whereas DORIS accessed only the desired strands.
d (Left) Monte Carlo simulations estimated the number of oligos found that will not interact with each other or the data payload. 400,000 oligos were
tested against different density encodings. The x-axis represents density (Eq. (4)), which is inversely related to the length of codewords used to store
discrete one-byte data values. We evaluated codeword lengths of 12 through 4. For DORIS, the encoding density was not impacted because it need not
guard against undesired binding between the oligos and data payloads. (Right) For PCR, the number of oligos that will not bind the data payload drops as
strand density increases, which means that fewer files can be stored, leading to a lower overall system capacity. For DORIS, the availability of oligos is
independent of encoding, and capacity therefore increases with denser encodings. Plotted values represent the arithmetic mean, and error bars represent
the s.d., of three replicate file separations or simulations. Gel images are representative of three independent experiments measured by RT-QPCR. Source
data are provided as a Source Data file. *Capacities may be limited by synthesis and sequencing limitations not accounted for here.
codeword length, the number of addresses available did not information density initially provided a minor benefit to overall
change for DORIS as no restrictions were placed on addresses capacity (Fig. 2d, right, blue) but eventually led to a catastrophic
other than that they were not allowed to be similar to other drop in capacity as the number of addresses that did not conflict
addresses (Fig. 2d, left, pink). Also as expected, as the payload with any payload sequence quickly dropped to zero (Fig. 2d, left,
information density increased, the database capacity increased blue). While it is possible to increase the number of distinct
monotonically as the number of file addresses remained the same strands per address (i.e., information per file) to make up for the
as did the total number of strands per file (Fig. 2d, right, pink). In loss of addresses, this would result in files too large to be
contrast, for PCR, addresses that appeared in any data payload sequenced and decoded in a single sequencing run17. It is also
sequence were excluded; the result was that increasing payload important to note that our simulations were based upon very
conservative codeword densities and a database size of only 109 biotin-linked oligos and streptavidin-based magnetic separation,
DNA strands, while future storage systems are likely to exceed in vitro transcribing (IVT) the DNA to RNA31, returning the file
1012 strands or greater. As database densities and DNA sequence to the database, and reverse-transcribing the RNA into cDNA for
spaces increase, the number of addresses available for PCR-based downstream analysis or sequencing.
systems will drop even further while DORIS will be unaffected. We implemented this system with three distinct ss-dsDNAs (A,
Therefore, the theoretical capacity and density improvements B, and C) collectively representing a three-file database, and we
DORIS provides could be orders of magnitude greater than what accessed file A with a biotinylated oligo A’ (Fig. 3b &
is estimated in our simulations. Furthermore, DORIS greatly Supplementary Fig. 4). We then measured the amounts and
simplifies address design; designing sets of orthogonal addresses compositions of the “retained database” (light shading) and
for PCR-based systems that do not interact with data payload “retained file” (dark shading) by qPCR (Eq. (8)). The retained
sequences will quickly become computationally intractable at database had higher levels of files B and C compared to A, as
large database sizes. In summary, a database comprised of ss- some of the file A strands were removed in the magnetic
dsDNAs can be efficiently created in one-pot reactions, and separation. The retained file contained mostly file A strands, with
ssDNA overhangs facilitate a non-PCR-based separation method minimal B or C. The best net total amount of file A recovered
that enhances address specificity and increases theoretical from the retained database and retained file was approximately
database densities and capacities. 90% of what was originally in the database. The high retention
rate of file A suggested that a file could be re-accessed multiple
DORIS enables repeatable file access. A key requirement but times. We tested this by repeatedly accessing file A five times, and
major challenge for engineering dynamic properties into storage measured the amounts and compositions of file A, B and C in the
systems is the reusability of the system. In this work, we took database after each access (Fig. 3c & Supplementary Fig. 4c). As
inspiration from natural biological systems where information is expected, the overall amounts of file B and C were maintained at
repeatedly accessed from a single permanent copy of genomic relatively stable levels in the database. Approximately 50% of file
DNA through the process of transcription. As shown in Fig. 3a, A strands remained after five accesses. The practical implications
dynamic access in DORIS starts by physically separating out a file for DNA storage systems is that only 2 copies of each distinct
of interest (ss-dsDNAs sharing the same overhang address) using sequence are needed in the initial database for every 5 times it is
a
A
A′ RNAP RT
B RNA cDNA
In vitro Reverse
File
C transcription transcription
separation B
Database C
Retained database DNA
QPCR
Retained file
100%
1.0
Retention rate
80%
60%
40%
20% 0.5
0%
100%
Retained
file 80% 0
Retention rate
Fig. 3 DORIS mimics natural transcription to repeatedly access information. a File A was separated using non-PCR-based magnetic separation while the
database was recovered (Retained Database) (n = 3 for each condition). T7-based in vitro transcription was performed directly on the bead-immobilized
file for up to 48 h to generate RNA. Reverse transcription converted the RNA to complementary DNA (cDNA) while the immobilized file A was released
back into the database (Retained File) (n = 3 for each condition). b The amount of retained database (light shading) and retained file (dark shading) after
file A was accessed by oligo A’ was measured by qPCR and plotted as a percentage of the original amount of each file that was in the database. The
specificity of file access is evident by the absence of file B and C in the Retained File. The presence of T7 RNA polymerase (RNAP) did not affect the
retention of file A. c File A was repeatedly accessed 5 times. The amounts of file A, B and C in the database were measured by qPCR and plotted as the
amount of each file in the database after each run (n = 3 for each condition), normalized to the original amount of each file prior to the 1st access. Values
represent the arithmetic mean. Error bars are s.d., n = the number of replicate file accesses. Source data are provided as a Source Data file.
a
: Forward primer : T7 promoter : Data payload : Reverse primer
3′ 5′
Creation of ss-dsDNA by
2. 160 nt (123 nt) 3. 140 nt (103 nt) 4. 130 nt (93 nt) 5. 120 nt (83 nt) 6. 110 nt (73 nt)
DNA RNA
ss-dsDNA dsDNA
RNA
1.5k nt
1.5k nt 1 2 3 4 5 6 1.0k nt 1 2 3 4 5 6 1.0k nt 1 2 3 4 5 6
1.0k nt In vitro
transcription 500 nt RT-PCR 500 nt
500 nt 300 nt
150 nt
80 nt
100 nt 100 nt
100 nt
0
8 16 24 32 40 48
IVT length (h)
Fig. 4 T7-based transcription generates uniformly sized products. a Six ssDNA oligos with different lengths were designed to generate six ss-dsDNA
templates with lengths of 180 bp, 160 bp, 140 bp, 130 bp, 120 bp and 110 bp, respectively. Each ss-dsDNA comprised a consensus reverse primer binding
sequence, T7 primer binding sequence, forward primer binding sequence, and a payload sequence with varying lengths. These ss-dsDNA templates were
in vitro transcribed for 8 h, followed by RT-PCR. Product sizes were examined by agarose gel electrophoresis. b IVT time course for up to 48 h (n = 3
replicate IVT reactions for each condition). The amount of both RNA and DNA template molecules were measured by NanoDrop and plotted as their ratio.
c Gel electrophoresis of RNA and dsDNA products after 2–48 h of IVT followed by RT-PCR. Plotted values represent the arithmetic mean, and error bars
represent the s.d., of three independent IVT reactions. Gel images are representative for three independent experiments measured by RT-QPCR. Source
data are provided as a Source Data file.
accessed (ignoring the effects of strand distributions). This is an We next focused on assessing the quality and efficiency of the
improvement over PCR-based file access where small aliquots of IVT. To check if RNA polymerase might be creating undesired
the database are taken and amplified. In this case, one copy of truncated or elongated transcripts, we ordered a series of six
each distinct sequence is needed for each access; furthermore, ssDNAs with a range of lengths spanning 110–180 nt (Fig. 4a &
unlike in DORIS, all of the other database files will be similarly Supplementary Fig. 5). These were converted into ss-dsDNA,
reduced in abundance even if they were not accessed. Thus, transcribed into RNA, and reverse transcribed and amplified into
DORIS may extend the lifespan of DNA databases and allow for dsDNA. Clear uniform bands were seen for the ss-dsDNA, RNA,
more frequent access for the same total mass of DNA synthesized. and dsDNA. Increasing IVT time did increase the yield of RNA
We next asked how the IVT reaction might affect database for all templates (Fig. 4b), although just 2 h was sufficient to
stability, as it is performed at an elevated temperature of 37 °C obtain clear RNA bands (Fig. 4c), and IVT time did not affect the
and could degrade the ss-dsDNA. While the retained database is length of the RNA generated. In summary, information can be
not exposed to the IVT, the accessed file is, and the amount of ss- repeatedly accessed from ss-dsDNAs by oligo-based separation
dsDNA retained could be affected by the length of the IVT. and IVT.
Indeed, while the presence of RNA polymerase itself had no effect
on the retained file, the length of IVT time did decrease the Transcription can be tuned by promoter sequence. Recent
amount of retained file (Fig. 3b & Supplementary Fig. 4a). works on molecular information storage have demonstrated the
Interestingly, reannealing the retained file at 45 °C and allowing it utility of storing additional information in the composition of
to cool back to room temperature improved the retention rate, mixtures of distinct molecules, including DNA32,33. As the
but longer IVT times still reduced overall file retention information accessed by DORIS relies on the T7 RNA poly-
(Supplementary Fig. 4b). This suggests that some loss is due to merase, and there is evidence that T7 promoter variants can affect
the file strands unbinding from the bead-linked oligos or RNAs transcription efficiency34–38, we asked whether the yield of T7-
competing with ss-dsDNA, while some loss is due to DNA based transcription could be modulated by specific nucleotide
degradation. As a control to confirm that ss-dsDNA was not sequences around the T7-promoter region while keeping the
contaminating cDNA generated from the transcribed RNA, promoter itself constant to allow for one-pot ss-dsDNA genera-
cDNA was obtained only when RNA polymerase was included in tion (Fig. 2a, b). To comprehensively address this question, we
the IVT reaction (Supplementary Fig. 4d). designed and ordered 1088 distinct 160 nt strands as an oligo
pool. The first 1024 strands contained all possible 5 nt variant oligo A’ = 1: 10: 10: 15) to minimize off-target separation and
sequences upstream to the promoter sequence (NNNNN-Pro- ensure proper locking. We did observe that the temperature at
moter, N is each of the four nucleotides), and the latter which the lock was added influenced the fidelity of the locking
64 sequences were all 3 nt variant sequences downstream of the process. At 98 °C, the locking process worked well. When the lock
promoter (Promoter-NNN, Fig. 5a). As the NNNNN nucleotides was added at 25 °C, there was leaky separation even when no key
were located in the ssDNA overhang, we also asked if this region was added (Supplementary Fig. 7). This may be due to secondary
being single stranded versus double stranded had any impact on structures preventing some file A strands from hybridizing with
relative transcriptional efficiencies. We first created ss-dsDNA by locks at low temperatures. Fortunately, locking at 45 °C had
primer extension and dsDNA by PCR of the ssDNA oligo pool. reasonable performance, thus avoiding the need to elevate the
Both ss-dsDNA and dsDNA databases were processed with IVT system to 98 °C. In the context of a future DNA storage system,
at 37 °C for 8 h, followed by RT-PCR and next-generation files could first be separated then locked at an elevated
sequencing. Short barcodes were designed in the payload region temperature, then returned to the database, thus avoiding
to identify which promoter variant each sequenced transcript was exposure of the entire database to elevated temperatures. The
derived from. entire process could otherwise be performed at room
The abundance of each distinct transcript sequence was temperature.
normalized to its abundance in the original ss-dsDNA (Fig. 5b) We also implemented file renaming and deletion. To rename a
or dsDNA (Supplementary Fig. 6a) database (Eq. (9)). A broad file with address A to have address B, we mixed file A with a 40 nt
and nearly continuous range of normalized abundances was ssDNA that binds to A, with the resultant overhang being address
obtained, indicating that this approach could be harnessed to B (Fig. 6b). We added all components at similar ratios to the
create complex compositional mixtures of DNA in the future. To locking process (file: renaming oligo: accessing oligo = 1: 10: 15)
determine if there may be simple design principles that described and the renaming oligo was added at 45 °C. We then tested how
promoter efficiency, we segmented the 1088 sequences into many file strands each oligo A’, B’, or C’ could separate and found
quartiles based on transcript abundance and imported the data that the renaming process completely blocked oligos A’ or C’
into the WebLogo tool39. We found that G or A at the 5th from separating out the file (Fig. 6b, bottom). Only oligo B’ was
position directly upstream and C or T at the 3rd position directly able to separate the file suggesting that almost all strands were
downstream of the T7 promoter generally resulted in the highest successfully renamed from A to B. Similarly, we successfully
RNA abundances (Fig. 5c). Segmenting the data by A/T content renamed file A to C. Based on the ability of oligos to rename files
showed that there was a slight preference for ~50% A/T content with near 100% completion, we hypothesized and indeed found
upstream of the T7 promoter and preference for overall low A/T that a short 20 nt oligo fully complementary to A could be used to
content downstream of the T7 promoter (Fig. 5d). completely block the overhang of file A and essentially delete it
This next-generation sequencing experiment also provided from the database (Fig. 6b, bottom). A file could also simply be
confidence that DORIS is scalable to large and complex ss- extracted from a database to delete it as well. However, this
dsDNA pools. Furthermore, error analysis of the sequencing alternative form of blocking-based deletion suggests one way to
reads indicated no systematic deletions, truncations, or substitu- ensure any leftover file strands that were not completely extracted
tions, and overall error levels were well below those already would not be spuriously accessed in the future.
present from DNA synthesis (Fig. 5e).
Discussion
DORIS enables in-storage file operations. Many inorganic As DNA-based information storage systems approach practical
information storage systems, even cold storage archives, maintain implementation44,45, scalable molecular advances are needed to
the ability to dynamically manipulate files. Similar capabilities in dynamically access information from them. DORIS represents a
DNA-based systems would significantly increase their value and proof of principle framework for how inclusion of a few simple
competitiveness. ssDNA overhangs have previously been used to innovations can fundamentally shift the physical and encoding
execute computations in the context of toehold switches40–43, and architectures of a system. In this case, ss-dsDNA strands drive
we therefore hypothesized they could be used to implement in- multiple powerful capabilities for DNA storage: (1) it provides a
storage file operations. As a proof-of-principle, we implemented physical handle for files and allows files to be accessed iso-
locking, unlocking, renaming, and deleting files and showed these thermally; (2) it increases the theoretical information density and
operations could be performed at room temperature (Fig. 6). capacity of DNA storage by inhibiting non-specific binding
We started with the three-file database and tested the ability of within data payloads and reducing the stringency and overhead of
a biotin-linked oligo A’ to bind and separate file A at a range of encoding; (3) it eliminates intractable computational challenges
temperatures from 25 to 75 °C (Fig. 6a, bottom, no lock). Roughly associated with designing orthogonal sets of address sequences;
50% of file A strands were successfully separated from the (4) it enables repeatable file access via in vitro transcription; (5) it
database. To lock file A, we separated file A from the three-file provides control of relative strand abundances; and (6) it makes
database and mixed in a long 50 nt ssDNA (lock) that had a 20 nt possible in-storage file operations. We envision other innovative
complementary sequence to the ssDNA overhang of file A. With architectures and capabilities may be on the horizon given rapid
the lock in place, oligo A’ was no longer able to separate the file advances in DNA origami46, molecular handles47–49, and mole-
except at higher temperatures above 45 °C (Fig. 6a, bottom, no- cular manipulations developed in fields such as synthetic
key), presumably because the lock was melted from the overhang, biology50.
allowing for oligo A’ to compete to bind the overhang. To unlock Beyond the specific capabilities enumerated above, one of the
the file, we added the key that was a 50 nt ssDNA fully greatest benefits we envision DORIS providing is compatibility
complementary to the lock. We tested different unlocking with future miniaturized and automated devices44,45. In parti-
temperatures and found the key was able to remove the lock at cular, DORIS can operate isothermally and function at or close to
room temperature with the same efficiency as at higher room temperature for all steps. This has potential advantages for
temperatures. This is likely due to the long 30 nt toehold maintaining DNA integrity and database stability while also
presented by the lock, allowing the key to unzip the lock from file simplifying the design of future automated DNA storage devices.
A. We also optimized the relative molar ratios (file A: lock: key: In addition, a single DNA database sample can be reused,
a b
NNNNN-T7 (160 nt) 1.0 NNNNN-T7
Normalized
abundance
Forward Variant Data Reverse
T7 payload primer
primer bases
AAAAA Barcode 0.5
3′ 1 5′
n = 1024
1. In-vitro
Oligo pool transcription 0
TTTTT Barcode ID of NNNNN (1–1024)
1024 2. RT-PCR
T7-NNN (160 nt) ss-dsDNA T7-NNN
extension
ss-dsDNA 3. Next-gen. 1.0
160 nt
sequencing
Normalized
Data Reverse
abundance
Forward
primer T7 payload primer
AAA 0.5
3′ 5′
n = 64
TTT 0
ID of NNN (1–64)
c
NNNNN-T7
1st Quartile 2nd Quartile 3rd Quartile 4th Quartile
0.5
Bits
0.25
0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
T7-NNN
0.5
Bits
0.25
0
1 2 3 1 2 3 1 2 3 1 2 3
Nucleotide sequence position
d e
Normalized abundance
Substitutions
0.1
0.5
0 0
0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
1.5 0.2
Deletions
p = 0.012
Insertions
p < 0.037
% Errors
Substitutions
1.0 p < 0.020
0.1
0.5
0
0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
0 33 66 100 DNA sequence position
AT% in NNN
Fig. 5 T7-based transcription efficiency can be controlled by surrounding sequences. a An oligo pool that had 1088 distinct sequences was designed to
generate ss-dsDNA templates. The first 1024 sequences contained all possible combinations of nucleotides upstream of the promoter sequence (NNNNN-
T7, where N is one of four DNA nucleotides), whereas the latter 64 sequences had all possible combinations of nucleotides downstream to the promoter
region (T7-NNN). Each sequence contained a barcode to identify the sequence of the variant nucleotides. The template ss-dsDNAs were processed with
IVT for 8 h, followed by RT-PCR and next-generation sequencing (n = 3 for each condition). b Transcription efficiencies of both sequence designs were
plotted by normalizing the read count of each transcribed strand to its abundance in the original library. The data was organized from lowest to highest
normalized abundance for both designs. c The sequences were further divided into four quartiles based upon normalized transcript abundance and
analyzed by the WebLogo tool. d The normalized abundance of each sequence was organized by A/T percentage. P values between each group were
calculated using One-Way ANOVA with Tukey–Kramer post-hoc and listed here for statistical significance. NNNNN-T7: p values less than 0.01 for
comparisons between 0%–100%, 80%–100% and 20%–80%; p values less than 0.001 for comparisons between 20%–100%, 40%–80%, 40%–100%,
60%–80% and 60%–100%; T7-NNN, p values less than 0.05 for comparisons between 33%–100%, 0%–100% and 0%–66%. e The percent error for
each DNA sequence position for the original synthesized database (left) and transcribed database (right). The error rate was calculated by dividing the
number of errors of a given type occurring at a nucleotide position by the total number of reads for that sequence (Supplementary Method). Plotted values
represent the arithmetic mean, and error bars represent the s.d., of three independent IVT-RT-PCR-NGS samples. Source data are provided as a Source
Data file.
a b
File operation — lock and unlock File operation — rename and delete
No mod. A′ B′ C′
D A′ A A
Lock Separation
B A′
D A′ A′
Lock Rename B A′ A′ B′ C′
A -> B
ΔTemp. (°C) D′ Separation
A A A
Unlock C A′ A A
ΔTemp. (°C) Key Rename
C A′ A′ B′ C′
A -> C
A′ Separation
D A′
Separation A′ A A
ΔTemp. (°C) A D′ Delete
A′ A′ B′ C′
A A ->
Separation
A A
Separation efficiency
Separation efficiency
75% 45 °C 75 °C 75%
50% 50%
25% 25%
0% 0%
No-lock No-key 98–14 77–14 55–14 35–14 25–14 A′ B′ C′ A′ B′ C′ A′ B′ C′ A′ B′ C′
Fig. 6 Toeholds enable in-storage file operations. a (Top) Schematic of locking and unlocking in-storage file operations. (Bottom) Attempts to access file
A by DORIS without locking (No-Lock), with locking but without a key (No-Key), or with locking and key added at different temperatures (orange) (n = 3
for each condition). The lock was added at 98 °C. The key was added at different temperatures (orange) and then cooled to 14 °C (n = 3 for each
condition). Oligo A’ was added at different access temperatures of 25, 35, 45, or 75 °C for 2 min, followed by a temperature drop of 1 °C/min to 25 °C (n =
3 for each condition). Separation efficiency is the amount of file A recovered relative to its original quantity, as measured by qPCR. b (Top) Schematic of
rename and delete operations. File A was modified by renaming or deleting oligos. (Bottom) The completion of each operation was tested by measuring
how much of the file was separated by each individual oligo: A’, B’, or C’. Separation efficiency is the amount of file A separated relative to its original
amount in the database, as measured by qPCR. No Mod (No file modification/operation). Plotted values represent the arithmetic mean, and error bars
represent the s.d., of three independent replicate file operations/separations. Source data are provided as a Source Data file.
extending the lifespan of storage systems. It is also intriguing to Furthermore, with the insights into the impact of the sequence
consider what other types of in-storage operations like lock & space surrounding the T7 promoter on transcriptional yield, an
unlock can be implemented to offer unique unforeseen cap- additional layer of information could be stored in the quantitative
abilities in the future. All of these features lend DORIS to be easily composition of DNA mixtures.
translated to systems with automated fluid handling and mag- Of course, as with all information storage systems, there are
netic actuation. challenges and questions regarding the efficiency and accuracy
DORIS is also a fundamentally scalable system. The creation of of each technology that will be important to address prior to
ss-dsDNA strands is simple and high throughput, it is compatible commercial implementation. For example, future work might
with existing file system architectures including hierarchical assess how each step of DORIS performs in the context of
addresses11,17, and it facilitates scaling of capacity. While the need increasingly diverse and dense pools of strands, both in terms of
to include the T7 promoter in every strand does occupy valuable efficiency and information retrieval error rates. In particular,
data payload space, it is a worthwhile tradeoff: the T7 promoter new materials, RNA polymerase enzymes, and the optimization
decreases data density and capacity in a linear fashion, yet it more of reaction conditions could improve DNA recovery percentages
than compensates by simultaneously improving both metrics to drive DORIS towards a fully reusable system. Devoting
exponentially by allowing many sequences to appear in the data resources and attention to such optimizations need to be
payload that normally would have to be avoided in PCR-based balanced with the fact that the field of molecular information
systems (or conversely by allowing the full set of mutually non- storage is nascent and that there are likely a wide range of new
conflicting addresses to be used)11,17. It is also important to note capabilities and physical innovations that could be explored and
that DORIS may help solve scalability issues with the encoding introduced into the field.
process as well: cross-comparing all address sequences with all Finally, we believe this work motivates a merging of work in
data payload sequences is computationally intractable, but the the fields of DNA computation, synthetic biology, and DNA
need to do this is eliminated with DORIS as addresses will not storage. In-storage computation and file operations could increase
physically interact with data payload sequences. Future work may the application space of DNA storage, or identify cutting-edge
assess how DORIS and other physical innovations may alter applications areas, such as in the highly parallel processing of
and reduce the stringency of encoding and error correction extreme levels of information (e.g., medical, genomic, and
algorithms and subsequently benefit system density and capacity. financial data). DORIS complements and harnesses the benefits of
prior work while providing a feasible path towards future systems To generate ample product for gel electrophoresis analyses, the resultant cDNA
with advanced capabilities. was diluted 100-fold, and 1 µL was used as the template in a PCR amplification
containing 0.5 µL of Q5 High-Fidelity DNA Polymerase (NEB, M0491S), 1x Q5
polymerase reaction buffer (NEB, B9072S), 0.5 uM of forward and reverse primer,
Methods 2.5 mM each of dATP (NEB, N0440S), dCTP (NEB, N0441S), dGTP (NEB,
Creation of ss-dsDNA strands. ss-dsDNA strands were created by filling in ssDNA N0442S), dTTP (NEB, N0443S) in a 50 µL total reaction volume. The amplification
templates (IDT DNA) with primer TCTGCTCTGCACTCGTAATAC (Eton conditions were 98 °C for 30 s and then 25 cycles of: 98 °C for 10 s, 55 °C for 20 s,
Bioscience) at a ratio of 1:40 using 0.5 µL of Q5 High-Fidelity DNA Polymerase (NEB, 72 °C for 10 s with a final 72 °C extension step for 2 min. The products were
M0491S) in a 50 µL reaction containing 1x Q5 polymerase reaction buffer (NEB, assayed by gel electrophoresis and their concentrations were measured by Frag-
B9072S) and 2.5 mM each of dATP (NEB, N0440S), dCTP (NEB, N0441S), dGTP ment Analyzer HS NGS Fragment Kit (Agilent Technologies Inc., DNF-474-0500).
(NEB, N0442S), dTTP (NEB, N0443S). The reaction conditions were 98 °C for 30 s
and then 4 cycles of: 98 °C for 10 s, 53 °C (1 °C s−1 temperature drop) for 20 s, 72 °C Locking and unlocking. Lock and key strands were purchased from Eton Bios-
for 10 s, with a final 72 °C extension step for 2 min. ss-dsDNA strands were purified ciences. To lock the file, purified ss-dsDNA strands were mixed with lock strands at
using AMPure XP beads (Beckman Coulter, A63881) and eluted in 20 μL of water. a molar ratio of 1:10 in a 25 µL reaction containing 2 mM MgCl2 and 50 mM KCl.
The mixture was annealed to 98 °C, 45 °C or 25 °C for 2 min, followed by a tem-
File separations. Oligos were purchased with a 5′ biotin modification (Eton perature drop at 1 °C/min to 14 °C. To unlock the file, key strands were added into
Bioscience, Supplementary Table 1). ss-dsDNA strands were diluted to 1011 strands the locked file mixture at a molar ratio of 10:1 to the original ss-dsDNA strand
and mixed with biotinylated oligos at a ratio of 1:40 in a 50 µL reaction containing amount. The mixtures were annealed to 98, 77, 55, 35, or 25 °C for 2 min, followed
2 mM MgCl2 (Invitrogen, Y02016) and 50 mM KCl (NEB, M0491S). Oligo by a temperature drop at 1 °C/min to 14 °C. To access the unlocked strands, file-
annealing conditions were 45 °C for 2 min, followed by a temperature drop at 1 °C/ specific biotin-modified oligos were added into the mixture at a ratio of 15:1 to the
min to 14 °C. Streptavidin magnetic beads (NEB, S1420S) were prewashed using original ss-dsDNA strand amount supplemented with additional MgCl2 and KCl to
high salt buffer containing 20 mM Tris-HCl, 2 M NaCl and 2 mM EDTA pH 8 and a final concentration of 2 mM and 50 mM, respectively, in a 30 µL reaction.
incubated with ss-dsDNA strands at room temperature for 30 min. The retained
database was recovered by collecting the supernatant of the separation. The beads Renaming and deleting. ss-dsDNA strands were mixed with renaming or deleting
were washed with 100 µL of high salt buffer and used directly in the in vitro oligos at a ratio of 1:20 in a 25 µL reaction containing 2 mM MgCl2 and 50 mM
transcription reaction. After transcription, the beads with the bound files were KCl. The mixture was heated to 35 °C for 2 min, followed by a temperature drop at
washed twice with 100 µL of low salt buffer containing 20 mM Tris-HCl, 0.15 M 1 °C/min to 14 °C. To delete the file, oligos were mixed with purified target file
NaCl and 2 mM EDTA pH 8 and subsequently eluted with 95% formamide (Sigma, strands at a ratio of 1:20.
F9037) in water. The quality and quantity of the DNA in the retained database and
file were measured by quantitative real-time PCR (Bio-Rad).
Real-time PCR (qPCR). qPCR was performed in a 6 μL, 384-well plate format
using SsoAdvanced Universal SYBR Green Supermix (BioRad, 1725270). The
In vitro transcription. Immobilized ss-dsDNA strands bound on the magnetic amplification conditions were 95 °C for 2 min and then 50 cycles of: 95 °C for 15 s,
beads were mixed with 30 µL of in vitro transcription buffer (NEB, E2050) con- 53 °C for 20 s, and 60 °C for 20 s. Quantities were interpolated from the linear
taining 2 µL of T7 RNA Polymerase Mix and ATP, TTP, CTP, GTP, each at ranges of standard curves performed on the same qPCR plate.
6.6 mM. The mixture was incubated at 37 °C for 8, 16, 32, and 48 h, followed by a
reannealing process where the temperature was reduced to 14 °C at 1 °C/min to Poly A tailing and template elongation. The NNN sequences in Fig. 5 and
enhance the retention of ss-dsDNA on the beads. The newly generated RNA Supplementary Fig. 6 are captured in the cDNA samples. However, they cannot be
transcripts were separated from the streptavidin magnetic beads and their quantity immediately amplified in preparation for next-generation sequencing as a common
measured using the Qubit RNA HS Assay Kit (Thermo Fisher, Q32852) and PCR primer pair is not available. Therefore, cDNA was A-tailed with terminal
Fragment Analyzer Small RNA Kit (Agilent Technologies Inc., DNF-470-0275). transferase under the following reaction conditions: 5.0 uL of 10x TdT buffer, 5.0
uL of 2.5 mM CoCl2 solution provided with the buffer, 5.0 pmols of the amplified
Gel electrophoresis for DNA. Agarose-based DNA gels were made by mixing and cDNA samples, 0.5 uL of 10 mM dATP (NEB, N0440S), and 0.5 uL of terminal
microwaving 100 mL of 1x LAB buffer containing 10 mM Lithium acetate dihy- transferase (20 units/uL) (NEB, M0315S) in a 50 uL total reaction volume. The
drate pH 6.5 (VWR, AAAA17921-0B) and 10 mM Boric acid (VWR, 97061–974) mixture was incubated at 37 °C for 30 min, and then 70 °C for 10 min to deactivate
with 1.5 mg of molecular biology grade agarose (GeneMate, 490000–002). 0.1x the enzyme. The A-tailed samples were further amplified using the primers pro-
SYBR Safe DNA Gel Stain was added to visualize DNA (Invitrogen, S33102). DNA vided in Supplementary Table 1 to extend the length of each sequence for optimal
samples and ladder (NEB, N3231S) were loaded with 1x DNA loading dye con- next-generation sequencing. The PCR reaction used the following recipe: 0.5 µL of
taining 10 mM EDTA, 3.3 mM Tris-HCl (pH 8.0), 0.08% SDS and 0.02% Dye 1 and Q5 High-Fidelity DNA Polymerase (NEB, M0491S), 1x Q5 polymerase reaction
0.0008% Dye 2 (NEB, B7024S). Electrophoresis was performed with 1x LAB buffer buffer (NEB, B9072S), 0.5uM of forward and reverse primer, 2.5 mM each of dATP
in a Thermo Scientific Mini Gel Electrophoresis System (Fisher Scientific, (NEB, N0440S), dCTP (NEB, N0441S), dGTP (NEB, N0442S) and dTTP (NEB,
09–528–110B) at a voltage gradient of 25 V/cm for 20 min. N0443S) in a 50 µL total reaction volume. The amplification conditions were 98 °C
for 30 s, 25 cycles of: 98 °C for 10 s, 55 °C for 20 s, 72 °C for 10 s, with a final 72 °C
extension step for 2 min. The products were assayed by gel electrophoresis.
Gel electrophoresis for RNA. All equipment was cleaned by 10% bleach (VWR,
951384) and RNaseZap (Fisher Scientific, AM9780) to minimize nuclease con-
tamination, particularly ribonuclease (RNase) contamination. The following pro- Next-generation sequencing. Amplicons were purified with AMPure XP beads
cedures were performed in a PCR workstation with sterile pipetting equipment to (Beckman Coulter, A63881) according to the TruSeq Nano protocol (Illumina,
further reduce ribonuclease contamination. Agarose-based RNA gels were cast by 20015965). The quality and band sizes of libraries were assessed using the High
mixing and microwaving 100 mL of 1x TAE buffer containing 0.04 M Tris-Acetate Sensitivity NGS Fragment Analysis Kit (Advanced Analytical, DNF-474) on the 12
and 0.001 M EDTA pH 8.0 with 1.5 mg of molecular biology grade agarose capillary Fragment Analyzer (Agilent Technologies Inc.). The prepared samples
(GeneMate, 490000–002). 0.1x of SYBR Safe Gel Stain (Invitrogen, S33102) was were submitted to Genewiz Inc. for Illumina-based next-generation sequencing
added to visualize the RNA. RNA samples were treated with 2 units DNase I (NEB, (Amplicon-EZ). Ligation of Illunima sequencing adapters to the prepared samples
M0303S) and incubated at 37 °C for 10 min, followed by a purification process was performed by Genewiz Inc. Next-generation sequencing data were analyzed as
using Monarch RNA Cleanup Kit (NEB, T2030S). The purified samples and RNA described in Supplementary Fig. 8.
ladder (NEB, N0364S) were mixed with 1x RNA loading dye containing 47.5%
Formamide, 0.01% SDS, 0.01% bromophenol blue, 0.005% Xylene Cyanol and File specificity. In Fig. 2, we calculated file specificity by the following equation
0.5 mM EDTA (NEB, B0363S). The mixtures were heated up at 65 C for 10 min,
followed by immediate cooling on ice for 5 min. RNA electrophoresis was per- Amount of strands of a specific file in a solution
File Specificity ð%Þ ¼ ð1Þ
formed at a voltage gradient of 15 V/cm for 45 min. Total number of strands in the same solution
Gel imaging. Fluorescence imaging of both DNA and RNA gel samples was Separation efficiency. In Fig. 3, Supplementary Figs. 2, 3, and 7 we calculated the
performed with a Li-Cor Odyssey® Fc Imaging System and the fluorescence separation efficiency by the following equation.
intensity was quantified using FIJI software. A; B; or C in the separated sample eluted from bead
Separation Efficiencyð%Þ ¼
Amount of A; B; or C in database before separation
Reverse transcription. First-strand synthesis was generated by mixing 5 µL of ð2Þ
RNA with 500 nM of reverse primer in a 20 µL reverse transcription reaction (Bio-
Rad, 1708897) containing 4 µL of reaction supermix, 2 µL of GSP enhancer solu-
tion, and 1 µL of reverse transcriptase. The mixture was incubated at 42 °C for 30 Theoretical thermodynamic calculations. To theoretically estimate the fraction
or 60 min, followed by a deactivation of the reverse transcriptase at 85 °C for 5 min. of bound oligos of various lengths and at different temperatures (Supplementary
31. Bosnes, M. et al. Solid-phase in vitro transcription and mRNA purification Acknowledgements
using DynabeadsTM superparamagnetic beads. in 5th International mRNA We thank Kyle J. Tomek for helpful discussions and Prof. Nathan Crook for use of their
Health Conference (2017). [Link] Qubit fluorometer and reagents. This work was supported by the National Science
32. Arcadia, C. E. et al. Multicomponent molecular memory. Nat. Commun. 11, Foundation (CNS-1650148 & CNS-1901324), a North Carolina State University Research
691 (2020). and Innovation Seed Funding Award (#2018–2509), and a North Carolina Biotechnology
33. Anavy, L., Vaknin, I., Atar, O., Amit, R. & Yakhini, Z. Data storage in DNA Center Flash Grant to AJK and JT. KNL was supported by a Department of Education
with fewer synthesis cycles using composite DNA letters. Nat. Biotechnol. 37, Graduate Assistance in Areas of Need fellowship.
1229–1236 (2019).
34. Komura, R., Aoki, W., Motone, K., Satomura, A. & Ueda, M. High-throughput
evaluation of T7 promoter variants using biased randomization and DNA
Author contributions
K.N.L., J.M.T., and A.J.K. conceived the study. K.N.L. planned and performed the wetlab
barcoding. PLoS One 13, e0196905 (2018).
experiments with guidance from A.J.K. J.M.T. planned and performed the simulations.
35. Gong, P. & Martin, C. T. Mechanism of instability in abortive cycling by T7
K.N.L. and K.V. processed the next-generation sequencing data. K.N.L. and A.J.K. wrote
RNA polymerase. J. Biol. Chem. 281, 23533–23544 (2006).
the paper with input from all.
36. Tang, G.-Q., Bandwar, R. P. & Patel, S. S. Extended upstream A-T sequence
increases T7 promoter strength. J. Biol. Chem. 280, 40707–40713 (2005).
37. Kapanidis, A. N. et al. Initial transcription by RNA polymerase proceeds Competing interests
through a DNA-scrunching mechanism. Science 314, 1144–1147 (2006). The authors declare no competing interests.
38. Potapov, V. et al. Base modifications affecting RNA polymerase and reverse
transcriptase fidelity. Nucleic Acids Res. 46, 5753–5763 (2018).
39. Crooks, G. E. WebLogo: a sequence logo generator. Genome Res. 14, Additional information
1188–1190 (2004). Supplementary information is available for this paper at [Link]
40. Dalchau, N. et al. Computing with biological switches and clocks. Nat. 020-16797-2.
Comput. 17, 761–779 (2018).
41. Spaccasassi, C., Lakin, M. R. & Phillips, A. A logic programming language for Correspondence and requests for materials should be addressed to J.M.T. or A.J.K.
computational nucleic acid devices. ACS Synth. Biol. 8, 1530–1547 (2019).
42. Joesaar, A. et al. DNA-based communication in populations of synthetic Peer review information Nature Communications thanks the anonymous reviewer(s) for
protocells. Nat. Nanotechnol. 14, 369–378, [Link] their contribution to the peer review of this work.
(2019). 1–18 (2019).
43. Wang, B., Chalk, C. & Soloveichik, D. SIMD||DNA: single instruction, Reprints and permission information is available at [Link]
multiple data computation with DNA strand displacement cascades. in
International Conference on DNA Computing and Molecular Programming, Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
219–235 (2019) [Link] published maps and institutional affiliations.
44. Takahashi, C. N., Nguyen, B. H., Strauss, K. & Ceze, L. Demonstration of end-
to-end automation of DNA data storage. Sci. Rep. 9, 4998 (2019).
45. Newman, S. et al. High density DNA data storage library via dehydration with Open Access This article is licensed under a Creative Commons
digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019). Attribution 4.0 International License, which permits use, sharing,
46. Zhang, F., Nangreave, J., Liu, Y. & Yan, H. Structural DNA nanotechnology: state adaptation, distribution and reproduction in any medium or format, as long as you give
of the art and future perspective. J. Am. Chem. Soc. 136, 11198–11211 (2014). appropriate credit to the original author(s) and the source, provide a link to the Creative
47. Min, D., Arbing, M. A., Jefferson, R. E. & Bowie, J. U. A simple DNA handle Commons license, and indicate if changes were made. The images or other third party
attachment method for single molecule mechanical manipulation material in this article are included in the article’s Creative Commons license, unless
experiments. Protein Sci. 25, 1535–1544 (2016). indicated otherwise in a credit line to the material. If material is not included in the
48. Jadhav, V. S., Brüggemann, D., Wruck, F. & Hegner, M. Single-molecule article’s Creative Commons license and your intended use is not permitted by statutory
mechanics of protein-labelled DNA handles. Beilstein J. Nanotechnol. 7, regulation or exceeds the permitted use, you will need to obtain permission directly from
138–148 (2016). the copyright holder. To view a copy of this license, visit [Link]
49. Hao, Y., Canavan, C., Taylor, S. S. & Maillard, R. A. Integrated method to licenses/by/4.0/.
attach DNA handles and functionally select proteins to study folding and
protein-ligand interactions with optical tweezers. Sci. Rep. 7, 10843 (2017).
50. Harroun, S. G. et al. Programmable DNA switches and their applications. © The Author(s) 2020
Nanoscale 10, 4607–4641 (2018).