HTH Fems
HTH Fems
www.fems-microbiology.org
Received 20 November 2004; received in revised form 22 December 2004; accepted 23 December 2004
Abstract
The helix-turn-helix (HTH) domain is a common denominator in basal and specific transcription factors from the three super-
kingdoms of life. At its core, the domain comprises of an open tri-helical bundle, which typically binds DNA with the 3rd helix.
Drawing on the wealth of data that has accumulated over two decades since the discovery of the domain, we present an overview
of the natural history of the HTH domain from the viewpoint of structural analysis and comparative genomics. In structural terms,
the HTH domains have developed several elaborations on the basic 3-helical core, such as the tetra-helical bundle, the winged-helix
and the ribbon-helix–helix type configurations. In functional terms, the HTH domains are present in the most prevalent transcrip-
tion factors of all prokaryotic genomes and some eukaryotic genomes. They have been recruited to a wide range of functions beyond
transcription regulation, which include DNA repair and replication, RNA metabolism and protein–protein interactions in diverse
signaling contexts. Beyond their basic role in mediating macromolecular interactions, the HTH domains have also been incorpo-
rated into the catalytic domains of diverse enzymes. We discuss the general domain architectural themes that have arisen amongst
the HTH domains as a result of their recruitment to these diverse functions. We present a natural classification, higher-order rela-
tionships and phyletic pattern analysis of all the major families of HTH domains. This reconstruction suggests that there were at
least 6–11 different HTH domains in the last universal common ancestor of all life forms, which covered much of the structural
diversity and part of the functional versatility of the extant representatives of this domain. In prokaryotes the total number of
HTH domains per genome shows a strong power-equation type scaling with the gene number per genome. However, the HTH
domains in two-component signaling pathways show a linear scaling with gene number, in contrast to the non-linear scaling of
HTH domains in single-component systems and sigma factors. These observations point to distinct evolutionary forces in the emer-
gence of different signaling systems with HTH transcription factors. The archaea and bacteria share a number of ancient families of
specific HTH transcription factors. However, they do not share any orthologous HTH proteins in the basal transcription apparatus.
This differential relationship of their basal and specific transcriptional machinery poses an apparent conundrum regarding the ori-
gins of their transcription apparatus.
2005 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.
q
Edited by Mark J. Pallen.
*
Corresponding author. Tel.: +1 301 594 2445; fax: +1 301 435 7794.
E-mail address: [email protected] (L Aravind).
0168-6445/$22.00 2005 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.femsre.2004.12.008
232 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
2. Structural scaffold of the HTH domain and its diverse elaborations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
2.1. HTH domains with a simple three-helical bundle and its extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
2.2. The winged HTH domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
2.3. Other highly modified variants of the HTH domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
3. General and specific aspects of the domain architectures of HTH proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
3.1. Simple architectures involving the HTH domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
3.2. Combinations of HTH with other nucleic acid binding domains and protein–protein interaction domains. . . . . . 241
3.3. Combinations of the HTH domain with catalytic domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
3.4. Architectures related to two-component, PTS and serine/threonine kinase signaling . . . . . . . . . . . . . . . . . . . . . 243
3.5. Architectures related to single-component signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
3.6. Unusual functional adaptations of the HTH domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
4. The evolutionary classification of HTH domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
4.1. Lineages of basic tri-helical HTH domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
4.2. The tetra-helical HTH superclass and its derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
4.3. The wHTH superclass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
4.4. Other miscellaneous families of HTH domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
5. Proteome-wide demographic trends of HTH domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
6. General considerations on the natural history of the HTH fold and implications for the evolution of transcription. . . . 254
7. General conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8. Supplementary material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
referred as the helix-turn-helix (HTH) domain. Se- may be deployed. The combined use of genome se-
quence and secondary structure analysis by these pio- quence data and high-throughput expression data also
neering workers suggested that the HTH domain was cast light on the multi-level transcriptional regulatory
a common DNA-binding motif found in several other networks in prokaryotes in which these HTH-containing
bacterial repressors as well as activators, such as the proteins functionally interact to maintain a particular
cAMP-dependent catabolite activator protein. They per- transcriptional state in the cell [42,43].
ceptively suggested that all these DNA-binding domains Over the years, the recruitment of the HTH domains
have descended from a common ancestor, through to biological functions beyond transcriptional regula-
duplication and divergence, thereby generating the tion has also become apparent. Some of these proteins,
diversity of transcription regulators that regulate bacte- participating in functions such as DNA repair and RNA
rial and phage genes [9,11]. Subsequent sequence analy- metabolism, exploit the nucleic acid-binding properties
sis revealed that DNA recognition by sigma factors was of the HTH just as in the case of transcriptional regula-
also mediated by HTH domains, similar to those ob- tion (for example see: [44–48]). However, there are other
served in the specific transcription factors [14–16]. In instances where the HTH may be adapted to very differ-
the second half of the 1980s the conserved domains of ent functions, such as mediating specific protein–protein
transcription factors regulating eukaryotic development interactions, or as a structural unit of a larger enzymatic
and differentiation, namely the homeodomains and Myb domain [49–51].
domains, were also noted to possess the HTH fold In this article we aim at providing a synthetic over-
[17,18]. These and other investigations suggested the view of the structural, functional and evolutionary
general significance of this module in DNA–protein diversity of the HTH domain from the vantage point
interactions across a wide phylogenetic spectrum of over two decades of intense investigation. We cur-
[18–21]. rently enjoy unprecedented advantages due to an enor-
An explosion of structural studies in the 1990s, while mous amount of genomic data, numerous high
strengthening the basic structure–function relation be- resolution structures, functional studies and sensitive
tween the HTH domain and DNA binding, also pro- computational tools to capture the highlights of the nat-
duced a large amount of data regarding the diversity ural history of HTH domains. Our focus is principally,
of DNA–protein interactions mediated by different ver- but not exclusively, on versions of the domain found
sions of the HTH domain. A number of sequence and in the prokaryotic super-kingdoms. We first discuss
structural analysis studies also uncovered HTH modules the structural diversity seen in this fold, followed by a
in several specific eukaryotic transcription factors, chro- discussion of the unifying themes in domain architec-
matin proteins like histone H1, and basal transcription tures of HTH proteins and their significance to different
factors such as TFIIB and TFIIE [22–30]. These findings biological functions. We next provide a higher-order
lead to the idea that the HTH domain is probably one of natural classification of these domains and discuss their
the most ancient conserved features of transcription genome-wide demography in light of the general adap-
apparatus, which was already present in the last univer- tive tendencies that can be inferred from comparative
sal common ancestor of all extant life forms (LUCA). genomics. In this context we also consider the adapta-
Studies during this period also showed that although ar- tion of the HTH to functional roles beyond transcrip-
chaea and eukaryotes possess a similar basal transcrip- tional regulation. Finally, we discuss the relevance of
tion machinery, the specific transcription factors of the the information gleaned from these diverse areas in
former are clearly closer to those of the bacteria than reconstructing the origin and evolution of transcription
the eukaryotes [30–34]. The other major development and its regulation.
in the later half of the 1990s was the birth of the genomic
era, unleashing the power of comparative genomics.
Starting with the earliest comparative genomic studies 2. Structural scaffold of the HTH domain and its diverse
it became apparent that the HTH domain was a highly elaborations
prevalent domain in prokaryotes [35]. Comparative
analysis also helped in identifying several major mono- The basic HTH domain is a simple fold comprised of
phyletic assemblages of HTH transcription factors, each three core helices that form a right-handed helical bun-
distinguished by their own distinctive sequence and dle with a partly open configuration. When it is dis-
structure features (for example see Refs. [36–41]). These played by placing the third helix in the front and in
classes often showed one or more distinctive domain the horizontal orientation, the 3 helices of the domain
architectures – i.e. the fusion of the HTH domain with form an approximately triangular outline (Fig. 1). We
additional globular domains in the same polypeptide. use this as the default orientation for all further illustra-
These globular domains, which are linked to the HTH tions and discussions. The characteristic sharp turn,
show a bewildering diversity, and point to the immense which is a defining feature of this domain, is situated be-
variety of functional contexts in which the HTH domain tween the 2nd and the 3rd helix, and typically does not
234 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
Fig. 2. A pathway showing the structural elaboration of the simple HTH domain into its diverse versions. Strands are shown as yellow arrows with
the arrow heads at the C-terminus, helices are shown as blue cylinders. The orange arrows show the probable routes of transformation of the HTH
fold. The two possible origins of the L11 like HTHs are shown by dotted arrows with question marks. The topologies have been constructed using the
following PDB enteries: (1) simple trihelical bundle: 2hdd (2) FF domain: 1h40 (3) Tetrahelical bundle: 1d5y (4) Multihelical bundle: 1ais (5) MetJ/
Arc: 2cpg (6) MerR: 1jbg (7) L11: 1mms (8) CRP-like: 1cgp (9) T. vaginalis initiator: 1pp7 (10) 3 stranded wHTH: 2dtr (11) Methionine
aminopeptidase: 1xgs (12) winged HTH: 1i1g (13) wHTH with a C-terminal helix: 1jgs.
of several retroviral integrases [53], in which helix-3 is [55], is another more divergent version of the TFIIB-like
packed more closely against the other helices by means HTH domains. This version of the fold is very infre-
of a Zn ion chelated by conserved cysteines and histi- quent in the bacteria, and is represented by a circularly
dines at the ends of helix-1 and helix-3 (Fig. 1). Like- permuted version seen in the sporulation regulator
wise, in RPB10, a set of zinc-chelating cysteines help Spo0A from spore-forming Gram-positive bacteria
in stabilizing an N-terminal loop against helix-3 [54]. [56]. Other versions, which represent relatively infre-
The tetra-helical version of HTH domain is an elabo- quent elaborations of the basic tri-helical version, are
ration of the basic tri-helical version and is characterized the KorB-like HTHs [57] and the FlhD-like HTHs
by an additional C-terminal helix which packs against [58]. The former version is characterized by an addi-
the shallow cleft formed due to the open configuration tional N-terminal helix that packs against the basic 3-he-
of the tri-helical core (Figs. 1 and 2). Several major fam- lix core and the later contains a C-terminal helical
ilies of prokaryotic transcription factors display this ver- extension that packs very differently from the helical
sion of the domain. The multi-helical version, typified by extension seen in the above-mentioned classical tetra-
the archaeo-eukaryotic basal transcription factor helical forms (Fig. 2).
TFIIB, is a further elaboration of the tetra-helical
HTH, wherein two additional helices have been added 2.2. The winged HTH domain
to the N-terminus of the tetra-helical core, resulting in
a larger globular helical bundle (Fig. 2). The Bright The winged HTH (wHTH) domains are distinguished
(ARID) domain, a eukaryotic DNA-binding domain by presence of a C-terminal b-strand hairpin unit (the
236 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
wing) that packs against the shallow cleft of the partially together with the notable structural and sequence simi-
open tri-helical core [24,25]. The simplest versions of the larities with the HTH domains, suggest that the RHH
wHTH domains contain a tight helical core similar to domain was derived from the HTH domain through
basic tri-helical version followed by the two-strand hair- conversion of the N-terminal helix to a strand. Concom-
pin (Figs. 1 and 2). However, many wHTH domains itant with this modification, the N-terminal strand,
display further serial elaborations of the b-sheet. In which came to lie atop the recognition helix, appears
the 3-stranded version, the loop between helix-1 and he- to have taken up the principal DNA-binding role of this
lix-2 assumes an extended configuration and is incorpo- protein.
rated as the 3rd strand in the sheet, via hydrogen The DNA-binding domain of the bacterial transcrip-
bonding with the basic C-terminal hairpin (Fig. 2). In tion regulator MerR defines an aberrant derivative of a
the 4-stranded version, the linker between helix-1 and 3-stranded version of the wHTH domain, in which he-
helix-2 also forms a hairpin with two b-strands, and lix-1 has been lost (Fig. 2). In addition to the MerR fam-
along with the C-terminal wing forms an extended b- ily of bacterial repressors this version of the fold is seen
sheet (Fig. 2). In versions that bind nucleic acids, the in a wide variety of phage, bacterial and eukaryotic
wing often provides an additional interface for substrate DNA-binding proteins and translation factors. The
contact, typically by interacting with the minor groove topoisomerase II family has two copies of wHTH do-
of DNA through charged residues in the hairpin mains [63]. One of these wHTH domains contains a
[24,25,27]. The two- and three-stranded versions of the large insert between helix-1 and helix-2. This insert con-
wHTH are encountered in DNA-binding domains of tains an extended b-sheet structure that forms a brace
some of the largest families of prokaryotic transcription around double-stranded DNA. Also present in this in-
factors, as well as several eukaryotic DNA-binding do- sert is a second wHTH domain which is circularly per-
mains. The single-strand RNA-binding La domains also muted with helix-1 occurring to the C-terminus of the
have a version of the wHTH fold with a slightly ex- wing. The two structurally related ribosomal proteins
tended and variable insert between helix-1 and helix-2. L11 and S18 represent another distinctive derivative of
An unusual version of the 4-stranded wHTH domain, wHTH domain. While they share a b-strand hairpin be-
with an additional small helical insert after helix-1, tween helix-1 and helix-2 with the 4-stranded wHTH,
is observed in the orphan transcription-initiator- unlike the latter they possess only a single C-terminal
sequence-binding protein from the parabasalid protist, strand (Fig. 2). A highly modified HTH domain seen
Trichomonas vaginalis [59] (Fig. 2). The Fur family of in the catalytic domain of the phage integrases shows
bacterial transcription factors, which is involved in me- a rare insertion between helix-2 and helix-3 of the core
tal-responsive transcriptional regulation, shows a regu- domain [63]; however, this insert does not distort the
lar 2/3-stranded wHTH domain, but the wing is structure of the core. The FF-domain [64] and the C-ter-
incorporated into a large sheet formed with additional minal domain of the PP2C protein phosphatase [65] de-
C-terminal strands. A circularly permuted version of fine another highly modified version of the HTH
the basic wHTH domain, with one of the strands of domain that is currently only known from eukaryotes.
the wing moved to the N-terminus, is seen in the C-ter- This version domain shows the insertion of a helical in-
minal accessory domain of the methionine aminopepti- sert between helix-1 and helix-2 resulting a different
dase-2 (Fig. 2) [60]. packing of the elements and a distortion of the fold
(Fig. 2).
2.3. Other highly modified variants of the HTH domain
Fig. 3. Examples of domain architectures of proteins with HTH domains. The 3 panels show the HTH domains in one component systems, those in
two component systems and those fused to enzymatic domains. Below each protein the name of the protein or its family is indicated. The domain
names and abbreviations are as shown in the Table 1. Additional abbreviations are H – any HTH domain; wH – winged HTH; Assd – archaeal
specific signaling domain; Acyclase – adenylyl cyclases; AATRS – aminoacyl tRNA synthetase; TPR – tetratricopeptide repeats; HAMP – domain
present in histidine kinases, adenylyl cyclases, methyl-accepting proteins and phosphatases, chro-chromodomain BTAD- conserved domain found in
bacterial signaling proteins, CPS-D1, the domain 1 of the large subunit of the carbamoylphosphate synthetases, MGS- the methylglyoxal synthase
domain that binds ornithine in the carbamoylphosphate synthetases, ParB- the OB-fold nuclease domain seen in ParB proteins, B-hel- a conserved
beta-barrel seen in the Lhr helicases. The transmembrane regions are indicated by yellow boxes. The grey boxes represent uncharacterized globular
domains that are unique to a particular family of proteins. Af: Archaeoglobus fulgidus, Ana: Nostoc sp., Blic: Bacillus licheniformis, Bs: Bacillus
subtilis, Bthe: Bacteroides thetaiotaomicron, Ec: Escherichia coli, Llac: Lactococcus lactis, Lmon: Listeria monocytogenes, Mjan: Methanococcus
jannaschii, Mlo: Mesorhizobium loti, Mmaz: Methanosarcina mazei, Mcap: Methylococcus capsulatus, Mthe: Methanothermobacter thermautotro-
phicus, Mtu: Mycobacterium tuberculosis, Phor: Pyrococcus horikoshii, Poke: Planomicrobium okeanokoites, Tm: Thermotoga maritima, Sent:
Salmonella enterica, Spyo: Streptococcus pyogenes, Hs: Homo sapiens, Sc: Saccharomyces cerevisiae.
are likely to represent the more successful functional machinery. The eukaryotic ribosomal protein S10 and
solutions in which the HTH domain has been involved. the histone H1 respectively contain RNA and DNA
binding versions of the wHTH domain, which is fused
to low-complexity sequence that forms a non-globular
3.1. Simple architectures involving the HTH domain tail [67]. Such non-globular extensions that play a role
in non-specific nucleic acid contacts and protein con-
The simplest architectures involving the HTH do- tacts during transcription activation are very common
main are seen in certain proteins related to the cI repres- in eukaryotic transcription factors with HTH domains
sors (e.g. the archaeal repressors typified by AF1793), [41]. A family of bacterial proteins typified by the B. sub-
most proteins of the MetJ-Arc superfamily and the Fis tilis sigma D regulator YlxL (SwrB) [68,69] contains a
proteins. These proteins are almost entirely comprised HTH domain fused to a N-terminal transmembrane re-
of just a standalone HTH, and might, at best, have some gion (Fig. 3). These HTH proteins might regulate tran-
small extensions that play a role in dimerization or inter- scription under the influence of signaling events
actions with other components of the transcriptional associated with the cell membrane. The next level of
238
Table 1
Globular domains frequently linked to the HTH domain in the same polypeptide
Domain Structure Placement of HTHa Comments
Domains in two-component, phosphorelay and S/T kinase signaling cascades
REC (receiver domain) a/b. [Flavodoxin-like topology]. 1.19N; 78.35C Is phosphorylated on an aspartate residue by the histidine kinase dependent
PDB 1NTR phospho-relay system. Typically found fused to OmpR and LuxR-family
(in NarL-like proteins) HTH domains
Histidine kinase a + b. PDB 1BXD 0.10N; 0.03C Usually occurs as a standalone subunit, but on few instances is fused to the
downstream transcription factor with receiver and HTH domains
3H a + b fold 1.00N A domain sharing a common fold with the HPr domain of the PTS system. It has 3
conserved histidine residues that are likely to be phosphorylated
FHA b fold PDB 1LGQ 0.23N; 0.16C A phosphoserine/threonine peptide-binding domain. Combinations with HTHs are
expanded mainly in Actinomyctes, which have numerous serine/threonine kinases
Enzymatic domains
NadR nucleotidyl transferase a/b. HUP fold; PDB 1LW7 0.48N The catalytic domain functions in the adenylation of the nicotinamide
domain (HIGH) mononucleotide in NAD biosynthesis
mlr6529-C module with Contains two distinct sub- 2.16N A novel conserved module typically found at the C-terminus of HTH domains of
metalloprotease-like and domains An N-terminal all a-unit the cI-like family. The N-terminal sub-domain possesses the same fold of as the
metal-chelating domains and a C-terminal a + b unit that Zn-dependent metalloproteases but many copies of it may be catalytically
might adopt a PAS-like fold inactive as they show disruptions of HEXXH signature. The C-terminal sub-
domain contains a conserved group of 4 cysteines, which suggests that it chelates
metals. Its predicted secondary structure suggests that it may adopt a PAS-like
fold
Biotin ligase domain BirM (Fold: Class II aaRS and 2.65N The ligase domain activates biotin to form biotinyl-5 0 -adenylate which acts as an
biotin synthetases) and BirC effector for transcriptional repression. Its regular activity is the transfer of biotin
(Fold: SH3-like barrel) PDB moiety to biotin-accepting proteins
1HXD
PLPDE PDB 1DJU 9.48N Pyridoxal phosphate-dependent aminotransferases. They are typically found fused
to GntR family HTH domains
Sugar isomerase (SIS) a/b PDB 1M3S 2.39N Usually associated with regulators of polysaccharide metabolism regulons
Uroporphyrinogen-III synthase a+b 12.65N An enzyme in Porphyrin biosynthesis pathway. Usually found fused to a wHTH
domain of the OmpR family
239
(continued on next page)
240
Table 1 (continued)
Domain Structure Placement of HTHa Comments
Phosphoribosyltransferase a/b PDB 1STO 0.77N These enzymes transfer PRPP an activated form of phosphoribose to orotate and
(PRTase) purines in nucleotide biosynthesis. Independent fusions of HTHs to orotate and
purine phosphoribosyltransferases are seen
Sugar kinase a/b. RNAse H fold 6.77N Sugar kinases involved in polysaccharide metabolism are usually found fused to a
distinct subfamily of wHTH domains of the MarR family
b-D-Xylosidase a/b (TIM barrel) PDB: 1UHVD 0.58N This enzyme is found fused to the AraC-like HTH domains in regulators of
polysaccharide metabolism in low GC Gram positive bacteria
MJ0056 a+b 0.39N Found fused to HTH domain in well-conserved subfamily of MarR transcription
Domains possessing the same fold as the TolB-N terminal domain are fused to HTH
The number in this columns gives the frequency (per 1000 HTH proteins in prokaryotic genomes) of HTH domains located immediately adjacent to the globular domain under discussion. The
Catalytic domain found in topoisomerases, primases and nucleases with a catalytic
domains like IagA from S. typhi. This domain is invariably followed by a further set
A domain often found fused to a P-loop ATPase in the animal schalfen proteins and
A nuclease domain of the OB-fold involved in plasmid partitioning. Always found
These domains are involved in both DNA binding and protein-protein interactions.
ÔNÕ indicates that the HTH occurs N-terminal to the domain under discussion, whereas C indicates that it occurs to the C-terminus of the domain under discussion. A total set of 31100 proteins from
between the schlafen domain and the wHTH domain. Operon architectures suggest
184 prokaryotic genomes were analyzed. A Ô–Ô indicates that the HTH domain is either not found immediately adjacent to the module under discussion or that such combinations are very infrequent.
Two related RNA-binding domains typified by the bacterial ribosomal S1 protein
They are common in the archaeo-eukaryotic lineage in HTH proteins like TFIIE,
of TPR domains
All b; OB fold
domains
domains
TOPRIM
TolB-N
general functional trends associated with such combina- In related examples, the HTH domain recruits a cat-
tions. One common association of the HTH domain alytic domain that may act on proteins, rather than nu-
with catalytic domains represents the utilization of the cleic acids. A striking example of this is the wHTH
domain as a substrate-recognition or localization do- domain fused to the N-terminus of the Rio family of
main. The HTH is observed to be linked to catalytic do- protein kinases from archaea and eukaryotes [30]. The
mains in several proteins involved in DNA replication Rio family of kinases function in 40S ribosomal subunit
(e.g. the FtsK–HerA superfamily ATPase domain [73] maturation [80,81], and the wHTH domain recruits the
in bacterial chromosome pumping protein FtsK [74], linked protein kinase domain to an rRNA processing
or the AAA+ ATPase domain in MCMs and Cdc6/ protein complex. The LexA protein, the repressor of sev-
ORC1, which function in archaeal and eukaryotic repli- eral bacterial DNA repair genes, represents another var-
cation initiation [75]) and repair (e.g. AlkA-type helical iation on this general functional theme. It contains a
DNA glycosylase domain), certain restriction endonu- protease domain of the signal peptidase fold fused to a
cleases (e.g. FokI [45]) and modification methylases wHTH domain. The protease domain catalyzes an auto-
(e.g. ScrFIA methylase) (Fig. 3). In course of this survey catalytic cleavage in response to a DNA damage signal
we noted that the DprA protein (Smf/Dal/CilB), which and triggers dissociation of its wHTH domain from tar-
is required for protecting DNA during transformation get sequences, thereby allowing transcription of DNA
in several bacteria [76], contains a wHTH domain fused repair genes [82]. Architectures analogous to LexA are
to the C-terminus of a large globular domain containing also seen in the repressors typified by the heat-response
a specialized version of the Rossmann fold (Fig. 3). This transcription factor HdiR from the Lactococcus lactis
wHTH domain might recruit the Rossmann fold do- [83], where a LexA-like protease domain is fused to a
main to DNA, and enable it to catalyze an as yet cI-like HTH instead of the wHTH seen in LexA (Fig.
uncharacterized DNA-modifying activity that is re- 3). This implies that the mechanism of transcription reg-
quired for efficient transformation. Most transposases ulation with a proteolytic processing step was innovated
and integrases of diverse mobile DNA elements have independently on at least two occasions in evolution.
at least one HTH domains that help their catalytic do- The next major architectural theme involving combi-
mains associate with DNA [77]. In these instances the nations of HTH and enzymatic domains appears to be
HTH either serves as an additional tether that recruits related to feedback regulation of metabolic pathways.
the catalytic domain to DNA or it participates in sub- In such combinations, the HTH domain is fused to an
strate recognition. An extreme case of this functional enzymatic domain catalyzing a key step in a biosynthetic
theme is seen in the topoisomerases: the HTH domains pathway, and usually regulates the transcription of
have supplied, on at least four independent occasions, genes in that pathway. One of the archetypal representa-
the catalytic tyrosine of these enzymes that is covalently tives of this architectural theme is the biotin operon
linked to DNA ([63,78] and LA unpublished). Similarly, repressor, BirA, which contains an N-terminal HTH do-
in the methyl-DNA protein methyltransferase (O-6- main fused to a C-terminal biotin ligase domain [23]. In
methylguanine-DNA alkyltransferase), a wHTH do- the presence of biotin the enzymatic domain synthesizes
main fused to a truncated domain of the RnaseH fold the co-repressor, and the HTH domain represses the
bears the cysteine that receives the alkyl group from transcription of the biotin biosynthesis genes. Compara-
damaged DNA [46]. Continuing on this theme, it has tive genomics suggests that architectures involving fu-
been previously noted that the catalytic domain of the sions to a range of enzymes from cofactor, nucleotide,
lambda integrase family has itself evolved from a dupli- amino acid and carbohydrate metabolism are fairly
cation of the HTH domains [63]. In some cases, the common in archaea and bacteria [30] (Table 1). Some
HTH domain has also been recruited by enzymes in- notable fusions include combination of the HTH with
volved in RNA metabolism and translation for binding nicotinamide mononucleotide adenylyl transferase and
RNA, especially double-stranded structures peculiar to a P-loop kinase in NadR, with the pyridoxal-phosphate
certain RNAs. For example, the GTPase module of dependent aminotransferase domain (in bacterial HTH
the bacterial selenocysteine-specific elongation factor proteins of the GntR family), the orotate phosphoribo-
(SelB) is recruited to a specific RNA hairpin of seleno- syltransferase (in archaea), sugar kinases (Rok family in
protein encoding transcripts by a C-terminal extension bacteria), purine phosphoribosyltransferase (in archaea)
containing 4 tandem copies of the wHTH domain [44]. and the threonine synthase (restricted to the genus Pyro-
We also detected a wHTH domain, similar to those in coccus) (Fig. 3). Some of these architectures, like BirA
SelB at the N-terminus of the bacterial RNA-processing are widely distributed in the prokaryotic genomes and
enzyme Rnase R, which might bind RNA (Fig. 3). How- appear to be ancient. Others like the fusion of an OmpR
ever, in proteobacteria the Rnase R is also known to family wHTH with the uroporphyrinogen-III synthase
function as regulator of virulence genes [79], suggesting are found only in actinobacteria, while yet others like
that this HTH domain also additionally functions as a a fusion to the threonine synthase are restricted to a sin-
conventional transcription regulator. gle genus, and are apparently of more recent prove-
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 243
nance. This observation suggests that the combinations 3.4. Architectures related to two-component, PTS and
of HTHs with enzymatic domains have been repeatedly serine/threonine kinase signaling
selected for throughout the span of prokaryotic
evolution. The two component phospho-relay system, involving
Two other specialized classes of domain architectures the histidine kinase and the receiver domain, which is
arise through fusions of the HTH domains with either of phosphorylated on a conserved aspartate, comprise
two types of P-loop NTPase domains, namely the NtrC- one of the most common signaling systems in bacteria
like AAA+ domains [84] and the STAND (signal trans- and certain archaea [90,91]. The fusions of the receiver
duction ATPases with numerous domains) NTPase domain with the HTH are typical of transcriptional reg-
domain [85]. Proteins containing the NtrC-like AAA+ ulators responding to histidine kinase-dependent signal-
domains are found only in those bacteria that contain ing. Two of the most common architectures, seen in the
sigma-54, and they bind a distant enhancer element majority of bacteria, involve combinations of a single N-
and activate transcription of sigma-54 bound promot- terminal receiver domain to either a LuxR-like tetra-
ers. The AAA+ ATPase domains of these proteins per- helical HTH domain (e.g. UhpA and NarL) or wHTH
form an ATP-dependent chaperone-like activity that domain (e.g. OmpR and PhoB, Fig. 4). Less frequent fu-
converts the ‘‘closed’’ sigma-54-containing transcription sions involving HTH domains of the AraC- and the
complexes to an ‘‘open’’ configuration, which is favor- CitB families are seen in certain bacteria. Other than
able for transcription initiation [84]. The NtrC-like these simple architectures, several more complicated
AAA+ domains are fused to at least two different types architectures involving multiple receiver domains or
of HTH domains. The classical versions like NtrC and even fusions to additional histidine kinase (e.g. B. cereus
TyrR are fused to a C-terminal basic tri-helical HTH protein BC3207) and NtrC-like AAA+ ATPase (e.g. E.
domain of the Fis family [86]. The second version typi- coli NtrC) domains are also observed (Fig. 3) [90,91].
fied by the Bacillus levanase operon regulator, LevR, in- The PTS sugar-transport systems [92] use a phospho-
stead contains an N-terminal wHTH domain, suggesting relay cascade to transfer a phosphate from phosphoenol
that there have been two independent fusions of the pyruvate to a histidine on the PTS regulatory domain
HTH domain with NtrC-like AAA+ ATPases (Fig. 3). (PRD), which often co-occurs in the same polypeptide
The STAND P-loop NTPases are, as a rule, large mul- with HTH domains [93,94]. The PRDs receive the phos-
ti-domain proteins that appear to catalyze the ATP- phates from the HPr and EIIB proteins of the PTS sys-
dependent assembly of complexes in variety of signaling tem, and depending on their phosphorylation state
contexts [85]. They typically contain repetitive super- regulate transcription. Architectures involving the
structure-forming domains, such as the WD and TPR PRD domain are analogous to those involving the recei-
domains, which may serve as surfaces for the assembly ver domain of the two-component system. The simplest
of multi-protein complexes [85]. The archetypal mem- versions contain an N-terminal wHTH domain fused to
bers of the architectural class combining the HTH and a C-terminal PRD domain. The more complex forms
STAND domains are the E. coli MalT [87], Bacillus contain more than one PRD domains, or fusions to
GutR [88] and Streptomyces AfsR proteins [89]. A re- NtrC-like AAA+ domains and PTS system EIIB do-
cent analysis of the STAND superfamily revealed that mains, which determine sugar specificity [95,96] (Fig.
the HTH domains have been fused to them on several 3). The B. subtilis LicR protein contains an N-terminal
independent occasions [85]. The fusions involving the HTH fused to two PRDs and both EIIB and EIIA com-
OmpR family of wHTH domains (e.g. in AfsR) usually ponents of the PTS system [95,96], indicating that it is a
link the HTH to the N-terminus of the STAND NTPase multi-functional protein that directly regulates both su-
domain. In contrast, fusions involving the LuxR family gar uptake and transcription of sugar-utilization genes.
of HTH link it to the C-terminus of the STAND mod- The 3H domain, which is related to the HPr domain
ule, with a set of a-helical repeats occurring between of the PTS system, is also found fused to BirA-related
these two modules (e.g. GutR and MalT) (Fig. 3). The wHTH domain in several bacterial proteins typified by
STAND-domain-containing transcription regulators Tm1602 from Thermotoga maritima [97]. The 3H do-
are likely to integrate multiple signaling inputs via inter- main may represent another novel domain that may be
actions of their STAND and super-structure forming regulated by phosphorylation on its conserved histi-
domains, and are particularly prevalent in the develop- dines, perhaps via a PTS-like system.
mentally or organizationally more complex bacteria. The serine–threonine kinases are over-represented in
Another version of the HTH domain, associated with certain organizationally complex bacteria, like the cya-
a restriction endonuclease fold and STAND NTPases nobacteria and the actinobacteria. In the latter group
domains (Fig. 3), is found in the PH-type ATPases that there is class of proteins, typified by the protein EmbR,
are expanded in Pyrococci. These domains have been containing a fusion of the HTH domain with the FHA
predicted to localize the endonuclease domains of these domain [98]. The FHA domain in this protein binds
proteins to their target sequences [30]. phosphoserine peptides, and mediates its interaction
244 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
Fig. 4. Phyletic patterns and demography of selected families of HTH domains in selected prokaryotic proteomes. The bar graph depicts actual
counts of the number of HTHs in each family per genome. Species abbreviations are as follows: Ap: Aeropyrum pernix, Atu_w:Agrobacterium
tumefaciens C58 (U. Washington), Aae: Aquifex aeolicus, Af: Archaeoglobus fulgidus DSM 4304, Bs: Bacillus subtilis, Bth_V:Bacteroides
thetaiotaomicron VPI-5482, Blo: Bifidobacterium longum, Bbr: Bordetella bronchiseptica, Brja: Bradyrhizobium japonicum, Ccr: Caulobacter
crescentus, Ct: Chlamydia trachomatis, Ctep: Chlorobium tepidum, Ctet: Clostridium tetani E88, Cgl: Corynebacterium glutamicum, Dr: Deinococcus
radiodurans, Tth: Thermus thermophilus, Dvu: Desulfovibrio vulgaris, Ec_C:Escherichia coli CFT073, Fnu: Fusobacterium nucleatum, Gsu: Geobacter
sulfurreducens, Gvi: Gloeobacter violaceus, Hasp: Halobacterium sp. NRC-1, Lint: Leptospira interrogans serovar lai 56601, Mlo: Mesorhizobium loti,
Mj: Methanococcus jannaschii, Mac: Methanosarcina acetivorans, Mth: Methanothermobacter thermautotrophicus, Mtu_H:Mycobacterium tubercu-
losis H37Rv, Neq: Nanoarchaeum equitans, Neu: Nitrosomonas europaea, Ana: Anabaena sp. PCC 7120, Oyp: Onion yellows phytoplasma, Pto:
Picrophilus torridus, Psp: Pirellula sp., Pgi: Porphyromonas gingivalis, Pae: Pseudomonas aeruginosa, Pyae: Pyrobaculum aerophilum, Pa: Pyrococcus
abyssi, Rpa: Rhodopseudomonas palustris, Son: Shewanella oneidensis, Sme: Sinorhizobium meliloti, Sau_MR:Staphylococcus aureus subsp. aureus
MRSA252, Sav_MA:Streptomyces avermitilis MA-4680, Scoe: Streptomyces coelicolor, Sso: Sulfolobus solfataricus, Sth: Symbiobacterium
thermophilum, Ssp: Synechocystis sp. PCC 6803, Tm: Thermotoga maritima, Vch: Vibrio cholerae, Wsu: Wolinella succinogenes.
with the upstream protein kinase in regulating the bio- 3 and 4). In their simplest form they combine a HTH do-
genesis of the mycobacterial cell wall [99]. Taken to- main with a small molecule binding domain (SMBDs)
gether, HTH domains fused to the receiver, PRD, 3H [97]. More complex architectures may involve multiple
and FHA domains represent a distinct class of architec- SMBDs or even additional domains such as the NtrC-
tures that are typical of proteins responding to environ- like AAA+. The same SMBDs found in the single com-
mental and physiological stimuli downstream of ponent systems may also occasionally be found fused to
signaling cascades. two-component regulators, where they may supply sec-
ondary allosteric inputs (Fig. 3).
3.5. Architectures related to single-component signaling The most common SMBDs fused to HTHs in the sin-
gle component systems are drawn from a relative small
In contrast to the above-discussed signaling cascades, set of ancient protein folds (Table 1): (1) The PAS-like
the single-component systems are defined as those sig- fold, with representatives such as the PAS domain, the
naling systems in which the transcription regulatory do- GAF domain, and the ligand binding domains of the
main and the stimulus sensor domain are combined in a IclR-type transcription factors [66,100]. (2) The periplas-
single protein. These architectures, which are function- mic binding protein types I and II domains, which in-
ally analogous to the fusions of the HTHs with the met- clude the ligand-binding domains of the LysR family
abolic enzymes, are by far the most prevalent [101–103]. (3) The ferredoxin-like fold, which includes
architectural category in prokaryotes (Table 1 and Figs. the ACT domain and related ligand-sensing domains
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 245
of the Lrp-like transcription factors and the classic ferre- sensor for the assembly state of certain multi-protein
doxins, which are fused to HTH domains in archaeal complexes. Hence, transcription factors like IagA with
and cyanobacterial proteins [104–106]. (4) The double- the TolB-N-related domains might regulate transcrip-
stranded b-helix (cupin) fold, which contains the tion in response to the dynamics of multi-protein
AraC-type ligand-binding domains, as well as the cNMP complexes.
binding domains [97,107]. (5) The CBS domain that oc-
curs as an obligate dyad [108]. (6) The GyrI domain, 3.6. Unusual functional adaptations of the HTH domain
which contains two copies of the SHS2 structural mod-
ule, appears to be one of the principal ligand-binding Beyond its usual DNA binding role the HTH domain
domains of the MerR family [109]. appears to have been exapted for a variety of functions,
Some other SMBDs share a common fold with enzy- where it is utilized as a molecular adaptor. For example
matic domains, but appear to be catalytically inactive the permuted version of the wHTH in the N-termini of
versions that merely bind low-molecular weight sub- the methionine aminopeptidases appears to represent an
strates. Examples of these are: (1) the UTRA domain, ancient recruitment to a protein–protein interaction
which is found in the HutC/FarR group of GntR family function [60]. Several such instances of recruitment of
transcription factors and possesses the same fold as the HTH domain to protein–protein interactions are
chorismate lyase [110] and (2) The DeoR ligand-binding seen in the eukaryotes. One such example is the PINT
domain, which shares a common a/b fold, which is also domain, which forms the structural scaffold of the pro-
present in the enzymes of the phosphosugar isomerase teasomal lid, the signalosome and the eukaryotic initia-
family such as ribose phosphate isomerase [111]. The tion factor eIF3 [117,118]. It appears to have been
enormous genomic information has resulted in the avail- derived from a prokaryotic wHTH precursor which sec-
ability of proteins from numerous prokaryotes, display- ondarily lost its DNA-binding properties. The Snf8 fam-
ing several new domain architectures. In many of these ily of proteins in eukaryotes contains two tandem copies
proteins, uncharacterized globular domains are fused of a wHTH domain related to the PINT domain. This
with the HTH and other signaling domains, in architec- family includes the Vps22, Vps25 and Vps36 proteins,
tures analogous to those of known sensory domains. which are required for sorting of transmembrane pro-
Thus, these analogous architectures enable the predic- teins and lipids into the multivesicular-bodies in the
tion of novel sensory domains of one-component sys- eukaryotic vesicular transport system [119–121]. Just
tems. By this procedure several new candidate sensory as in the case of the PINT domain, the duplicate wHTH
domains, with somewhat lower abundance than the pre- domains of the Snf8 family proteins provide a scaffold
viously described domains, were uncovered (Table 1). for the formation of multi-protein ESCRT complexes
An example of such a domain is suggested by the PocR required for vesicular trafficking. Additionally, the same
protein, from Salmonella, with an AraC-like HTH do- complex of Snf8 family proteins are also implicated in
main, which binds the effector 1,2-propanediol, and reg- transcriptional elongation [119], suggesting that they
ulates the propanediol regulon [112]. It contains a might have been secondarily recruited for a eukaryote-
distinct globular N-terminal domain that is also found specific role in vesicular transport from an original role
fused to histidine kinases, chemotaxis receptors, and in transcriptional regulation. The cyclins and Retino-
diguanylate phosphodiesterases of the HD-GYP family blastoma are derivatives of the ancestral TFIIB protein
in other bacterial proteins (VA and LA unpublished). which were utilized for specific protein–protein interac-
These domain combinations suggest that it a novel evo- tions, respectively involved in regulating the eukaryotic
lutionarily mobile, small-molecule-sensing domain, cell-cycle controlling kinases and the E2F/DP1 proteins
which probably initiates responses through the domains [122]. Likewise, the DEP domain, which is found in sev-
with which it is linked. eral signaling proteins [49], the cullin C-terminal domain
We also found a conserved domain in the Salmonella (found in cullins, which are the adaptors for the ana-
typhi invasion regulator IagA [113], which occurs C-ter- phase-promoting ubiquitin ligases) [50], and the C-ter-
minal to the OmpR-like wHTH domain domain (Fig. 3). minal domain of Esa1-like histone acetylases [123] are
Additionally, it occurs independently in other bacterial other notable examples in which the wHTH domain
proteins fused to adenylyl cyclase and histidine kinase has been recruited to mediate specific protein–protein
domains (data not shown). Iterative sequence profile interactions in diverse eukaryote-specific signaling con-
searches suggest that this domain shares a common fold texts. In most of these proteins the HTH domain sticks
with the TolB N-terminal domain [114,115], and is typ- out as a distinct domain in a complex modular architec-
ically found at the N-termini of super-structure forming tural context.
repeats such as TPR and WD40 repeats (Fig. 3). These The plant isoflavone O-methyltransferases contain a
architectures, and the interactions of the TolB protein N-terminal wHTH domain which is related to that in
[116], suggest that this domain probably acts in unison bacterial transcription factors, and appears to function
with the super-structure forming proteins as a potential as a dimerization domain [124]. The closest relatives of
246 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
these plant methyltransferases are seen in bacteria (e.g. and its derivatives. Beyond these, there are few other
McmR from Streptomyces lavendulae) suggesting that highly derived versions whose provenance is hard to
the plant lineage probably acquired these enzymes establish on account of their extreme sequence and
through lateral transfer from a bacterial source. Hence, structure divergence. Below, we briefly describe the ma-
it is possible that the wHTH domain in the bacterial pre- jor evolutionary lineages of the HTH along with their
cursor originally functioned as a transcription regulator phyletic patterns and functional diversification.
that was fused to the methyltransferase domain (see
above for discussion on analogous fusions) but was sub- 4.1. Lineages of basic tri-helical HTH domains
sequently reused in the plants in a structural role. A sim-
ilar case is presented by the carbamoylphosphate Several distinct families that retain the primitive sim-
synthetase (CPS), which contains a tandem duplication ple tri-helical HTH domain are represented in one or
of HTH domains between the carboxyphosphate and more of the major divisions of life. The duplicate
carbamoyl phosphate synthetic modules of the enzyme HTH domains found in the carbamoyl phosphate synthe-
(Fig. 3). These HTH domains are of the simple tri- tase represent a distinctive lineage of simple trihelical
helical versions, like those encountered in bacterial tran- domains present in all the 3 super-kingdoms of life.
scription factors such as Fis and LuxR. However, in Phylogenetic trees show that these proteins follow the
CPS, rather than binding DNA, they apparently func- ‘‘standard model topology’’, with a distinct archaeo-
tion as protein–protein interaction domains that convert eukaryotic branch and a bacterial branch [126]. This
the enzyme to its oligomeric form in the presence of topology and their phyletic pattern suggest that this line-
uridine [125]. age most probably, goes back to the LUCA. The HTH
domains bearing the catalytic tyrosine of the topoiso-
merase I family that is found in all the 3 super-kingdoms
4. The evolutionary classification of HTH domains and the archaeal topoisomerase VI are also distinctive
lineages of tri-helical HTH domains with no specific
As the HTH is a small domain, which exhibits ex- relationship to any other HTH domains. The phyletic
treme sequence divergence, reconstruction of its higher- pattern of the former lineage suggests that it was prob-
order natural classification is fraught with problems ably present in the common ancestor of the 3 super-
arising from the erosion of evolutionary signal. Never- kingdoms [127].
theless, the availability of numerous high-resolution Rbp10 family, which is defined by the eponymous
structures and extensive sequence information allows RNA polymerase core subunit [54,128], is universal in
us at least to reconstruct the major evolutionary radia- both archaea and eukaryotes and appears to have been
tions of this domain. This reconstruction is based on part of the shared vertical inheritance of the archaeo-
three distinct sources of information: (1) Structural fea- eukaryotic lineage. Likewise the sigma factor family
tures (see above for discussion) help in establishing the [129] is conserved throughout the bacteria but bona fide
relationships at the highest level. (2) Sequence informa- representatives of this group are absent in the archaea
tion can be used for clustering based on similarity and eukaryotes. Members of this group are character-
scores, conventional phylogenetic analysis and cladistic ized by a tandem duplication of the HTH domain. sig-
analysis with discrete sequence characters. These se- ma-70, which is the basal transcription factor of the
quence-based procedures help in resolving the relation- bacteria, is usually present in single copy in all bacterial
ships at a lower level, such as defining the principal lineages and shows a noticeable phylogenetic signal sug-
sequence families, the relationships within them, and gestive of a largely vertical inheritance since the last
some of the higher-level groupings between sequence common ancestor of all the extant bacteria [129,130].
families. (3) Phyletic patterns (Fig. 4) help in recon- In contrast other sigma factor subfamilies show evidence
structing the temporal aspects of the evolutionary his- for lateral transfer, gene loss and lineage specific expan-
tory of these domains and also help in constraining sions (Fig. 4). In particular, the ECF subfamily appears
the directions of derivation of particular versions from to have been widely expanded in numerous bacterial lin-
others. The combined scenario gleaned from these direc- eages, especially in those with complex metabolic and
tions is presented schematically in Fig. 5 (for details of developmental capabilities [129]. The lineage specific
the combined approach see [85]). diversification of the ECF subfamily of sigma appears
This higher-order evolutionary scheme is character- to have been a major contributor to the evolution of
ized by the presence of several basal lineages that retain niche-specific adaptations in bacteria by allowing di-
the primitive basic tri-helical version HTH fold, but verse patterns of differential gene expression (see below).
have no other shared-derived character that groups The sigma-54 family appears to have been derived inde-
them together. These basal lineages are followed by pendently from the remaining sigma factors and is an-
the two great monophyletic lineages, namely the tetra- other distinct lineage of tri-helical HTHs that is
helical superclass and its derivatives and the wHTH sporadically distributed in bacteria.
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 247
Fig. 5. Higher order evolutionary relationships of HTH domains. The horizontal lines show temporal epochs corresponding to three major
transitions in evolution, the last Universal common ancestor, the diversification of archaea and bacteria and the evolution of the eukaryotes. Solid
lines reflect the maximum depth of time to which a particular family can be traced. Broken lines indicate an uncertainty with respect to the exact
point of origin of a lineage. Colored circles at the termini of the lines represent broad functional classes: where yellow represents DNA binding, pink
represents RNA binding and blue, interaction with proteins. The ellipses encompass groups of lineages from which a new lineage with relatively
limited distribution could have potentially emerged. Lineages of archaeal origin are colored blue, those of bacterial origin are colored orange and
those present in archaea and bacteria are colored black. Lineages only detected in the eukaryotes are colored green. The yellow triangle reflects the
origin of the L11 family of proteins, the blue triangle reflects the origin of the winged HTHs and the red triangle reflects the origin of the tetra-helical
version. The phyletic distribution of the lineages are also shown in brackets, where A: Archaea; B: bacteria, E: eukaryotes, proteo: Proteobacteria
and Crown: Crown group eukaryotes. The Ô>Õ reflects lateral transfer with the arrow head pointing to the potential direction of transfer.
248 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
The Fis family of basic tri-helical HTH domains is lating residues. The KorB–ParB family also contains a
also found solely in the bacteria and typically appears basic tri-helical version of the HTH domain, and func-
at the C-termini of the NtrC-like AAA+ domains tions as the partitioning protein for diverse bacterial
[131]. It appears to have emerged early in bacterial evo- plasmids. The KorB subfamily contains an additional
lution and spread widely via lateral gene transfer along C-terminal 4-helical DNA-binding domain [57] fused
with the spread of sigma factors of the sigma-54 family. to the HTH domain, while the ParB subfamily contains
The Fis protein itself appears to have been secondarily a fusion to a nuclease domain of the OB-fold [139].
derived in the proteobacteria through the gene fission The MetJ-Arc (RHH) family of transcription factors
of an NtrC-like regulator [131]. Likewise, the trihelical appears to have been derived from the basic tri-helical
HTH domains of the Rok family are restricted to the bundle [61]. They are most frequently found as tran-
bacteria and always found in combination with sugar ki- scriptional regulators of the mobile toxin–antitoxin
nase domains of Hsp70/actin fold. The archaeal Boa1 operons [140]. Hence, it is possible that they were origi-
family of tri-helical HTH domains contains a unique nally derived in such toxin–antitoxin systems, through
all b-strand domain at the N-terminus that is likely to rapid divergence from a conventional HTH. This ap-
bind its effectors (Table 1, Fig. 4). The Myb family is a pears to have happened early in the evolution of one
pan-eukaryotic family of simple tri-helical domains that of the prokaryotic lineages, after which they were widely
appears to have diversified into multiple members prior disseminated across the prokaryotes through horizontal
to the diversification of all extant eukaryotic lineages mobility.
(Fig. 5). Some versions of the Myb family, the SANT
subfamily, appear to have been recruited secondarily 4.2. The tetra-helical HTH superclass and its derivatives
for protein–protein interactions in the eukaryotic chro-
matin [132]. In bacteria, the Myb domain is only seen The first major monophyletic clade of HTH domains
in the RsfA-related pre-spore transcription factors of is defined by the unifying structural feature of the tetra-
the low-GC Gram-positive bacteria [133], suggesting helical bundle. Sequence similarities help in identifying
that it was acquired relatively late through lateral trans- several major lineages within this group. The cI-like
fer from the eukaryotes. In addition to the Myb domain, family, typified by the phage lambda cI protein, contains
the homeodomain and POU domain families [41], which representatives from across the 3 super-kingdoms of life
are also basic tri-helical HTH domain, are respectively (Fig. 4). Several distinct subfamilies can be recognized
widespread in the crown-group eukaryotes and metazo- within this family. The largest of these is the bacterial
ans. The HTH domain of the eukaryotic tumor suppres- repressor subfamily typified by the protein PbsX (Xre)
sor BRCA2 [72] is another simple trihelical version that from the B. subtilis prophage 168 [141]. Another notable
was derived early in eukaryotes. However, it shows no subfamily is the MBF1 subfamily, which is nearly uni-
specific relationships to any other HTH domains with versally conserved in the archaeo-eukaryotic lineage
a similar structural configuration. functions, and is an adaptor that appears to bridge the
Beyond these classical families there are several tri- specific transcriptional regulators to the basal transcrip-
helical HTH domains associated with diverse transpos- tion machinery MBF1 [142].
ases and resolvases, such as the gamma–delta resolvase. The next major assemblage within the tetra-helical
The exact point of origin of the HTH domains associ- superclass comprises of 6 major families that are exclu-
ated with these mobile elements is difficult to ascertain, sively prokaryotic in their distribution (Figs. 4 and 5).
but they appear to have given rise to families of HTH This assemblage includes the AraC, LuxR, LacI, DnaA,
domains found in cellular transcription factors on multi- TrpR and TetR families, which are predominantly bacte-
ple occasions. Particularly striking examples of these in- rial with several independent lateral transfers to archaea
clude the Paired box and Pipsqueak families involved in (Fig. 4). The first four of these families are nearly pan-
metazoan developmental gene expression, and the bacterial in their distribution suggesting that these
CENBP family (centromere binding protein) in the HTH families had probably diverged from each other
crown group eukaryotes [134–137]. Likewise, the HTH even in the common ancestor of all bacteria (Fig. 5).
domain of the bacterial YlxL(SwrB) family (Fig. 3) is The latter two lineages are more limited in their distribu-
also related to the HTH domains of the gamma–delta tion, but are found in most proteobacteria, and low GC
resolvase [138]. It is possible that other transcription Gram positive bacteria (Fig. 4). Within some of these
factors families with a relatively restricted phyletic dis- families several distinct lineages, often defined by spe-
tribution, like the eukaryotic homeodomain, and the cific architectural themes can be identified. However,
bacterial sigma-54 family have also ultimately been de- most of these families contain a dominant architectural
rived from the HTH domains of transposases. The me- theme suggesting these might have been the earliest ver-
tal-binding domain of the retroviral integrase also sions of these families. The AraC family contains a
appears to have been derived from the HTHs of trans- duplication of the tetra-helical version of the HTH do-
posase/resolvase class through acquisition of metal-che- main [143], and typically occurs fused to a sugar binding
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 249
domain of the DSBH fold [97,107] suggesting that they these families, barring some striking exceptions (see be-
predominantly function as sugar-sensing transcription low) function as transcription factors suggesting that
factors. However, in complex cyanobacteria, like Nos- they could have descended from an ancestral protein
toc, the majority of AraC family HTH domains occur that had some role in transcriptional regulation. The
fused to a novel sensor domain that is also found in MarR family has vastly proliferated in the archaea to
the siderophore alcalgin sensing transcriptional activa- give rise to several archaeal subfamilies and includes
tor, AlcR, from Bordetella [144] (Fig. 6 and Table 1). most of the major archaea-specific wHTH transcription
The most common subfamily of the LuxR family con- factors. In bacteria the RuvB sub-family, which is de-
tains fusions to the receiver domain, as seen in the case rived from the ancestral HrcA lineage, appears to have
of the E. coli protein NarL. In the case of the LacI fam- secondarily acquired a role in DNA recombination after
ily the dominant architecture features a fusion of the a fusion with the AAA+ domain in the ATPase subunit
HTH domain with a small-molecule binding domain of the Holliday junction resolvase [148]. Another sub-
of the periplasmic solute-binding protein fold. DnaA is family of the HrcA–RuvB family, S19AE is a small sub-
usually found only in a single copy in all bacterial gen- unit ribosomal protein in the archaeo-eukaryotic
omes, with the HTH occuring at the C-terminus of the lineage, with a potential RNA-binding function (Fig.
AAA+ domain [145]. The DnaA protein functions in 5). A similar case is observed in the GntR family, which
replication initiation, and also as a transcription factor has vastly proliferated in bacteria giving rise to many of
[146]. Additionally, sporadic versions of the tetrahelical the major bacterial one-component transcription fac-
HTH superclass are also seen in several phage transpos- tors. However, in the archaeo-eukaryotic clade it is rep-
ases related to the Mu transposase. resented by a single lineage, the small subunit ribosomal
TFIIB, a basal transcription factor in the archaeo- protein S25AE (Figs. 4 and 5).
eukaryotic lineage, defines the TFIIB family, a derivative In contrast to the above families, the BirA, ModE,
of the tetra-helical class. In the archaea there is typically PadR, ScpB and DtxR-Fur families appear to have
a single version of this family (Fig. 4). However, in the had a bacterial origin followed by sporadic lateral trans-
eukaryotes not only did TFIIB undergo duplication, fers to the archaea (Fig. 4). The DtxR–Fur family
but it also spawned two more divergent families, namely appears to have specialized early on in regulating me-
the cyclins and the Rb proteins [122]. Structural com- tal-dependent transcription of genes. The ScpB family
parisons suggest that the eukaryote-specific Bright do- is typically represented in most bacteria by a single pro-
main family [55,147], which includes DNA-binding tein, which is encoded in the same operon as a kleisin
proteins involved in chromatin dynamics was also de- and a SMC-type ABC ATPase [149,150]. The ScpB pro-
rived from a TFIIB-like precursor prior to the radiation tein contains a tandem duplication of the wHTH with a
of eukaryotes from their common ancestor. The prove- C-terminal helix after the wing (Fig. 3), and has been
nance of the only bacterial relatives of this version, shown to regulate the activity of the chromosome re-
namely the Spo0A protein from endospore-forming bac- organizing SMC ATPases [149,150]. The archaeal repre-
teria [56] remains unclear. sentatives of this family appear to have been acquired
through a lateral transfer from the cyanobacteria
4.3. The wHTH superclass (LMI and LA unpublished). The pan-bacterial families,
namely CitB, LysR and Rrf2, are largely absent in the
The second major monophyletic clade of HTH do- archaea (Fig. 4). Interestingly, a single member of the
mains are unified by the presence of a striking derived Rrf2 family (GLP_14_27362_29578) is seen in the early
feature, the ‘‘wing’’. Several major assemblages with branching eukaryote Giardia lamblia. A number of
varying distributions can be identified within the wHTH wHTH domains with distinct sequence conservation
superclass. The wHTH superclass includes the majority profiles, and occurring in large multi-domain proteins
of prokaryotic transcription factors. Thirteen major also belong to this assemblage of wHTH domains with
families of prokaryotic wHTH domains, namely the a C-terminal helix after the wing. These include (1)
BirA, ArsR, GntR, DtxR-FurR, CitB, LysR, ModE, two HTH domains of the topoisomerase II family, (2)
MarR, PadR, YtcD, Rrf2, ScpB and HrcA-RuvB fami- the HTH domain of the enigmatic topoisomerase V,
lies, are unified by the presence of a characteristic helix which is currently found only in Methanopyrus kandleri
after the wing, and comprise the largest monophyletic [78], (3) The ESA1 family of eukaryotic histone acety-
assemblage within the wHTH superclass (Fig. 5). Of ltransferases and (4) the N-terminal HTH domains of
these, representatives of the ArsR, MarR, YctD, GntR, the Rio kinases from archaea and eukaryotes. Two re-
HrcA-RuvB are seen in both archaea and bacteria (Fig. lated wHTH domains from Giardia (Genbank Gis:
4) with phylogenetic trees suggesting distinct pan-archa- 29245940, 29247865) also appear to have been derived
eal and pan-bacterial branches within them (data not from within the above-discussed assemblage of wHTH
shown). This would imply that these families possibly domains, but their precise affinities are obscured due
even go back to the LUCA (Fig. 5). The members of to rapid sequence divergence.
250 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
Fig. 6. A multiple alignment of the AlcR N-terminal module. Multiple sequence alignment the AlcrN domain was constructed using T-Coffee after
parsing high-scoring pairs from PSI-BLAST search results. The Jpred secondary structure is shown above the alignment with H representing an a-
helix and E representing a b-strand. The 85% consensus shown below the alignment was derived using the following amino acid classes: hydrophobic
(h: ALICVMYFW, yellow shading); small (s: ACDGNPSTV, green) and polar (p: CDEHKNQRST, blue). The conserved ÔCÕ is shaded red. The
limits of the domains are indicated by the residue positions, on each end of the sequence. The sequences are denoted by their gene name followed by
the species abbreviation and GenBank Identifier (gi). The species abbreviations are: Ana: Nostoc sp., Bbro: Bordetella bronchiseptica, Bpar:
Bordetella parapertussis, Ctet: Clostridium tetani, Ecar: Erwinia carotovora, Gvio: Gloeobacter violaceus, Pae: Pseudomonas aeruginosa, Plum:
Photorhabdus luminescens, Psyr: Pseudomonas syringae, Rpal: Rhodopseudomonas palustris, Rsol: Ralstonia solanacearum, Smel: Sinorhizobium
meliloti, Spne: Streptococcus pneumoniae, Ssp: Synechocystis sp., Styp: Salmonella typhimurium, Tden: Treponema denticola, Wsuc: Wolinella
succinogenes.
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 251
The next major monophyletic assemblage of wHTH protein S10 and the bacterial selenocysteine-specific
superclass includes the DeoR, ArgR, LevR, YitL, Lrp- elongation factor SelB appear to comprise a family prin-
AsnC, ZBD (Z-DNA binding domain), and RNase R cipally associated with translation and RNA metabo-
families. These families are unified by overall sequence lism proteins (Figs. 3 and 5). Their phyletic patterns
similarity, and a conserved pattern with a conserved glu- suggest that their recruitment to RNA-specific functions
tamine or arginine residue between helix-1 and helix-2 of appears to have occurred after the separation of the ma-
the HTH domain. Of these the Lrp-AsnC family is jor superkingdoms of life, though it is possible that a
widely conserved in both bacteria and archaea (Figs. 4 standalone precursor of this family was already present
and 5) and in phylogenetic trees displays distinct in the LUCA. Another distinct family of wHTH do-
branches separating the majority of archaeal and bacte- mains with an exclusive single-stranded RNA-binding
rial members. Hence, it is possible that the Lrp-AsnC function is the La domain family [47,48,157]. The La do-
family goes back to the LUCA. The ArgR and DeoR main has previously only been reported from eukary-
are predominantly bacterial families, whereas the LevR otes; however, using sequence profile analysis we show
group is sporadically found, mainly in low GC Gram that it is homologous to the N-terminal domain of the
positive bacteria. The RNase R family is a limited group NAD-dependent RNA 2 0 -phosphotransferase [158],
that is represented by just a single pan-bacterial orthol- which removes the phosphate from the 2 0 ends of
ogous lineage in the form of the wHTH domain at the RNA. In the RNA 2 0 -phosphotransferase the La do-
extreme N-terminus of the Rnase R protein (Fig. 3). main bears two of the four absolutely conserved cata-
Its widespread distribution in the bacteria suggests that lytic histidines (LA unpublished), suggesting that it is
it emerged early in the evolution of this lineage. The another case of recruitment of the HTH domain for a
ZBD family is restricted to the crown group eukaryotes catalytic role. The RNA 2 0 -phosphotransferases are
and in animals it is fused to the deaminase domain in- highly conserved in the archaeo-eukaryotic and sporad-
volved in hyper-mutation of the immunoglobulin genes ically observed in the bacteria. This suggests that the La
[151,152]. The restricted phyletic pattern of the ZBD family of HTH domains emerged early in the archaeo-
family suggests that it may have evolved from one of eukaryotic lineage and were subsequently laterally trans-
the prokaryotic families after lateral transfer to the ferred to bacteria. In the eukaryotes the non-catalytic
crown group eukaryotes (Fig. 5). versions (the classical La domains) were recruited for
The wHTH domains found in the archaeo-eukaryotic binding 3 0 poly(U)-rich elements of nascent RNA poly-
proteins involved in replication initiation, namely the merase III transcripts and translation regulation [47,48].
MCM proteins, the CDC6–Orc1 proteins (C-terminal The fungal protein frequency [157] contains an as yet
to the AAA+ ATPase domains in both these cases) functionally uncharacterized version of the La domain,
and the RP-A protein comprise the replication initiation which may regulate the circadian clock via RNA
family of wHTH domains. Both the MCM and the metabolism.
CDC6 versions appear to have been present right from There are other distinct families of wHTH transcrip-
the base of the archaeo-eukaryotic lineage [30,153,154]. tion factors in prokaryotes with related 2- or 3- stranded
The version associated with RP-A occurs as a stand- wHTH domains, but they do not appear to belong to
alone protein in the archaea, while it appears to have any of the aforementioned assemblages. These include
been fused to the single strand DNA binding OB fold the LexA, OmpR, and IclR families that appear to be
domains in the eukaryotes. Also belonging to this family pan-bacterial families, with at best a rare presence in
are the C-terminal wHTH domains from the replicative the archaea (Figs. 4 and 5). The classical representatives
helicase-primase enzymes of various viruses such as P4, of the LexA family appear to be involved in regulating
plasmid Rep proteins and the eukaryotic RFX-type responses to DNA damage in diverse bacteria [82].
DNA binding domain seen in transcription factors like However, a highly divergent, potential offshoot of the
the MHC class II transcription factor HRFX1 LexA family is seen fused to the C-termini of large
[75,155,156]. This family might have originally evolved DNA helicases (lhr) that are found sporadically in sev-
to recognize specific DNA features associated with the eral bacteria (Fig. 3). The eukaryotes display their own
replication initiation sites, and recruiting the catalytic families of specific transcription factors, such as the
activities involved in pre-initiation and initiation to Forkhead-histone H1 [25,26] and E2F-DP1 families
these sites [75]. The RFX domains are specifically re- [159], and the basal transcription factors, such as the
lated to certain phage helicase-primase wHTH domains TFIIF(Rap30)-TFIIIC-63K family [160] that show
belonging to this family. This suggests that the crown structurally similar wHTH domains, but lack specific se-
group eukaryotes may have acquired the RFX domains quence relationships with any of the prokaryotic fami-
from such a viral replication protein and reused them as lies. Within the TFIIF-TFIIIC-63K family, the RNA
a transcription regulator (Fig. 5). polymerase III transcription factor, TFIIIC-63K, is con-
The wHTH domains of the archaeal phenylalanyl served throughout eukaryotes, but TFIIF appears to be
tRNA synthetase a-subunits, the eukaryotic ribosomal restricted to the crown group and the apicomplexans.
252 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
The wHTH domains of the TFIIF and TFIIIC-63K are distribution. A potential RNA-binding version of this
functionally similar to those of the Forkhead-histone H1 domain is the N-terminal domain of translation initia-
family suggesting that the latter were probably derived tion factor IF2 from certain bacteria [165]. The remain-
from more widespread family of basal transcription fac- ing versions, observed in the phage lambda excisionase
tors [160]. The remaining eukaryotic families appear to and terminase proteins, the phage Mu-repressor family
be chiefly represented in the eukaryotic crown group, and the eukaryotic DNA repair protein XP-A and ani-
implying that they arose relatively late from pre-existing mal transcription factors of the Dachshund family, ap-
eukaryotic wHTH domains found in the basal transcrip- pear to be DNA binding [166–168]. Eukaryotic Xp-A
tion machinery or via rapid divergence from laterally family is involved in nucleotide excision repair [169],
transferred prokaryotic transcription factors. A similar and appears to have been derived at some point in
scenario appears to be applicable for the four eukaryotic eukaryotic evolution from the functionally similar phage
wHTH families that are not involved in DNA-binding, excisionases. The principal diversification of this assem-
namely the PINT domain, the SNF8 (with two tandem blage appears to have happened early in bacterial evolu-
copies of the wHTH domain), cullin and the DEP domain tion resulting in the two ancient families, MerR and the
families. FTRS b-subunit N-terminal domain. Given that regular
Beyond these multi-member families there are highly 3-standed wHTHs are also found in association with
conserved lineages of 2- or 3- stranded wHTH domains other translation proteins, like the archaeal FTRS-a
that are typically found in single orthologous groups of subunit and SelB, it possible that the prototype of the
proteins, and cannot be linked to any of the larger MerR-like version was derived from such a form
assemblages. Such lineages include wHTHs found in through loss of the initial helix.
the bacterial proteins such as FtsK, DprA and O-6-
methylguanine-DNA alkyltransferase, and the ar- 4.4. Other miscellaneous families of HTH domains
chaeo-eukaryotic proteins like TFIIE (Fig. 3) [30,161].
Distinct from the 2- and the 3-stranded HTH do- In addition to the above-described major assem-
mains are the 4-stranded HTH domains that appear to blages, there are a few ancient HTH families with uncer-
form a separate monophyletic assemblage within the tain affinities. The chief amongst these are the related
wHTH clade (Fig. 5). The main prokaryotic family in ribosomal proteins L11 and S18 (Fig. 2). The former is
this assemblage is the Crp family [36] that has a pan-bac- conserved in all the three super-kingdoms of life and
terial distribution and sporadic presence in few archaea. binds RNA in the L11-stalk structure, which appears
Thus, it seems to represent a bacterial innovation that to go back to the shared ancestral core of the ribosome
was disseminated to the archaea via lateral transfer. [170]. S18 appears to have been derived from L11 in the
Members of this family are typically fused to a C-termi- bacteria. Despite its apparent ancient origins L11 ap-
nal cNMP-binding domain, and appear to have special- pears to be a highly derived version of the HTH. Hence,
ized early on as cyclic nucleotide dependent regulators. notwithstanding certain general similarities with the 4-
The eukaryotes contain a single major family of this stranded wHTHs, it is more likely that L11 has conver-
assemblage, the HSF family (heat shock transcription gently acquired these features early in evolution.
factor), which is present only in the crown group
eukaryotes. In the animals this protein appears to have
spawned two distinct sub-families that are prototyped 5. Proteome-wide demographic trends of HTH domains
by the ETS domain and the IRF domain (interferon reg-
ulator factors) [162,163]. Given their relatively restricted The availability of a large number and phyletic diver-
presence in eukaryotes, it is possible that they have orig- sity of complete genome sequences allows robust estima-
inated through rapid divergence from laterally trans- tion of the general trends in the proteome-wide
ferred prokaryotic versions. A similar scenario could distribution of HTH domains. In order to detect the
be envisaged for the origin of the orphan initiator bind- occurrences of HTH domains in proteomes, position-
ing protein from Trichomonas, which also contains a 4- specific score matrices or sequence profiles were con-
stranded version of the HTH domain. structed for the various distinct families of these domains
The MerR-like assemblage of truncated wHTH do- using seed alignments with diverse representatives. These
mains are derivatives of 3-stranded wHTH domains sequence profiles were then used to iteratively search the
(Fig. 2). The MerR family proper [37] and the related target proteomes with the PSI-BLAST program [42].
wHTH domains present in the DNA-binding region of Alternatively, the alignments were used to generate hid-
the bacterial phenyalanyl tRNA synthetase b-subunit den Markov models, which were similarly used to search
[164] show a pan-bacterial distribution. In bacteria the the proteomes with the HMMER program [42]. Using a
MerR family has vastly proliferated into several distinct combination of these procedures we determined the total
subfamilies, like the SoxR and CueR subfamilies [37]. number of proteins containing HTH domains encoded by
However, most other versions show a more restricted all completely sequenced organisms that were available at
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 253
the time of the analysis. For prokaryotes, the plot of the provided by a quadratic curve of the form y = a ·
total number of HTH domains against gene number per x2 + b · x; where ÔaÕ and ÔbÕ are constants; r2 = 0.76).
genome is best fitted by the power equation of the form The non-linear scaling of the sigma factors suggests that
y = k · xc (where k and c are constants; r2 = 0.89; see in the more complex genomes the additional genes are
Fig. 7(a)). The r2 of 0.89 for this fit suggests that this ten- distributed amongst several functionally specialized
dency is indeed strongly maintained across a wide diver- gene batteries, which are under the regulation of de-
sity of genomes. This non-linear scaling of HTH voted sigma factors responding to specific situations.
domain numbers with gene number is consistent with re- Interestingly, a few genomes show a significantly greater
cent studies that have suggested that the transcription fac- than expected number of sigma factors (Fig. 7(d)). The
tor counts follow a power equation with respect to gene most striking example is seen in the case of Phytoplasma
number [171]. Given that the HTH domains are the main asteris, which, like other Mycoplasmas, has a highly re-
transcription factors in most of the prokaryotic genomes duced genome with just over 700 genes [172]. Whereas,
it is clear that the trend observed for transcription factors the other Mycoplasmas have only a basal sigma-54, P.
is principally a reflection of the distribution of HTH do- asteris has, in addition to sigma-54, a recent lineage-spe-
mains (Fig. 7). This distribution function suggests that cific expansion of 11 sigma factors that are related to the
as gene number increases, a greater than linear number Bacillus sigma F. Likewise, Bacteroides thetaiotaomicron
of HTH domain regulators are required per gene. and Nitrosomonas show recent lineage-specific expan-
Examination of major architectural classes of HTH sions of ECF-type sigma factors that have given rise
indicates that there is an interesting differential class- to at least 10 closely related paralogous members in their
wise partitioning of the trends. HTH proteins belong- proteomes. In the case of P. asteris there is evidence that
ing to two-component, serine/threonine kinase and the sigma factors may constitute a novel transposon
PTS signaling cascades are almost entirely missing in [173]. While this possibility also exists in the case of
archaea. In the bacteria, where they are abundantly the other bacteria that show a greater than expected
present, their numbers show a linear scaling with re- number of sigma factors, it is likely that some of them
spect to gene count per genome (r2 = 0.8) (Fig. 7(c)) might have been utilized as transcriptional regulators.
[42]. This suggests that each HTH in two-component This potential link between sigma factors and transpo-
and related phosphorylation cascades regulates a fixed son is consistent with the repeated recruitment of
number of target genes, and as the gene numbers grow HTH domains from transposases as specific transcrip-
larger, the regulators increase in direct proportion to tional regulators.
their increase in targets. This is consistent with a model The genomic data from eukaryotes is still not ade-
of evolution in which the two-component systems and quate to discuss general genome-wide trends. However,
their target genes undergo duplication with approxi- the available data suggests that eukaryotes display dif-
mately the same probability as the size of a bacterial ferent tendencies from the prokaryotes. The HTH do-
genome increases in evolution. In contrast to the mains of the homeo and Myb proteins are amongst
two-component systems, the one component systems, the most prominent DNA-binding domains of transcrip-
which chiefly comprise of those HTH domains fused tional regulators in the plant and animal lineages. How-
to specific SMBDs or metabolic enzymes, scaled non- ever their numbers are relatively low in the other
linearly with increase in gene count per genome. The eukaryotic lineages. This suggests that the rise in prom-
best fit for the one component systems was obtained inence of HTH transcription factors may have been a
with the power equation of the form y = k · xc relatively late phenomenon that occurred on multiple
(r2 = 0.76, Fig. 7(b)). This equation suggests that the occasions in the eukaryotic crown group. The parasitic
HTH domains belonging to the one-component system apicomplexa, including those forms that have genome
are likely to be a significant contributor to the over-all sizes comparable to some free-living fungi, have far
power equation-type distribution of the HTH domains. fewer transcription factors in general [174]. The early
This observation implies that as genome size increases branching eukaryote Giardia lamblia has at least 13
a greater than proportional increase in the numbers Myb domains and 2 Bright domains, but apparently
of one-component transcription factors is required for no other representatives of the HTH domains found in
controlling the newly added genes. This tendency might the eukaryotic specific transcription factors (LA unpub-
correlate with the need to regulate specialized groups lished). Furthermore, the scaling of the total number of
of their genes, by combining the distinct inputs sensed transcription factors in eukaryotes with gene counts per
by the effector-binding domains of multiple sets of one- genome is not equivalent with what is observed in the
component transcription factors in the metabolically or prokaryotes. This difference may arise due to two main
organizationally complex bacteria with large genomes. reasons: (1) the emergence of a complex apparatus for
The counts of sigma factors, too on an average, are chromatin-structure regulation might have changed
positively correlated with gene count per genome (Fig. the nature of transcriptional control in eukaryotes. (2)
7(d)), and scale non-linearly with it (the best fit being Most eukaryotic transcription factors are effectively
254 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
Fig. 7. Scaling of HTH domains with gene number per genome. All graphs show a scatter plot of number of proteins with HTH domains in a given
proteome (Y-axis) versus the number of protein-coding genes in that organism (X-axis). (a) The Y-axis is the overall number of proteins with HTH
domains. (b) The Y-axis is the number of predicted one-component system proteins with HTH domains. (c) The Y-axis is number of two-component
system and other phospho-relay system proteins with HTH domains. (d) The Y-axis shows the number of sigma factors. For each graph the best-
fitting trend line along with its r2 value is shown.
down-stream of signaling cascades that communicate the archaeo-eukaryotic clade are ribosomal proteins
from the cell membrane, or the cytoplasm to the nu- (S19AE and S25AE); whereas all the known bacterial
cleus. Hence, there are there are few equivalents of the representatives of this family are specific transcription
genuine prokaryote-type two-component systems in factors (Fig. 5). All other versions of the HTH associ-
the eukaryotes. Additionally, the eukaryotes might also ated with translation and RNA metabolism, such as
extensively employ post-transcriptional control mecha- those found in La/RNA 2 0 phosphatase, SelB and ribo-
nisms, involving regulatory RNAs, resulting in a lower somal protein S10E, appear to have been derived after
dependence on transcriptional regulators [175,176]. A the separation of the archaeo-eukaryotic and bacterial
combination of these factors might account for the dif- clades. The simplest interpretation of these observations
ference in the average number of genes controlled by is that the majority of HTH domains associated with
per transcription factor in eukaryotes and prokaryotes. RNA metabolism settled into their extant functional
niches only after the divergence of the major lineages
from LUCA. Hence, excluding L11, it is likely that other
6. General considerations on the natural history of the HTH domains associated with RNA metabolism in the
HTH fold and implications for the evolution of LUCA performed more generic functions compared to
transcription their extant counterparts. The DNA-binding property
is strongly preserved across diverse lineages and struc-
With the exception of the ribosomal protein L11 no tural variants of the HTH fold, and involves helix-3, de-
bona fide HTH domains with an ancestral RNA-bind- spite the several variations in the details of the
ing role can be confidently traced back to the LUCA. interactions of individual versions of the domain. This
In the HrcA and GntR families the representatives from observation, taken together with the more sporadic
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 255
distribution of the versions of the domain associated ilies in LUCA performed a generic nucleic-acid-binding
with translation and RNA metabolism, suggests that function. They appear to have been secondarily re-
the ancestor of most of the extant versions of the cruited as specific transcription factors only in the bacte-
HTH, excluding L11, was a DNA-binding protein. Fur- ria, thereby supporting the second scenario. The basal
thermore, the diversification of this domain was poten- transcription factors and ribosomal proteins tend to
tially associated with the emergence of DNA as show a stronger signal of vertical inheritance as com-
genetic material [75,153]. pared to specific transcriptions factors which are prone
While there are HTH domains in the basal transcrip- to rampant gene loss and lateral transfer [32,177]. This
tion factors of both the bacterial (sigma factors) and ar- is not unexpected, given the functional constraints act-
chaeo-eukaryotic (TFIIB, TFIIE, and MBF1) lineages, ing on the basal transcription factors, and might con-
none of these can be considered as being truly ortholo- found the reconstruction of the actual evolutionary
gous [30,32,33]. In contrast, several families of the scenario for the specific transcription factors.
HTH domains in specific transcription factors appear Although the first scenario of basal transcription fac-
to be extensively shared by the bacteria and archaea tors emerging after the specific transcription factors
(Fig. 4). Though several of the prokaryotic families might appear counter-intuitive, detailed analysis of the
shared by bacteria and archaea can be easily explained reconstructed house-keeping functions in the LUCA
as arising from relatively recent lateral transfer between suggest that it is hardly implausible. Earlier studies on
the prokaryotic super-kingdoms, some others like the the DNA replication and chromosome partitioning sys-
MarR, ArsR, YctD, Lrp, HrcA and GntR families ap- tems suggest that the central enzymes of the DNA rep-
pear to show distinct pan-archaeal and pan-bacterial lication apparatus appear to have emerged only after
groups suggesting that they were present in the earliest the split of the archaeo-eukaryotic and bacterial lineages
representatives of each of the super-kingdoms, hence [32,127,178]. Thus, the DNA replication system resem-
potentially go back to the LUCA (Fig. 5). Despite the bles the basal transcription apparatus in its origins.
basal transcription machinery shared with the archaea, More specifically, the absence of an ancestral DNA
the specific transcription factors that are traceable to polymerase and associated replication enzymes in the
the last eukaryotic common ancestor do not belong to LUCA suggest that it probably had a system of replica-
the same families as the specific transcription factors tion involving reverse transcription [127]. However,
seen in the archaea [30]. These patterns raise a profound simpler but completely functional DNA-dependent
conundrum regarding the origin of the phyletic patterns RNApolymerase (DdRP) subunits have been recon-
of HTH domains in the basal and specific transcrip- structed for the LUCA [130,179]. Hence, it is possible
tional regulators of extant organisms. that the LUCA did not use basal transcription factors
Even though several scenarios could in principle ex- and the DdRPs chiefly functioned as enzymes that sup-
plain these patterns, there are only a few parsimonious plied the RNA template for the replication process
alternatives that account for the currently available involving a reverse transcription step. The precursors
data. Given that both the basal transcription machinery of the specific transcription factors might have still func-
and the domain architecture of the RNA polymerase tioned under these circumstances, primarily acting as
catalytic subunits are very different in the bacterial and general repressors that regulated the synthesis of the
the archaeo-eukaryotic lineages [130], one could extrap- RNA template. The basal transcription factors probably
olate that the basal transcription factors arose only after arose only when the genome got organized into multiple
the two great lineages had separated from the LUCA. In tandem operons, each needing its own transcription ini-
this situation, the sharing of the specific transcription tiation signal. This suggestion is consistent with the fact
factors by the archaea and the bacteria could be ex- that the prokaryotic specific transcription factors can
plained in two possible ways: (1) The LUCA had several function equally well with both archaeo-eukaryotic-
specific transcription factors but no basal transcription and bacterial-type basal transcription machinery and
factors. (2) Alternatively, there were neither specific RNA polymerases, despite the numerous differences
nor basal transcription factors in the LUCA and both [180].
types emerged after the lineages separated. However, Irrespective of the scenarios, the HTH domains of
multiple very early lateral transfer events resulted in pro- both the simple tri-helical type (e.g. carbamoylphos-
karyotic lineages sharing a common set of specific tran- phate synthetase and topoisomerase I) and the wHTH
scription factors. This latter scenario is the consistent type (e.g. topoisomerase II family) are present in pro-
with the evidence for extensive lateral transfer between teins that can be confidently traced back to the LUCA.
the two prokaryotic super-kingdoms throughout their Depending on the scenario, there were at least 6–11 dif-
evolution [32,177]. At least in the case of the HrcA ferent HTH domains in the cellular genomes, suggesting
and GntR families, the striking difference in functions that the fold had undergone structural and functional
of the extant bacterial and archaeo-eukaryotic represen- differentiation even before the period of the LUCA.
tatives suggests that the ancestral versions of these fam- Subsequently, in course of prokaryotic evolution, the
256 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
HTH domains rapidly expanded in several prokaryotic tion factors have chiefly emerged through multiple
genomes to rank amongst the folds with highest repre- lineage-specific expansions.
sentation [32,35]. The currently available evidence sug-
gests that the common ancestor of extant eukaryotes
arose later, via an endosymbiotic event involving an 7. General conclusions
archaeal precursor for the nucleus, translation and
secretion apparatus and an a-proteobacterial precursor The HTH domain, one of the best-studied of the
for the mitochondrion. However, neither the mitochon- double-stranded DNA-binding domains, is one of the
drial, nor the nuclear genome, appears to have retained key protein domains in the transcriptional apparatus
the specific transcription factors might have been inher- of all extant organisms. With the ‘‘hindsight’’ of over
ited from the respective prokaryotic precursors. Instead, two decades of investigations since the discovery of the
the principal pan-eukaryotic HTH transcription factor domain we attempt to provide a synthetic overview of
families, like Bright and MYB domains, are only dis- the natural history of the HTH domain from the view-
tantly related to the prokaryotic counterparts. The ori- point of comparative genomics. Despite the HTH being
gin of the eukaryotes saw the emergence of a distinct a rather simple structural scaffold, it is observed to be
nuclear compartment, extensive RNA-dependent post- capable of considerable structural variety and functional
transcriptional gene regulation, a complex chromatin versatility, while still preserving a core set of correlated
structure and the proliferation of enzymatic complexes structure–function features. Most HTH domains, de-
involved in chromatin dynamics, such as the Swi2 ATP- spite their structural diversity, participate in a variety
ases, acetylases and deacetylases [181]. The compart- of functions that depend on their DNA-binding proper-
mentalization of the cell probably rendered the ties. These include their central role in mediating the
prokaryote-type one-component systems ineffective in substrate interactions of various enzymes that operate
the eukaryotes. Furthermore, the regulators of chroma- on DNA, and their role as both basal and specific tran-
tin dynamics, such as the histone deacetylases and asso- scription regulators. Thus, the HTH domains are the
ciated Swi2 ATPases took up the role of transcriptional predominant transcription factors in all prokaryotic
repressors [181], and probably resulted in the ancestral organisms and the more complex eukaryotes, such as
prokaryote-type repressors becoming superfluous. the plants, animals and fungi. Beyond these DNA-
Hence, the origin of the eukaryotes was probably binding functions, the HTH domains have been
accompanied by a massive loss of the prokaryote-type recruited on multiple occasions in a RNA-binding
transcription factors, along with the innovation through capacity and as mediators of protein–protein interac-
rapid sequence divergence of new versions that suited tions. The last universal common ancestor already had
the eukaryotic milieu. Some of the HTH domains inher- anywhere between 6 and 11 distinct versions of the
ited from the ancestral prokaryotic genomes were also HTH domain, which covered much of the structural
reused by the eukaryotes as adaptor modules in signal- diversity, and at least some of the functional diversity
ing systems un-related to DNA-binding or transcription seen in the extant versions of the domain. Though sev-
regulation. However, many versions inherited from the eral families of specific transcription factors are shared
archaea appear to have persisted in the basal transcrip- by the two prokaryotic kingdoms and may even go back
tion machinery, where they were indispensable for tran- to their common ancestor, the HTH proteins in the
scription of the nuclear genome. basal transcription factors do not appear to be ortholo-
Finally, the rise of organizational complexity in gous. This presents an interesting evolutionary conun-
plant, animal and fungal lineages went hand-in-hand drum, whose solution might emerge from new data on
with the emergence of new specific transcriptional regu- alternative transcription and replication systems, like
lators. Some were drawn from pre-existing HTH fami- those in viruses and other selfish elements [75,156].
lies, like the Myb domain, while others like the HSF, The HTH domain occurs frequently in modular pro-
Homeo, Pou, Pipsqueak and Paired families arose teins, whose domain architectures are often correlated
through rapid divergence from different sources. The with the general functional properties of the protein.
HTH domains of transposons provided the source mate- In prokaryotes the dominant domain architecture is
rial for some of these domains, whereas other might the one-component system that combines the HTH with
have diverged rapidly from laterally transferred pro- a sensor domain. While many of the sensor domains of
karyotic transcription factors. Thus, on one hand pro- commonly known one-component systems have been
karyotes appear to share a sizeable common set of characterized previously, several others remain structur-
transcription factors, whose phyletic patterns are chiefly ally and functionally unexplored, and suggest a new
governed by the lateral transfers and gene losses acting direction for exploring the intricacies of biological sen-
over and above a basic signal of vertical inheritance. sors in prokaryotic systems. The extensive use of diver-
On the other hand, the eukaryotes share only a few un- gent HTH domains in protein–protein interactions,
ique ancient DNA-binding domains, and their transcrip- especially in eukaryotes, is another area that might
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 257
develop further in the future as the actual mechanistic inferred from its homology with cro repressor. Proc. Natl. Acad.
details of such interactions become clearer. In prokary- Sci. USA 79, 1428–1432.
[14] Gribskov, M. and Burgess, R.R. (1986) Sigma factors from E.
otic systems the wealth of sequence and structure data coli, B. subtilis, phage SP01, and phage T4 are homologous
might finally allow us to investigate some of the more proteins. Nucl. Acids Res. 14, 6745–6763.
difficult problems such as, the overall transcriptional [15] Yura, T., Tobe, T., Ito, K. and Osawa, T. (1984) Heat shock
regulatory network of organisms, and the details of regulatory gene (htpR) of Escherichia coli is required for growth
how target DNA sequence and ligand specificity are at high temperature but is dispensable at low temperature. Proc.
Natl. Acad. Sci. USA 81, 6803–6807.
achieved by transcriptional regulators. We hope that [16] Landick, R., Vaughn, V., Lau, E.T., VanBogelen, R.A., Erick-
the overview presented by us will provide a framework son, J.W. and Neidhardt, F.C. (1984) Nucleotide sequence of the
for such future investigations. heat shock regulatory gene of E. coli suggests its protein product
may be a transcription factor. Cell 38, 175–182.
[17] Frampton, J., Leutz, A., Gibson, T. and Graf, T. (1989) DNA-
binding domain ancestry. Nature 342, 134.
8. Supplementary material [18] Otting, G., Qian, Y.Q., Muller, M., Affolter, M., Gehring, W.
and Wuthrich, K. (1988) Secondary structure determination for
A complete list of gis of the HTH domains detected in the Antennapedia homeodomain by nuclear magnetic resonance
and evidence for a helix-turn-helix motif. EMBO J. 7, 4305–
183 completely sequenced prokaryotic genomes, and 4309.
alignments of major families will be available by ftp. [19] Brennan, R.G. and Matthews, B.W. (1989) The helix-turn-helix
ftp.ncbi.nih.gov/pub/aravind/. DNA binding motif. J. Biol. Chem. 264, 1903–1906.
[20] Dodd, I.B. and Egan, J.B. (1990) Improved detection of helix-
turn-helix DNA-binding motifs in protein sequences. Nucl.
Acids Res. 18, 5019–5026.
References [21] Dodd, I.B. and Egan, J.B. (1987) Systematic method for the
detection of potential lambda Cro-like DNA-binding regions in
[1] Ptashne, M. (2004) Genetic Switch: Phage Lambda Revisited. proteins. J. Mol. Biol. 194, 557–564.
Cold Spring Harbor Laboratory Press, New York. [22] Schultz, S.C., Shields, G.C. and Steitz, T.A. (1991) Crystal
[2] Tahirov, T.H., Temiakov, D., Anikin, M., Patlan, V., McAll- structure of a CAP-DNA complex: the DNA is bent by 90.
ister, W.T., Vassylyev, D.G. and Yokoyama, S. (2002) Structure Science 253, 1001–1007.
of a T7 RNA polymerase elongation complex at 2.9 Å [23] Wilson, K.P., Shewchuk, L.M., Brennan, R.G., Otsuka, A.J. and
resolution. Nature 420, 43–50. Matthews, B.W. (1992) Escherichia coli biotin holoenzyme
[3] Zhang, G., Campbell, E.A., Minakhin, L., Richter, C., Sever- synthetase/bio repressor crystal structure delineates the biotin-
inov, K. and Darst, S.A. (1999) Crystal structure of Thermus and DNA-binding domains. Proc. Natl. Acad. Sci. USA 89,
aquaticus core RNA polymerase at 3.3 Å resolution. Cell 98, 9257–9261.
811–824. [24] Brennan, R.G. (1993) The winged-helix DNA-binding motif:
[4] Darst, S.A., Polyakov, A., Richter, C. and Zhang, G. (1998) another helix-turn-helix takeoff. Cell 74, 773–776.
Structural studies of Escherichia coli RNA polymerase. Cold [25] Clark, K.L., Halay, E.D., Lai, E. and Burley, S.K. (1993) Co-
Spring Harb. Symp. Quant. Biol. 63, 269–276. crystal structure of the HNF-3/fork head DNA-recognition
[5] Haldenwang, W.G. (1995) The sigma factors of Bacillus subtilis. motif resembles histone H5. Nature 364, 412–420.
Microbiol. Rev. 59, 1–30. [26] Ramakrishnan, V., Finch, J.T., Graziano, V., Lee, P.L. and
[6] Stragier, P. and Losick, R. (1990) Cascades of sigma factors Sweet, R.M. (1993) Crystal structure of globular domain of
revisited. Mol. Microbiol. 4, 1801–1806. histone H5 and its implications for nucleosome binding. Nature
[7] Kornberg, R.D. (1999) Eukaryotic transcriptional control. 362, 219–223.
Trends Cell Biol. 9, 46–49. [27] Swindells, M.B. (1995) Identification of a common fold in the
[8] Kornberg, R.D. (1998) Mechanism and regulation of yeast RNA replication terminator protein suggests a possible mode for DNA
polymerase II transcription. Cold Spring Harb. Symp. Quant. binding. Trends Biochem. Sci. 20, 300–302.
Biol. 63, 229–232. [28] Kodandapani, R., Pio, F., Ni, C.Z., Piccialli, G., Klemsz, M.,
[9] Ohlendorf, D.H., Anderson, W.F. and Matthews, B.W. (1983) McKercher, S., Maki, R.A. and Ely, K.R. (1996) A new pattern
Many gene-regulatory proteins appear to have a similar a-helical for helix-turn-helix recognition revealed by the PU.1 ETS-
fold that binds DNA and evolved from a common precursor. J. domain-DNA complex. Nature 380, 456–460.
Mol. Evol. 19, 109–114. [29] Gajiwala, K.S. and Burley, S.K. (2000) Winged helix proteins.
[10] Ohlendorf, D.H., Anderson, W.F., Fisher, R.G., Takeda, Y. and Curr. Opin. Struct. Biol. 10, 110–116.
Matthews, B.W. (1982) The molecular basis of DNA–protein [30] Aravind, L. and Koonin, E.V. (1999) DNA-binding proteins and
recognition inferred from the structure of cro repressor. Nature evolution of transcription regulation in the archaea. Nucl. Acids
298, 718–723. Res. 27, 4658–4670.
[11] Sauer, R.T., Yocum, R.R., Doolittle, R.F., Lewis, M. and Pabo, [31] Bell, S.D. and Jackson, S.P. (2001) Mechanism and regulation of
C.O. (1982) Homology among DNA-binding proteins suggests transcription in archaea. Curr. Opin. Microbiol. 4, 208–213.
use of a conserved super-secondary structure. Nature 298, 447– [32] Makarova, K.S., Aravind, L., Galperin, M.Y., Grishin, N.V.,
451. Tatusov, R.L., Wolf, Y.I. and Koonin, E.V. (1999) Comparative
[12] Steitz, T.A., Ohlendorf, D.H., McKay, D.B., Anderson, W.F. genomics of the Archaea (Euryarchaeota): evolution of con-
and Matthews, B.W. (1982) Structural similarity in the DNA- served protein families, the stable core, and the variable shell.
binding domains of catabolite gene activator and cro repressor Genome Res. 9, 608–628.
proteins. Proc. Natl. Acad. Sci. USA 79, 3097–3100. [33] Bell, S.D. and Jackson, S.P. (1998) Transcription and translation
[13] Matthews, B.W., Ohlendorf, D.H., Anderson, W.F. and Takeda, in Archaea: a mosaic of eukaryal and bacterial features. Trends
Y. (1982) Structure of the DNA-binding region of lac repressor Microbiol. 6, 222–228.
258 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
[34] Rivera, M.C., Jain, R., Moore, J.E. and Lake, J.A. (1998) [53] Cai, M., Zheng, R., Caffrey, M., Craigie, R., Clore, G.M. and
Genomic evidence for two functionally distinct gene classes. Gronenborn, A.M. (1997) Solution structure of the N-terminal
Proc. Natl. Acad. Sci. USA 95, 6239–6244. zinc binding domain of HIV-1 integrase. Nat. Struct. Biol. 4,
[35] Koonin, E.V., Tatusov, R.L. and Rudd, K.E. (1995) Sequence 567–577.
similarity analysis of Escherichia coli proteins: functional and [54] Mackereth, C.D., Arrowsmith, C.H., Edwards, A.M. and
evolutionary implications. Proc. Natl. Acad. Sci. USA 92, McIntosh, L.P. (2000) Zinc-bundle structure of the essential
11921–11925. RNA polymerase subunit RPB10 from Methanobacterium ther-
[36] Korner, H., Sofia, H.J. and Zumft, W.G. (2003) Phylogeny of moautotrophicum. Proc. Natl. Acad. Sci. USA 97, 6316–6321.
the bacterial superfamily of Crp-Fnr transcription regulators: [55] Iwahara, J. and Clubb, R.T. (1999) Solution structure of the
exploiting the metabolic spectrum by controlling alternative gene DNA binding domain from Dead ringer, a sequence-specific AT-
programs. FEMS Microbiol. Rev. 27, 559–592. rich interaction domain (ARID). EMBO J. 18, 6084–6094.
[37] Brown, N.L., Stoyanov, J.V., Kidd, S.P. and Hobman, J.L. [56] Zhao, H., Msadek, T., Zapf, J., Madhusudan, Hoch, J.A. and
(2003) The MerR family of transcriptional regulators. FEMS Varughese, K.I. (2002) DNA complexed structure of the key
Microbiol. Rev. 27, 145–163. transcription factor initiating development in sporulating bacte-
[38] Rigali, S.b., Derouaux, A., Giannotta, F. and Dusart, J. (2002) ria. Structure (Camb) 10, 1041–1050.
Subdivision of the helix-turn-helix GntR family of bacterial [57] Khare, D., Ziegelin, G.n., Lanka, E. and Heinemann, U. (2004)
regulators in the FadR, HutC, MocR, and YtrA subfamilies. J. Sequence-specific DNA binding determined by contacts outside
Biol. Chem. 277, 12507–12515. the helix-turn-helix motif of the ParB homolog KorB. Nat.
[39] Gallegos, M.T., Schleif, R., Bairoch, A., Hofmann, K. and Struct. Mol. Biol. 11, 656–663.
Ramos, J.L. (1997) Arac/XylS family of transcriptional regula- [58] Campos, A., Zhang, R.G., Alkire, R.W., Matsumura, P. and
tors. Microbiol. Mol. Biol. Rev. 61, 393–410. Westbrook, E.M. (2001) Crystal structure of the global regulator
[40] Weickert, M.J. and Adhya, S. (1992) A family of bacterial FlhD from Escherichia coli at 1.8 Å resolution. Mol. Microbiol.
regulators homologous to Gal and Lac repressors. J. Biol. Chem. 39, 567–580.
267, 15869–15874. [59] Schumacher, M.A., Lau, A.O.T. and Johnson, P.J. (2003)
[41] Gehring, W.J., Affolter, M. and Burglin, T. (1994) Homeodo- Structural basis of core promoter recognition in a primitive
main proteins. Annu. Rev. Biochem. 63, 487–526. eukaryote. Cell 115, 413–424.
[42] Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M. and [60] Liu, S., Widom, J., Kemp, C.W., Crews, C.M. and Clardy, J.
Teichmann, S.A. (2004) Structure and evolution of transcrip- (1998) Structure of human methionine aminopeptidase-2 com-
tional regulatory networks. Curr. Opin. Struct. Biol. 14, 283– plexed with fumagillin. Science 282, 1324–1327.
291. [61] Gomis-Rueth, F.X., Solã¡, M., Acebo, P., Pãrraga, A., Guasch,
[43] Teichmann, S.A. and Babu, M.M. (2004) Gene regulatory A., Eritja, R., Gonzalez, A., Espinosa, M., del Solar, G. and
network growth by duplication. Nat. Genet. 36, 492–496. Coll, M. (1998) The structure of plasmid-encoded transcriptional
[44] Selmer, M. and Su, X.-D. (2002) Crystal structure of an mRNA- repressor CopG unliganded and bound to its operator. EMBO J.
binding fragment of Moorella thermoacetica elongation factor 17, 7404–7415.
SelB. EMBO J. 21, 4145–4153. [62] Cordes, M.H., Walsh, N.P., McKnight, C.J. and Sauer, R.T.
[45] Wah, D.A., Hirsch, J.A., Dorner, L.F., Schildkraut, I. and (1999) Evolution of a protein fold in vitro. Science 284, 325–
Aggarwal, A.K. (1997) Structure of the multimodular endonu- 328.
clease FokI bound to DNA. Nature 388, 97–100. [63] Grishin, N.V. (2000) Two tricks in one bundle: helix-turn-helix
[46] Moore, M.H., Gulbis, J.M., Dodson, E.J., Demple, B. and gains enzymatic activity. Nucl. Acids Res. 28, 2229–2233.
Moody, P.C. (1994) Crystal structure of a suicidal DNA repair [64] Allen, M., Friedler, A., Schon, O. and Bycroft, M. (2002) The
protein: the Ada O6-methylguanine-DNA methyltransferase structure of an FF domain from human HYPA/FBP11. J. Mol.
from E. coli. EMBO J. 13, 1495–1501. Biol. 323, 411–416.
[47] Alfano, C., Sanfelice, D., Babon, J., Kelly, G., Jacks, A., Curry, [65] Das, A.K., Helps, N.R., Cohen, P.T. and Barford, D. (1996)
S. and Conte, M.R. (2004) Structural analysis of cooperative Crystal structure of the protein serine/threonine phosphatase 2C
RNA binding by the La motif and central RRM domain of at 2.0 Å resolution. EMBO J. 15, 6798–6809.
human La protein. Nat. Struct. Mol. Biol. 11, 323–329. [66] Aravind, L., Mazumder, R., Vasudevan, S. and Koonin, E.V.
[48] Dong, G., Chakshusmathi, G., Wolin, S.L. and Reinisch, K.M. (2002) Trends in protein evolution inferred from sequence and
(2004) Structure of the La motif: a winged helix domain mediates structure analysis. Curr. Opin. Struct. Biol. 12, 392–399.
RNA binding via a conserved aromatic patch. EMBO J. 23, [67] Lewis, J.D., Saperas, N.R., Song, Y., Zamora, M.J., Chiva, M.
1000–1007. and Ausió, J. (2004) Histone H1 and the origin of protamines.
[49] Wong, H.C., Mao, J., Nguyen, J.T., Srinivas, S., Zhang, W., Liu, Proc. Natl. Acad. Sci. USA 101, 4148–4152.
B., Li, L., Wu, D. and Zheng, J. (2000) Structural basis of the [68] Werhane, H., Lopez, P., Mendel, M., Zimmer, M., Ordal, G.W.
recognition of the dishevelled DEP domain in the Wnt signaling and Márquez-Magaña, L.M. (2004) The last gene of the fla/che
pathway. Nat. Struct. Biol. 7, 1178–1184. operon in Bacillus subtilis, ylxL, is required for maximal sigmaD
[50] Zheng, N., Schulman, B.A., Song, L., Miller, J.J., Jeffrey, P.D., function. J. Bacteriol. 186, 4025–4029.
Wang, P., Chu, C., Koepp, D.M., Elledge, S.J., Pagano, M., [69] Kearns, D.B., Chu, F., Rudner, R. and Losick, R. (2004) Genes
Conaway, R.C., Conaway, J.W., Harper, J.W. and Pavletich, governing swarming in Bacillus subtilis and evidence for a phase
N.P. (2002) Structure of the Cul1-Rbx1-Skp1-F boxSkp2 SCF variation mechanism controlling surface motility. Mol. Micro-
ubiquitin ligase complex. Nature 416, 703–709. biol. 52, 357–369.
[51] Guo, F., Gopaul, D.N. and van Duyne, G.D. (1997) Structure of [70] Campbell, E.A., Muzzin, O., Chlenov, M., Sun, J.L., Olson,
Cre recombinase complexed with DNA in a site-specific recom- C.A., Weinman, O., Trester-Zedlitz, M.L. and Darst, S.A. (2002)
bination synapse. Nature 389, 40–46. Structure of the bacterial RNA polymerase promoter specificity
[52] Otting, G., Qian, Y.Q., Billeter, M., Muller, M., Affolter, M., sigma subunit. Mol. Cell. 9, 527–539.
Gehring, W.J. and Wuthrich, K. (1990) Protein–DNA contacts [71] Aasland, R., Stewart, A.F. and Gibson, T. (1996) The SANT
in the structure of a homeodomain–DNA complex determined domain: a putative DNA-binding domain in the SWI-SNF and
by nuclear magnetic resonance spectroscopy in solution. EMBO ADA complexes, the transcriptional co-repressor N-CoR and
J. 9, 3085–3092. TFIIIB. Trends Biochem. Sci. 21, 87–88.
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 259
[72] Yang, H., Jeffrey, P.D., Miller, J., Kinnucan, E., Sun, Y., [88] Poon, K.K., Chu, J.C. and Wong, S.L. (2001) Roles of glucitol
Thoma, N.H., Zheng, N., Chen, P.-L., Lee, W.-H. and Pavletich, in the GutR-mediated transcription activation process in Bacillus
N.P. (2002) BRCA2 function in DNA binding and recombina- subtilis: glucitol induces GutR to change its conformation and to
tion from a BRCA2-DSS1-ssDNA structure. Science 297, 1837– bind ATP. J. Biol. Chem. 276, 29819–29825.
1848. [89] Lee, P.-C., Umeyama, T. and Horinouchi, S. (2002) afsS is a
[73] Iyer, L.M., Makarova, K.S., Koonin, E.V. and Aravind, L. target of AfsR, a transcriptional factor with ATPase activity that
(2004) Comparative genomics of the FtsK-HerA superfamily of globally controls secondary metabolism in Streptomyces coeli-
pumping ATPases: implications for the origins of chromosome color A(32). Mol. Microbiol. 43, 1413–1430.
segregation, cell division and viral capsid packaging. Nucl. Acids [90] Pao, G.M. and Saier, M.H. (1995) Response regulators of
Res. 32, 5260–5279. bacterial signal transduction systems: selective domain shuffling
[74] Aussel, L., Barre, F.X., Aroyo, M., Stasiak, A., Stasiak, A.Z. during evolution. J. Mol. Evol. 40, 136–154.
and Sherratt, D. (2002) FtsK Is a DNA motor protein that [91] West, A.H. and Stock, A.M. (2001) Histidine kinases and
activates chromosome dimer resolution by switching the cata- response regulator proteins in two-component signaling systems.
lytic state of the XerC and XerD recombinases. Cell 108, 195– Trends Biochem. Sci. 26, 369–376.
205. [92] Stulke, J. and Hillen, W. (1998) Coupling physiology and
[75] Giraldo, R. and Ferná ndez-Tresguerres, M.E. (2004) Twenty gene regulation in bacteria: the phosphotransferase sugar
years of the pPS10 replicon: insights on the molecular mecha- uptake system delivers the signals. Naturwissenschaften 85,
nism for the activation of DNA replication in iteron-containing 583–592.
bacterial plasmids. Plasmid 52, 69–83. [93] Stulke, J., Arnaud, M., Rapoport, G. and Martin-Verstraete, I.
[76] Berge, M., Mortier-Barriére, I., Martin, B. and Claverys, J.-P. (1998) PRD–a protein domain involved in PTS-dependent
(2003) Transformation of Streptococcus pneumoniae relies on induction and carbon catabolite repression of catabolic operons
DprA- and RecA-dependent protection of incoming DNA single in bacteria. Mol. Microbiol. 28, 865–874.
strands. Mol. Microbiol. 50, 527–536. [94] Hu, K.-Y. and Saier, M.H. (2002) Phylogeny of phosphoryl
[77] Pietrokovski, S. and Henikoff, S. (1997) A helix-turn-helix DNA- transfer proteins of the phosphoenolpyruvate-dependent sugar-
binding motif predicted for transposases of DNA transposons. transporting phosphotransferase system. Res. Microbiol. 153,
Mol. Gen. Genet. 254, 689–695. 405–415.
[78] Belova, G.I., Prasad, R., Nazimov, I.V., Wilson, S.H. and [95] Reizer, J. and Saier, M.H. (1997) Modular multidomain
Slesarev, A.I. (2002) The domain organization and properties of phosphoryl transfer proteins of bacteria. Curr. Opin. Struct.
individual domains of DNA topoisomerase V, a type 1B Biol. 7, 407–415.
topoisomerase with DNA repair activities. J. Biol. Chem. 277, [96] Tobisch, S., Stülke, J. and Hecker, M. (1999) Regulation of the
4959–4965. lic operon of Bacillus subtilis and characterization of potential
[79] Tobe, T., Sasakawa, C., Okada, N., Honma, Y. and Yoshikawa, phosphorylation sites of the LicR regulator protein by site-
M. (1992) vacB, a novel chromosomal gene required for directed mutagenesis. J. Bacteriol. 181, 4995–5003.
expression of virulence genes on the large plasmid of Shigella [97] Anantharaman, V., Koonin, E.V. and Aravind, L. (2001)
flexneri. J. Bacteriol. 174, 6359–6367. Regulatory potential, phyletic distribution and evolution of
[80] Angermayr, M., Roidl, A. and Bandlow, W. (2002) Yeast Rio1p ancient, intracellular small-molecule-binding domains. J. Mol.
is the founding member of a novel subfamily of protein serine Biol. 307, 1271–1292.
kinases involved in the control of cell cycle progression. Mol. [98] Hofmann, K. and Bucher, P. (1995) The FHA domain: a
Microbiol. 44, 309–324. putative nuclear signalling domain found in protein kinases and
[81] LaRonde-LeBlanc, N. and Wlodawer, A. (2004) Crystal struc- transcription factors. Trends Biochem. Sci. 20, 347–349.
ture of A. fulgidus Rio2 defines a new family of serine protein [99] Molle, V., Kremer, L., Girard-Blanc, C., Besra, G.S., Cozzone,
kinases. Structure (Cambridge) 12, 1585–1594. A.J. and Prost, J.-F.O. (2003) An FHA phosphoprotein recog-
[82] Peat, T.S., Frank, E.G., McDonald, J.P., Levine, A.S., Wood- nition domain mediates protein EmbR phosphorylation by
gate, R. and Hendrickson, W.A. (1996) Structure of the UmuD 0 PknH, a Ser/Thr protein kinase from Mycobacterium tuberculo-
protein and its regulation in response to DNA damage. Nature sis. Biochemistry 42, 15300–15309.
380, 727–730. [100] Taylor, B.L., Zhulin, I.B. and Johnson, M.S. (1999) Aerotaxis
[83] Savijoki, K., Ingmer, H., Frees, D., Vogensen, F.K., Palva, and other energy-sensing behavior in bacteria. Annu. Rev.
A. and Varmanen, P. (2003) Heat and DNA damage Microbiol. 53, 103–128.
induction of the LexA-like regulator HdiR from Lactococcus [101] Tyrrell, R., Verschueren, K.H., Dodson, E.J., Murshudov, G.N.,
lactis is mediated by RecA and ClpP. Mol. Microbiol. 50, Addy, C. and Wilkinson, A.J. (1997) The structure of the
609–621. cofactor-binding fragment of the LysR family member, CysB: a
[84] Zhang, X., Chaney, M., Wigneshweraraj, S.R., Schumacher, J., familiar fold with a surprising subunit arrangement. Structure 5,
Bordes, P., Cannon, W. and Buck, M. (2002) Mechanochemical 1017–1032.
ATPases and transcriptional activation. Mol. Microbiol. 45, [102] Tam, R. and Saier, M.H. (1993) Structural, functional, and
895–903. evolutionary relationships among extracellular solute-binding
[85] Leipe, D.D., Koonin, E.V. and Aravind, L. (2004) STAND, a receptors of bacteria. Microbiol. Rev. 57, 320–346.
class of P-loop NTPases including animal and plant regulators of [103] Vartak, N.B., Reizer, J., Reizer, A., Gripp, J.T., Groisman,
programmed cell death: multiple, complex domain architectures, E.A., Wu, L.F., Tomich, J.M. and Saier, M.H. (1991) Sequence
unusual phyletic patterns, and evolution by horizontal gene and evolution of the FruR protein of Salmonella typhimurium: a
transfer. J. Mol. Biol. 343, 1–28. pleiotropic transcriptional regulatory protein possessing both
[86] Wang, Y., Zhao, S., Somerville, R.L. and Jardetzky, O. (2001) activator and repressor functions which is homologous to the
Solution structure of the DNA-binding domain of the TyrR periplasmic ribose-binding protein. Res. Microbiol. 142, 951–
protein of Haemophilus influenzae. Protein Sci. 10, 592–598. 963.
[87] Larquet, E., Schreiber, V.r., Boisset, N. and Richet, E. (2004) [104] Ettema, T.J.G., Brinkman, A.B., Tani, T.H., Rafferty, J.B. and
Oligomeric assemblies of the Escherichia coli MalT transcrip- Van Der Oost, J. (2002) A novel ligand-binding domain involved
tional activator revealed by cryo-electron microscopy and image in regulation of amino acid metabolism in prokaryotes. J. Biol.
processing. J. Mol. Biol. 343, 1159–1169. Chem. 277, 37464–37468.
260 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
[105] Aravind, L. and Koonin, E.V. (1999) Gleaning non-trivial mechanism for catalysis and substrate binding by histone
structural, functional and evolutionary information about pro- acetyltransferases. Mol. Cell 6, 1195–1205.
teins by iterative database searches. J. Mol. Biol. 287, 1023–1040. [124] Zubieta, C., He, X.Z., Dixon, R.A. and Noel, J.P. (2001)
[106] Bull, P.C. and Cox, D.W. (1994) Wilson disease and Menkes Structures of two natural product methyltransferases reveal the
disease: new handles on heavy-metal transport. Trends Genet. basis for substrate specificity in plant O-methyltransferases. Nat.
10, 246–252. Struct. Biol. 8, 271–279.
[107] Dunwell, J.M., Culham, A., Carter, C.E., Sosa-Aguirre, C.R. [125] Kim, J. and Raushel, F.M. (2001) Allosteric control of the
and Goodenough, P.W. (2001) Evolution of functional diversity oligomerization of carbamoyl phosphate synthetase from Esch-
in the cupin superfamily. Trends Biochem. Sci. 26, 740–746. erichia coli. Biochemistry 40, 11030–11036.
[108] Bateman, A. (1997) The structure of a domain common to [126] Lawson, F.S., Charlebois, R.L. and Dillon, J.A. (1996) Phylo-
archaebacteria and the homocystinuria disease protein. Trends genetic analysis of carbamoylphosphate synthetase genes: com-
Biochem. Sci. 22, 12–13. plex evolutionary history includes an internal duplication within
[109] Heldwein, E.E. and Brennan, R.G. (2001) Crystal structure of a gene which can root the tree of life. Mol. Biol. Evol. 13, 970–
the transcription activator BmrR bound to DNA and a drug. 977.
Nature 409, 378–382. [127] Leipe, D.D., Aravind, L. and Koonin, E.V. (1999) Did DNA
[110] Aravind, L. and Anantharaman, V. (2003) HutC/FarR-like replication evolve twice independently. Nucl. Acids Res. 27,
bacterial transcription factors of the GntR family contain a 3389–3401.
small molecule-binding domain of the chorismate lyase fold. [128] Rubbi, L., Labarre-Mariotte, S., Chéldin, S. and Thuriaux, P.
FEMS Microbiol. Lett. 222, 17–23. (1999) Functional characterization of ABC10alpha, an essential
[111] Zhang, R.G., Andersson, C.E., Savchenko, A., Skarina, T., polypeptide shared by all three forms of eukaryotic DNA-
Evdokimova, E., Beasley, S., Arrowsmith, C.H., Edwards, dependent RNA polymerases. J. Biol. Chem. 274, 31485–31492.
A.M., Joachimiak, A. and Mowbray, S.L. (2003) Structure of [129] Gruber, T.M. and Gross, C.A. (2003) Multiple sigma subunits
Escherichia coli ribose-5-phosphate isomerase: a ubiquitous and the partitioning of bacterial transcription space. Annu. Rev.
enzyme of the pentose phosphate pathway and the Calvin cycle. Microbiol. 57, 441–466.
Structure (Cambridge) 11, 31–42. [130] Iyer, L.M., Koonin, E.V. and Aravind, L. (2004) Evolution of
[112] Ailion, M., Bobik, T.A. and Roth, J.R. (1993) Two global bacterial RNA polymerase: implications for large-scale bacterial
regulatory systems (Crp and Arc) control the cobalamin/ phylogeny, domain accretion, and horizontal gene transfer. Gene
propanediol regulon of Salmonella typhimurium. J. Bacteriol. 335, 73–88.
175, 7200–7208. [131] Morett, E. and Bork, P. (1998) Evolution of new protein
[113] Miras, I., Hermant, D., Arricau, N. and Popoff, M.Y. (1995) function: recombinational enhancer Fis originated by horizontal
Nucleotide sequence of iagA and iagB genes involved in invasion gene transfer from the transcriptional regulator NtrC. FEBS
of HeLa cells by Salmonella enterica subsp. enterica ser. Typhi. Lett. 433, 108–112.
Res. Microbiol. 146, 17–20. [132] Gruene, T., Brzeski, J., Eberharter, A., Clapier, C.R., Corona,
[114] Abergel, C., Bouveret, E., Claverie, J.M., Brown, K., Rigal, A., D.F.V., Becker, P.B. and Mueller, C.W. (2003) Crystal structure
Lazdunski, C. and Benedetti, H. (1999) Structure of the and functional analysis of a nucleosome recognition module of
Escherichia coli TolB protein determined by MAD methods at the remodeling factor ISWI. Mol. Cell. 12, 449–460.
1.95 Å resolution. Struct. Fold Des. 7, 1291–1300. [133] Juan Wu, L. and Errington, J. (2000) Identification and
[115] Carr, S., Penfold, C.N., Bamford, V., James, R. and Hemmings, characterization of a new prespore-specific regulatory gene,
A.M. (2000) The structure of TolB, an essential component of rsfA, of Bacillus subtilis. J. Bacteriol. 182, 418–424.
the tol-dependent translocation system, and its protein–protein [134] Izsak, Z., Khare, D., Behlke, J., Heinemann, U., Plasterk, R.H.
interaction with the translocation domain of colicin E9. Struct. and Ivics, Z.n. (2002) Involvement of a bifunctional, paired-like
Fold Des. 8, 57–66. DNA-binding domain and a transpositional enhancer in sleeping
[116] Walburger, A., Lazdunski, C. and Corda, Y. (2002) The Tol/Pal beauty transposition. J. Biol. Chem. 277, 34581–34588.
system function requires an interaction between the C-terminal [135] Czerny, T., Schaffner, G. and Busslinger, M. (1993) DNA
domain of TolA and the N-terminal domain of TolB. Mol. sequence recognition by Pax proteins: bipartite structure of the
Microbiol. 44, 695–708. paired domain and its binding site. Genes Dev. 7, 2048–2061.
[117] Hofmann, K. and Bucher, P. (1998) The PCI domain: a common [136] Tanaka, Y., Nureki, O., Kurumizaka, H., Fukai, S., Kawaguchi,
theme in three multiprotein complexes. Trends Biochem. Sci. 23, S., Ikuta, M., Iwahara, J., Okazaki, T. and Yokoyama, S. (2001)
204–205. Crystal structure of the CENP-B protein–DNA complex: the
[118] Aravind, L. and Ponting, C.P. (1998) Homologues of 26S DNA-binding domains of CENP-B induce kinks in the CENP-B
proteasome subunits are regulators of transcription and trans- box DNA. EMBO J. 20, 6612–6618.
lation. Protein Sci. 7, 1250–1254. [137] Smit, A.F. and Riggs, A.D. (1996) Tiggers and DNA transposon
[119] Wernimont, A. and Weissenhorn, W. (2004) Crystal structure of fossils in the human genome. Proc. Natl. Acad. Sci. USA 93,
subunit VPS25 of the endosomal trafficking complex ESCRT-II. 1443–1448.
BMC Struct. Biol. 4, 10. [138] Subramanian, G., Koonin, E.V. and Aravind, L. (2000) Com-
[120] Teo, H., Perisic, O., González, B. and Williams, R.L. (2004) parative genome analysis of the pathogenic spirochetes Borrelia
ESCRT-II, an endosome-associated complex required for pro- burgdorferi and Treponema pallidum. Infect. Immun. 68, 1633–
tein sorting: crystal structure and interactions with ESCRT-III 1648.
and membranes. Dev. Cell 7, 559–569. [139] Close, S.M. and Kado, C.I. (1992) A gene near the plasmid pSa
[121] Hierro, A., Sun, J., Rusnak, A.S., Kim, J., Prag, G., Emr, S.D. origin of replication encodes a nuclease. Mol. Microbiol. 6, 521–
and Hurley, J.H. (2004) Structure of the ESCRT-II endosomal 527.
trafficking complex. Nature 431, 221–225. [140] Anantharaman, V. and Aravind, L. (2003) New connections in
[122] Gibson, T.J., Thompson, J.D., Blocker, A. and Kouzarides, T. the prokaryotic toxin-antitoxin network: relationship with the
(1994) Evidence for a protein domain superfamily shared by the eukaryotic nonsense-mediated RNA decay system. Genome
cyclins, TFIIB and RB/p107. Nucl. Acids Res. 22, 946–952. Biol., 4.
[123] Yan, Y., Barlev, N.A., Haley, R.H., Berger, S.L. and Marmor- [141] Wood, H.E., Devine, K.M. and McConnell, D.J. (1990)
stein, R. (2000) Crystal structure of yeast Esa1 suggests a unified Characterisation of a repressor gene (xre) and a temperature-
L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262 261
sensitive allele from the Bacillus subtilis prophage, PBSX. Gene [157] Anantharaman, V., Koonin, E.V. and Aravind, L. (2002)
96, 83–88. Comparative genomics and evolution of proteins involved in
[142] Takemaru, K.i., Li, F.Q., Ueda, H. and Hirose, S. (1997) RNA metabolism. Nucl. Acids Res. 30, 1427–1464.
Multiprotein bridging factor 1 (MBF1) is an evolutionarily [158] Spinelli, S.L., Kierzek, R., Turner, D.H. and Phizicky, E.M.
conserved transcriptional coactivator that connects a regulatory (1999) Transient ADP-ribosylation of a 2 0 -phosphate implicated
factor and TATA element-binding protein. Proc. Natl. Acad. in its removal from ligated tRNA during splicing in yeast. J.
Sci. USA 94, 7251–7256. Biol. Chem. 274, 2637–2644.
[143] Rhee, S., Martin, R.G., Rosner, J.L. and Davies, D.R. (1998) A [159] Zheng, N., Fraenkel, E., Pabo, C.O. and Pavletich, N.P. (1999)
novel DNA-binding motif in MarA: the first structure for an Structural basis of DNA recognition by the heterodimeric cell
AraC family transcriptional activator. Proc. Natl. Acad. Sci. cycle transcription factor E2F-DP. Genes Dev. 13, 666–674.
USA 95, 10413–10418. [160] Groft, C.M., Uljon, S.N., Wang, R. and Werner, M.H. (1998)
[144] Brickman, T.J., Kang, H.Y. and Armstrong, S.K. (2001) Structural homology between the Rap30 DNA-binding domain
Transcriptional activation of Bordetella alcaligin siderophore and linker histone H5: implications for preinitiation complex
genes requires the AlcR regulator with alcaligin as inducer. J. assembly. Proc. Natl. Acad. Sci. USA 95, 9117–9122.
Bacteriol. 183, 483–489. [161] Meinhart, A., Blobel, J. and Cramer, P. (2003) An extended
[145] Fujikawa, N., Kurumizaka, H., Nureki, O., Terada, T., Shir- winged helix domain in general transcription factor E/IIE alpha.
ouzu, M., Katayama, T. and Yokoyama, S. (2003) Structural J. Biol. Chem. 278, 48267–48274.
basis of replication origin recognition by the DnaA protein. [162] Landsman, D. and Wolffe, A.P. (1995) Common sequence and
Nucl. Acids Res. 31, 2077–2086. structural features in the heat-shock factor and Ets families of
[146] Messer, W. and Weigel, C. (2003) DnaA as a transcription DNA-binding domains. Trends Biochem. Sci. 20, 225–226.
regulator. Meth. Enzymol. 370, 338–349. [163] Mackereth, C.D., Scharpf, M., Gentile, L.N., MacIntosh, S.E.,
[147] Wilsker, D., Patsialou, A., Dallas, P.B. and Moran, E. (2002) Slupsky, C.M. and McIntosh, L.P. (2004) Diversity in structure
ARID proteins: a diverse family of DNA binding proteins and function of the Ets family PNT domains. J. Mol. Biol. 342,
implicated in the control of cell growth, differentiation, and 1249–1264.
development. Cell Growth Differ. 13, 95–106. [164] Dou, X., Limmer, S. and Kreutzer, R. (2001) DNA-binding of
[148] Yamada, K., Miyata, T., Tsuchiya, D., Oyama, T., Fujiwara, Y., phenylalanyl-tRNA synthetase is accompanied by loop forma-
Ohnishi, T., Iwasaki, H., Shinagawa, H., Ariyoshi, M., Mayan- tion of the double-stranded DNA. J. Mol. Biol. 305, 451–458.
agi, K. and Morikawa, K. (2002) Crystal structure of the RuvA– [165] Laursen, B.S.g., Mortensen, K.K., Sperling-Petersen, H.U. and
RuvB complex: a structural basis for the Holliday junction Hoffman, D.W. (2003) A conserved structural motif at the N
migrating motor machinery. Mol. Cell 10, 671–681. terminus of bacterial translation initiation factor IF2. J. Biol.
[149] Soppa, J.r., Kobayashi, K., Noirot-Gros, M.-F.o., Oesterhelt, Chem. 278, 16320–16328.
D., Ehrlich, S.D., Dervyn, E., Ogasawara, N. and Moriya, S. [166] Sam, M.D., Cascio, D., Johnson, R.C. and Clubb, R.T. (2004)
(2002) Discovery of two novel families of proteins that are Crystal structure of the excisionase–DNA complex from bacte-
proposed to interact with prokaryotic SMC proteins, and riophage lambda. J. Mol. Biol. 338, 229–240.
characterization of the Bacillus subtilis family members ScpA [167] Wojciak, J.M., Iwahara, J. and Clubb, R.T. (2001) The Mu
and ScpB. Mol. Microbiol. 45, 59–71. repressor–DNA complex contains an immobilized ÔwingÕ within
[150] Mascarenhas, J., Soppa, J.R., Strunnikov, A.V. and Graumann, the minor groove. Nat. Struct. Biol. 8, 84–90.
P.L. (2002) Cell cycle-dependent localization of two novel [168] de Beer, T., Fang, J., Ortega, M., Yang, Q., Maes, L., Duffy, C.,
prokaryotic chromosome segregation and condensation proteins Berton, N., Sippy, J., Overduin, M., Feiss, M. and Catalano,
in Bacillus subtilis that interact with SMC protein. EMBO J. 21, C.E. (2002) Insights into specific DNA recognition during the
3108–3118. assembly of a viral genome packaging machine. Mol. Cell 9,
[151] Schwartz, T., Rould, M.A., Lowenhaupt, K., Herbert, A. and 981–991.
Rich, A. (1999) Crystal structure of the Zalpha domain of the [169] Ikegami, T., Kuraoka, I., Saijo, M., Kodo, N., Kyogoku, Y.,
human editing enzyme ADAR1 bound to left-handed Z-DNA. Morikawa, K., Tanaka, K. and Shirakawa, M. (1998) Solution
Science 284, 1841–1845. structure of the DNA- and RPA-binding domain of the human
[152] Schade, M., Turner, C.J., Lowenhaupt, K., Rich, A. and repair factor XPA. Nat. Struct. Biol. 5, 701–706.
Herbert, A. (1999) Structure–function analysis of the Z-DNA- [170] Agrawal, R.K., Linde, J., Sengupta, J., Nierhaus, K.H. and
binding domain Zalpha of dsRNA adenosine deaminase type I Frank, J. (2001) Localization of L11 protein on the ribosome
reveals similarity to the (a + b) family of helix-turn-helix and elucidation of its involvement in EF-G-dependent translo-
proteins. EMBO J. 18, 470–479. cation. J. Mol. Biol. 311, 777–787.
[153] Giraldo, R. and Diaz-Orejas, R. (2001) Similarities between the [171] van Nimwegen, E. (2003) Scaling laws in the functional content
DNA replication initiators of Gram-negative bacteria plasmids of genomes. Trends Genet. 19, 479–484.
(RepA) and eukaryotes (Orc4p)/archaea (Cdc6p). Proc. Natl. [172] Oshima, K., Kakizawa, S., Nishigawa, H., Jung, H.-Y., Wei, W.,
Acad. Sci. USA 98, 4938–4943. Suzuki, S., Arashida, R., Nakata, D., Miyata, S.-i., Ugaki, M.
[154] Liu, J., Smith, C.L., DeRyckere, D., DeAngelis, K., Martin, and Namba, S. (2004) Reductive evolution suggested from the
G.S. and Berger, J.M. (2000) Structure and function of Cdc6/ complete genome sequence of a plant-pathogenic phytoplasma.
Cdc18: implications for origin recognition and checkpoint Nat. Genet. 36, 27–29.
control. Mol. Cell 6, 637–648. [173] Lee, I.-M., Zhao, Y. and Bottner, K.D. (2003) Identification of
[155] Emery, P., Strubin, M., Hofmann, K., Bucher, P., Mach, B. and putative insertion sequence (IS) associated with members of the
Reith, W. (1996) A consensus motif in the RFX DNA binding aster yellows (AY) phytoplasma group. Phytopathology, 93.
domain and binding domain mutants with altered specificity. [174] Templeton, T.J., Iyer, L.M., Anantharaman, V., Enomoto, S.,
Mol. Cell Biol. 16, 4486–4494. Abrahante, J.E., Subramanian, G.M., Hoffman, S.L., Abraham-
[156] Yeo, H.-J., Ziegelin, G.N., Korolev, S., Calendar, R., Lanka, E. sen, M.S. and Aravind, L. (2004) Comparative analysis of
and Waksman, G. (2002) Phage P4 origin-binding domain apicomplexa and genomic diversity in eukaryotes. Genome Res.
structure reveals a mechanism for regulation of DNA-binding 14, 1686–1695.
activity by homo- and heterodimerization of winged helix [175] Bartel, D.P. (2004) MicroRNAs: genomics, biogenesis, mecha-
proteins. Mol. Microbiol. 43, 855–867. nism, and function. Cell 116, 281–297.
262 L. Aravind et al. / FEMS Microbiology Reviews 29 (2005) 231–262
[176] Mattick, J.S. (2001) Non-coding RNAs: the architects of RNA polymerases and eukaryotic RNA-dependent RNA poly-
eukaryotic complexity. EMBO Rep. 2, 986–991. merases and the origin of RNA polymerases. BMC Struct. Biol.
[177] Boucher, Y., Douady, C.J., Papke, R.T., Walsh, D.A., Boud- 3, 1.
reau, M.E.R., NesbÃß, C.L., Case, R.J. and Doolittle, W.F. [180] Brinkman, A.B., Bell, S.D., Lebbink, R.J., de Vos, W.M. and
(2003) Lateral gene transfer and the origins of prokaryotic van der Oost, J. (2002) The Sulfolobus solfataricus Lrp-like
groups. Annu. Rev. Genet. 37, 283–328. protein LysM regulates lysine biosynthesis in response to lysine
[178] Edgell, D.R. and Doolittle, W.F. (1997) Archaea and the availability. J. Biol. Chem. 277, 29537–29549.
origin(s) of DNA replication proteins. Cell 89, 995–998. [181] Grewal, S.I.S. and Moazed, D. (2003) Heterochromatin
[179] Iyer, L.M., Koonin, E.V. and Aravind, L. (2003) Evolutionary and epigenetic control of gene expression. Science 301,
connection between the catalytic subunits of DNA-dependent 798–802.