Papers by Patrick Schnable

Plant Journal, 2010
Sequence capture technologies, pioneered in mammalian genomes, enable the resequencing of targete... more Sequence capture technologies, pioneered in mammalian genomes, enable the resequencing of targeted genomic regions. Most capture protocols require blocking DNA, the production of which in large quantities can prove challenging. A blocker-free, two-stage capture protocol was developed using NimbleGen arrays. The first capture depletes the library of repetitive sequences, while the second enriches for target loci. This strategy was used to resequence non-repetitive portions of an approximately 2.2 Mb chromosomal interval and a set of 43 genes dispersed in the 2.3 Gb maize genome. This approach achieved approximately 1800–3000-fold enrichment and 80–98% coverage of targeted bases. More than 2500 SNPs were identified in target genes. Low rates of false-positive SNP predictions were obtained, even in the presence of captured paralogous sequences. Importantly, it was possible to recover novel sequences from non-reference alleles. The ability to design novel repeat-subtraction and target capture arrays makes this technology accessible in any species.

Homologous Recombination in Maize
We have divided this chapter into two major sections: somatic and meiotic recombination. Somatic ... more We have divided this chapter into two major sections: somatic and meiotic recombination. Somatic recombination in plants has been mostly monitored with artificial recombination substrates in transgenic systems. Although, in this area, maize has lagged behind other plants that can be more easily transformed, excellent progress has been achieved recently, as detailed in the first section. Specific topics discussed in this section are site-specific and targeted recombination. Research on meiotic recombination, particularly intragenic recombination, has been historically strong in maize relative to other plants, principally because the maize endosperm provides distinct advantages as an experimental unit of observation for recombination studies. It is, at the same time, large enough so that many traits can be scored and small enough so that many kernels can be screened. Many of the genes utilized in meiotic recombi-national analyses affect anthocyanin pigmentation in the aleurone layer of the endosperm, as will be evident in the second section. In this section we discuss the distribution of recombination junctions at the genomic, regional, and genic levels, as well as modifiers that affect that distribution. We consider the special case of tandem duplications and gene families as recombination substrates and discuss how recombination has been used as a tool in the genetic analysis of paramutation and disease resistance.

Mutations at the glossyl (gll) locus of maize (Zea mays L.) quantitatively and qualitatively affe... more Mutations at the glossyl (gll) locus of maize (Zea mays L.) quantitatively and qualitatively affect the deposition of cuticular waxes on the surface of seedling leaves. l h e gll locus has been molecularly cloned by transposon tagging with the Mutafor transposon system. l h e epi23 cDNA was isolated by subtractive hybridization as an epidermis-specific mRNA from Senerio odora (Kleinia odora). l h e deduced amino acid sequence of the GL1 and EP123 proteins are very similar to each other and to two other plant proteins i n which the sequences were deduced from their respective mRNAs. These are the Arabidopsis CERl protein, which is involved in cuticular wax deposition on diques, stems, and leaves of that plant, and the protein coded by the rice expressed sequence tag RICS2751A. All four proteins are predicted to be localized in a membrane via a common NH,-terminal domain, which consists of either five or seven membrane-spanning helices. l h e COOHterminal portion of each of these proteins, although less conserved, is predicted to be a water-soluble, globular domain. These sequence similarities indicate that these plant orthologs may belong to a superfamily of membrane-bound receptors that have been extensively characterized from animals, including the HIV co-receptor fusin (also termed CXCR4).
Trends in Plant Science, 2011
2010 marks the 10th anniversary of the completion of the first plant genome sequence (Arabidopsis... more 2010 marks the 10th anniversary of the completion of the first plant genome sequence (Arabidopsis thaliana). Triggered by advancements in sequencing technologies, many crop genome sequences have been produced, with eight published since 2008. To date, however, only the rice (Oryza sativa) genome sequence has been finished to a quality level similar to that of the Arabidopsis sequence. This trend to produce draft genomes could affect the ability of researchers to address biological questions of speciation and recent evolution or to link sequence variation accurately to phenotypes. Here, we review the current crop genome sequencing activities, discuss how variability in sequence quality impacts utility for different studies and provide a perspective for a paradigm shift in selecting crops for sequencing in the future.
BMC Biochemistry, 2009
Background: Eukaryotic aldehyde dehydrogenases (ALDHs, EC 1.2.1), which oxidize aldehydes into ca... more Background: Eukaryotic aldehyde dehydrogenases (ALDHs, EC 1.2.1), which oxidize aldehydes into carboxylic acids, have been classified into more than 20 families. In mammals, Family 2 ALDHs detoxify acetaldehyde. It has been hypothesized that plant Family 2 ALDHs oxidize acetaldehyde generated via ethanolic fermentation, producing acetate for acetyl-CoA biosynthesis via acetyl-CoA synthetase (ACS), similar to the yeast pathway termed the "pyruvate dehydrogenase (PDH) bypass". Evidence for this pathway in plants has been obtained from pollen.
BMC Bioinformatics, 2009
Background: Few microarrays have been quantitatively calibrated to identify optimal hybridization... more Background: Few microarrays have been quantitatively calibrated to identify optimal hybridization conditions because it is difficult to precisely determine the hybridization characteristics of a microarray using biologically variable cDNA samples.

Plant Journal, 2008
Vegetative phase change is the developmental transition from the juvenile phase to the adult phas... more Vegetative phase change is the developmental transition from the juvenile phase to the adult phase in which a plant becomes competent for sexual reproduction. The gain of ability to flower is often accompanied by changes in patterns of differentiation in newly forming vegetative organs. In maize, juvenile leaves differ from adult leaves in morphology, anatomy and cell wall composition. Whereas the normal sequence of juvenile followed by adult is repeated with every sexual generation, this sequence can be altered in maize by the isolation and culture of the shoot apex from an adult phase plant: an 'adult' meristem so treated reverts to forming juvenile vegetative organs. To begin to unravel the as-yet poorly understood molecular mechanisms underlying phase change in maize, we compared gene expression in two juvenile sample types, leaf 4 and culture-derived leaves 3 or 4, with an adult sample type (leaf 9) using cDNA microarrays. All samples were leaf primordia at plastochron 6. A gene was scored as 'phase induced' if it was up-or downregulated in both juvenile sample types, compared with the adult sample type, with at least a twofold change in gene expression at a P-value of £0.005. Some 221 expressed sequence tags (ESTs) were upregulated in juveniles, and 28 ESTs were upregulated in adults. The largest class of juvenile-induced genes was comprised of those involved in photosynthesis, suggesting that maize plants are primed for energy production early in vegetative growth by the developmental induction of photosynthetic genes.

Assembly of Large Genomes from Paired Short Reads
The de novo assembly of genomes from high-throughput short reads is an active area of research. S... more The de novo assembly of genomes from high-throughput short reads is an active area of research. Several promising methods have been recently developed, with applicability largely restricted to the smaller and less complex bacterial genomes. In this paper, we present a method for assembling large genomes from high-coverage paired short reads. Our method exploits large distributed memory and parallelism available on multiprocessor systems to handle memory-intensive phases of the algorithm, effectively allowing scaling to large genomes. We present parallel algorithms to construct a bidirected string graph that is several orders of magnitude smaller than the raw sequence data and to extract features from paired reads. We also present a heuristic method that uses these features to guide the extension of partial graph traversals corresponding to large genomic contigs. In addition, we propose a simple model for error correction and derive a lower bound on the coverage needed for its use. We present a validation of our framework with short reads from D. melanogaster and S. cervisiae synthetically generated at 300-fold coverage. Assembly of the D. melanogaster genome resulted in large contigs (50% of the genome covered by contigs larger than 102Kb), accurate to 99.9% of the bases, in under 4 hours of wall clock time on a 512-node Blue Gene/L.

Genetics, 2009
Digestion-ligation-amplification (DLA), a novel adaptor-mediated PCR-based method that uses a sin... more Digestion-ligation-amplification (DLA), a novel adaptor-mediated PCR-based method that uses a single-stranded oligo as the adaptor, was developed to overcome difficulties of amplifying unknown sequences flanking known DNA sequences in large genomes. DLA specifically overcomes the problems associated with existing methods for amplifying genomic sequences flanking Mu transposons, including high levels of nonspecific amplification. Two DLA-based strategies, MuClone and DLA-454, were developed to isolate Mu-tagged alleles. MuClone allows for the amplification of subsets of the numerous Mu transposons in the genome, using unique three-nucleotide tags at the 39 ends of primers, simplifying the identification of flanking sequences that cosegregate with mutant phenotypes caused by Mu insertions. DLA-454, which combines DLA with 454 pyrosequencing, permits the efficient cloning of genes for which multiple independent insertion alleles are available without the need to develop segregating populations. The utility of each approach was validated by independently cloning the gl4 (glossy4) gene. Mutants of gl4 lack the normal accumulation of epicuticular waxes. The gl4 gene is a homolog of the Arabidopsis CUT1 gene, which encodes a condensing enzyme involved in the synthesis of very-long-chain fatty acids, which are precursors of epicuticular waxes.

Genetics, 2008
Rates of Mu transposon insertions and excisions are both high in late somatic cells of maize. In ... more Rates of Mu transposon insertions and excisions are both high in late somatic cells of maize. In contrast, although high rates of insertions are observed in germinal cells, germinal excisions are recovered only rarely. Plants doubly homozygous for deletion alleles of rad51A1 and rad51A2 do not encode functional RAD51 protein (RAD51 À ). Approximately 1% of the gametes from RAD51 1 plants that carry the MuDRinsertion allele a1-m5216 include at least partial deletions of MuDR and the a1 gene. The structures of these deletions suggest they arise via the repair of MuDR-induced double-strand breaks via nonhomologous end joining. In RAD51 À plants these germinal deletions are recovered at rates that are at least 40fold higher. These rates are not substantially affected by the presence or absence of an a1-containing homolog. Together, these findings indicate that in RAD51 1 germinal cells MuDR-induced double-strand breaks (DSBs) are efficiently repaired via RAD51-directed homologous recombination with the sister chromatid. This suggests that RAD51 À plants may offer an efficient means to generate deletion alleles for functional genomic studies. Additionally, the high proportion of Mu-active, RAD51 À plants that exhibit severe developmental defects suggest that RAD51 plays a critical role in the repair of MuDR-induced DSBs early in vegetative development.

Journal of Genetics and Genomics, 2008
The maize (Zea mays) spikelet consists of two florets, each of which contains three developmental... more The maize (Zea mays) spikelet consists of two florets, each of which contains three developmentally synchronized anthers. Morphologically, the anthers in the upper and lower florets proceed through apparently similar developmental programs. To test for global differences in gene expression and to identify genes that are coordinately regulated during maize anther development, RNA samples isolated from upper and lower floret anthers at six developmental stages were hybridized to cDNA microarrays. Approximately 9% of the tested genes exhibited statistically significant differences in expression between anthers in the upper and lower florets. This finding indicates that several basic biological processes are differentially regulated between upper and lower floret anthers, including metabolism, protein synthesis and signal transduction. Genes that are coordinately regulated across anther development were identified via cluster analysis. Analysis of these results identified stage-specific, early in development, late in development and bi-phasic expression profiles. Quantitative RT-PCR analysis revealed that four genes whose homologs in other plant species are involved in programmed cell death are up-regulated just prior to the time the tapetum begins to visibly degenerate (i.e., the mid-microspore stage). This finding supports the hypothesis that developmentally normal tapetal degeneration occurs via programmed cell death.

Journal of Ecology, 2010
1. If we are to understand the mechanisms underlying species responses to climate change in natur... more 1. If we are to understand the mechanisms underlying species responses to climate change in natural systems, studies are needed that focus on responses of non-model species under field conditions. We measured transcriptional profiles of individuals of Andropogon gerardii, a C4 grass native to North American grasslands, in a field experiment in which both temperature and precipitation were manipulated to simulate key aspects of forecasted climate change.2. By using microarrays developed for a closely related model species, Zea mays, we were able to compare the relative influence of warming versus altered soil moisture availability on expression levels of over 7000 genes, identify responsive functional groups of genes and correlate changes in gene transcription with physiological responses.3. We observed more statistically significant shifts in transcription levels of genes in response to thermal stress than in response to water stress. We also identified candidate genes that demonstrated transcription levels closely associated with physiological variables, in particular chlorophyll fluorescence.4.Synthesis. These results suggest that an ecologically important species responds differently to different environmental aspects of forecast climate change. These translational changes have the potential to influence phenotypic characters and ultimately adaptive responses.

BMC Bioinformatics, 2008
Background: A primary reason for using two-color microarrays is that the use of two samples label... more Background: A primary reason for using two-color microarrays is that the use of two samples labeled with different dyes on the same slide, that bind to probes on the same spot, is supposed to adjust for many factors that introduce noise and errors into the analysis. Most users assume that any differences between the dyes can be adjusted out by standard methods of normalization, so that measures such as log ratios on the same slide are reliable measures of comparative expression. However, even after the normalization, there are still probe specific dye and slide variation among the data. We define a method to quantify the amount of the dye-by-probe and slide-by-probe interaction. This serves as a diagnostic, both visual and numeric, of the existence of probe-specific dye bias. We show how this improved the performance of two-color array analysis for arrays for genomic analysis of biological samples ranging from rice to human tissue.

Advances in next generation sequencing technology have facilitated the discovery of single nucleo... more Advances in next generation sequencing technology have facilitated the discovery of single nucleotide polymorphisms (SNPs). Sequenom-based SNP-typing assays were developed for 1,359 maize SNPs identified via comparative next-generation transcriptomic sequencing. ~75% of these SNPs were successfully converted into genetic markers that can be scored reliably and used to generate a SNP-based genetic map by genotyping Recombinant Inbred Lines (RILs) from the Intermated B73 X Mo17 (IBM) population. The quantitative nature of Sequenom-based SNP assays led to the development of a time-and cost-efficient strategy to genetically map mutants via quantitative Bulked Segregant Analysis (BSA). This strategy was used to rapidly map the loci associated with several dozen recessive mutants. Because a mutant can be mapped using as few as eight multiplexed sets of SNP assays on a bulk of as few as 20 mutant F 2 individuals, this strategy is expected to be widely adopted for mapping in many species.

The Etched1 gene of Zea mays (L.) encodes a zinc ribbon protein that belongs to the transcriptionally active chromosome (TAC) of plastids and is similar to the transcription factor TFIIS
Plant Journal, 2004
Etched1 (et1) is a pleiotropic, recessive mutation of maize that causes fissured and cracked matu... more Etched1 (et1) is a pleiotropic, recessive mutation of maize that causes fissured and cracked mature kernels and virescent seedlings. Microscopic examinations of the et1 phenotype revealed an aberrant plastid development in mutant kernels and mutant leaves. Here, we report on the cloning of the et1 gene by transposon tagging, the localization of the gene product in chloroplasts, and its putative function in the plastid transcriptional apparatus. Several alleles of Mutator (Mu)-induced et1 mutants, the et1-reference (et1-R) mutant, and Et1 wild-type were cloned and analyzed at the molecular level. Northern analyses with wild-type plants revealed that Et1 transcripts are present in kernels, leaves, and other types of tissue, and no Et1 expression could be detected in the et1 mutants analyzed. The ET1 protein is imported by chloroplasts and has been immunologically detected in transcriptionally active chromosome (TAC) fractions derived from chloroplasts. Accordingly, the relative transcriptional activity of TAC fractions was significantly reduced in chloroplasts of et1-R plants. ET1 is the first zinc ribbon (ZR) protein shown to be targeted to plastids. With regard to its localization and its striking structural similarity to the eukaryotic transcription elongation factor TFIIS, it is feasible that ET1 functions in plastid transcription elongation by reactivation of arrested RNA polymerases.

Plant Journal, 2007
All above-ground plant organs are derived from shoot apical meristems (SAMs). Global analyses of ... more All above-ground plant organs are derived from shoot apical meristems (SAMs). Global analyses of gene expression were conducted on maize (Zea mays L.) SAMs to identify genes preferentially expressed in the SAM. The SAMs were collected from 14-day-old B73 seedlings via laser capture microdissection (LCM). The RNA samples extracted from LCM-collected SAMs and from seedlings were hybridized to microarrays spotted with 37Â 660 maize cDNAs. Approximately 30% (10Â 816) of these cDNAs were prepared as part of this study from manually dissected B73 maize apices. Over 5000 expressed sequence tags (ESTs) (about 13% of the total) were differentially expressed (PÂ <Â 0.0001) between SAMs and seedlings. Of these, 2783 and 2248 ESTs were up- and down-regulated in the SAM, respectively. The expression in the SAM of several of the differentially expressed ESTs was validated via quantitative RT-PCR and/or in situ hybridization. The up-regulated ESTs included many regulatory genes including transcription factors, chromatin remodeling factors and components of the gene-silencing machinery, as well as about 900 genes with unknown functions. Surprisingly, transcripts that hybridized to 62 retrotransposon-related cDNAs were also substantially up-regulated in the SAM. Complementary DNAs derived from the LCM-collected SAMs were sequenced to identify additional genes that are expressed in the SAM. This generated around 550Â 000 ESTs (454-SAM ESTs) from two genotypes. Consistent with the microarray results, approximately 14% of the 454-SAM ESTs from B73 were retrotransposon-related. Possible roles of genes that are preferentially expressed in the SAM are discussed.

Theoretical and Applied Genetics, 2006
Temperature gradient capillary electrophoresis (TGCE) is a high-throughput method to detect segre... more Temperature gradient capillary electrophoresis (TGCE) is a high-throughput method to detect segregating single nucleotide polymorphisms and InDel polymorphisms in genetic mapping populations. Existing software that analyzes TGCE data was, however, designed for mutation analysis rather than genetic mapping. Genetic recombinant analysis and mapping assistant (GRAMA) is a new tool that automates TGCE data analysis for the purpose of genetic mapping. Data from multiple TGCE runs are analyzed, integrated, and displayed in an intuitive visual format. GRAMA includes an algorithm to detect peaks in electropherograms and can automatically compare its peak calls with those produced by another software package. Consequently, GRAMA provides highly accurate results with a low false positive rate of 5.9% and an even lower false negative rate of 1.3%. Because of its accuracy and intuitive interface, GRAMA boosts user productivity more than twofold relative to previous manual methods of scoring TGCE data. GRAMA is written in Java and is freely available at

Plant Journal, 2005
Prior analyses established that the maize (Zea mays L.) gl8a gene encodes 3-ketoacyl reductase, a... more Prior analyses established that the maize (Zea mays L.) gl8a gene encodes 3-ketoacyl reductase, a component of the fatty acid elongase required for the biosynthesis of very long chain fatty acids (VLCFAs). A paralogous gene, gl8b, has been identified that is 96% identical to gl8a. The gl8a and gl8b genes map to syntenic chromosomal regions, have similar, but not identical, expression patterns, and encode proteins that are 97% identical. Both of these genes are required for the normal accumulation of cuticular waxes on seedling leaves. The chemical composition of the cuticular waxes from gl8a and gl8b mutants indicates that these genes have at least overlapping, if not redundant, functions in cuticular wax biosynthesis. Although gl8a and gl8b double mutant kernels have endosperms that cannot be distinguished from wild-type siblings, these kernels are non-viable because their embryos fail to undergo normal development. Double mutant kernels accumulate substantially reduced levels of VLCFAs. VLCFAs are components of a variety of compounds, for example, cuticular waxes, suberin, and sphingolipids. Consistent with their essential nature in yeast, the accumulation of the ceramide moiety of sphingolipids is substantially reduced and their fatty acid composition altered in gl8a and gl8b double mutant kernels relative to wild-type kernels. Hence, we hypothesize that sphingolipids or other VLCFA-containing compounds are essential for normal embryo development.

A Ab bs st tr ra ac ct t--T This presents results on training both finite state classifiers and i... more A Ab bs st tr ra ac ct t--T This presents results on training both finite state classifiers and interpolated Markov models as classifiers for polymerase chain reaction primers. The goal of the study is to find techniques to decrease the number of primers that fail to amplify correctly within a large genomics project. Standard primer design packages already select primers in a manner consistent with current knowledge of the biophysics of DNA. The classifiers trained in this effort are used to capture lab and organism specific features of primer data and are used to post-process the output of standard primer design packages. The finite state classifiers in this study are trained with a novel evolutionary algorithm that uses an incremental fitness reward system and multi-population hybridization. This hybridization is akin to population seeding, not the more usual hybridization of evolutionary computation with other techniques. The interpolated Markov model is a form of Markov model that adapts to data rich and data sparse portions of the training set by using a variable order in its modeling. The interpolated Markov models exhibited slightly superior performance and trains with far higher speed. The finite state classifiers provide a substantially different classification, however, and require less training data.
Parallel de novo assembly of large genomes from high-throughput short reads
The advent of high-throughput short read technology is revolutionizing life sciences by providing... more The advent of high-throughput short read technology is revolutionizing life sciences by providing an inexpensive way to sequence genomes at high coverage. Exploiting this technology requires the development of a de novo short read assembler, which is an important open problem that is garnering significant research effort. Current methods are largely limited to microbial organisms, whose genomes are two to
Uploads
Papers by Patrick Schnable