ExpRev IdentifyingCodingVariants
ExpRev IdentifyingCodingVariants
Chee-Seng Ku*1, The advent of next-generation sequencing technologies has revolutionized the study of genetic
Mengchu Wu1, variation in the human genome. Whole-genome sequencing currently represents the most
David N Cooper2, comprehensive strategy for variant detection genome-wide but is costly for large sample sizes,
and variants detected in noncoding regions remain largely uninterpretable. By contrast, whole-
Nasheen Naidoo3,
exome sequencing has been widely applied in the identification of germline mutations underlying
Yudi Pawitan4, Mendelian disorders, somatic mutations in various cancers and de novo mutations in
Brendan Pang5, neurodevelopmental disorders. Since whole-exome sequencing focuses upon the entire set of
Barry Iacopetta6 and exons in the genome (the exome), it requires additional exome-enrichment steps compared
Richie Soong1 with whole-genome sequencing. Although the availability of multiple commercial exome-enrichment
1
Cancer Science Institute of Singapore kits has made whole-exome sequencing technically feasible, it has also added to the overall
(CSI Singapore), #12-01, MD6, Centre cost. This has led to the emergence of transcriptome (or RNA) sequencing as a potential
for Translational Medicine, NUS Yong alternative approach to variant detection within protein coding regions, since the transcriptome
Loo Lin School of Medicine, National
University of Singapore, 14 Medical
of a given tissue represents a quasi-complete set of transcribed genes (mRNAs) and other
Drive, 117599, Singapore noncoding RNAs. A further advantage of this approach is that it bypasses the need for exome
2
Institute of Medical Genetics, School enrichment. Here we discuss the relative merits and limitations of these approaches as they are
of Medicine, Cardiff University, applied in the context of variant detection within gene coding regions.
Cardiff, UK
3
Saw Swee Hock School of Public
Health, National University of KEYWORDS : EXOME s EXOME ENRICHMENT s NEXT GENERATION SEQUENCING s SINGLE NUCLEOTIDE VARIANTS s TRANSCRIPTOME
Singapore, Singapore
4
Department of Medical Epidemiology
& Biostatistics, Karolinska Institutet,
The advent of next-generation sequencing (NGS) detection [10–13] . WES has been applied to the
Stockholm, Sweden technologies has revolutionized our approach to detection of both germline and somatic variants
5
Department of Pathology, National performing structural and functional genomics [14,15] . As most of the disease-causing mutations
University Health System, Singapore
6
School of Surgery, The University of
studies [1,2] . The detection and characterization in Mendelian disorders reside within gene cod-
Western Australia, WA, Australia of genetic variation (ranging from single-nucle- ing regions, this has promoted the use of WES
*Author for correspondence: otide variants [SNVs] and small insertions and in unraveling new causal variants for these
Tel.: +65 81388095
Fax: +65 68739664
deletions [indels] to larger structural rearrange- disorders [16,17] . This approach has also been
[email protected] ments) in the human genome have been greatly widely employed in attempts to identify the
facilitated by NGS technologies such as whole- somatic driver mutations within the exomes of
genome sequencing (WGS) [3–5] . This has also various cancers [18] . WES has higher sensitivity
driven the 1000 Genomes Project, which, upon and specificity for detecting SNVs than small
completion, aims to provide a comprehensive indels [19,20] . In addition, WES also allows the
map of human genetic variants. Findings from detection of larger copy-number variations using
the pilot phases of this project have already pro- depth of coverage from mapped short-sequence
vided new insights into the nature and extent reads through the development of appropriate
of human genetic variation [6] . However, this bioinformatics tools [21] .
undertaking is well beyond the technical and Although NGS technologies have been avail-
financial capabilities of individual laboratories. able since 2005, the isolation and enrichment of
The high cost of WGS (in relation to sequenc- the entire set of all exons in the human genome
ing, data storage and ana lysis), together with the (the exome) was not technically feasible until the
challenges inherent in analyzing and interpret- development of commercial high-throughput
ing variants detected in noncoding regions [7–9] , exome-enrichment kits [22,23] . However, the cost
have now made whole-exome sequencing (WES) of the exome-enrichment step, which constitutes
a more popular approach in the context of variant a substantial proportion of the total cost of WES,
represents a ‘bottleneck’ that impedes the scale-up of WES to detection in coding regions, highlight their respective pros and
large sample sizes. More recently, the cost of sequencing has fallen cons, and make recommendations with regard to which approach
rapidly owing to the increasing throughput of sequencing data to use in different circumstances.
(up to hundreds of gigabases) per instrument run by the latest
sequencing platforms. As a result, multiple samples (up to tens of High-throughput sequencing technologies
exomes) can now be multiplexed to avoid redundant sequencing Currently available NGS technologies, such as the Illumina®
while still achieving adequate sequencing depth. This is known as HiSeq™ and Life Technologies™ SOLiD4™, are able to gen-
post-hybridization sample multiplexing or barcoding. By contrast, erate hundreds of millions of short sequence reads (50–125 bp)
this barcoding protocol became available for the exome-enrich- totaling several hundred gigabases of sequencing data per instru-
ment steps only comparatively recently [24,25] ; although it should ment run. By contrast, the Roche 454 GS FLX produces approxi-
further decrease the cost of exome enrichment and/or WES, its mately 1 million longer sequence reads ( 500 bp). These sequenc-
technical performance and effect on sequencing data from the ing technologies have been widely used in various studies ranging
sample barcoding in these prehybridization steps have not yet been from large-scale targeted sequencing of candidate genes to WES
tested experimentally by the end user. and WGS. However, owing to the large number of sequence reads
To further optimize the cost–effectiveness of variant detec- generated by HiSeq and SOLiD, these platforms are more suit-
tion within coding regions, transcriptome or RNA sequencing able for RNA-seq, which requires millions of reads for applica-
(RNA-seq) has been proposed as a potential substitute for WES tions such as profiling expression levels [33,34] . In terms of the
[26,27] . From a theoretical standpoint, this approach represents accuracy of variant detection, all three NGS technologies have
a promising alternative since, by definition, the transcriptome higher raw base error rates than Sanger sequencing. However, this
comprises all transcripts for both coding RNAs (i.e., mRNAs) can be improved through deeper sequencing to achieve a higher
and noncoding RNAs in a given tissue. Hence, RNA-seq would consensus accuracy rate. For WES (or WGS) of genomic DNA,
also be able to detect variants within the coding regions [28–30] . an average sequencing depth of 30–50× is usually deemed to be
In addition, the use of RNA-seq bypasses the need for exome- sufficient to detect most germline SNVs accurately. However,
enrichment steps, thereby rendering this approach more cost effec- greater sequencing depth would be needed to detect somatic
tive than WES. It would also obviate the need for target-probe point mutations in primary cancer tissue in order to allow for
hybridization steps and the technical limitations during exome tissue contamination and genetic heterogeneity within the tissue
enrichment. For example, owing to the uneven capture efficiency [35,36] . By contrast, the sequencing depth or coverage of RNA-
across exons experienced when using available exome-enrichment seq is difficult to estimate, because calculation of the coverage
kits, capture of all exons is incomplete. Moreover, some sequence of the transcriptome is less straightforward given that the true
reads map outside the targeted regions (‘off-target hybridization’), number and level of different transcript isoforms is not usually
leading to the production of unusable sequence reads for down- known. Moreover, transcriptional activity varies greatly between
stream ana lysis [22,23,31,32] . However, the application of RNA-seq genes (low- vs high-abundance transcripts) [27] . Although NGS
in this context is not without its shortcomings and limitations. It technologies have greater specificity to detect both germline and
is important to bear in mind that the transcriptome is tissue spe- somatic SNVs, further validation using Sanger sequencing is still
cific, so the set of genes transcribed varies between tissue types. As common practice.
a result, sequencing a transcriptome from a specific tissue would The arrival of third-generation sequencing (TGS) technologies,
be incapable of capturing all variants in the exome; hence, the such as the true single-molecule sequencing (Helicos Biosciences)
transcriptome of a specific tissue is invariably and unavoidably and single-molecule real-time sequencing (Pacific Biosciences),
only a subset of the exome. In certain diseases, this would not be has revolutionized variant detection [37,38] . These technologies
a problem since the tissue specificity of the disease-associated gene are characterized by single DNA molecule sequencing without
product would already be known. However, in cases where the the need for amplification steps, such as emulsion PCR (SOLiD
disease would be caused by the complete loss of the gene product, and 454 GS FLX platforms) and bridge amplification (Illumina
RNA-seq should probably be avoided. sequencing platform), thereby avoiding errors inherent in these
In this article, we focus on the technical and logistical diffi- polymerase-mediated amplification steps. In addition, since a
culties to be overcome in applying these different approaches to single DNA molecule is sequenced (rather than a cluster of clon-
variant detection in coding regions rather than the cost, because ally amplified DNA templates), this avoids the phenomenon of
the cost of the various technologies (exome enrichment and ‘dephasing’ (i.e., the uneven sequencing of the clonally ampli-
sequencing) is currently falling quite rapidly. Although the total fied DNA templates within a cluster), which constitutes another
cost of RNA-seq may well be less than WES, the limitations of source of sequencing error [39] . Theoretically, TGS technologies
this approach in detecting coding variants, as well as the specific would be expected to achieve a lower read base error rate as com-
research question posed, must be taken into consideration when pared with NGS platforms; however, the opposite has actually
selecting an analytical approach. Unlike WES, where the major been found to be the case. The read base error rate of the Helicos
application is variant detection, RNA-seq is also able to mea- true single-molecule sequencing HeliScope™ is 4–5%, which is
sure transcript expression levels and detect novel fusion genes. In higher than any of the NGS technologies (<2%). In addition, with
this review, we discuss the role of WES and RNA-seq in variant HeliScope, the dominant error type is indels [40] . All four of these
sequencing platforms have been previously applied in WGS stud- covered [31] . Furthermore, it is noteworthy that the human gene
ies of the human genome. The limitations of TGS technologies complement is far from being fully characterized, as demonstrated
(i.e., higher read base error rate and indel errors) are explicable in recently by Mercer et al. [49] . This study used a capture array
terms of the weak signal generated by single-molecule sequencing covering 2265 contiguous regions that collectively comprised a
or due to incorporation of unlabeled nucleotides [39] . Despite these total size of approximately 0.77 Mb and was subjected to deep
limitations, TGS technologies offer additional advantages for RNA-seq. By focusing on regions containing well-annotated
direct RNA-seq, such as avoiding the need to convert RNA into protein-coding genes, Mercer et al. identified an additional 204
cDNA. Single-molecule sequencing also avoids the amplification unannotated isoforms of 55 protein-coding loci, representing a
biases inherent in measuring transcript levels [30,41] . 2.8-fold increase over the current catalog of isoforms for these loci
[49] . This suggests that considerable functional genomic complex-
Whole-exome sequencing ity remains to be resolved even for quite well-characterized loci.
WES represents an approach to sequencing the entire set of Therefore, given that the structures of many human genes are
exons in the human exome (comprising 200,000 exons). Since still inadequately characterized, it is apparent that WES (unlike
this approach focuses specifically on the coding regions, exome- RNA-seq) is intrinsically incomplete and inevitably biased
enrichment steps are needed before the genomic DNA can be towards the currently known (and still limited number of) exons
subjected to massively parallel sequencing. The development of of protein-coding genes.
commercial whole human exome-enrichment kits by Agilent, Although multiple sequence-enrichment methods are avail-
Nimblegen and Illumina has been largely responsible for the pop- able [24] , the commercial enrichment kits come in two formats
ularity of this approach [22] . The relative performance of these dif- that is array-based and in-solution hybrid capture. The difference
ferent WES platforms has recently been compared [23] . WES has between on-solid and in-solution capture methods is that the
gained favor largely because gene coding regions harbor >85% of oligonucleotide probes are either tethered on microarrays or are
mutations in monogenic disease states [42] and it has been widely suspended in solution (oligonucleotide probes attached to beads),
applied in the identification of the underlying germline muta- respectively. The coupling of these enrichment methods with
tions in numerous Mendelian disorders of previously unknown NGS technologies has made WES technically more feasible and
genetic etiology. In addition to its use as a discovery tool, WES cost effective. However, a major limitation of the exome-enrich-
is now increasingly being employed in diagnostic applications ment kits is that some exons are not captured, resulting in some
[43] . Furthermore, this approach has also been widely adopted in variants within these regions going undetected and hence being
the study of somatic driver mutations within gene coding regions refractory to ana lysis. Thus, WES may have to be supplemented
in various cancers [15,44,45] . When trios (parents–offspring) have with conventional PCR-based Sanger sequencing methods in
been available, WES has been successfully used to identify de novo order to capture these ‘missed’ exons.
mutations in a number of different neurodevelopmental disorders Another critical limitation of the enrichment kits is the uneven
[46–48] . Despite it being widely considered as a transient technol- capture of exons as a consequence either of the technical limita-
ogy, WES studies have already generated significant new findings tions of target-probe hybridization or the variable GC content of
in the context of human genetic disease. the regions. This, coupled with the uneven sequencing charac-
As noted earlier, exome enrichment is a prerequisite for teristic of the NGS technologies, has resulted in an inadequate
WES. During enrichment, the genomic regions of interest (the sequencing depth for some of the regions. GC-rich sequence
exome) are captured through hybrid selection of DNA frag- stretches can be difficult to capture and, in the worst case sce-
ments using oligonucleotide probes, whereas the unwanted nario, these GC-rich regions are not captured at all. For exam-
DNA sequences (the noncoding regions) are removed prior to ple, two exons without any sequencing coverage were found to
sequencing. Although protein coding regions are almost invari- contain very high GC content (76.1 and 63.6%, respectively)
ably the main focus, exome-enrichment kits are also designed to compared with an average GC content of 37.6% for the 50 best-
capture sequences outside the exome. For example, the Illumina® covered exons [50] . Those variants called in regions characterized
TruSeq™ exome-enrichment kits was designed to target a region by poor sequencing coverage usually receive a poor accuracy score
of 62 Mb [101] , more than double the size of the human exome and hence are usually filtered out. As a result, a higher overall/
( 30 Mb). In addition to the comprehensive coverage of the average sequencing depth is needed to ensure that the poorly
major exon/gene databases such as consensus coding sequence covered regions achieve the minimum sequencing depth for
(CCDS) and RefSeq, this enrichment kit also provides broad accurate variant detection. Despite this unevenness, an average
coverage of noncoding DNA in exon-flanking regions (promoters sequencing depth of 30–50 is usually deemed sufficient for the
and untranslated regions). Furthermore, approximately 78% of detection of most germline SNVs. WES at this depth has high
predicted miRNAs are also captured. sensitivity and specificity to detect SNVs (approximately >90%),
Although whole-exome-enrichment kits were designed to cap- but the specificity for small indels is much lower [14,19,20,51] .
ture the exons listed in the major databases (such as CCDS and
RefSeq), it is by no means complete. For example, the NimbleGen Transcriptome sequencing
version 2 exome-enrichment kits (Roche) is predicted to provide The transcriptome is the collection of all protein coding and non-
99.2% coverage of CCDS, but only 49.6% of RefSeq would be coding transcripts (RNAs) in a given tissue. However, the coding
www.expert-reviews.com 243
Review Ku, Wu, Cooper et al.
Table 1. Comparison of whole-exome sequencing and RNA sequencing in the context of detecting coding
region variants.
Aspects of comparison Whole-exome sequencing RNA-seq
Tissue specificity of exome Exome shared across different tissues/cell types Transcriptome varies between different tissues/cell types
versus transcriptome (same set of genomic DNA and germline DNA (different sets of transcribed genes)
variants, but different somatic mutational profiles)
Time-course difference Same set of genomic DNA and germline DNA Transcriptome is ‘dynamic’ in that it changes with time in
variants throughput life, but somatic mutations a given tissue/cell type in response to both internal and
accumulate with mitotic cell divisions in a external stimuli
tissue-/cell type-specific manner
Application Variant detection in coding regions Expression analysis of coding and noncoding transcripts
Discovery of new transcripts
Studying alternative splicing patterns
Detection of transcript fusions
Analysis of allele-specific expression
Variant detection in coding regions of transcribed genes
in a given tissue
Application in variant
detection to identify:
– Germline variants for Yes Has not been applied
Mendelian disorders
– Somatic variants in cancers Yes Yes
– De novo variants Yes Has not been applied
Sources of DNA/RNA
sample:
– Germline variants DNA from any tissues (commonly DNA is mRNA from tissue of interest to detect variants in
extracted from a peripheral blood sample) transcribed genes
– Somatic variants DNA from disease tissue or tissue of interest mRNA from disease tissue or tissue of interest
Need for exome Yes No (mRNA extraction protocol)
enrichment
Representativeness of the Capture and sequence almost all of the Only a subset of exons from the transcribed genes
entire coding region 200,000 exons in the human genome (pooling of transcriptome from multiple tissues to
enhance the detection of germline variants in exons is
theoretically sound, but could be technically laborious)
Variant detection in Both transcribed and nontranscribed genes Only transcribed genes in a given tissue (RNA-seq would be
protein-coding regions incapable of capturing all variants within the coding regions)
Variant detection beyond Whole exome-enrichment kits have also been Variant detection in noncoding regions is also feasible
protein-coding regions designed to capture miRNA regions through whole-transcriptome sequencing rather than
mRNA sequencing
Incomplete capture of the Some exons failed to be captured by the Incomplete in terms of the variants in nontranscribed
entire coding regions exome-enrichment kits and hence variants in the genes (in a given tissue) that cannot be detected
missing regions cannot be detected
Uneven capture and Uneven capture of exons as a consequence either Different expression levels of different genes/transcripts
sequencing of technical limitations or the variable GC content lead to a natural ‘unevenness’ in the transcriptome
of chromosomal regions
Other issues associated Inherent limitations in target–probe hybridization Expression imbalance of two different strands is likely to
with variant detection during exome enrichment generate false-positive results in variant detection. The
possibility of errors in the reverse transcription of cDNA
or the existence of RNA editing must also be considered.
Mutations that cause rapid mRNA degradation of the
transcripts containing them may also be missed by
RNA-seq
Diagnostic application Widely tested to detect germline variants Not widely tested
underlying Mendelian disorders
RNA-seq: RNA sequencing.
component can be extracted from the transcriptome sample and low-abundance transcripts. Increasing the sequencing depth is
constructed into a sequencing library before being subjected to also not without its adverse effects. A very high sequencing depth
massively parallel sequencing. As with WES, the large number of will increase the sensitivity of variant detection in low abundance
sequence reads produced are then mapped to a reference genome. transcripts, but it also compromises the specificity [27] . This is
RNA-seq data have previously been used in several major applica- because, at low levels of coverage, reads that could produce false-
tions, such as the expression ana lysis of coding and noncoding positive calls are not sufficiently abundant to pass through the
transcripts, the discovery of new transcripts, the study of alterna- quality control filters. However, as the sequencing depth increases
tive splicing patterns, the detection of transcript fusions and the and more reads are added to the dataset, the number of incor-
ana lysis of allele-specific expression [28–30] . rect alignments and hence sequencing errors increases, resulting
In comparison to the genome (or genomic DNA), the tran- in more false-positive calls passing through the quality control
scriptome is both tissue- and cell-type specific. It is also dynamic filters. In either scenario, the critical challenges in variant detec-
in that it changes with time (within the same tissue and even tion using RNA-seq due to uneven transcripts levels are appar-
within the same cell type) in response to both internal and exter- ent. This natural variability in transcript expression levels could
nal stimuli. Thus, the transcriptome derived from any one tis- be even larger than the unevenness resulting from the technical
sue type will not represent the entire exome (i.e., all cells may limitations in exome enrichment of genomic DNA. Although
have essentially the same genome/exome, but not all genes are the challenge of variant detection in low-abundance transcripts
expressed in a specific tissue/cell type). The focus on the exome can be alleviated by RNA CaptureSeq enrichment for low-level
(or mRNAs) is achieved naturally through transcription and tech- transcripts using cDNA-tiling arrays prior to high-throughput
nically through mRNA extraction and library preparation meth- sequencing [49,53] , this further adds to the cost of the approach,
ods, thereby bypassing the need for an exome-enrichment step. thereby off setting the benefits of reduced cost potentially offered
This also leads to the incomplete capture of the exome; variants by this approach. However, it is also noteworthy that RNA-seq
in nontranscribed genes cannot be detected. This has important (as opposed to WES) is not biased towards currently known exons
implications in the context of identifying variants in all coding and that CaptureSeq permits a very deep RNA-seq coverage of
regions. It has been shown experimentally that only approximately genomic regions of interest and has a high likelihood of revealing
40% of all coding SNVs can be identified by RNA-seq using novel gene variants.
peripheral blood mononuclear cells as the RNA source. However, The definition of low- versus high-abundance transcripts is
when this ana lysis was focused exclusively on peripheral blood somewhat arbitrary, and the existence of alternative splicing fur-
mononuclear cell-expressed genes, approximately 81% of coding ther complicates it. In any discussion of the detection of variants
SNVs were identified [27] . This suggests that RNA-seq is only a in low- versus high-abundance transcripts, alternative splicing
feasible alternative for identifying exonic variants in tissue-specific must be taken into consideration. Let us take a simple hypotheti-
transcribed genes. TABLE 1 summarizes the comparison of WES and cal example with five alternative transcripts for a given gene; tran-
RNA-seq in detecting coding region variants. script ‘A’ is highly expressed; for example, fourfold higher than
A further complication of RNA-seq is that different genes have each of the remaining four transcripts in a given tissue. However,
different expression levels at different times. This also leads to a transcript ‘A’ does not contain exon ‘I’, although this particular
natural unevenness of the transcriptome (i.e., high- vs low-abun- exon is expressed in all four other transcripts. Hence, variants in
dance mRNAs), irrespective of the tissue under study. More spe- exon ‘I’ would not be detected in the highly abundant transcript
cifically, by performing RNA-seq on an acute myeloid leukemia ‘A’, but would be collectively in the other alternative transcripts
(AML) sample alongside the corresponding remission sample, expressed at a lower level. It should therefore be emphasized that
Greif et al. found that the read depth per gene ranged from zero the ability to detect variants in a particular exon depends upon
to over 1000 [52] . Put another way, a total of 10,152 genes had the ‘aggregate abundance’ of the alternative transcripts covering
an average read depth of at least sevenfold while 6989 genes had the exon rather than their ‘individual abundance’ per se.
an average read depth of 20-fold or greater in both samples [52] . Furthermore, the expression imbalance of two different strands
This suggests that some transcripts (expressed to some levels in is also likely to generate false-positive results in variant detec-
a tissue) will have inadequate coverage for variants to be called tion using RNA-seq [54] . For this reason, some SNVs in the
accurately, dependent on the sequencing depth. genomic DNA may be missed by RNA-seq as a consequence of
From a theoretical perspective, the unevenness of the expres- allele expression imbalances [55] , namely when an individual is
sion levels could be rectified by increasing the overall sequencing heterozygous for a given SNV (in genomic DNA) but the refer-
depth to ensure that the low-abundance transcripts are adequately ence allele is much more highly expressed than the mutant allele.
sequenced to allow efficient variant detection. However, this will Other complications associated with RNA-seq include: a low per-
lead to highly redundant sequencing of the most abundant tran- centage of uniquely aligned sequence reads; incorrect alignment
scripts. This is particularly problematic in those transcriptome at the ends of a read due to splicing; and the possibility of errors
samples harboring the greatest variability in transcript expres- in the reverse transcription of cDNA or the existence of RNA
sion levels. Since this ‘further sequencing’ approach is not com- editing [27] . Finally, beyond these technical limitations, mutations
monly pursued because it is not cost effective, the accuracy of that cause rapid mRNA degradation of the transcripts contain-
variant detection is likely to be compromised in the context of ing them may also be missed by RNA-seq. These limitations
www.expert-reviews.com 245
Review Ku, Wu, Cooper et al.
are noteworthy as some of them are specific to the RNA-seq cancer tissue in comparison to constitutional DNA derived from
approach. normal blood or skin samples from the same individual. Although
Despite these limitations, several studies have demonstrated the RNA-seq can also be used to detect somatic mutations, finding
successful application of RNA-seq to the discovery of novel driver a matched normal sample for comparison can be challenging.
mutations in cancer [52,56–60] . For example, Shah et al. identified a Normal tissue is unlikely to express exactly the same genes as the
recurrent missense point mutation (402C/G or C134W) in FOXL2 tumor sample because different tissues have different sets of tran-
by analyzing four ovarian adult-type granulosa-cell tumors using scribed genes in addition to varying in terms of transcript abun-
whole-transcriptome paired-end RNA-seq [58] . For comparison, the dance [35,36] . However, gene expression patterns/levels have been
study also performed RNA-seq for 11 nongranulosa-cell ovarian found to be comparable between an AML sample (a bone mar-
tumors, and transcriptome variants were identified that were absent row aspirate with more than 90% blasts) and a remission sample
in these 11 tumors that were classified as granulosa-cell tumor-spe- (peripheral blood with a normal white blood cell count) [52] .
cific variants and subjected to further analysis. The specific FOXL2 As with germline variants, somatic variants are likely to be
mutation was further validated in the cDNA and genomic DNA detected by RNA-seq in highly transcribed genes, but combining
of all four index samples by two additional independent methods. transcriptomes from multiple tissues (to increase the abundance
Furthermore, the mutation was found to be somatic in origin in of low-transcribed genes) would be inappropriate in the context
two patients from whom normal constitutional tissue was avail- of somatic variants. However, it is unlikely that somatic variants
able [58] . Similarly, five nonsynonymous mutations specific to the (e.g., even though they may be protein truncating) detected in
tumor sample were identified by performing RNA-seq of an AML nontranscribed genes will be functionally important in the spe-
sample (bone marrow aspirate) with the corresponding remission cific tissue under study. This therefore justifies using the RNA-seq
sample (peripheral blood). These mutations included a nonsense approach for detecting somatic variants in genes that are tran-
mutation affecting the RUNX1 gene (a known mutational target in scribed only in the tumor tissue under study. Although somewhat
AML) and a missense mutation in the TLE4 gene (which encodes a speculative, it may be that somatic variants detected in genes
RUNX1-interacting protein) [52] . Taken together, these studies have that are not transcribed in a given tumor tissue could also be
demonstrated that RNA-seq represents a promising tool for detect- functionally important and might mediate their effects through
ing point mutations or SNVs within coding regions of transcribed mechanisms other than protein disruption.
genes that could turn out to play a key role in tumorigenesis. In terms of detecting somatic variants, both WES and RNA-
seq approaches are limited by the impurity of the primary tumor
Germline versus somatic mutations tissue (i.e., it inevitably contains a mixture of cancer and noncan-
The suitability of WES and RNA-seq as mutation screening tech- cer cells). Genetic heterogeneity of the cancer cells further com-
niques is also dependent upon whether the variants in question are plicates the situation, since different subclones within the same
germline or somatic. Germline variants are heritable and hence tumor tissue can harbor different mutational profiles. These issues
are shared across different tissues and cell types. For the detec- make the detection of somatic variants even more challenging and
tion of germline variants, genomic DNA is required for WES and require greater sequencing depth to be resolved [35,36] .
can in principle be derived from any tissue (except in the case of
gonosomal mosaics). By contrast, our ability to detect germline Diagnostic applications
variants from a transcriptome sample of a given tissue is confined In addition to new discoveries, the arrival of NGS technologies
to the transcribed genes of that tissue. However, the transcriptomes has also created new opportunities in molecular diagnostics. WES
from multiple different tissues could in principle (and in prac- has been shown to be a promising tool in a diagnostic setting for
tice) be combined to increase the completeness of germline variant rare Mendelian disorders. In a pioneering study by Choi et al.
detection in the entire coding region. Although this approach is the genetic diagnosis of congenital chloride-losing diarrhea in
theoretically and technically sound, it is experimentally laborious a patient was confirmed through WES by revealing a homozy-
to prepare and combine mRNA samples from multiple tissues, gous missense variant in SLC26A3 (a gene known to underlie
and consequently this may also decrease the cost–effectiveness of the disease) [43] . The patient was initially diagnosed as having
RNA-seq. The detection of somatic variants is a somewhat different Bartter syndrome based on superficial phenotyping [43] . Its diag-
procedure as it requires comparison with ‘constitutional DNA or nostic utility is also becoming more evident in cases with broad
RNA’ from a different tissue in order to exclude germline events. In and previously unsuspected phenotypic heterogeneity. This has
addition, detection of somatic variants requires the specific tissue of been well illustrated by the identification of a homozygous PEX1
the disease of interest; for example, tissue from the primary tumor. mutation in a patient with a clinical diagnosis of Leber congeni-
The need to obtain the appropriate tissue for the disease of interest tal amaurosis, which is identical to a mutation known to cause
has essentially limited the studies of somatic mutations to cancer Zellweger syndrome [61] .
[52,56–59] . It may be quite challenging to obtain the appropriate tis- In addition, the clinical utility of WES has also been demon-
sues to study other diseases, such as schizophrenia (brain tissue), or strated in the case of one patient affected by two different genetic
to establish the identity of the appropriate tissue to study specific disorders. Two different mutations in SLC45A2 and G6PC3,
systemic diseases, such as diabetes or systemic lupus erythematosus. respectively, were identified in a single patient with an indeter-
WES has been widely used to detect somatic variants in primary minate clinical phenotype. These mutations were sufficient to
account for the two different clinical phenotypes manifested by heterogeneity, rendering the ‘one-by-one’ approach inefficient.
this patient; oculocutaneous albinism Type 4 and neutropenia In this scenario, RNA-seq would serve multiple roles to detect
[62] . Last, WES is also a powerful tool for disorders with genetic genetic aberrations in a single experiment. By contrast, RNA-seq
heterogeneity, such as Charcot–Marie–Tooth disease, an inherited might not be an inappropriate choice as a diagnostic tool if most
peripheral neuropathy characterized by extensive locus hetero- of the cases could be accounted for by a single genetic alteration.
geneity (mutations in more than 35 genes have been identified to
date). Indeed, a WES study of two affected members in a family Expert commentary
with Charcot–Marie–Tooth disease identified a nonsynonymous In our view, WES of genomic DNA is a more powerful tool
mutation in GJB1, a known Charcot–Marie–Tooth disease gene, to detect germline variants than RNA-seq. The WES approach
thereby confirming the molecular diagnosis [63] . is, however, reliant upon commercial exome-enrichment kits to
Apart from its application in diagnostics, WES also has several capture the entire set of exons. As such, incomplete capture of
advantages over the conventional targeted sequencing of candidate exons in some regions due to technical limitations represents a key
genes by PCR-based Sanger sequencing methods, which prioritize challenge. Nevertheless, this limitation can be remedied by con-
genes for sequencing on a ‘one-by-one’ basis. Despite the potential ventional PCR-based Sanger sequencing methods as long as the
to use WES as a diagnostic tool, the technical challenges and ethi- number (or the total genomic size) of these missing exons is small,
cal issues involved in adopting this approach in clinical laboratories since traditional methods are laborious and not readily scaled up.
must also be appreciated. A major challenge will be to analyze the By contrast, high-throughput multiplexed PCR methods, such as
large amount of sequencing data, since WES typically generates RainDance™ and Fluidigm® technologies, could be applied if
>10,000 genetic variants per genome. Thus, a robust variant-filter- a considerable number of exons are lacking (up to hundreds of
ing pipeline must be applied to identify the disease-causing variants. exons) [68] . The enriched genomic DNA can then be sequenced
In addition, the sensitivity and specificity of WES to detect SNVs using the medium-throughput NGS machines, such as the Roche
and small indels needs to be further improved to attain clinical 454 Genome Sequencer Junior Sequencing System [69] , the Life
standards. The common practice of validation of the results from Technologies Ion Torrent Personal Genome Machine Sequencer
WES by Sanger sequencing unnecessarily increases the cost of a [70] and the Illumina MiSeq Personal Sequencing System [102] . As
diagnostic test. Furthermore, the incomplete capture of some exons the total size of genomic DNA from the ‘missing’ regions is not
and uneven sequencing depth could potentially lead to a negative large (ranging from tens to hundreds of kilobases), the sequenc-
result. It is therefore important to generate a report (detailing the ing capabilities of these medium-throughput NGS machines,
quality of a WES run; e.g., what was not captured and what was such as >35 Mb (454 Junior) to >1 Gb (Ion Torrent and MiSeq)
sequenced unreliably owing to inadequate sequencing depth) for will adequately suit this application. In addition, amplicon-
diagnostic applications. On the other hand, there are a number of sequencing protocols have also been commercially developed for
unresolved and quite complex ethical issues including the disclosure these medium-throughput NGS machines. Furthermore, sample
of findings that might be considered incidental or unrelated to the multiplexing can further optimize cost–effectiveness.
original purpose of the diagnostic test and whether the patients By contrast, the transcriptome from a specific tissue/cell only
have the right to demand full access to the results generated from represents a subset of the exome. As a result, only the variants in
their WES diagnostic test. Since WES is a powerful information the expressed genes or transcripts in that tissue can be detected
generation tool, this also raises a concern as to whether clinicians by RNA-seq. However, where detection of germline variants is
and medical geneticists have a responsibility to sift through the required, this can be remedied, at least in principle, by combining
list of variants to identify known pathological mutations for other the transcriptome samples from multiple tissues in order to extend
diseases and what level of scrutiny should be exercised [64–67] . variant detection to the entire exome. This is feasible because
The discussion of the potential for RNA-seq to be used as a germline variants are not tissue specific. However, the considerable
diagnostic tool in other applications, such as the measurement of variability in transcript expression levels presents a critical chal-
transcript expression levels and the detection of fusion gene tran- lenge to the detection of variants using RNA-seq. This will lead to
scripts, is beyond the scope of this article. However, in terms of either insufficient coverage of low-abundance transcripts to enable
applying RNA-seq as a diagnostic tool for variant detection, it has accurate variant detection or redundant sequencing of the high-
not been widely tested empirically. It should be noted that variant abundance transcripts to ensure that the sequencing/coverage
detection is a minor application of RNA-seq. However, in most depth is sufficient for low-abundance transcripts. Both outcomes
cases, RNA-seq is being used to examine whether the variant- are undesirable.
harboring genes are expressed. The limited interest in applying In the context of somatic mutations, both WES and RNA-seq
RNA-seq in this context is probably attributable to the various may be applied but both approaches have their pros and cons.
technical limitations and challenges, as discussed earlier. In the Most of the studies that have detected somatic mutations in the
context of cancer, RNA-seq may be more suitable for detecting cancer genome (utilizing genomic DNA extracted from cancer
somatic mutations in transcribed genes and in the prediction of tissues or cell lines) have adopted either WES or WGS approaches.
therapeutic response. As with WES, RNA-seq is a genome-wide WES of genomic DNA detects somatic variants in all of the cod-
approach and a powerful information generation tool. Hence ing regions (both transcribed and nontranscribed genes) in the
it would be very applicable to cancers characterized by genetic cancer tissue. It is arguable that deleterious somatic mutations,
www.expert-reviews.com 247
Review Ku, Wu, Cooper et al.
such as nonsense/protein-truncating mutations in expressed genes, method. Furthermore, the number of variants identified in the
are more likely to be functionally important. However, somatic exome or transcriptome is much more manageable (and easier to
mutations located in nontranscribed genes need not necessarily prioritize for subsequent validation) than is the case for WGS. It is
be functionally inert if they coincide with regulatory elements or nevertheless foreseeable that these challenges will be overcome in
produce biological alterations other than protein disruptions. This the not-too-distant future. The cost of WGS to a sequencing depth
does not therefore render WES redundant (in terms of generating of 50× provided as a commercial service is now below US$5000.
data in nontranscribed genes) to this application. By contrast, As the total cost of WGS becomes more affordable, it is expected
RNA-seq only detects somatic mutations in transcribed genes to become the dominant tool for many applications in structural
in the specific tissue that is relevant to the disease of interest. and functional genomics studies, including variant detection in
Furthermore, RNA-seq possesses several distinct advantages that the entire genome. It is expected that the generation of excess
cannot be substituted by WES. In addition to mutation detection, information will then become of minor importance. Hence, our
RNA-seq also permits other analyses, such as the measurement of current difficulty in interpreting variants in noncoding regions
transcript expression levels, the investigation of alternative splicing will not always serve to impede the application of WGS. Similarly,
patterns and the detection of fusion transcripts. RNA-seq will also benefit from further technological and bioinfor-
Ultimately, the choice of which approach to adopt will be depen- matical advances. Here, RNA-seq refers specifically to the sequenc-
dent on both the research question posed and the original hypo- ing of mRNAs. However, in a wider context, the transcriptome
thesis. For example, if the study was designed to identify aberrantly encompasses all the transcripts, including coding (mRNAs) and
expressed transcripts or fusion transcripts differentiating cancer noncoding RNAs, such as miRNAs and long intergenic noncoding
from noncancer tissues in addition to somatic mutation detection, RNAs. Future advances in sequencing technologies will enable
RNA-seq would clearly be the method of choice. On the other complete transcriptome sequencing to allow variant detection in
hand, if the aim was solely to identify somatic mutations in cancer both coding and noncoding regions. More importantly, the key
tissues, both methods could be applied depending upon whether role of RNA-seq in interrogating noncoding RNAs in cancer has
the detection of somatic mutations in nontranscribed genes would been recently demonstrated. A comprehensive analysis of long non-
be important to the research question posed; if so, then WES would coding RNAs in 102 prostate cancer tissue samples and cell lines by
be the approach of choice. By contrast, if one was interested in deep RNA-seq identified 121 noncoding RNAs, termed prostate
detecting somatic mutations in transcribed genes, then RNA-seq cancer-associated noncoding RNA transcripts, whose expression
would be the more appropriate technique to use. However, it should patterns appear to be capable of distinguishing benign, localized
be noted that mutations in low-abundance transcripts might not cancer from metastatic cancer samples, suggesting that cancer-
be detected accurately in the absence of a high depth of sequencing specific functions of these noncoding RNAs may help to drive
coverage. We and others believe that WES is technically and bio- tumorigenesis [75] . This study demonstrates the utility of RNA-seq
informatically less challenging for interrogating somatic mutations; in defining functionally important (yet unannotated) elements
this viewpoint is supported by WES being more widely applied in of the genome. It must also be noted that beyond the ability to
studies of cancer mutations than RNA-seq [15,44,45,71–74] . detect clinically important variants of noncoding RNAs, present
data have also shown the importance of detecting variable levels
Five-year view of expression of these noncoding RNAs. The functional role of
The main reason to use RNA-seq as a means to detect variants in long noncoding RNAs in cancer and the important role of RNA-
coding regions would be its cost–effectiveness, since this approach seq in identifying the relevant noncoding RNAs are increasingly
obviates the need for exome-enrichment steps (assuming that the being recognized [75–77] . Hence, the importance of RNA-seq in
sequencing cost is comparable to WES). A direct total cost com- defining the complement and the abundance of protein-coding
parison between these two approaches is difficult, because these and noncoding RNAs should be appreciated.
technologies (and their associated costs) are rapidly changing and
also differ by vendor. This weakens the justification for apply- Acknowledgements
ing RNA-seq based on cost alone. However, even if RNA-seq is C-S Ku contributed to the conceptualization of this article. C-S Ku, M Wu
cheaper, the challenge of detecting mutations using this approach and DN Cooper contributed to the writing of the article and the preparation
must be appreciated. At present, RNA-seq has not been applied of the table. N Naidoo, Y Pawitan, B Pang, B Iacopetta and R Soong were
as widely as WES in variant detection. The advent of sample involved in the discussion and critical reading. C-S Ku approved the final
barcoding protocols in the prehybridization steps may be expected version and had final responsibility for this article.
to reduce the cost of exome enrichment significantly in WES.
Moving beyond the coding regions, WES is likely to be a tran- Financial & competing interests disclosure
sient technology that will eventually be replaced by WGS. However, The authors have no relevant affiliations or financial involvement with any
several factors, including the total cost (sequencing costs plus other organization or entity with a financial interest in or financial conflict with
indirect costs incurred for bioinformatic analysis and data storage), the subject matter or materials discussed in the manuscript. This includes
analytical challenges of a large dataset and our limited ability to employment, consultancies, honoraria, stock ownership or options, expert
interpret the functional/clinical significance of variants in noncod- testimony, grants or patents received or pending, or royalties.
ing regions, have for the time being made WES a more popular No writing assistance was utilized in the production of this manuscript.
Key issues
s The detection and characterization of genetic variations in the human genome have been greatly facilitated by next-generation
sequencing technologies such as whole-genome sequencing.
s The high cost of whole-genome sequencing, together with the challenges inherent in analyzing and interpreting variants detected in
noncoding regions, have made whole-exome sequencing (WES) a popular approach in the context of variant detection.
s WES has been applied to the detection of both germline and somatic variants and de novo variants in trios.
s Since WES focuses specifically on the coding regions, exome-enrichment steps are required before the genomic DNA can be subjected
to massively parallel sequencing, thereby adding substantially to the total cost of WES.
s To further optimize the cost–effectiveness of variant detection within coding regions, transcriptome or RNA sequencing (RNA-seq) has
been proposed as a potential substitute for WES.
s Because the transcriptome from a specific tissue/cell only represents a subset of the exome, only the variants in the expressed genes or
transcripts in that tissue can be detected by RNA-seq.
s It is likely that deleterious somatic mutations, such as nonsense/protein-truncating mutations in expressed genes, are more likely to be
functionally important.
s The considerable variability in transcript expression levels also presents a critical challenge to the detection of variants using RNA-seq.
s The choice of which approach to adopt will be dependent on both the research question posed and the original hypothesis.
s Unlike WES, where the major application is variant detection, RNA-seq has other applications, such as the measurement of transcript
expression levels and the detection of novel fusion genes.
10 Teer JK, Mullikin JC. Exome sequencing: 18 Wong KM, Hudson TJ, McPherson JD.
References the sweet spot before whole genomes. Hum. Unraveling the genetics of cancer: genome
Papers of special note have been highlighted as: Mol. Genet. 19(R2), R145–R151 (2010). sequencing and beyond. Annu. Rev.
s OF INTEREST
11 Ng SB, Nickerson DA, Bamshad MJ, Genomics Hum. Genet. 12, 407–430 (2011).
ss OF CONSIDERABLE INTEREST
Shendure J. Massively parallel sequencing 19 Ng SB, Buckingham KJ, Lee C et al.
1 Mardis ER. The impact of next-generation
and rare disease. Hum. Mol. Genet. 19(R2), Exome sequencing identifies the cause of a
sequencing technology on genetics. Trends
R119–R124 (2010). mendelian disorder. Nat. Genet. 42(1),
Genet. 24(3), 133–141 (2008).
12 Majewski J, Schwartzentruber J, Lalonde E, 30–35 (2010).
2 Shendure J, Ji H. Next-generation DNA
Montpetit A, Jabado N. What can exome s /NE OF THE lRST STUDIES TO DEMONSTRATE THE
sequencing. Nat. Biotechnol. 26(10),
sequencing do for you? J. Med. Genet. 48(9), FEASIBILITY OF 7%3 TO IDENTIFY NEW CAUSAL
1135–1145 (2008).
580–589 (2011). MUTATIONS AND GENES FOR -ENDELIAN
3 Wheeler DA, Srinivasan M, Egholm M DISORDERS WITH PREVIOUSLY UNKNOWN
13 Singleton AB. Exome sequencing: a
et al. The complete genome of an individual
transformative technology. Lancet Neurol. GENETIC ETIOLOGY
by massively parallel DNA sequencing.
10(10), 942–946 (2011). 20 Ng SB, Bigham AW, Buckingham KJ et al.
Nature 452(7189), 872–876 (2008).
14 Ng SB, Turner EH, Robertson PD et al. Exome sequencing identifies MLL2
4 Bentley DR, Balasubramanian S, Swerdlow
Targeted capture and massively parallel mutations as a cause of Kabuki syndrome.
HP et al. Accurate whole human genome
sequencing of 12 human exomes. Nature Nat. Genet. 42(9), 790–793 (2010).
sequencing using reversible terminator
461(7261), 272–276 (2009). 21 Sathirapongsasuti JF, Lee H, Horst BA
chemistry. Nature 456(7218), 53–59
(2008). s 4HE lRST STUDY TO DEMONSTRATE THE et al. Exome sequencing-based copy-
FEASIBILITY OF WHOLE EXOME SEQUENCING 7%3 number variation and loss of heterozygosity
5 Wang J, Wang W, Li R et al. The diploid
TO IDENTIFY KNOWN CAUSAL MUTATIONS detection: exome CNV. Bioinformatics
genome sequence of an Asian individual.
UNDERLYING A -ENDELIAN DISORDER 27(19), 2648–2654 (2011).
Nature 456(7218), 60–65 (2008).
15 Varela I, Tarpey P, Raine K et al. Exome 22 Asan NF, Xu Y, Jiang H et al.
6 1000 Genomes Project Consortium. A map
sequencing identifies frequent mutation of Comprehensive comparison of three
of human genome variation from
the SWI/SNF complex gene PBRM1 in renal commercial human whole-exome
population-scale sequencing. Nature
carcinoma. Nature 469(7331), 539–542 capture platforms. Genome Biol. 12(9),
467(7319), 1061–1073 (2010).
(2011). R95 (2011).
7 Mardis ER. The $1,000 genome, the
s /NE OF THE RECENT STUDIES THAT HAVE APPLIED 23 Clark MJ, Chen R, Lam HY et al.
$100,000 analysis? Genome Med. 2(11), 84
7%3 TO IDENTIFY CANCER SOMATIC MUTATIONS Performance comparison of exome DNA
(2010).
sequencing technologies. Nat. Biotechnol.
8 Sboner A, Mu XJ, Greenbaum D, Auerbach 16 Ku CS, Naidoo N, Pawitan Y. Revisiting 29(10), 908–914 (2011).
RK, Gerstein MB. The real cost of Mendelian disorders through exome
sequencing. Hum. Genet. 129(4), 351–370 s #OMPREHENSIVE COMPARISON OF THREE MAJOR
sequencing: higher than you think! Genome
(2011). COMMERCIAL EXOME SEQUENCING PLATFORMS
Biol. 12(8), 125 (2011).
FROM !GILENT )LLUMINA® AND .IMBLEGEN
9 Koboldt DC, Ding L, Mardis ER, Wilson 17 Bamshad MJ, Ng SB, Bigham AW et al.
Exome sequencing as a tool for Mendelian APPLIED TO THE SAME HUMAN BLOOD SAMPLE
RK. Challenges of sequencing human
genomes. Brief Bioinform. 11(5), 484–498 disease gene discovery. Nat. Rev. Genet. 24 Mertes F, Elsharawy A, Sauer S et al.
(2010). 12(11), 745–755 (2011). Targeted enrichment of genomic DNA
www.expert-reviews.com 249
Review Ku, Wu, Cooper et al.
regions for next-generation sequencing. second-generation sequencing. Nat. Rev. 48 Girard SL, Gauthier J, Noreau A et al.
Brief Funct. Genomics 10(6), 374–386 Genet. 11(10), 685–696 (2010). Increased exonic de novo mutation rate in
(2011). 36 Robison K. Application of second-generation individuals with schizophrenia. Nat. Genet.
25 Harakalova M, Mokry M, Hrdlickova B sequencing to cancer genomics. Brief 43(9), 860–863 (2011).
et al. Multiplexed array-based and Bioinform. 11(5), 524–534 (2010). 49 Mercer TR, Gerhardt DJ, Dinger ME et al.
in-solution genomic enrichment for flexible 37 Bowers J, Mitchell J, Beer E et al. Virtual Targeted RNA sequencing reveals the deep
and cost-effective targeted next-generation terminator nucleotides for next-generation complexity of the human transcriptome.
sequencing. Nat. Protoc. 6(12), 1870–1886 DNA sequencing. Nat. Methods 6(8), Nat. Biotechnol. 30(1), 99–104 (2011).
(2011). 593–595 (2009). 50 Hoischen A, Gilissen C, Arts P et al.
26 Chepelev I, Wei G, Tang Q, Zhao K. 38 Eid J, Fehr A, Gray J et al. Real-time DNA Massively parallel sequencing of ataxia genes
Detection of single nucleotide variations in sequencing from single polymerase after array-based enrichment. Hum. Mutat.
expressed exons of the human genome molecules. Science 323(5910), 133–138 31(4), 494–499 (2010).
using RNA-seq. Nucleic Acids Res. 37(16), (2009). 51 Stitziel NO, Kiezun A, Sunyaev S.
e106 (2009). Computational and statistical approaches to
39 Schadt EE, Turner S, Kasarskis A. A window
ss /NE OF THE lRST STUDIES TO APPLY THE 2.! into third-generation sequencing. Hum. Mol. analyzing variants identified by exome
SEQUENCING 2.! SEQ TECHNIQUE TO Genet. 19(R2), R227–R240 (2010). sequencing. Genome Biol. 12(9), 227 (2011).
IDENTIFY SINGLE NUCLEOTIDE VARIANTS IN 40 Li Y, Wang J. Faster human genome 52 Greif PA, Eck SH, Konstandin NP et al.
EXPRESSED EXONS FROM THE HUMAN GENOME sequencing. Nat. Biotechnol. 27(9), 820–821 Identification of recurring tumor-specific
27 Cirulli ET, Singh A, Shianna KV et al. (2009). somatic mutations in acute myeloid
Screening the human exome: a comparison leukemia by transcriptome sequencing.
41 Ozsolak F, Platt AR, Jones DR et al. Direct
of whole genome and whole transcriptome Leukemia 25(5), 821–827 (2011).
RNA sequencing. Nature 461(7265),
sequencing. Genome Biol. 11(5), R57 814–818 (2009). ss /NE OF THE SUCCESSFUL STUDIES OF CANCER
(2010). MUTATIONS USING THE 2.! SEQ APPROACH
42 Botstein D, Risch N. Discovering genotypes
ss ! SYSTEMATIC EVALUATION OF THE underlying human phenotypes: past 53 Roberts A, Pachter L. RNA-Seq and find:
PERFORMANCE OF 2.! SEQ TO IDENTIFY successes for mendelian disease, future entering the RNA deep field. Genome Med.
HUMAN CODING VARIANTS BY COMPARING approaches for complex disease. Nat. Genet. 3(11), 74 (2011).
VARIANTS IDENTIlED THROUGH HIGH COVERAGE (Suppl. 33), 228–237 (2003). 54 Heap GA, Yang JH, Downes K et al.
WHOLE GENOME SEQUENCING TO THOSE 43 Choi M, Scholl UI, Ji W et al. Genetic Genome-wide analysis of allelic expression
IDENTIlED BY HIGH COVERAGE 2.! SEQ IN diagnosis by whole exome capture and imbalance in human primary cells by
THE SAME INDIVIDUAL massively parallel DNA sequencing. Proc. high-throughput transcriptome
28 Wang Z, Gerstein M, Snyder M. RNA-seq: Natl Acad. Sci. USA 106(45), 19096–19101 resequencing. Hum. Mol. Genet. 19(1),
a revolutionary tool for transcriptomics. (2009). 122–134 (2010).
Nat. Rev. Genet. 10(1), 57–63 (2009). s 4HE lRST PROOF OF PRINCIPLE STUDY TO 55 Palacios R, Gazave E, Goni J et al.
29 Morozova O, Hirst M, Marra MA. DEMONSTRATE THE FEASIBILITY OF USING 7%3 Allele-specific gene expression is widespread
Applications of new sequencing IN A DIAGNOSTIC CONTEXT across the genome and biological processes.
technologies for transcriptome analysis. PLoS One 4(1), e4150 (2009).
44 Makinen N, Mehine M, Tolvanen J et al.
Annu. Rev. Genomics Hum. Genet. 10, MED12, the mediator complex subunit 12 56 Sugarbaker DJ, Richards WG, Gordon GJ
135–151 (2009). gene, is mutated at high frequency in uterine et al. Transcriptome sequencing of
30 Ozsolak F, Milos PM. RNA sequencing: leiomyomas. Science 334(6053), 252–255 malignant pleural mesothelioma tumors.
advances, challenges and opportunities. (2011). Proc. Natl Acad. Sci. USA 105(9),
Nat. Rev. Genet. 12(2), 87–98 (2011). 3521–3526 (2008).
45 Lilljebjorn H, Rissler M, Lassen C et al.
31 Parla JS, Iossifov I, Grabill I, Spector MS, Whole-exome sequencing of pediatric acute 57 Shah SP, Morin RD, Khattra J et al.
Kramer M, McCombie WR. A comparative lymphoblastic leukemia. Leukemia Mutational evolution in a lobular breast
analysis of exome capture. Genome Biol. doi:10.1038/leu.333 (2011) (Epub ahead tumour profiled at single nucleotide
12(9), R97 (2011). of print). resolution. Nature 461(7265), 809–813
(2009).
32 Sulonen AM, Ellonen P, Almusa H et al. 46 O’Roak BJ, Deriziotis P, Lee C et al. Exome
Comparison of solution-based exome sequencing in sporadic autism spectrum ss /NE OF THE SUCCESSFUL STUDIES OF CANCER
capture methods for next generation disorders identifies severe de novo mutations. MUTATIONS USING THE 2.! SEQ APPROACH
sequencing. Genome Biol. 12(9), R94 Nat. Genet. 43(6), 585–589 (2011). 58 Shah SP, Kobel M, Senz J et al. Mutation of
(2011). FOXL2 in granulosa-cell tumors of the
s /NE OF THE lRST STUDIES TO DEMONSTRATE THE
33 Metzker ML. Sequencing technologies – FEASIBILITY OF 7%3 IN IDENTIFYING de novo ovary. N. Engl. J. Med. 360(26),
the next generation. Nat. Rev. Genet. 11(1), VARIANTS IN TRIOS BY SEQUENCING THE EXOMES 2719–2729 (2009).
31–46 (2010). OF INDIVIDUALS WITH A SPORADIC AUTISM 59 Levin JZ, Berger MF, Adiconis X et al.
34 Mardis ER. A decade’s perspective on DNA SPECTRUM DISORDER AND THEIR PARENTS Targeted next-generation sequencing of a
sequencing technology. Nature 470(7333), cancer transcriptome enhances detection of
47 Xu B, Roos JL, Dexheimer P et al. Exome
198–203 (2011). sequence variants and novel fusion
sequencing supports a de novo mutational
Meyerson M, Gabriel S, Getz G. Advances transcripts. Genome Biol. 10(10), R115
35 paradigm for schizophrenia. Nat. Genet.
in understanding cancer genomes through (2009).
43(9), 864–868 (2011).
60 Kridel R, Meissner B, Rogic S et al. Whole 66 Marian AJ. Medical DNA sequencing. Curr. bladder. Nat. Genet. 43(9), 875–878
transcriptome sequencing reveals recurrent Opin. Cardiol. 26(3), 175–180 (2011). (2011).
NOTCH1 mutations in mantle cell 67 Tabor HK, Berkman BE, Hull SC, 73 Tiacci E, Trifonov V, Schiavoni G et al.
lymphoma. Blood 119(9), 1963–1971 Bamshad MJ. Genomics really gets BRAF mutations in hairy-cell leukemia.
(2012). personal: how exome and whole genome N. Engl. J. Med. 364(24), 2305–2315
61 Majewski J, Wang Z, Lopez I et al. A new sequencing challenge the ethical (2011).
ocular phenotype associated with an framework of human genetics research. 74 Wei X, Walia V, Lin JC et al. Exome
unexpected but known systemic disorder Am. J. Med. Genet. A 155A(12), sequencing identifies GRIN2A as frequently
and mutation: novel use of genomic 2916–2924 (2011). mutated in melanoma. Nat. Genet. 43(5),
diagnostics and exome sequencing. J. Med. 68 Jones MA, Bhide S, Chin E et al. Targeted 442–446 (2011).
Genet. 48(9), 593–596 (2011). polymerase chain reaction-based 75 Prensner JR, Iyer MK, Balbin OA et al.
62 Cullinane AR, Vilboux T, O’Brien K et al. enrichment and next generation sequencing Transcriptome sequencing across a prostate
Homozygosity mapping and whole-exome for diagnostic testing of congenital cancer cohort identifies PCAT-1, an
sequencing to detect SLC45A2 and G6PC3 disorders of glycosylation. Genet. Med. unannotated lincRNA implicated in
mutations in a single patient with 13(11), 921–932 (2011). disease progression. Nat. Biotechnol. 29(8),
oculocutaneous albinism and neutropenia. 69 Artuso R, Fallerini C, Dosa L et al. 742–749 (2011).
J. Invest. Dermatol. 131(10), 2017–2025 Advances in Alport syndrome diagnosis 76 Gibb EA, Brown CJ, Lam WL. The
(2011). using next-generation sequencing. Eur. functional role of long non-coding RNA in
63 Montenegro G, Powell E, Huang J et al. J. Hum. Genet. 20(1), 50–57 (2012). human carcinomas. Mol. Cancer 10, 38
Exome sequencing allows for rapid gene 70 Rothberg JM, Hinz W, Rearick TM et al. (2011).
identification in a Charcot–Marie–Tooth An integrated semiconductor device 77 Prensner JR, Chinnaiyan AM. The
family. Ann. Neurol. 69(3), 464–470 enabling non-optical genome sequencing. emergence of lncRNAs in cancer biology.
(2011). Nature 475(7356), 348–352 (2011). Cancer Discovery 1(5), 391–407 (2011).
64 Berg JS, Khoury MJ, Evans JP. Deploying 71 Yan XJ, Xu J, Gu ZH et al. Exome
whole genome sequencing in clinical sequencing identifies somatic mutations of
practice and public health: meeting the DNA methyltransferase gene DNMT3A in Websites
challenge one bin at a time. Genet. Med. acute monocytic leukemia. Nat. Genet. 101 TruSeq Exome Enrichment Kit.
13(6), 499–504 (2011). 43(4), 309–315 (2011). www.illumina.com/products/truseq_
65 Bick D, Dimmock D. Whole exome and 72 Gui Y, Guo G, Huang Y et al. Frequent exome_enrichment_kit.ilmn
whole genome sequencing. Curr. Opin. mutations of chromatin remodeling genes 102 MiSeq Personal Sequencer.
Pediatr. 23(6), 594–600 (2011). in transitional cell carcinoma of the www.illumina.com/systems/miseq.ilmn
www.expert-reviews.com 251