0% found this document useful (0 votes)
7 views423 pages

RNA Modifications Methods and Protocols

The document is a comprehensive guide on RNA modifications, detailing methodologies and protocols for detecting, quantifying, and analyzing various types of RNA modifications. It highlights the significance of post-transcriptional RNA modifications in gene expression and physiological processes, emphasizing the need for accurate identification and study of these modifications. The book serves researchers in epitranscriptomics and post-transcriptional gene regulation, providing tools applicable across diverse organisms.

Uploaded by

ltieliu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views423 pages

RNA Modifications Methods and Protocols

The document is a comprehensive guide on RNA modifications, detailing methodologies and protocols for detecting, quantifying, and analyzing various types of RNA modifications. It highlights the significance of post-transcriptional RNA modifications in gene expression and physiological processes, emphasizing the need for accurate identification and study of these modifications. The book serves researchers in epitranscriptomics and post-transcriptional gene regulation, providing tools applicable across diverse organisms.

Uploaded by

ltieliu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Methods in

Molecular Biology 2298

Mary McMahon Editor

RNA
Modifications
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY

Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK

For further volumes:


[Link]
For over 35 years, biological scientists have come to rely on the research protocols and
methodologies in the critically acclaimed Methods in Molecular Biology series. The series was
the first to introduce the step-by-step protocols approach that has become the standard in all
biomedical protocol publishing. Each protocol is provided in readily-reproducible step-by-
step fashion, opening with an introductory overview, a list of the materials and reagents
needed to complete the experiment, and followed by a detailed procedure that is supported
with a helpful notes section offering tips and tricks of the trade as well as troubleshooting
advice. These hallmark features were introduced by series editor Dr. John Walker and
constitute the key ingredient in each and every volume of the Methods in Molecular Biology
series. Tested and trusted, comprehensive and reliable, all protocols from the series are
indexed in PubMed.
RNA Modifications

Methods and Protocols

Edited by

Mary McMahon
Department of Urology, University of California, San Francisco, San Francisco, CA, USA
Editor
Mary McMahon
Department of Urology
University of California, San Francisco
San Francisco, CA, USA

ISSN 1064-3745 ISSN 1940-6029 (electronic)


Methods in Molecular Biology
ISBN 978-1-0716-1373-3 ISBN 978-1-0716-1374-0 (eBook)
[Link]
© Springer Science+Business Media, LLC, part of Springer Nature 2021
The chapter 20 is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://
[Link]/licenses/by/4.0/). For further details see license information in the chapter.
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface

Post-transcriptional RNA modifications offer a wealth of chemical and functional diversity


to RNA and are emerging as key regulators of gene expression and function, as well as
governors of important physiological processes. Collectively referred to as the epitranscrip-
tome, over 150 distinct types of modifications have been identified on diverse classes of both
coding and non-coding RNA. Advances in methodologies to map RNA modifications have
significantly propelled the discovery of new types of modifications especially within messen-
ger RNA, as well as the identification of new substrates of seemingly well-characterized
RNA-modifying enzymes. Moreover, a new wave of research has supported the discovery of
novel biological functions of modified residues and of “writers,” “readers,” and “erasers” of
various modifications, alterations of which are becoming increasingly linked to disease
pathologies. As the need to better understand RNA modifications is increasing, so too is
the need to accurately identify, measure, and study modified residues. This book describes
some of the most recent advances and up-to-date methodologies to detect, quantify,
analyze, and elucidate the biological function of different types of RNA modifications.
Importantly, the methodologies and tools described herein can be applied to a wide variety
of organisms and can be used to address biological and clinical questions. We hope these
methods and protocols will serve those working directly in the fields of epitranscriptomics
and post-transcriptional gene regulation, as well as scientists and clinicians interested in
bioinformatic tools to study RNA modifications and techniques to dissect their roles in
physiology and disease.

San Francisco, CA, USA Mary McMahon

v
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

PART I BIOINFORMATIC TOOLS TO STUDY RNA MODIFICATIONS


1 RNA Post-Transcriptional Modification Mapping Data Analysis
Using RNA Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Ilaria Manfredonia and Danny Incarnato
2 An Informatics Pipeline for Profiling and Annotating RNA Modifications. . . . . . 15
Qi Liu, Xiaoqiang Lang, and Richard I. Gregory

PART II DETECTING RNA MODIFICATIONS USING NANOPORE


DIRECT RNA SEQUENCING

3 EpiNano: Detection of m6A RNA Modifications Using Oxford


Nanopore Direct RNA Sequencing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Huanle Liu, Oguzhan Begik, and Eva Maria Novoa
4 Adaptation of Human Ribosomal RNA for Nanopore Sequencing
of Canonical and Modified Nucleotides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Miten Jain, Hugh E. Olsen, Mark Akeson, and Robin Abu-Shumays

PART III NEXT-GENERATION SEQUENCING APPROACHES TO


DETECT AND CAPTURE MODIFIED RNAS

5 AlkAniline-Seq: A Highly Sensitive and Specific Method for Simultaneous


Mapping of 7-Methyl-guanosine (m7G) and 3-Methyl-cytosine
(m3C) in RNAs by High-Throughput Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Virginie Marchand, Lilia Ayadi, Valérie Bourguignon-Igel,
Mark Helm, and Yuri Motorin
6 Transcriptome-Wide Detection of Internal N7-Methylguanosine . . . . . . . . . . . . . 97
Li-Sheng Zhang, Chang Liu, and Chuan He
7 miCLIP-MaPseq Identifies Substrates of Radical SAM RNA-Methylating
Enzyme Using Mechanistic Cross-Linking and Mismatch Profiling . . . . . . . . . . . 105
Vanja Stojković, David E. Weinberg, and Danica Galonić Fujimori
8 Mapping RNA Modifications Using Photo-Crosslinking-Assisted
Modification Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Bryan R. Cullen and Kevin Tsai
9 Quantitative and Single-Nucleotide Resolution Profiling of RNA
5-Methylcytosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Jun Li, Xingyu Wu, Trung Do, Vy Nguyen, Jing Zhao, Pei Qin Ng,
Alice Burgess, Rakesh David, and Iain Searle

vii
viii Contents

10 A Small RNA-Seq Protocol with Less Bias and Improved Capture


of 20 -O-Methyl RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Erwin L. van Dijk and Claude Thermes

PART IV ASSESSING RNA MODIFICATIONS USING QPCR- AND


MOLECULAR BIOLOGY-BASED METHODS

11 Assessing 20 -O-Methylation of mRNA Using Quantitative PCR . . . . . . . . . . . . . . 171


Brittany A. Elliott and Christopher L. Holley
12 Relative Quantification of Residue-Specific m6A RNA Methylation
Using m6A-RT-QPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Ane Olazagoitia-Garmendia and Ainara Castellanos-Rubio
13 Monitoring the 5-Methoxycarbonylmethyl-2-Thiouridine (mcm5s2U)
Modification Utilizing the Gamma-Toxin Endonuclease . . . . . . . . . . . . . . . . . . . . . 197
Jenna M. Lentini and Dragony Fu
14 Analysis of Queuosine tRNA Modification Using APB Northern
Blot Assay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Cansu Cirzi and Francesca Tuorto
15 Detecting ADP-Ribosylation in RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Deeksha Munnur and Ivan Ahel

PART V MASS SPECTROMETRY- AND NMR-BASED METHODS FOR


RNA MODIFICATIONS ANALYSIS

16 Detecting Internal N7-Methylguanosine mRNA Modifications


by Differential Enzymatic Digestion Coupled with Mass
Spectrometry Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Xue-Jiao You and Bi-Feng Yuan
17 A General LC-MS-Based Method for Direct and De Novo
Sequencing of RNA Mixtures Containing both Canonical and
Modified Nucleotides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Ning Zhang, Shundi Shi, Xiaohong Yuan, Wenhao Ni, Xuanting Wang,
Barney Yoo, Tony Z. Jia, Wenjia Li, and Shenglong Zhang
18 Quantification of Modified Nucleosides in the Context of NAIL-MS. . . . . . . . . . 279
Matthias Heiss, Kayla Borland, Yasemin Yoluç, and Stefanie Kellner
19 A Method to Monitor the Introduction of Posttranscriptional
Modifications in tRNAs with NMR Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Alexandre Gato, Marjorie Catala, Carine Tisné, and Pierre Barraud

PART VI APPROACHES TO ASSESS KINETICS, DETERMINANTS,


AND FUNCTIONS OF RNA MODIFICATIONS

20 Effects of mRNA Modifications on Translation: An Overview . . . . . . . . . . . . . . . . 327


Bijoyita Roy
Contents ix

21 Assaying the Molecular Determinants and Kinetics of RNA


Pseudouridylation by H/ACA snoRNPs and Stand-Alone
Pseudouridine Synthases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Dominic P. Czekay, Sarah K. Schultz, and Ute Kothe
22 Investigating Pseudouridylation Mechanisms by High-Throughput
in Vitro RNA Pseudouridylation and Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Nicole M. Martinez and Wendy V. Gilbert
23 Targeted RNA m6A Editing Using Engineered CRISPR-Cas9
Conjugates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Xiao-Min Liu and Shu-Bing Qian

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Contributors

ROBIN ABU-SHUMAYS • Biomolecular Engineering Department and Genomics Institute,


University of California, Santa Cruz, CA, USA
IVAN AHEL • Sir William Dunn School of Pathology, University of Oxford, Oxford, UK
MARK AKESON • Biomolecular Engineering Department and Genomics Institute, University
of California, Santa Cruz, CA, USA
LILIA AYADI • Université de Lorraine, CNRS, INSERM, EpiRNASeq Core Facility,
UMS2008/US40 IBSLor, Nancy, France; Université de Lorraine, CNRS, UMR7365
IMoPA, Nancy, France
PIERRE BARRAUD • Expression génétique microbienne, UMR 8261, CNRS, Université de
Paris, Institut de biologie physico-chimique (IBPC), Paris, France
OGUZHAN BEGIK • Centre for Genomic Regulation (CRG), The Barcelona Institute of
Science and Technology, Barcelona, Spain; Department of Neuroscience, Garvan Institute
of Medical Research, Darlinghurst, NSW, Australia; St. Vincent’s Clinical School, UNSW
Sydney, Darlinghurst, NSW, Australia
KAYLA BORLAND • Department of Chemistry, Ludwig Maximilians University Munich,
Munich, Germany
VALÉRIE BOURGUIGNON-IGEL • Université de Lorraine, CNRS, INSERM, EpiRNASeq Core
Facility, UMS2008/US40 IBSLor, Nancy, France; Université de Lorraine, CNRS,
UMR7365 IMoPA, Nancy, France
ALICE BURGESS • Department of Molecular and Biomedical Sciences, School of Biological
Sciences, The University of Adelaide, Adelaide, SA, Australia
AINARA CASTELLANOS-RUBIO • Department of Genetics, Physical Anthropology and Animal
Physiology, University of the Basque Country (UPV-EHU), Leioa, Spain; Biocruces Bizkaia
Health Research Institute, Barakaldo, Spain; Ikerbasque, Basque Foundation for Science,
Bilbao, Spain; CIBER de Diabetes y Enfermedades Metabolicas Asociadas (CIBERDEM),
Instituto de Salud Carlos III, Madrid, Spain
MARJORIE CATALA • Expression génétique microbienne, UMR 8261, CNRS, Université de
Paris, Institut de biologie physico-chimique (IBPC), Paris, France
CANSU CIRZI • Division of Epigenetics, German Cancer Research Center (DKFZ),
Heidelberg, Germany; Faculty of Biosciences, University of Heidelberg, Heidelberg,
Germany
BRYAN R. CULLEN • Department of Molecular Genetics and Microbiology, Duke University
Medical Center, Durham, NC, USA
DOMINIC P. CZEKAY • Department of Chemistry and Biochemistry, Alberta RNA Research
and Training Institute, University of Lethbridge, Lethbridge, AB, Canada
RAKESH DAVID • Department of Molecular and Biomedical Sciences, School of Biological
Sciences, The University of Adelaide, Adelaide, SA, Australia
TRUNG DO • Department of Molecular and Biomedical Sciences, School of Biological Sciences,
The University of Adelaide, Adelaide, SA, Australia
BRITTANY A. ELLIOTT • Department of Medicine, Duke University Medical Center, Durham,
NC, USA
DRAGONY FU • Department of Biology, Center for RNA Biology, University of Rochester,
Rochester, NY, USA

xi
xii Contributors

DANICA GALONIĆ FUJIMORI • Department of Cellular and Molecular Pharmacology,


University of California San Francisco, San Francisco, CA, USA; Department of
Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA,
USA
ALEXANDRE GATO • Expression génétique microbienne, UMR 8261, CNRS, Université de
Paris, Institut de biologie physico-chimique (IBPC), Paris, France
WENDY V. GILBERT • Department of Molecular Biophysics and Biochemistry, Yale University,
New Haven, CT, USA
RICHARD I. GREGORY • Stem Cell Program, Division of Hematology/Oncology, Boston
Children’s Hospital, Boston, MA, USA; Department of Biological Chemistry and
Molecular Pharmacology, Harvard Medical School, Boston, MA, USA; Department of
Pediatrics, Harvard Medical School, Boston, MA, USA; Harvard Initiative for RNA
Medicine, Boston, MA, USA; Harvard Stem Cell Institute, Cambridge, MA, USA
CHUAN HE • Department of Chemistry, The University of Chicago, Chicago, IL, USA;
Howard Hughes Medical Institute, The University of Chicago, Chicago, IL, USA;
Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, The
University of Chicago, Chicago, IL, USA
MATTHIAS HEISS • Department of Chemistry, Ludwig Maximilians University Munich,
Munich, Germany
MARK HELM • Institute of Pharmacy and Biochemistry, Johannes Gutenberg University
Mainz, Mainz, Germany
CHRISTOPHER L. HOLLEY • Department of Medicine, Duke University Medical Center,
Durham, NC, USA
DANNY INCARNATO • Department of Molecular Genetics, Groningen Biomolecular Sciences
and Biotechnology Institute (GBB), University of Groningen, Groningen, Netherlands
MITEN JAIN • Biomolecular Engineering Department and Genomics Institute, University of
California, Santa Cruz, CA, USA
TONY Z. JIA • Earth-Life Science Institute, Tokyo Institute of Technology, Meguro-ku, Tokyo,
Japan; Blue Marble Space Institute of Science, Seattle, WA, USA
STEFANIE KELLNER • Department of Chemistry, Ludwig Maximilians University Munich,
Munich, Germany
UTE KOTHE • Department of Chemistry and Biochemistry, Alberta RNA Research and
Training Institute, University of Lethbridge, Lethbridge, AB, Canada
XIAOQIANG LANG • Precision Medicine Research Center, West China Hospital, Sichuan
University, Chengdu, Sichuan, China
JENNA M. LENTINI • Department of Biology, Center for RNA Biology, University of Rochester,
Rochester, NY, USA
JUN LI • Department of Molecular and Biomedical Sciences, School of Biological Sciences, The
University of Adelaide, Adelaide, SA, Australia
WENJIA LI • Department of Computer Science, New York Institute of Technology, New York,
NY, USA
CHANG LIU • Department of Chemistry, The University of Chicago, Chicago, IL, USA;
Howard Hughes Medical Institute, The University of Chicago, Chicago, IL, USA
HUANLE LIU • Centre for Genomic Regulation (CRG), The Barcelona Institute of Science
and Technology, Barcelona, Spain
QI LIU • Stem Cell Program, Division of Hematology/Oncology, Boston Children’s Hospital,
Boston, MA, USA; Department of Biological Chemistry and Molecular Pharmacology,
Harvard Medical School, Boston, MA, USA
Contributors xiii

XIAO-MIN LIU • School of Life Science and Technology, China Pharmaceutical University,
Nanjing, China
ILARIA MANFREDONIA • Department of Molecular Genetics, Groningen Biomolecular Sciences
and Biotechnology Institute (GBB), University of Groningen, Groningen, The Netherlands
VIRGINIE MARCHAND • Université de Lorraine, CNRS, INSERM, EpiRNASeq Core Facility,
UMS2008/US40 IBSLor, Nancy, France
NICOLE M. MARTINEZ • Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT, USA
YURI MOTORIN • Université de Lorraine, CNRS, INSERM, EpiRNASeq Core Facility,
UMS2008/US40 IBSLor, Nancy, France; Université de Lorraine, CNRS, UMR7365
IMoPA, Nancy, France
DEEKSHA MUNNUR • Sir William Dunn School of Pathology, University of Oxford, Oxford,
UK
PEI QIN NG • Department of Molecular and Biomedical Sciences, School of Biological
Sciences, The University of Adelaide, Adelaide, SA, Australia
VY NGUYEN • Department of Molecular and Biomedical Sciences, School of Biological
Sciences, The University of Adelaide, Adelaide, SA, Australia
WENHAO NI • Department of Biological and Chemical Sciences, New York Institute of
Technology, New York, NY, USA
EVA MARIA NOVOA • Centre for Genomic Regulation (CRG), The Barcelona Institute of
Science and Technology, Barcelona, Spain; Department of Neuroscience, Garvan Institute
of Medical Research, Darlinghurst, NSW, Australia; St. Vincent’s Clinical School, UNSW
Sydney, Darlinghurst, NSW, Australia; Universitat Pompeu Fabra (UPF), Barcelona,
Spain
ANE OLAZAGOITIA-GARMENDIA • Department of Genetics, Physical Anthropology and Animal
Physiology, University of the Basque Country (UPV-EHU), Leioa, Spain; BioCruces
Bizkaia Health Research Institute, Barakaldo, Spain
HUGH E. OLSEN • Biomolecular Engineering Department and Genomics Institute,
University of California, Santa Cruz, CA, USA
SHU-BING QIAN • Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
BIJOYITA ROY • RNA and Genome Editing, New England Biolabs Inc., Ipswich, MA, USA
SARAH K. SCHULTZ • Department of Chemistry and Biochemistry, Alberta RNA Research
and Training Institute, University of Lethbridge, Lethbridge, AB, Canada
IAIN SEARLE • Department of Molecular and Biomedical Sciences, School of Biological
Sciences, The University of Adelaide, Adelaide, SA, Australia
SHUNDI SHI • Department of Chemical Engineering, Columbia University, New York, NY,
USA
VANJA STOJKOVIĆ • Department of Cellular and Molecular Pharmacology, University of
California San Francisco, San Francisco, CA, USA
CLAUDE THERMES • Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA
Univ Paris-Sud, Université Paris-Saclay, Gif sur Yvette Cedex, France
CARINE TISNÉ • Expression génétique microbienne, UMR 8261, CNRS, Université de Paris,
Institut de biologie physico-chimique (IBPC), Paris, France
KEVIN TSAI • Department of Molecular Genetics and Microbiology, Duke University Medical
Center, Durham, NC, USA; Institute of Biomedical Sciences, Academia Sinica, Taipei,
Taiwan
FRANCESCA TUORTO • Division of Biochemistry, Mannheim Institute for
Innate Immunoscience (MI3), Medical Faculty Mannheim, Heidelberg University,
xiv Contributors

Mannheim, Germany; Center for Molecular Biology of Heidelberg University (ZMBH),


Mannheim, Germany
ERWIN L. VAN DIJK • Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA
Univ Paris-Sud, Université Paris-Saclay, Gif sur Yvette Cedex, France
XUANTING WANG • Department of Chemical Engineering, Columbia University, New York,
NY, USA
DAVID E. WEINBERG • Department of Cellular and Molecular Pharmacology, University of
California San Francisco, San Francisco, CA, USA
XINGYU WU • Department of Molecular and Biomedical Sciences, School of Biological
Sciences, The University of Adelaide, Adelaide, SA, Australia
YASEMIN YOLUÇ • Department of Chemistry, Ludwig Maximilians University Munich,
Munich, Germany
BARNEY YOO • Department of Chemistry, Hunter College, City University of New York, New
York, NY, USA
XUE-JIAO YOU • Sauvage Center for Molecular Sciences, Department of Chemistry, Wuhan
University, Wuhan, China
BI-FENG YUAN • Sauvage Center for Molecular Sciences, Department of Chemistry, Wuhan
University, Wuhan, China
XIAOHONG YUAN • Department of Biological and Chemical Sciences, New York Institute of
Technology, New York, NY, USA
LI-SHENG ZHANG • Department of Chemistry, The University of Chicago, Chicago, IL, USA;
Howard Hughes Medical Institute, The University of Chicago, Chicago, IL, USA
NING ZHANG • Department of Biological and Chemical Sciences, New York Institute of
Technology, New York, NY, USA; Department of Chemical Engineering, Columbia
University, New York, NY, USA
SHENGLONG ZHANG • Department of Biological and Chemical Sciences, New York Institute of
Technology, New York, NY, USA
JING ZHAO • Department of Molecular and Biomedical Sciences, School of Biological Sciences,
The University of Adelaide, Adelaide, SA, Australia
Part I

Bioinformatic Tools to Study RNA Modifications


Chapter 1

RNA Post-Transcriptional Modification Mapping Data


Analysis Using RNA Framework
Ilaria Manfredonia and Danny Incarnato

Abstract
RNA post-transcriptional modifications (PTMs) are progressively gaining relevance in the study of coding-
independent functions of RNA. RNA PTMs act as dynamic regulators of several aspects of the RNA
physiology, from translation to half-life. Rising interest is supported by the advance of high-throughput
techniques enabling the detection of these modifications on a transcriptome-wide scale. To this end, here
we illustrate the usefulness of RNA Framework, a comprehensive toolkit for the analysis of RNA PTM
mapping experiments, by reanalyzing two published transcriptome-scale datasets of N1-methyladenosine
(m1A) and pseudouridine (Ψ) mapping, based on two different experimental strategies.

Key words RNA post-transcriptional modifications, RNA immunoprecipitation, High-throughput


sequencing, m1A, N1-methyladenosine, Pseudouridine

1 Introduction

Although the discovery of the very first RNA post-transcriptional


modification (PTM) dates back to 1951 [1] and, since then, over
150 PTMs have been identified [2], the study of RNA PTMs
remained elusive for over 50 years. This was largely due to a lack
of suitable methods for detection of RNA PTMs, especially on
low-abundant transcripts. Thanks to the advent of high-through-
put sequencing methods, it is now possible to identify, localize, and
contextualize RNA PTMs. Over the past decade, RNA PTMs have
been implicated in modulating RNA structure by affecting the
formation of both intra- and intermolecular interactions, half-life,
translation, protein binding, and many other aspects of RNA
physiology [3].
Currently available sequencing-based techniques for mapping
RNA PTMs can be broadly divided into two major categories. The
first category involves techniques that rely on the use of a specific
antibody to enrich for PTM-containing RNA fragments, dubbed
RNA immunoprecipitation (RIP) approaches. The second

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

3
4 Ilaria Manfredonia and Danny Incarnato

category, instead, includes techniques based on the exploitation of


individual chemical properties of the modified nucleobases, with
the aim of achieving single-base resolution mapping.
A major bottleneck of these experiments is represented by data
analysis. To this end, we have recently introduced RNA Framework
as a generalized toolkit for the analysis of NGS-based RNA PTM
mapping experiments [4]. Herein we guide the reader through the
steps necessary for the successful analysis of data derived from both
IP-based and single-base resolution experiments, by exploiting two
previously published datasets.

2 Materials

2.1 RNA Framework RNA Framework (obtainable from our Git repository: https://
[Link]/dincarnato/RNAFramework) is implemented in
Perl. It requires Perl v5.12 (or greater), with ithreads support and
a 64-bit architecture system running Linux or any other UNIX-
based OS. The following software and packages are also required:
1. Bowtie v1.1.2 or greater ([Link]
[Link]), and/or Bowtie v2.2.7 or greater ([Link]
[Link]/bowtie2/[Link]).
2. SAMTools v1.2 or greater ([Link]
3. BEDTools v2.0 or greater ([Link]
bedtools2/).
4. Cutadapt v2.1 or greater ([Link]
stable/[Link]).
5. ViennaRNA Package v2.2.0 or greater ([Link]
[Link]/RNA/).
6. Perl non-CORE modules ([Link]
– DBD::mysql.
– RNA (installed by the ViennaRNA package).
– XML::LibXML.
– Config::Simple.

To start using RNA Framework, first clone it from our GitHub


repository by typing:

$ git clone [Link]

This will create the “RNAFramework” folder. RNA Framework


executables can then be simply added to user’s PATH by typing:

$ export PATH=$PATH:$(pwd)/RNAFramework
High-Throughput Mapping of RNA Modifications 5

Table 1
List of SRA accession IDs for the datasets used in this chapter

Accession ID Description Reference


1
SRR2086044 m A-seq (H. sapiens, polyA+) [5]
SRR2086045 Input (H. sapiens, polyA+) [5]
SRR1327248 CMC (S. cerevisiae, total RNA) [6]
SRR1327249 CMC+ (S. cerevisiae, total RNA) [6]

2.2 Data Retrieval To illustrate a typical data analysis workflow with RNA Framework,
we chose two published datasets. The first one is m1A-seq [5], a
RIP approach applied to map N1-methyladenosine (m1A) in
H. sapiens polyadenylated transcripts. The second dataset is
Pseudo-Seq [6], a single-base resolution method based on the
ability of N-cyclohexyl-N0 -(2-morpholinoethyl)carbodiimide
metho-p-toluenesulfonate (CMC) to generate an alkali-resistant
adduct with the N3 of pseudouridine (Ψ) residues, resulting in
reverse transcription (RT) drop-off and which can be applied to
map Ψ in S. cerevisiae ribosomal RNAs (rRNAs).
1. Data can be obtained from the NCBI Sequence Read Archive
([Link] Using the NCBI SRA
Toolkit ([Link]
toolkitsoft/) it is possible to obtain the raw FASTQ files by
typing:

$ fastq-dump -A <SRA accession ID>

2. SRA accession IDs used in this chapter are reported in Table 1.

3 Methods

For a detailed list of all program-specific parameters, all tools can be


invoked with the “-h” (or “--help” flag). Also, a detailed documen-
tation of RNA Framework and its components can be found online
at [Link]
Bowtie (either version 1 or 2) is the default read aligner used by
RNA Framework [7, 8]. Alternatively, the user can employ any
other read alignment tool, as long as it is able to report alignments
in SAM/BAM format. In this case, the next two paragraphs can be
skipped and it is possible to proceed directly to Subheading 3.3.
6 Ilaria Manfredonia and Danny Incarnato

3.1 Generation of the 1. Data derived from any of the aforementioned PTM mapping
Reference Index techniques (IP-based and single-base resolution) can be
mapped to the reference transcriptome using Bowtie v1, a fast
ungapped read aligner. Bowtie reference transcriptome index
generation is automatized through the rf-index module.
Besides including a set of pre-built reference indexes (display-
able with the “-lp” flag and downloadable through the “-pb
<n>” parameter, where <n> is the code identifying the
desired index), rf-index further allows the creation of tailored
reference indexes. For this purpose, it relies on querying the
UCSC genome database ([Link] for a user-
specified genome assembly and gene annotation. A complete
list of available genome assemblies is available at https://
[Link]/FAQ/[Link].
2. For the purpose of this chapter, we are going to need two
reference indexes: one on H. sapiens RefSeq protein-coding
transcripts and one on S. cerevisiae rRNAs. To obtain the
pre-built indexes, type:

$ rf-index -pb 6 # H. sapiens transcriptome


$ rf-index -pb 3 # S. cerevisiae rRNAs

3. Alternatively, the user can generate a new reference transcrip-


tome index using the desired genome assembly and gene anno-
tation, for example:

$ rf-index -g hg38 -a refGene -n -co # H. sapiens transcrip-


tome

In this example, the reference index will be created using the


hg38 assembly and the refGene (RefSeqGene) annotation. The
“-n” flag instructs rf-index to use gene symbols instead of gene
IDs (where possible). If multiple transcript isoforms are available,
only the longest will be picked as representative of the gene. The
“-co” flag instead allows selecting only protein-coding transcripts.
It is worth noting that, more recently, approaches for single-
base resolution mapping of PTMs based on reverse transcription
(RT) conditions favoring the read-through on sites of PTM-adduct
formation have been proposed [9, 10]. Analogously to mutational
profiling strategies for RNA structure mapping, these methods
result in the recording of read-through sites as mutations or inser-
tions and deletions (indels) in the resulting cDNA molecules. As
Bowtie v1 is not able to handle indels, it is advisable to use Bowtie
v2 for the analysis of these experiments. This can be easily done by
invoking both rf-index and rf-map with the “-b2” (or “--bowtie2”)
flag. However, we are not going to focus on the analysis of these
types of experiments in this chapter.
High-Throughput Mapping of RNA Modifications 7

3.2 Mapping Following index generation, it is possible to proceed to read


of Reads mapping using the rf-map module. All the necessary read prepro-
cessing steps, including adaptor clipping and low-quality base
trimming, are automatized as part of this tool. As a result, a sorted
BAM file will be generated for each sample being analyzed. Before
proceeding with read mapping, it is also advisable to check base
qualities using the FastQC tool ([Link]
[Link]/projects/fastqc/).

3.2.1 Mapping To map m1A-seq data, simply type:


of m1A-Seq Reads
$ rf-map -ca3 GATCGGAAGAGCACACGTCTGA -cq5 20 -bm 7 -ba -bi
Hsapiens_refGene_longest_bt2/reference -o rf_map_m1Aseq/
SRR208604*.fastq

Through the “-ca3” parameter it is possible to specify the


sequence of the 30 adaptor to be clipped (in this case corresponding
to the Illumina TruSeq adaptor). Quality trimming (Phred <20) is
by default performed only from the 30 end (controlled through the
“-cq3” parameter). The “-cq5” parameter enables the quality
trimming also from the 50 end. The “-bm” parameter allows speci-
fying the maximum number of equally-scoring mappings allowed
for each read. If this number is exceeded, the read is discarded. The
default behavior of Bowtie v1, when multiple equally scoring
mapping positions are identified for a read, is to stochastically
report only one of them. Use of the “-ba” flag instructs Bowtie to
report all the possible mapping positions in the output BAM file.
The path to the reference index is provided through the “-bi”
parameter. As the folder generated by rf-index contains multiple
files, only the common prefix needs to be specified here. The “-o”
parameter is used instead to specify the path to the output directory
(if it does not exist, it will be created).

3.2.2 Mapping To map Pseudo-Seq data, simply type:


of Pseudo-Seq Reads
$ rf-map -ca3 TGGAATTCTCGGGTGCCAAGG -bi Scerevisiae_rRNA_bt/
reference -o rf_map_pseudoseq/ SRR132724*.fastq

In this case, most parameters are left as default. Particularly, as


these reads come from a total RNA sample, multi-mapping reads
are not expected; therefore, the “-bm” parameter is left with its
default value of 1. Most importantly the “-cq5” parameter has been
omitted. It is essential to remember that this parameter must be
avoided when mapping experiments relying on the detection of RT
drop-off events (e.g., Pseudo-Seq/Ψ-seq, 20 Ome-seq; see Note 1).
8 Ilaria Manfredonia and Danny Incarnato

3.3 Counting Reads 1. BAM files generated by rf-map (or any other alignment tool)
are processed by the rf-count module, the core of RNA Frame-
work. This tool is able to calculate the number of RT stops or
mutations (where needed) and the read coverage for each base
of the analyzed transcripts. In our case scenario, m1A-seq sam-
ples are processed by:

$ rf-count -co -r -f Hsapiens_refGene_longest_bt2/[Link] -o


rf_count_m1Aseq/ rf_map_m1Aseq/SRR208604*.bam

The analysis of RIP experiments relies on the detection of


regions of local read enrichment (peaks); therefore, the only
relevant information is the per-base read coverage. As the
default behavior of rf-count is to count also RT drop-off events
(more computationally intensive), the “-co” flag is used to
disable counting of RT stops.
2. Input SAM/BAM files must be sorted lexicographically by
transcript ID and numerically by position. By default,
rf-count assumes that the provided files are not sorted, so it
will automatically proceed with sorting. As BAM files gener-
ated by rf-map are already sorted, use of the “-r” flag is used to
skip sorting and save some computational time.
3. Besides the input file(s), the only other requirement for
rf-count is the reference FASTA file (used to generate the
index), provided through the “-f” parameter. In the case of
indexes generated via rf-index, the FASTA file can be found in
the index folder.
4. For the analysis of Pseudo-Seq samples, instead, it is sufficient
to avoid the “-co” flag, to calculate per-base RT drop-off
counts (and coverage):

$ rf-count -r -f Scerevisiae_rRNA_bt/[Link] -o
rf_count_pseudoseq/ rf_map_pseudoseq/SRR132724*.bam

It is essential to remember that in case static trimming of


read 50 bases has been performed prior to mapping (or through
the “-b5” parameter of rf-map), this will result in a shift of the
actual RT stop position by a number of bases equal to the
number of bases that have been trimmed (for instructions on
how to deal with statically trimmed reads, please see Note 2).
5. As a result, rf-count will generate an RC (RNA count) file for
each sample. RC files are binary files optimized for fast random
access. The complete format specification is available online
([Link]
#rc-rna-count-format). Manipulation and visualization of RC
files can be performed via the rf-rctools utility (see Note 3).
High-Throughput Mapping of RNA Modifications 9

3.4 Calling PTMs The term “calling” is inherited from the original algorithms used
for the identification of protein-binding sites on DNA from ChIP-
seq experiments. In this case, it consists of the identification of
putative post-transcriptionally modified sites within target tran-
scripts. Two different PTM calling modules are available in RNA
Framework, depending on whether the analysis of RIP or single-
base resolution mapping experiments is being performed.

3.4.1 Calling Peak calling from RIP data is performed using the rf-peakcall
of m1A Peaks module. This tool can work both in the absence and presence of
an input (or control) sample (although in the latter case, called
peaks are of higher confidence). To call m1A peaks, type:

$ rf-peakcall -c rf_count_m1Aseq/[Link] -I rf_count_-


m1Aseq/[Link] -i rf_count_m1Aseq/[Link] -ec 10 -g
-mp -eo -r

Parameters “-c” and “-I” specify the path to the input


(or control, e.g., IgG) and IP RC files, respectively. To speed up
the analysis, transcripts whose median coverage in the input sample
is below a user-defined threshold can be discarded through the
parameter “-ec” (or “--median-coverage”). This parameter does
not take into account the coverage in the IP sample.
Flags “-g” and “-mp” enable the generation of transcript-level
and meta-gene-level coverage plots, respectively (Fig. 1). When flag
“-eo” is enabled, meta-gene coverage plots are generated only on
transcripts with a detected enrichment (called peak).
Flag “-r” enables the refinement of peak boundaries. As peaks
are called by sliding a window of a fixed size (150 nt by default,
controlled through the “-w” parameter), with a user-defined offset
(half of the window size by default, controlled through the “-f”
parameter), in some cases, the called interval might not include the
full peak (or it might include additional low-enrichment bases).
When peak refinement is active, rf-peakcall will progressively
enlarge (or shrink) the interval to only include bases exceeding
the enrichment threshold (threefold by default, controlled through
the “-e” parameter).

3.4.2 Calling of Ψ Sites Calling of PTMs from single-base resolution experiments is instead
performed using the rf-modcall module. To call sites, type:

$ rf - mo d ca l l -u r f _ co u nt _ ps e ud o s eq / SR R 13 2 7 24 8 .r c -t
rf_count_pseudoseq/[Link] -i rf_count_pseudoseq/index.
rci

Only two parameters are essential to perform the analysis,


namely “-u” and “-t,” that specify the untreated (CMC for
Ψ-seq/Pseudo-Seq, or high dNTP for 2Ome-seq) and treated
10 Ilaria Manfredonia and Danny Incarnato

Control (RPM)
5.0
4.0
3.0
2.0
1.0
0.0

5.0
IP (RPM)

4.0
3.0
2.0
1.0
0.0

76.0
% of peaks

57.0
38.0
19.0
0.0
0

10

20

30

40

50

60

70

80

90

100
Relative position (%)

Fig. 1 Meta-gene coverage plots for the input (in green) and m1A IP (in blue) samples. On the bottom, the
relative distribution of m1A peaks is also shown (in black)

(CMC+ for Ψ-seq/Pseudo-Seq, or low dNTP for 2Ome-seq) sam-


ples, respectively. The RC index file (RCI) can be optionally speci-
fied through the “-i” parameter. In case this is omitted, rf-modcall
will automatically regenerate it at runtime.
As for rf-peakcall, analysis is performed using a sliding window,
centered on the base being analyzed. The size of the window
(150 nt by default) can be adjusted through the “-w” (or “--win-
dow”) parameter.

3.5 Output Files 1. Results are reported in XML format. For each nucleotide of a
and Interpretation transcript, two values are calculated, the score and the ratio. The
score is obtained by evaluating the enrichment of RT drop-off
events on a given base in the treated sample, relatively to both
surrounding bases and the same base in the untreated sample; it
provides a numerical estimate of the likelihood of a given
residue to be modified. The ratio is instead a measure of the
relative modification stoichiometry, calculated as the ratio
between the RT drop-off count and the read coverage of the
base in the treated sample only.
High-Throughput Mapping of RNA Modifications 11

50 18S rRNA

40
30
Score

20
10
0
1.0
0.8
0.6
Ratio

0.4
0.2
0.0
Ψs

Fig. 2 Per-base scores and ratios from Pseudo-Seq analysis of S. cerevisiae 18S rRNA. Known Ψ sites are
highlighted in orange

2. Visualization of rf-modcall’s XML files is made possible


through the rf-wiggle utility (see Note 4). Alternatively,
Dr. Artem Babaian (University of British Columbia) has devel-
oped the RRNAFramework R package ([Link]
ababaian/RRNAframework) that allows users to easily import
RNA Framework’s XML files in R (Fig. 2):

library(RRNAFramework)
XML<-readModXML("SRR1327249_vs_SRR1327248_sites/[Link]")
par(mfrow = c(2,1))
barplot([Link](unlist(XML$score)), las=1, ylab="Score")
barplot([Link](unlist(XML$ratio)), las=1, ylab="Ratio")

4 Notes

1. To inspect FASTQ files using FastQC, simply type:

$ fastqc <FASTQ file>

The result will be an HTML file reporting many details on


the sample, including the distribution of base qualities. If
low-quality bases are detected at the 50 end, trimming them
out might improve mapping. Quality trimming of bases con-
trolled by the “-cq5” parameter of rf-map is dynamic; thus, it
can result in a different number of bases being trimmed from
each read. While this does not represent an issue when analyz-
ing RIP experiments, it can instead impair the analysis of single-
base resolution mapping experiments based on the detection of
RT drop-off events. To avoid this, it is essential to adopt a static
12 Ilaria Manfredonia and Danny Incarnato

trimming strategy by providing rf-map with a user-defined


number of low-quality bases to be trimmed from any read,
through the use of the “-b5” parameter (see also Note 2).
2. When static trimming of 50 end bases is performed (for example
through the “-b5” parameter of rf-map), it is necessary to
invoke rf-count by providing the number of trimmed bases
through the “-t5” parameter. Alternatively (only for align-
ments generated with Bowtie) it is possible to specify the
“-fh” flag to make rf-count automatically infer the number of
trimmed bases directly from the BAM header.
3. To inspect an RC file using rf-rctools, simply type:

$ rf-rctools view <RC file>

Multiple RC files (e.g., replicates) can be merged into a


single RC file by:

$ rf-rctools merge <RC file #1> <RC file #2> . . . <RC file #n>

4. To generate WIGGLE tracks that can be visualized using Inte-


grative Genomics Viewer ([Link]
software/igv/), type:

$ rf-wiggle <XML file #1> <XML file #2> ... <XML file #n>

By default, the score will be used to generate the track. To


plot the ratio instead, simply type:

$ rf-wiggle -r <XML file #1> <XML file #2> . . . <XML file #n>

Acknowledgments

This work was supported by funding from the University of Gro-


ningen (Groningen, Netherlands) and the Groningen Biomolecu-
lar Sciences and Biotechnology Institute (GBB) to D.I.

References

1. Cohn WE, Volkin E (1951) Nucleoside-5- 3. Incarnato D, Oliviero S (2017) The RNA epis-
0
-phosphates from ribonucleic acid. Nature tructurome: uncovering RNA function by
167:483–484. [Link] studying structure and post-transcriptional
167483a0 modifications. Trends Biotechnol
2. Behm-Ansmant I, Helm M, Motorin Y (2011) 35:318–333. [Link]
Use of specific chemical reagents for detection tibtech.2016.11.002
of modified nucleotides in RNA. J Nucleic 4. Incarnato D, Morandi E, Simon LM, Oliviero
Acids 2011:1–17. [Link] S (2018) RNA framework: an all-in-one toolkit
s1355838299981335 for the analysis of RNA structures and post-
High-Throughput Mapping of RNA Modifications 13

transcriptional modifications. Nucleic Acids 9:357–359. [Link]


Res 46:e97–e97. [Link] 1923
nar/gky486 8. Langmead B, Trapnell C, Pop M, Salzberg SL
5. Dominissini D, Nachtergaele S, Moshitch- (2009) Ultrafast and memory-efficient align-
Moshkovitz S, Peer E, Kol N, Ben-Haim MS, ment of short DNA sequences to the human
Dai Q, Segni AD, Salmon-Divon M, Clark genome. Genome Biol 10:R25. [Link]
WC, Zheng G, Pan T, Solomon O, Eyal E, org/10.1186/gb-2009-10-3-r25
Hershkovitz V, Han D, Doré LC, 9. Linder B, Grozhik AV, Olarerin-George AO,
Amariglio N, Rechavi G, He C (2016) The Meydan C, Mason CE, Jaffrey SR (2015)
dynamic N(1)-methyladenosine methylome in Single-nucleotide-resolution mapping of m6A
eukaryotic messenger RNA. Nature and m6Am throughout the transcriptome. Nat
530:441–446. [Link] Methods 12:767–772. [Link]
nature16998 1038/nmeth.3453
6. Carlile TM, Rojas-Duran MF, Zinshteyn B, 10. Zhou KI, Clark WC, Pan DW, Eckwahl MJ,
Shin H, Bartoli KM, Gilbert WV (2014) Pseu- Dai Q, Pan T (2018) Pseudouridines have
douridine profiling reveals regulated mRNA context-dependent mutation and stop rates in
pseudouridylation in yeast and human cells. high-throughput sequencing. RNA Biol
Nature 515:143–146. [Link] 15:892–900. [Link]
1038/nature13802 15476286.2018.1462654
7. Langmead B, Salzberg SL (2012) Fast gapped-
read alignment with Bowtie 2. Nat Methods
Chapter 2

An Informatics Pipeline for Profiling and Annotating RNA


Modifications
Qi Liu, Xiaoqiang Lang, and Richard I. Gregory

Abstract
While over 150 distinct types of chemical modifications are known to occur on various cellular RNAs and
can be dynamically controlled, the function of most of these modifications remains poorly defined.
Collectively, these RNA modifications have been recently termed the “epitranscriptome”. Identification
and annotation of individual RNA modifications throughout the transcriptome are key for studying the role
of the epitranscriptome in the regulation of gene expression and for elucidating the functional relevance of
particular RNA modifications in diverse physiological and disease processes. In this protocol, we demon-
strate how to identify and annotate RNA modifications based on the informatic analysis of methylated RNA
immunoprecipitation and sequencing (MeRIP-seq) data, using RNAmod, a convenient one-stop online
interactive platform for the annotation, analysis, and visualization of mRNA modifications.

Key words RNA modification, Epitranscriptome, Identification, Annotation, Software, Tool, Web
server, RNA-binding protein, RBP, MeRIP-seq

1 Introduction

More than 150 distinct kinds of chemical modifications of RNA are


known to exist. Collectively these diverse RNA modifications have
been recently termed the “epitranscriptome” [1]. While the func-
tion of the majority of these individual modifications remains
largely unknown, it is becoming increasingly apparent that certain
modifications of different classes of cellular RNAs can play impor-
tant regulatory roles in various biological processes [2, 3]. The
discovery that particular RNA modifications, for example N6-
methyladenosine (m6A) modification of messenger RNAs
(mRNAs), can be dynamically regulated by the interplay between
the activity of the methyltransferase (METTL3) and demethylases
(FTO, and ALKBH5) [4], as well as the development of methylated
RNA immunoprecipitation and sequencing (MeRIP-seq) methods
that revealed the striking prevalence of m6A modification of a large
subset of mRNAs throughout the transcriptome, has led to a recent

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

15
16 Qi Liu et al.

explosion of research focused on the m6A epitranscriptome (as well


as other modifications) in various biological contexts [5].
An expanding list of mRNA modifications has been identified,
including N6-methyladenosine (m6A) [5], N1-methyladenosine
(m1A) [6], cytidine N4-acetylation (ac4C) [7], and N7-methylgua-
nosine (m7G) [8], which have been implicated in many aspects of
the mRNA life cycle, including splicing, nuclear export, mRNA
stability, and translation [2, 3]. Dysregulation of some of these
mRNA modifications is linked to several diseases, including neuro-
logical disorders and cancers [9–12]. The development of new
methods to globally profile the extent of individual RNA modifica-
tions has greatly facilitated studies of the epitranscriptome. Next-
generation sequencing (NGS)-based technologies, such as MeRIP-
seq (methylated RNA immunoprecipitation sequencing), have
been wildly used to profile different types of mRNA modifications
throughout the transcriptome in various different cell types and
under various conditions [5]. Meanwhile, several tools have been
developed based on different machine learning algorithms that
have been used to systematically predict RNA modifications, such
as SRAMP [13], M6AMRFS [14], WHISTLE [15], RAM-ESVM
[16], and DeepM6ASeq [12].
New methods including MeRIP-seq have revealed that individ-
ual modifications typically occur at certain locations on target
mRNAs, and within specific sequence contexts. For example, N6-
methyladenosine (m6A), the most abundant mRNA modification,
is enriched in 30 untranslated regions (30 UTRs) near translation
stop codons, and typically occurs at a DRACH (D ¼ A, G, or U;
H ¼ A, C, or U) motif [5]; N1-methyladenosine (m1A) is highly
enriched at the 50 untranslated region (50 UTR) of mRNA [6];
cytidine N4-acetylation (ac4C) is enriched within the 50 regions of
coding sequences [7]; N7-methylguanosine (m7G) is enriched at
the 50 UTR region and AG-rich sequence contexts [8]. Therefore,
annotating the modified mRNA targets and determining the
detailed distribution of dynamic modifications are essential for
studying their functions.
RNAmod [17] is a very convenient Web-based platform for
functional annotation and visualization of mRNA modifications, as
well as for annotation of binding features of RNA-binding proteins
(RBPs), for multiple species including human, mouse, rat, zebra-
fish, and fly. BED format, a commonly used format to represent
chromosomal locations, is used as inputs of RNAmod. Conve-
niently, BED format files can be outputted by almost all peak/site
calling tools, such as MACS [18] and exomePeak [19]. In this
protocol, we first explain how to identify RNA modifications from
MeRIP-seq data, and then describe tools to annotate the modifica-
tions using RNAmod. m6A MeRIP-seq is used as test data in this
protocol, which could be very easily adapted and applied to the
study of other RNA modifications.
A Pipeline for Profiling and Annotating RNA Modifications 17

2 Materials: Software and Data

1. Data: An example used for the purpose of this chapter is m6A


MeRIP-seq of human HepG2 cell (GEO accession:
GSE110323).
2. Human reference genome sequence (hg19)) and comprehen-
sive gene annotations (Release 19), downloaded from GEN-
CODE database ([Link]
release_19.html).
3. SRA toolkit (version 2.9.0) for sequence extraction from SRA
files ([Link]
4. FastQC (version 0.11.8) for visually checking the sequencing
quality of MeRIP-seq data ([Link]
[Link]/projects/fastqc).
5. Trim Galore (version 0.6.5) for trimming the adapters and low
quality sequences from MeRIP-seq data ([Link]
[Link]/projects/trim_galore).
6. STAR (version 2.7.3a) for mapping sequencing reads to the
reference genome ([Link]
7. Samtools (version 1.9) for indexing the SAM/BAM mapping
files ([Link]
8. MACS2 (version 2.2.5) for modification peak calling from
MeRIP-seq ([Link]
9. RNAmod for modification annotation ([Link]
[Link]/RNAmod or [Link]
RNAmod).

3 Methods

This protocol contains two parts: modification calling based on


m6A MeRIP-seq data (Subheading 3.1) and modification annota-
tion using RNAmod (Subheading 3.2) (Fig. 1). All of the analysis
commands in the modification calling part are executed under
UNIX/Linux operating system, in which “$” character represents
the shell prompt where commands are executed from. All of the
installed software required for the analysis has been added in the
PATH system environment variable and the users do not need to
input the system absolute path of the software. Meanwhile, the
absolute path of the software could also be used.

3.1 RNA Modification 1. MeRIP-seq library preparation and sequencing:


Calling from m6A MeRIP-seq can be performed as described by Dominissini
MeRIP-Seq Data et al. [14]. Briefly, mRNA is first purified from total RNA. Then
mRNA is fragmented and immunoprecipitated with anti-
modification antibody, and the immunoprecipitated RNA is
18 Qi Liu et al.

INPUT raw data IP raw data


(.fastq) (.fastq)

Quality control
(FastQC and trim_galore)

Read mapping
(STAR)

Modification calling
(MACS2/exomePeak)

Modification annotation
(RNAmod)

Fig. 1 A flowchart summarizing the bioinformatic analysis steps involved in


modification calling and annotations based on MeRIP-Seq. First a quality check
of the FASTQ reads is performed with FastQC, and then the adapter and
low-quality reads are removed with trim_galore. After that the cleaned reads
are aligned to the reference genome using STAR. m6A modification peaks are
called using MACS2 or exomePeak. Finally, RNAmod is used to annotate
modification sites

washed and eluted by competition with modification nucleo-


tide. The purified RNA fragments from MeRIP are then used
for library construction and sequencing as described.
2. Download MeRIP-seq data from the SRA database:
The m6A MeRIP-seq data used as an example herein to
demonstrate the steps of modification calling are available in
GEO database (accession number: GSE110323) [20] of
NCBI. Download the m6A MeRIP-seq raw sequencing reads
from the NCBI database using the prefetch program of SRA
toolkit in a terminal window (see Note 1):

$ prefetch SRR6686554 SRR6686555 SRR6686557 SRR6686558


SRR6686560 SRR6686561 SRR6686563 SRR6686564
$ mv ~/ncbi/public/sra/*.sra./
A Pipeline for Profiling and Annotating RNA Modifications 19

Table 1
Example of published m6A MeRIP-seq samples

GSE ID SRA ID Cell type Treatments Sample type Replicates


GSM2987444 SRR6686554 HepG2 shCtrl INPUT rep1
GSM2987445 SRR6686555 HepG2 shCtrl INPUT rep2
GSM2987447 SRR6686557 HepG2 shCtrl m6A IP rep1
6
GSM2987448 SRR6686558 HepG2 shCtrl m A IP rep2
GSM2987450 SRR6686560 HepG2 shMETTL3 INPUT rep1
GSM2987451 SRR6686561 HepG2 shMETTL3 INPUT rep2
GSM2987453 SRR6686563 HepG2 shMETTL3 m6A IP rep1
6
GSM2987454 SRR6686564 HepG2 shMETTL3 m A IP rep2

Here, the m6A MeRIP-seq dataset is used as example data


to illustrate the stepwise analysis for modification calling based
on MeRIP-seq data. The features of the eight samples that were
sequenced as part of the dataset are summarized in Table 1.
The downloaded SRA files (with file extension “.sra”) can be
moved into the current working directory (“./”). The symbol
“~” represents the home directory ($HOME). The users can
use the “pwd” command to get the path of the current working
directory.
3. Extract FASTQ data from SRA files:
Extract FASTQ format files from SRA files using fastq-
dump in SRA toolkit by entering the following commands
(see Note 2):

$ for sraID in SRR6686554 SRR6686555 SRR6686557 SRR6686558


SRR6686560 SRR6686561 SRR6686563 SRR6686564; do fastq-dump --
gzip ${sraID}.sra; done

The compressed FASTQ files will be extracted into the


current working directory with the file extension “.[Link]”.
4. Check the sequencing quality:
Visually check the sequencing quality, such as quality type,
distribution, and adapter contamination, using FastQC:

$ for sraID in SRR6686554 SRR6686555 SRR6686557 SRR6686558


SRR6686560 SRR6686561 SRR6686563 SRR6686564; do fastqc
${sraID}.[Link]; done
20 Qi Liu et al.

The quality reports which are in html format (with file


extension “_fastqc.html”) will be generated in the current
working directory.
5. Quality control:
Perform adapter cleaning and low-quality filtering using
trim_galore if the raw sequencing data contain adapters and/or
low-quality sequences (see Note 3):

$ mkdir clean_outdir
$ for sraID in SRR6686554 SRR6686555 SRR6686557 SRR6686558
SRR6686560 SRR6686561 SRR6686563 SRR6686564; do trim_galore
${sraID}.[Link] --phred33 --adapter AGATCGGAAGAGCACA --
length 20 --quality 30 --gzip --output_dir clean_outdir; done

Here, after adapter cleaning and low-quality filtering, the


clean sequences (with file extension “_trimmed.[Link]”), which
have >20 bp length and >Q30 quality, will be found in the
“clean_outdir” directory created using the “mkdir” command.
“--phred33” indicates the encode type of sequence quality,
which can be identified in the FastQC report (see Note 4).
6. Short read mapping:
Map the cleaned sequence generated in the previous step to
the reference genome sequence using a splicing-aware short
read aligner tool, such as STAR [21] and TopHat [22], which
can generate mapping results in SAM/BAM format. Here,
STAR is used to align the cleaned sequences (INPUT and
m6A IP samples) to human reference genome sequences
(hg19):

$ STAR --runThreadN 8 --runMode genomeGenerate --genomeDir


hg19 --genomeFastaFiles [Link] --sjdbGTFfile [Link]-
[Link]
$ for sraID in SRR6686554 SRR6686555 SRR6686557 SRR6686558
SRR6686560 SRR6686561 SRR6686563 SRR6686564; do STAR --geno-
meDir hg19 --sjdbGTFfile [Link] --outSAM-
type BAM SortedByCoordinate --outFilterMultimapNmax 1 --
outFilterMismatchNmax 2 --runThreadN 8 --readFilesCommand zcat
--readFilesIn clean_outdir/${sraID}_trimmed.[Link] --outFile-
NamePrefix $sraID; done

During the mapping process, a maximum of two mis-


matches (--outFilterMismatchNmax 2) is allowed and unique
mapped reads are reported (--outFilterMultimapNmax 1).
“hg19” is the name of the genome sequence index which is
built by genomeGenerate model of STAR. “gencode.v19.
A Pipeline for Profiling and Annotating RNA Modifications 21

[Link]” is the gene annotation for hg19 reference


genome in GENCODE [23]. The BAM mapping files with
file extension “[Link]” will be gen-
erated in the current working directory.
7. Index the BAM files:
Index the BAM files using Samtools:

$ for sraID in SRR6686554 SRR6686555 SRR6686557 SRR6686558


SRR6686560 SRR6686561 SRR6686563 SRR6686564; do samtools index
${sraID}[Link]; done

Here, the BAM files are further indexed, which can be used
to visually check the modification peaks in the Integrative
Genomics Viewer (IGV) [24].
8. Modification site/peak calling:
Call the m6A RNA modification peaks based on INPUT
and m6A IP sample using MACS2 [18] or exomePeak [19] (see
Note 5). Here, MACS2 is used to call the modifications:

$ inputs=(SRR6686554 SRR6686555 SRR6686560 SRR6686561)


$ ips=(SRR6686557 SRR6686558 SRR6686563 SRR6686564)
$ for(( i=0;i<${#inputs[@]};i++)) do
inputSample=${inputs[$i]}[Link]
ipSample=${ips[$i]}[Link]
macs2 callpeak --treatment $ipSample --control $inputSample
--format BAM --outdir macs2_out --name ${ips[$i]} --qvalue
0.05
done

The outputs of MACS2 will be generated in macs2_out


directory, containing four BED format files
(SRR6686557_summits.bed, SRR6686558_summits.bed,
SRR6686563_summits.bed and SRR6686564_summits.bed).

3.2 Modification 1. RNAmod overview:


Annotation Using The overall workflow of RNAmod [17] (see Note 6) is
RNAmod shown in Fig. 2. RNAmod first extracts gene features from
the reference genome annotation and then maps the submitted
modification sites/peaks onto different RNA features. Subse-
quently, RNAmod performs various coverage calculations,
metagene analyses, and annotations focusing on mRNAs. The
web server of RNAmod provides three functional modules,
including single case, group case, and gene case. The single-
case module allows the users to annotate RNA modifications
22 Qi Liu et al.

RNAmod
Single case

Group case

Gene case
Datasets
Users

Outputs

Gene Gene Transcription


Heatmaps Motifs Gene list
features characteristics start/end sites

Gene Splice Translation


JBrowse Functions Site list
bio-types sites start/end sites

Fig. 2 Overall workflow of RNAmod. RNAmod provides three functional modules, including single case, group
case, and gene case, and is freely available at [Link] or [Link]
[Link]/RNAmod

for a single sample. Group-case module allows the users to


annotate and compare the modification distribution between
two groups or multiple groups. The gene analysis module is
used to analyze the modification distribution in a specific gene
context. Meanwhile, RNAmod integrated JBrowse tool [25]
can be used to check and compare the modifications in the
context of known RNA modification sites and RNA-binding
sites of RBPs.
2. Data input of RNAmod:
RNAmod uses BED format as input (see Note 7), which is
commonly used to represent chromosomal locations and can
be generated by many modification calling tools, such as
MACS [18] and exomePeak [19]. Besides, BED format can
be easily converted from other text formats. BED file further
compressed in .gz or .zip format is recommended to speed the
uploading.
RNAmod provides several flexible parameters to define the
gene features. For instance, the flank size (bp) is the flank
length interval used to define the transcription start/end
regions, translation start/end regions, and splice site regions.
The number of bins is used to divide all gene features into equal
number of bins in coverage plots and metagene analyses. The
scale RNA feature determines whether to scale the length of
CDS and UTR in mRNA metagene plot. In group-case input,
there is an additional parameter to control whether the inter-
section or union set in each group is used in the annotation. It
notes that all input web pages are organized with examples to
help users with correct inputs.
A Pipeline for Profiling and Annotating RNA Modifications 23

BED formats of m6A modification peak files of control


(shCtrl) and Mettl3 knockdown (shMETTL3) samples
(Table 1) generated by MACS2 in Subheading 3.1, step 8,
can be directly submitted to RNAmod. Select Homo sapiens
(hg19) as the reference genome, upload the four peaks files,
and click the submit button to start an analysis using the default
parameters. After submission of the data, the data analysis
queue system returns a job ID (example: tyXu-
BoOhKh4dW63x), which is a 16-random-character string
and can be used to retrieve the results once the job is finished.
3. The outputs of RNAmod:
The outputs of RNAmod are presented in intuitive web
interfaces, which typically contain the following types of infor-
mation (Fig. 3): (1) overall statistic of modification sites/peaks
across different gene features, including promoter, 50 UTR
(UTR5), CDS, 30 UTR (UTR3), stop codon, intron, and
intergenic regions: in the plots, y-axis denotes the frequency
of peaks/sites while x-axis represents different gene features in
the plot; (2) overall statistics of modifications across different
gene biotypes, such as protein-coding gene, long noncoding

Fig. 3 Screenshots of the RNAmod analysis outputs. The typical outputs contain the following types of
information: (1) modification location distribution; (2) gene type statistics; (3) coverage plots for different
features; (4) distribution around transcription start/end sites; (5) distribution around translation start/end sites;
(6) distribution around splicing sites; (7) comparison of RNA characteristics between groups; (8) mRNA
metagene plot; (9) enriched sequence motifs; (10) heatmap of modification distribution; (11) Gene Ontology
functional enrichments; (12) pathway functional enrichments; (13) modification target gene annotation list;
(14) detailed list of modification sites; and (15) JBrowse visualization of modification in known modifications
and RBP-binding site context. In the coverage plots, 95% confidence interval (mean  standard error of the
mean  1.96) is shown
24 Qi Liu et al.

RNA (lncRNA), pseudogenes, ribosomal RNAs (rRNA), and


microRNA (miRNA); (3) coverage plot for modification sites/
peaks overlapping with different mRNA features: in the plot, all
features are divided into equal bins; the number of sites/peaks
distributed in each bin is counted and the mean coverage is
then calculated among all the genes; (4) coverage plot for
modification sites/peaks around transcription start sites and
transcription end sites: the number of site/peaks for each
location is counted around the flank regions (1000 nt flank
size for both upstream and downstream by default); (5) cover-
age plot for modification sites/peaks around translation start
sites (TSS) and translation end sits (TES): the number of site/
peaks for each nucleotide location is counted around the flank
regions (upstream and downstream) of TSS/TES on the tran-
script; (6) coverage plot for modification sites/peaks around 50
splice sites (5PSS) and 30 splice sites (3PSS): for the coverage
analysis around splice sites, the number of sites/peaks in each
nucleotide location is counted; then, mean depth coverage for
the specific location is calculated for all the genes; (7) compari-
son of RNA characteristics of genes with modifications
between groups: y-axis in three plots (from left to right) repre-
sents sequence length, GC content, and minimum free energy
(MFE), respectively; (8) the metagene plot of modification
sites across mRNA features, including 50 UTR, CDS, and 30
UTR, which are represented using rectangles in green, yellow,
and blue colors, respectively; y-axis denotes the distribution
density of modification sites; (9) enriched sequence motifs
detected by Homer [26] for the modification sites/peaks: the
seqLogo plots for enriched motifs are showed, in which the
logo uses a stack of letters to represent columns of the align-
ment and the height of each stack is proportional to the
sequence conservation at that position; (10) heatmaps of mod-
ifications around transcription start/end sites (genomic
regions) and around translation start/end sites (transcript
regions): in the heatmap, x-axis represents the nucleotide posi-
tion around the site (50 ! 30 direction) while y-axis represents
all modified genes sorted by the number of modifications in
each gene; (11) Gene Ontology (GO) functional enrichment
for modification target genes: all the enriched terms in three
GO domains (biological process, cellular component, and
molecular function) are contained in one table, in which the
specific GO domain can be selected using the filter in “GO”
column; (12) functional pathway enrichments for modification
target genes, including KEGG for all supported species, Reac-
tome pathway, Disease Ontology, Network of Cancer Gene,
DisGeNET disease genes, and MSigDB functional gene sets
(chemical and genetic perturbation genes, microRNA target
motif genes, cancer gene neighborhoods, cancer module
A Pipeline for Profiling and Annotating RNA Modifications 25

genes, oncogenic signature genes, and immunologic signature


genes) for human, mouse, rat, zebrafish, fly, and C. elegans;
(13) the detailed list of modification target genes, in which the
gene information includes transcript ID, gene symbol, and
gene biotype: it is noteworthy that “stop codon” region is
overlapped with 30 UTR and CDS; (14) detailed annotation
list of modification sites: the gene information for the sites
includes transcript ID, gene symbol, gene biotype, and location
of the gene; and (15) JBrowse integration visualization for the
modification sites in known RNA modifications and
RBP-binding site context. Results are shown in interactive
tables and figures on the web page, and these are all available
for downloading as different formats. RNAmod also allows
users to use different ways to show the distribution charts and
provides a link to download all the results.

4 Notes

1. The default output directory of prefetch in SRA toolkit is


“$HOME/ncbi/public/sra/”, which can be changed by the
users using vdb-config.
2. If the sequencing data are in paired-end format, use “--split-
files” parameter in the fastq-dump to extract the paired-end
reads.
3. trim_galore uses Cutadapt to trim the adapter, which should be
installed in the system. If Cutadapt is not in the PATH envi-
ronment, use “--path_to_cutadapt” parameter to specify the
path to the Cutadapt executable.
4. The encoding type of sequencing quality can be checked in
FastQC report, in which “Sanger/Illumina 1.9” and “Illumina
1.5” indicate “phred33” and “phred64” encoding types in
trim_galore, respectively.
5. MACS2 uses one BAM mapping file as input while exomePeak
can use BAM files of multiple replicates in one command. Refer
to the link [Link]
html/[Link] to see how to use exomePeak.
6. RNAmod is freely available at [Link]
RNAmod or [Link]
The data used in this protocol are available at GitHub
([Link]
7. For the BED format ([Link]
[Link]#format1), at least the first three fields (chro-
mosome, start, and end) are required.
26 Qi Liu et al.

Acknowledgments

R.I.G. is supported by R35CA232115 and R01CA233671 grants


from the National Cancer Institute of the NIH.

References

1. Helm M, Motorin Y (2017) Detecting RNA Sliz P, Santisteban P, George RE, Richards
modifications in the epitranscriptome: predict WG, Wong KK, Locker N, Slack FJ, Gregory
and validate. Nat Rev Genet 18(5):275–291. RI (2018) mRNA circularization by METTL3-
[Link] eIF3h enhances translation and promotes
2. Peer E, Rechavi G, Dominissini D (2017) Epi- oncogenesis. Nature 561(7724):556–560.
transcriptomics: regulation of mRNA metabo- [Link]
lism through modifications. Curr Opin Chem 8
Biol 41:93–98. [Link] 10. Kadumuri RV, Janga SC (2018) Epitranscrip-
cbpa.2017.10.008 tomic code and its alterations in human disease.
3. Shi H, Wei J, He C (2019) Where, when, and Trends Mol Med 24(10):886–903. https://
how: context-dependent functions of rna [Link]/10.1016/[Link].2018.07.010
methylation writers, readers, and erasers. Mol 11. Livneh I, Moshitch-Moshkovitz S,
Cell 74(4):640–650. [Link] Amariglio N, Rechavi G, Dominissini D
1016/[Link].2019.04.025 (2020) The m(6)A epitranscriptome: transcrip-
4. Meyer KD, Jaffrey SR (2017) Rethinking m(6) tome plasticity in brain development and func-
A readers, writers, and erasers. Annu Rev Cell tion. Nat Rev Neurosci 21(1):36–51. https://
Dev Biol 33:319–342. [Link] [Link]/10.1038/s41583-019-0244-z
1146/annurev-cellbio-100616-060758 12. Barbieri I, Kouzarides T (2020) Role of RNA
5. Dominissini D, Moshitch-Moshkovitz S, modifications in cancer. Nat Rev Cancer.
Schwartz S, Salmon-Divon M, Ungar L, [Link]
Osenberg S, Cesarkas K, Jacob-Hirsch J, 2
Amariglio N, Kupiec M, Sorek R, Rechavi G 13. Zhou Y, Zeng P, Li YH, Zhang Z, Cui Q
(2012) Topology of the human and mouse (2016) SRAMP: prediction of mammalian
m6A RNA methylomes revealed by m6A-seq. N6-methyladenosine (m6A) sites based on
Nature 485(7397):201–206. [Link] sequence-derived features. Nucleic Acids Res
10.1038/nature11112 44(10):e91. [Link]
6. Li X, Xiong X, Zhang M, Wang K, Chen Y, gkw104
Zhou J, Mao Y, Lv J, Yi D, Chen XW, Wang C, 14. Dominissini D, Moshitch-Moshkovitz S,
Qian SB, Yi C (2017) Base-resolution mapping Salmon-Divon M, Amariglio N, Rechavi G
reveals distinct m(1)a methylome in nuclear- (2013) Transcriptome-wide mapping of N(6)-
and mitochondrial-encoded transcripts. Mol methyladenosine by m(6)A-seq based on
Cell 68(5):993–1005. e1009. [Link] immunocapturing and massively parallel
org/10.1016/[Link].2017.10.019 sequencing. Nat Protoc 8(1):176–189.
7. Arango D, Sturgill D, Alhusaini N, Dillman [Link]
AA, Sweet TJ, Hanson G, Hosogane M, Sin- 15. Chen K, Wei Z, Zhang Q, Wu X, Rong R,
clair WR, Nanan KK, Mandler MD, Fox SD, Lu Z, Su J, de Magalhaes JP, Rigden DJ,
Zengeya TT, Andresson T, Meier JL, Coller J, Meng J (2019) WHISTLE: a high-accuracy
Oberdoerffer S (2018) Acetylation of cytidine map of the human N6-methyladenosine
in mRNA promotes translation efficiency. Cell (m6A) epitranscriptome predicted using a
175(7):1872–1886. e1824. [Link] machine learning approach. Nucleic Acids Res
10.1016/[Link].2018.10.030 47(7):e41. [Link]
8. Zhang LS, Liu C, Ma H, Dai Q, Sun HL, gkz074
Luo G, Zhang Z, Zhang L, Hu L, Dong X, 16. Chen W, Xing P, Zou Q (2017) Detecting N
He C (2019) Transcriptome-wide mapping of (6)-methyladenosine sites from RNA transcrip-
internal N(7)-methylguanosine methylome in tomes using ensemble Support Vector
mammalian mRNA. Mol Cell 74 Machines. Sci Rep 7:40242. [Link]
(6):1304–1316. e1308. [Link] 10.1038/srep40242
1016/[Link].2019.03.036 17. Liu Q, Gregory RI (2019) RNAmod: an
9. Choe J, Lin S, Zhang W, Liu Q, Wang L, integrated system for the annotation of
Ramirez-Moya J, Du P, Kim W, Tang S, mRNA modifications. Nucleic Acids Res 47
A Pipeline for Profiling and Annotating RNA Modifications 27

(W1):W548–W555. [Link] [Link]


1093/nar/gkz479 btp120
18. Zhang Y, Liu T, Meyer CA, Eeckhoute J, John- 23. Harrow J, Frankish A, Gonzalez JM,
son DS, Bernstein BE, Nusbaum C, Myers Tapanari E, Diekhans M, Kokocinski F, Aken
RM, Brown M, Li W, Liu XS (2008) Model- BL, Barrell D, Zadissa A, Searle S, Barnes I,
based analysis of ChIP-Seq (MACS). Genome Bignell A, Boychenko V, Hunt T, Kay M,
Biol 9(9):R137. [Link] Mukherjee G, Rajan J, Despacio-Reyes G,
gb-2008-9-9-r137 Saunders G, Steward C, Harte R, Lin M,
19. Meng J, Lu Z, Liu H, Zhang L, Zhang S, Howald C, Tanzer A, Derrien T, Chrast J,
Chen Y, Rao MK, Huang Y (2014) A protocol Walters N, Balasubramanian S, Pei B,
for RNA methylation differential analysis with Tress M, Rodriguez JM, Ezkurdia I, van
MeRIP-Seq data and exomePeak R/Biocon- Baren J, Brent M, Haussler D, Kellis M,
ductor package. Methods 69(3):274–281. Valencia A, Reymond A, Gerstein M,
[Link] Guigo R, Hubbard TJ (2012) GENCODE:
008 the reference human genome annotation for
20. Huang H, Weng H, Zhou K, Wu T, Zhao BS, The ENCODE Project. Genome Res 22
Sun M, Chen Z, Deng X, Xiao G, Auer F, (9):1760–1774. [Link]
Klemm L, Wu H, Zuo Z, Qin X, Dong Y, 135350.111
Zhou Y, Qin H, Tao S, Du J, Liu J, Lu Z, 24. Robinson JT, Thorvaldsdottir H, Winckler W,
Yin H, Mesquita A, Yuan CL, Hu YC, Sun W, Guttman M, Lander ES, Getz G, Mesirov JP
Su R, Dong L, Shen C, Li C, Qing Y, Jiang X, (2011) Integrative genomics viewer. Nat Bio-
Wu X, Guan JL, Qu L, Wei M, Muschen M, technol 29(1):24–26. [Link]
Huang G, He C, Yang J, Chen J (2019) His- 1038/nbt.1754
tone H3 trimethylation at lysine 36 guides m 25. Skinner ME, Uzilov AV, Stein LD, Mungall CJ,
(6)A RNA modification co-transcriptionally. Holmes IH (2009) JBrowse: a next-generation
Nature 567(7748):414–419. [Link] genome browser. Genome Res 19
org/10.1038/s41586-019-1016-7 (9):1630–1638. [Link]
21. Dobin A, Davis CA, Schlesinger F, Drenkow J, 094607.109
Zaleski C, Jha S, Batut P, Chaisson M, Gingeras 26. Heinz S, Benner C, Spann N, Bertolino E, Lin
TR (2013) STAR: ultrafast universal RNA-seq YC, Laslo P, Cheng JX, Murre C, Singh H,
aligner. Bioinformatics 29(1):15–21. https:// Glass CK (2010) Simple combinations of
[Link]/10.1093/bioinformatics/bts635 lineage-determining transcription factors
22. Trapnell C, Pachter L, Salzberg SL (2009) prime cis-regulatory elements required for
TopHat: discovering splice junctions with macrophage and B cell identities. Mol Cell 38
RNA-Seq. Bioinformatics 25(9):1105–1111. (4):576–589. [Link]
molcel.2010.05.004
Part II

Detecting RNA Modifications Using Nanopore Direct RNA


Sequencing
Chapter 3

EpiNano: Detection of m6A RNA Modifications Using Oxford


Nanopore Direct RNA Sequencing
Huanle Liu, Oguzhan Begik, and Eva Maria Novoa

Abstract
RNA modifications play pivotal roles in the RNA life cycle and RNA fate, and are now appreciated as a major
posttranscriptional regulatory layer in the cell. In the last few years, direct RNA nanopore sequencing
(dRNA-seq) has emerged as a promising technology that can provide single-molecule resolution maps of
RNA modifications in their native RNA context. While native RNA can be successfully sequenced using this
technology, the detection of RNA modifications is still challenging. Here, we provide an upgraded version
of EpiNano (version 1.2), an algorithm to predict m6A RNA modifications from dRNA-seq datasets. The
latest version of EpiNano contains models for predicting m6A RNA modifications in dRNA-seq data that
has been base-called with Guppy. Moreover, it can now train models with features extracted from both base-
called dRNA-seq FASTQ data and raw FAST5 nanopore outputs. Finally, we describe how EpiNano can be
used in stand-alone mode to extract base-calling “error” features and current intensity information from
dRNA-seq datasets. In this chapter, we provide step-by-step instructions on how to produce in vitro tran-
scribed constructs to train EpiNano, as well as detailed information on how to use EpiNano to train, test,
and predict m6A RNA modifications in dRNA-seq data.

Key words Oxford Nanopore Technologies, Direct RNA sequencing, Native RNA, RNA modifica-
tion, Base-calling “errors”, In vitro transcription, Support vector machine, N6-methyladenosine,
Nanopore sequencing

1 Introduction

Chemical modifications in RNA have been well documented for


over a half century. In the 1950s, pseudouridine was discovered to
be the most abundant RNA modification present in cellular RNAs
[1]. Later studies showed that internal modifications were also
present in mRNAs and long noncoding RNAs (lncRNAs), reveal-
ing that N6-methyladenosine (m6A) was the most abundant
mRNA modification [2–5]. Interest in functionally dissecting and
mapping RNA modifications transcriptome-wide re-emerged in the

Huanle Liu and Oguzhan Begik contributed equally to this work.

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

31
32 Huanle Liu et al.

past decade, largely triggered by the discovery of the biological


function of m6A demethylases FTO [6] and ALKBH5 [6, 7]. At
the same time, the availability of novel methods to map m6A RNA
modifications transcriptome-wide (m6A-seq) opened new possibi-
lities to study m6A modifications across a wide variety of conditions
and tissues [8, 9]. Using m6A-seq, m6A RNA modifications were
found to play pivotal roles in a wide variety of biological processes,
including cellular differentiation [10–13] and sex determination
[14, 15], among others.
While the field of RNA modifications owes largely to improved
methods for detection using next-generation sequencing (NGS)
technologies [8, 9, 16–18], these methods present several caveats:
(1) they lack single-molecule resolution [19]; (2) they are limited
to those RNA modifications for which there are commercial anti-
bodies and chemicals that are selective toward a particular RNA
modification [20]; (3) they cannot provide isoform-specific infor-
mation, due to the short-read nature of Illumina-based technolo-
gies; and iv) they are limited to those regions that can be reverse
transcribed and/or PCR amplified. Direct RNA nanopore sequenc-
ing (dRNA-seq) offers an alternative to NGS-based methods to
detect RNA base modifications in a transcriptome-wide fashion
[21–24]. Indeed, it is capable of sequencing native RNA molecules,
including RNA modifications, in their native RNA context, and
with single-molecule resolution.
In this chapter, we describe the use of EpiNano, an algorithm
to detect RNA base modifications from data generated using direct
RNA nanopore sequencing. We exemplify the usage of EpiNano to
detect m6A RNA modifications both in in vitro-transcribed con-
structs and in in vivo datasets.

2 Materials

2.1 In Vitro Trans- 1. Plasmids containing “curlcake” sequences, to be used as tem-


cribed (IVT) RNAs plates for the in vitro transcription reaction. The “curlcakes”
with RNA are a set of synthetic sequences that comprise all possible
Modifications 5-mers (median occurrence of each 5-mer ¼ 10), while mini-
mizing the RNA secondary structure, and thus can be used to
systematically identify the perturbations of current intensity
caused by the presence of a given RNA modification in all
possible 5-mer contexts (n ¼ 1024). Plasmids can be obtained
from Addgene:
pUC57-Curlcake1 (2329 bp, Addgene # 139340).
pUC57-Curlcake2 (2543 bp, Addgene # 139341).
pUC57-Curlcake3 (2678 bp, Addgene # 139342).
pUC57-Curlcake4 (2795 bp, Addgene # 139343).
Identifying m6A Modifications Using Nanopore Sequencing 33

2. Competent E. coli cells: 10-beta Competent E. coli high


efficiency.
3. SOC medium: 20 g/L Tryptone, 5 g/L yeast extract, 4.8 g/L
MgSO4, 3.603 g/L dextrose, 0.5 g/L NaCl, 0.186 g/L KCl.
4. Agar plates with ampicillin (100μg/mL).
5. LB broth.
6. Ampicillin (100 mg/mL): Dissolve 1 g of sodium ampicillin in
10 mL of nuclease-free water.
7. Qiagen Plasmid Maxi Kit.
8. Molecular grade ethanol.
9. Nuclease-free water.
10. BSA (NEB).
11. Cutsmart Buffer, BamHI-HF, and EcoRV-HF.
12. Phenol:chloroform:isoamyl alcohol [Link], saturated with
10 mM Tris, pH 8.0, 1 mM EDTA.
13. Chloroform.
14. 3 M Sodium acetate, pH 5.2 (molecular biology grade).
15. Pellet Paint® Co-Precipitant or glycogen.
16. Agarose.
17. 1 TBE buffer: Dissolve 10.8 g Tris and 5.5 g boric acid in
900 mL distilled water. Add 4 mL 0.5 M Na2EDTA, pH 8.0.
Adjust the volume to 1 L.
18. GelRed nucleic acid stain (10,000).
19. Gel loading dye, purple (6).
20. AmpliScribe™ T7 High Yield Transcription Kit.
21. N6-methyl-ATP (Jena Bioscience).

2.2 Cleanup 1. RNeasy Mini Kit.


of IVT RNAs 2. Nuclease-free water.
3. Qubit™ RNA HS Assay Kit.

2.3 PolyA Tailing, 1. SUPERase In.


Cleanup, and 2. E. coli poly(A) polymerase and accompanying buffer.
RNA Quality Check
3. 10 mM ATP.
4. Agencourt RNAClean XP beads.
4. Nuclease-free water.
5. Molecular grade ethanol.
6. Magnetic separator, suitable for 1.5 mL Eppendorf tubes.
7. Hula Mixer.
34 Huanle Liu et al.

8. Nanodrop or similar.
9. Agilent TapeStation and accompanying reagents: RNA Screen-
Tape Sample Buffer, RNA ScreenTape Ladder, RNA Screen-
Tape, optical tube strip caps (8 strip), optical tube strips (8
strip), and loading tips.

2.4 Direct RNA 1. Direct RNA sequencing kit (SQK-RNA002).


Nanopore Sequencing 2. Flow cell priming kit (EXP-FLP001).
Library Preparation
3. 1.5 mL Eppendorf DNA LoBind tubes, 0.2 mL thin-walled
PCR tubes.
4. Nuclease-free water.
5. Freshly prepared 70% ethanol in nuclease-free water.
6. SuperScript III Reverse Transcriptase and accompanying
reagents.
7. 10 mM dNTP solution.
8. Concentrated T4 DNA Ligase 2 M U/mL (NEB).
9. NEBNext® Quick Ligation Reaction Buffer.
10. Agencourt RNAClean XP beads.
11. Qubit RNA HS Assay Kit, Qubit dsDNA HS Assay Kit, and
Qubit™ Assay Tubes.
12. Hula Mixer (gentle rotator mixer).
13. Magnetic separator, suitable for 1.5 mL Eppendorf tubes.

2.5 Software 1. Guppy, version 3.1.5 or later ([Link]


com/): base-calling algorithm.
2. Minimap2, version 2.14-r886 ([Link]
minimap2): mapping algorithm.
3. Samtools, version 0.1.19 ([Link]
samtools): sorting and manipulation of BAM files.
4. 4. Sam2tsv, version a779a30d6af485d9cd669aa3752465132c-
f21eec ([Link] con-
version of BAM to plain text files and reorganization of the read-
reference alignment.
5. EpiNano, version 1.2 ([Link]
EpiNano): extraction of base-calling “error” features from
FASTQ files and BAM/SAM alignment files, and optionally
current intensity information from Nanopolish event align
outputs.
6. Nanopolish, version 0.11.2 or later ([Link]
nanopolish): extraction of event information from FAST5 files.

2.6 Datasets 1. Direct RNA sequencing data from S. cerevisiae polyA(+)-


selected RNA, both from WT and ΔIME4 strains, can be
Identifying m6A Modifications Using Nanopore Sequencing 35

found in SRA (SRP184486, FAST5) and GEO (code:


GSE126213, FASTQ).
2. Direct RNA sequencing data from S. cerevisiae 25S ribosomal
RNA, both from WT and snR34 knockout strains, can be
found in the EpiNano GitHub repository ([Link]
com/enovoa/EpiNano/tree/master/test_data/).
3. Direct RNA sequencing data from in vitro-transcribed syn-
thetic constructs (“curlcakes”) containing unmodified
(“unm”) as well as m6A-modified (“mod”) nucleosides can
be found in SRA (code: SRP174366, FAST5) and GEO
(code: GSE124309, FASTQ).

3 Methods

3.1 Preparation 1. Adjust the water bath to 42  C and place the competent E. coli
of Modified cells into ice.
and Unmodified In 2. Add 3μL of the curlcake plasmid into a 25μL volume of com-
Vitro Transcribed petent cells and mix well. Incubate on ice for 30–45 min.
Constructs to Train 3. Heat shock the cells at 42  C for 45 s, and then incubate on ice
EpiNano for 5 min.
3.1.1 Plasmid 4. Add 500μL warm SOC/SOB medium. Do not pipette and
Transformation incubate at 37  C for 1 h, shaking at 220 rpm (use
and Isolation thermomixer).
5. Spread 200μL transformant on agar plates containing 100μg/μ
L ampicillin.
6. Incubate in a 37  C incubator overnight.
7. Next day, pick a colony and inoculate it in 200 mL LB (with
ampicillin 100μg/μL) for O/N culture (for Maxiprep).
8. Centrifuge the culture in 250 mL centrifuge vessels at
10,000  g for 10 min (either disposable or reusable auto-
claved ones) and isolate plasmid DNA using the Plasmid Maxi
Kit according to the manufacturer’s instructions, resuspending
the final DNA pellet in 500μL RNase-free water.

3.1.2 Enzymatic 1. Digest 10μg DNA (see Note 1) in 250μL volume as outlined
Digestion of Plasmids below and incubate for 4 h (or O/N) at 37  C.
and DNA Cleanup
BSA (100) 2.5μL
Cutsmart buffer (10) 25μL
BamHI-HF 1μL (20 units)
EcoRV-HF 1μL (20 units)
DNA x μL
dH2O Up to 250μL
36 Huanle Liu et al.

2. To clean up the plasmid DNA, add one volume of phenol:


chloroform:isoamyl alcohol ([Link]) to your sample, and
shake by hand thoroughly for approximately 20 s.
3. Centrifuge at room temperature for 5 min at 16,000  g.
Carefully remove the upper aqueous phase and transfer the
layer to a fresh tube. Be sure not to carry over any phenol
during pipetting.
4. Repeat the two previous steps until no protein is visible at the
interface.
5. Mix an equal volume of chloroform with the aqueous phase.
Shake briefly and centrifuge at 12,000  g for 3–5 min.
6. Mix the upper phase with 0.1 sodium acetate and 2.5
absolute ethanol and 1μL glycogen or 2μL pellet paint. Incu-
bate for 15 min at RT or overnight at 20  C or 1 h at 80  C.
7. Centrifuge the sample at 4  C for 30 min at 16,000  g to
pellet the DNA.
8. Carefully remove the supernatant without disturbing the DNA
pellet. Add 150μL of 70% ethanol.
9. Centrifuge the sample at 4  C for 2 min at 16,000  g. Care-
fully remove the supernatant.
10. Allow the pellet to air-dry and resuspend the pellet in 25μL of
RNase-free H2O.
11. Measure the DNA concentration using a Nanodrop or similar.

3.1.3 Agarose Gel 1. Dissolve 1 g agarose in 100 mL 1xTBE/TAE buffer in a


Electrophoresis to Confirm microwavable flask and microwave for 1–3 min until the aga-
Plasmid Digestion rose is completely dissolved.
2. Let it cool on the benchtop for 5 min, add 10μL GelRed
nucleic acid stain (10,000), and mix.
3. Pour the mixture into the gel container and allow to set.
4. Mix 1μL digested DNA sample with 1μL loading dye (6) and
4μL nuclease-free water.
5. Load the DNA sample into the well and run the gel for 30 min
at 100 V.
6. Image the gel using a Bio-Rad gel imager or similar to ensure
that DNA is completely linearized.

3.1.4 In Vitro 1. For each plasmid, set up an IVT reaction by combining the
Transcription Using following reaction components from the AmpliScribe™ T7
AmpliScribe T7-Flash High Yield Transcription Kit with linearized DNA from the
Transcription Reaction step above at RT, in the order listed below (see Note 2).
Substitution of m6ATP in the place of ATP will result in the
generation of RNA containing m6A residues.
Identifying m6A Modifications Using Nanopore Sequencing 37

Component Volume
RNase-free water Up to 20μL
Linearized template DNA 1μg
AmpliScribe T7-flash 10 reaction buffer 2μL
100 mM DTT 2μL
100 mM ATP (or m6ATP) 1.8μL
100 mM CTP 1.8μL
100 mM GTP 1.8μL
100 mM UTP 1.8μL
RiboGuard RNase inhibitor 0.5μL
AmpliScribe T7 flash enzyme solution 2μL
Total 20μL

2. Incubate the reaction for 4 h at 42  C.


3. Add 2μL of RNase-free DNase I to the reaction and incubate
for 20 min at 37  C.

3.1.5 RNA Cleanup Using 1. Bring the volume of the IVT reaction to 100μL with RNase-
RNeasy Qiagen Kit free water.
2. Follow the step-by-step instructions of the RNeasy Kit accord-
ing to the manufacturer’s protocol.
3. To elute RNA, pipette 20μL RNase-free water directly onto the
RNeasy Mini column membrane. Centrifuge for 15 min at
8000  g to elute.
4. Add another 20μL RNase-free water directly onto the RNeasy
Mini column membrane and centrifuge for 15 min at
8000  g to elute.
5. Measure the quality/quantity of the eluate.

3.1.6 PolyA Tailing 1. Mix the following reagents to proceed with polyA tailing
reaction:

Reagent Volume
Purified RNA (sample from step 3.1.5 1–10μg in 15.5μL nuclease-free
above) water
RNase inhibitor (SUPERaseIN) 0.5μL
10 E. coli poly(A) polymerase reaction 2μL
buffer

(continued)
38 Huanle Liu et al.

Reagent Volume
ATP (10 mM) 1μL
E. coli poly(A) polymerase 1μL
Total 20μL

2. Incubate the reaction at 37  C for 20 min and proceed imme-


diately to cleanup.

3.1.7 Bead Cleanup 1. Vortex the RNAClean XP bead stock until homogenous and
of RNA Using RNAClean add 36μL (1.8) RNAClean beads to the RNA sample.
XP Beads 2. Mix by pipetting up and down 10 gently and incubate for
5 min at RT.
3. Place the reaction on the magnet and let it settle for 5–10 min.
4. Slowly aspirate the solution and discard.
5. Add 70% fresh ethanol and incubate for 30 s at RT. Remove
ethanol completely and air-dry for 2 min.
6. Add 20μL water and pipette the beads up and down. Incubate
for 5 min at RT.
7. Use a magnetic stand to separate the beads from the RNA.
Transfer the RNA solution into a new tube.
8. Measure the quality and quantity of the RNA and confirm the
polyA tailing.

3.1.8 Quality Check 1. Load both non-polyA-tailed and polyA-tailed IVT constructs
of PolyA-Tailed RNAs Using into TapeStation (see Note 3) according to the manufacturer’s
TapeStation instructions. Expected results are displayed in Fig. 1.

3.2 Direct RNA 1. Pool each curlcake (for unmodified and m6A modified sepa-
Sequencing Library rately) into a DNA LoBind tube that will contain 200 ng from
Preparation each (800 ng total).
3.2.1 Preparing 2. Adjust the volume to 9μL with nuclease-free water and mix
Input RNA thoroughly by inversion.
3. Spin down briefly in a microfuge.

3.2.2 Adapter Ligation 1. In a 0.2 mL thin-walled PCR tube, mix the reagents in the
following order including some components from the Direct
RNA sequencing kit:
Identifying m6A Modifications Using Nanopore Sequencing 39

d
ile

ile
-ta

-ta
pA

pA
M

6A

6A
UN

UN
er

m
dd

1
CC

CC

CC

CC
La
6000
4000

2000

1000
500
200

25

Fig. 1 TapeStation image of the Curlcake1 (CC1) IVT products, which were in vitro transcribed with or without
m6A modifications, before and after polyA tailing. Ladder illustrates the size of distinct bands on the electronic
gel. Each IVT product shows increased size upon polyA tailing

Reagent Volume (μL)


5 NEBNext Quick ligation reaction buffer 3
RNA 9
RT adapter (RTA) 1
Concentrated T4 DNA ligase 1.5
RNase inhibitor (SUPERase) 0.5
Total 15

2. Mix by pipetting and spin down.


3. Incubate the reaction for 10 min at RT. In the meantime,
proceed to the reverse transcription step.

3.2.3 Reverse 1. Mix the following reagents together to make the reverse tran-
Transcription and Cleanup scription master mix:
40 Huanle Liu et al.

Reagent Volume (μL)


Nuclease-free water 9
10 mM dNTPs 2
5 first-strand buffer 8
0.1 M DTT 4
Total 23

2. Add the master mix to the 0.2 mL PCR tube containing the RT
Adapter-ligated RNA from the “RT Adapter ligation” step
above. Mix by pipetting.
3. Add 2μl SuperScript III reverse transcriptase to the reaction
and mix by pipetting.
4. Place the tube in a thermal cycler, incubate at 50  C for 50 min
and 70  C for 10 min, and bring the sample to 4  C before
proceeding to the next step.
5. Transfer the sample to a 1.5 mL DNA LoBind Eppendorf tube.
6. Resuspend the stock of Agencourt RNAClean XP beads by
vortexing, add 72μL beads to the reverse transcription reac-
tion, and mix by pipetting.
7. Incubate on a Hula Mixer for 5 min at RT.
8. Prepare 200μL of fresh 70% ethanol in nuclease-free water.
9. Spin down the sample and pellet on a magnet.
10. Keep the tube on the magnet and wash the beads with 150μL
70% ethanol without disturbing the pellet.
11. Remove the ethanol and discard. Spin down tubes, place back
on the magnetic rack, and remove any residual ethanol.
12. Remove the tube from the magnetic rack and resuspend the
pellet in 20μL nuclease-free water. Incubate for 5 min at RT.
13. Pellet the beads on the magnet until the eluate is clear and
colorless.
14. Pipette 20μL of the eluate into a clean 1.5 mL Eppendorf DNA
LoBind tube.
15. Measure cDNA and RNA on a Qubit or similar.

3.2.4 RMX Adapter 1. In a clean 1.5 mL Eppendorf DNA LoBind tube, mix the
Ligation and Cleanup reagents in the following order:
Identifying m6A Modifications Using Nanopore Sequencing 41

Volume
Reagents (μL)
Reverse-transcribed RNA from the “reverse transcription” 20
step
5 NEBNext Quick ligation reaction buffer 8
RNA adapter (RMX) 6
Nuclease-free water 3
Concentrated T4 DNA ligase 3

2. Mix by pipetting and incubate for 10 min at RT.


3. Resuspend the stock of Agencourt RNAClean XP beads by
vortexing, add 40μL of beads to the adaptor ligation reaction,
and mix by pipetting.
4. Incubate on a Hula Mixer for 5 min at RT.
5. Spin down the sample and pellet on a magnet. Keep the tube
on a magnet and pipette off the supernatant.
6. Add 150μL of the wash buffer (WSB) provided in the Direct
RNA sequencing kit to the beads. Resuspend the beads by
flicking the tube. Return the tube on the magnetic rack, allow
beads to pellet, and pipette off the supernatant. Repeat. Allow
to air-dry for 2 min.
7. Remove the tube from the magnetic rack and resuspend the
pellet in 21μl elution buffer. Incubate for 10 min at RT.
8. Pellet the beads on magnet until eluate is clear and colorless.
9. Remove and retain 21μL of eluate into a clear 1.5 mL Eppen-
dorf DNA LoBind tube.
10. Measure cDNA and RNA on Qubit or similar.
11. Mix the library with 17.5μl water.
12. Add 37.5μL RRB buffer (mix RRB by vortexing before using)
and mix well. Library is now ready to be loaded to the flow cell
(see Note 4).

3.3 Analysis of Direct Base-calling should be performed using Oxford Nanopore Tech-
RNA Sequencing nologies’ Guppy base-caller, such as in the example shown below
Datasets: Base-Calling (see Note 5):
and Mapping
guppy_basecaller --device cuda:0 -c rna_r9.4.1_70bps_hac.cfg
3.3.1 Base-Calling --compress_fastq -i path/to/fast5_directory -r -s /path/to/
save/basecalling_out --fast5_out
42 Huanle Liu et al.

3.3.2 Mapping Map the base-called reads to the reference FASTA sequences using
minimap2 [25], and keep only the mapped reads:

minimap2 --MD -t 6 -ax map-ont [Link] [Link] |


samtools view -hbS -F 3844 - | samtools sort -@ 6 - sample.
reads

Alternatively, reads can also be aligned to a reference genome


using the following command (see Note 6):

minimap2 --MD -t 6 -ax splice -k14 -uf [Link] [Link].


fastq | samtools view -hbS -F 3844 - | samtools sort -@ 6 -
[Link]

3.4 Extraction The latest version of the EpiNano suite (version 1.2) consists of five
of Features to Detect main programs or modules:
RNA Modifications
– Epinano_Variants computes systematic base-calling “errors”
in Direct RNA (mismatch, deletion, insertion, and per-base quality score) for
Sequencing Datasets each base along the mapped reads and reports their relative
Using EpiNano frequencies in a plain text file.
– Epinano_Current extracts raw current intensity values and dwell
time for each reference base.
– Epinano_Predict trains models using the features extracted with
the aforementioned two modules and makes predictions using
trained models.
– Epinano_DiffErr predicts RNA modifications based on the dif-
ferences in base-calling “errors” between two samples (typically
wild type and knockout).
– Epinano_Plot produces scatterplots or barplots depicting the
differences in base-calling “errors” or modification probabilities
between two samples, highlighting positions that are identified
by the algorithm as significantly altered, i.e., predicted as differ-
entially modified.
EpiNano 1.2 can predict RNA modifications from direct RNA
sequencing datasets using two distinct strategies: (1) EpiNano-
SVM, which employs pre-trained SVM models to predict RNA
modifications, and (2) EpiNano-Error, which uses the differences
between base-calling “errors” (mismatches, deletions, insertions)
between two samples, as well as alterations in per-base qualities, to
predict RNA modifications (Fig. 2). Both strategies rely on the fact
that RNA modifications appear as systematic base-calling “errors”
in direct RNA sequencing datasets.
Identifying m6A Modifications Using Nanopore Sequencing 43

Feature extraction Epinano_Variants Epinano_Variants


Per-site and per-kmer • Quality
feature extraction of: • Insertion Epinano_Current
• Deletion
Per-site and per-kmer • Quality
• Mismatch
feature extraction of: • Insertion
• Deletion
• Mismatch
• Current intensity

Epinano_DiffError Epinano_Predict
Prediction of
• Compute Sum of Errors
RNA-modified sites • Outlier Detection • Train SVM models
• Predict RNA mods using
Epinano_Plot pre-trained models (ProbM)

• Per-transcript plots (WT-KO)


• Features Scatterplots (WT vs KO)

Fig. 2 Overview of the five main modules included in the EpiNano 1.2 suite. The latest version of EpiNano can
predict RNA modifications using two distinct strategies: (1) EpiNano-Error, which detects RNA modifications
using differential base-calling “errors” that are detected between two samples (typically wild type and
knockout), and (2) EpiNano-SVM, which detects RNA modifications using support vector machines (SVM),
where the SVM models have been pre-trained with modified and unmodified datasets

3.4.1 Extraction 1. Clone the EpiNano repository from GitHub ([Link]


of Base-Calling “Error” com/enovoa/EpiNano) using git:
Features Using EpiNano git clone [Link]
git
2. Extract base-calling errors using the module Epinano_Var-
iants. This module relies on sam2tsv from the jvarkit toolkit
([Link] to extract base qualities
and compute variant frequencies from direct RNA sequencing
data. The user must provide as input both a BAM file contain-
ing the mapped reads and a reference transcriptome or genome
in FASTA format. The $EPINANO_HOME variable corresponds
to the location of the EpiNano script folder:

python $EPINANO_HOME/Epinano_Variants.py -t 6 -R reference.


fasta -b [Link] -s /path/to/sam2tsv/[Link] --
type g

The “--type” flag indicates the type of reference that was used
to obtain the bam file. If the reads were mapped to a genome
reference with splicing-aware mapping options, “--type g”
should be specified, and EpiNano will discriminate the reads
mapped to the forward strand from those mapped to the reverse
strand. Otherwise, by default, the script assumes that the bam file
44 Huanle Liu et al.

was generated by mapping the reads to reference transcriptome and


that the reads should only be mapped to the forward strand.
Epinano_Variants outputs two feature tables: (1) sample.
[Link], which contains base-calling “error” infor-
mation for each reference position, and (2) sample.per_si-
[Link], which contains the same base-called features
organized in slided 5-mer windows (see Note 7).

3.4.2 Extraction The latest version of EpiNano (v.1.2) relies on the use of Nanopolish
of Current Intensity Values [26] to extract the current signal-level information and collapse it
Using EpiNano on a single-position basis. We offer a custom bash script, which
carries out Nanopolish’s eventalign function and further collapses
the current intensity and dwell time values. Epinano_Variants
will produce a file of per-position results consisting of raw current
intensity values and their corresponding mean, median, and stan-
dard deviations as well as a second file with results organized in
5-mer windows:

sh $EPINANO_HOME/misc/Epinano_Current.sh -b [Link]
-r [Link] -f [Link] -t 6 -m g -d fas-
t5_folder/

Finally, we can merge both variants and current features using


the following script:

python $EPINANO_HOME/misc/Join_variants_currents.py
--variants sample.per_site_var.[Link]
--intensity sample.per_site_current.[Link]
--outfile sample.5mer.all_features.csv

3.5 Predicting RNA 1. Label and merge the “modified” and “unmodified” datasets.
Modifications In Vivo To train a model, we first have to label the files containing
Using Trained SVM EpiNano-extracted features that will be used for training the
Models (EpiNano-SVM) model (“mod.per_site.[Link]” and “unm.per_si-
[Link]”) by adding the corresponding labels (“mod”
3.5.1 Train EpiNano and “unm”), as shown below:
Models
bash $EPINANO_HOME/misc/Epinano_LabelSamples.sh -m mod.per_-
[Link] -u unm.per_site.[Link] -o combined.per_site_r-
aw_feature.[Link]

2. Train the model using EpiNano. Epinano_Predict is the


module to train EpiNano models using features that have
been previously extracted using either Epinano_Variants
or Epinano_Current, and is executed as shown below:
Identifying m6A Modifications Using Nanopore Sequencing 45

python $EPINANO_HOME/Epinano_Predict.py
--train combined.per_site_raw_feature.[Link]
--predict combined.per_site_raw_feature.[Link]
--accuracy_estimation --out_prefix train_and_test
--columns 8,13,23 --modification_status_column 26

While the user can choose to train the algorithm with one
sample (--train) and test it on an independent sample (--pre-
dict), it is also possible to use the same input file both for training
and testing the model, as depicted in the example above. In this
scenario, Epinano_Predict will train the models with 50% of the
input data and make predictions with the remaining 50% of
the data.
In the above command, “--columns” denotes the column
indexes of features that are used for training models (in this case,
corresponding to “q3,” “mis3,” and “del3”), while “--modifi-
cation_status_column” indicates the index of the column of
prior knowledge of the modification statuses, i.e., the labels “mod”
and “unm.” Switching on --accuracy_estimation will report
the accuracy of the trained model(s). Unless “--kernel” is used,
Epinano_Predict will train models with multiple kernels. Finally,
the user can visualize the accuracy of their trained models in the
form of receiver operating characteristic (ROC) curves (Fig. 3).

3.5.2 Predict RNA Epinano_Predict.py can predict RNA modifications on a given


Modifications Using dataset using previously trained EpiNano models (specified with
Trained SVM Models “--model”). In the example below, we employ a previously trained
model “[Link]” that will
predict m6A modifications in RRACH k-mers on a dataset that is
specified with “--predict” (see Note 8). This SVM model has
been trained on RRm6ACH and RRACH k-mers produced using
in vitro transcription, using the steps described in Sect. 3.4. The
features used to train the model were q3, mis3, and del3, which
correspond to the per-base quality, mismatch frequency, and dele-
tion frequency of the middle position of the k-mer. It is important
to note that a given model should only be used to predict modifica-
tions on the same set of k-mers that were used to train the model;
that is, if the model is trained on GGACA k-mers, it should only be
used to predict m6A modifications on GGACA k-mers (see Note 9):

python $EPINANO_HOME/Epinano_Predict.py
--model $EPINANO_HOME/models/[Link]-
[Link]
--predict sample.per_site.[Link]
--columns 8,13,23
--out_prefix sample_mod_prediction
46 Huanle Liu et al.

Fig. 3 ROC curves depicting the modification detection performance using in vitro test data (not used for
training) which includes 263 RRACH k-mers. (a) All three SVM models were trained with the same subset of
features (q3, mis3, and del3) described previously [21], using three different kernels (linear, poly and rbf). (b)
The models were trained using the difference of these features (Δq3, Δmis3, and Δdel3) between the
modified and unmodified positions, in a pairwise manner. AUC represents the area under the curve

EpiNano relies on systematic base-calling “errors” caused by


the presence of RNA modifications, and as such, it can be con-
founded by base-calling “errors” that are present in the data, lead-
ing to a high number of false positives (see Note 10). To remove the
false positives, we recommend coupling the sequencing run of
interest to a knockout or knockdown condition. For example, if
the user is sequencing the RNA of a given cell line and is interested
in detecting m6A modifications, they should also sequence the
matched METTL3 knockdown condition to remove false-positive
predictions. In addition, if EpiNano is run in “paired” mode, it can
be used with pre-trained SVM models that rely on the differences in
the base-calling “errors” observed between two samples (e.g.,
WT-KO) to predict the RNA modifications (Fig. 4), which are
available as part of the EpiNano 1.2 release.
Here we tested the performance of EpiNano 1.2 in S. cerevisiae
wild type and ime4Δ direct RNA sequencing datasets, which are
publicly available (see Note 11). The performance of EpiNano on
known m6A-modified sites from in vivo yeast mRNAs is depicted in
Fig. 4, using two different pre-trained models, which are available
in the EpiNano GitHub repository. We find that SVM models
trained on base-calling “error” differences perform slightly better
than those relying on absolute base-calling feature values.
Identifying m6A Modifications Using Nanopore Sequencing 47

Fig. 4 ROC curve depicting the performance of m6A modification detection in in vivo data. All shown models
were trained with a linear kernel but using distinct features. Raw features across replicates were combined as
previously described [21] (see Note 12)

3.6 Predicting RNA In Sect. 3.5 we have showcased the use of base-calling “error”
Modifications In Vivo features to train support vector machine (SVM) models to predict
from Base-Calling m6A RNA modifications, using the Epinano_Predict module,
“Error” Differences which was the original strategy employed by EpiNano to predict
(EpiNano-Error) RNA modifications [21]. In the EpiNano 1.2 suite, we now provide
a new module, Epinano_DiffErr, that can predict RNA modifi-
cations by identifying those positions that show differential base-
calling “errors”—previously extracted using the Epinano_Var-
iants—when comparing two samples (e.g., a wild-type and
knockout condition):

Rscript $EPINANO_HOME/Epinano_DiffErr.R -k [Link].


csv -w [Link] -d 0.1 -t 3 -p -o diffErr -f mis
48 Huanle Liu et al.

Summed Error Linear Regression


1.00

2826
0.75 2883
2880
Summed Error (WT)

2882
0.50

2885 2827

0.25
2877

0.00

0.00 0.25 0.50 0.75 1.00


Summed Error (KO)
Fig. 5 Scatterplots and bar plots generated by Epinano_Plot. The RNA-modified sites that are known to be
affected by the knockout are positions 2826 and 2880. These plots can be generated using test data from
$EPINANO_HOME/test_data

In the example above, Epinano_DiffErr predicts RNA mod-


ifications using mismatch frequency differences (-f mis). We
should note that distinct RNA modification types affect distinct
base-calling “error” features, and therefore, the user should choose
whichever feature is most affected by the RNA modification type
that is being studied. Epinano_DiffErr also offers the possibility
of using the combination of all base-calling “errors” simultaneously
(-f sum_err).
To identify RNA-modified sites, Epinano_DiffErr relies on
two metrics. The first one is based on the z-score deviance of error
frequencies between two samples. The second relies on fitting a
linear regression model between the features of the two samples,
and modifications are then determined by identifying data points
with significant residuals (inferred from z-scores or as Bonferroni-
corrected p-values through t-test of the studentized residuals). The
thresholds for these two metrics can be adjusted by the user using
parameters --t and --d, respectively.
Identifying m6A Modifications Using Nanopore Sequencing 49

While Epinano_DiffErr can only identify RNA-modified


sites that are changing between the two conditions/samples stud-
ied (i.e., it cannot predict RNA modifications “de novo”), it can be
applied, in principle, to any RNA modification type—as long as the
RNA modification type affects the current intensity and/or base-
called features.
Finally, the user can choose to visualize the predicted RNA-
modified sites using the Epinano_Plot module:

Rscript $EPINANO_HOME/Epinano_Plot.R [Link]-


[Link]

This module takes as input a comma-separated file with predic-


tions, such as the one generated by Epinano_DiffErr in the
previous step, and will highlight the predicted sites in scatterplots
or bar plots (Fig. 5).

4 Notes

1. DNA volume should never be more than 25% of the total


volume. Enzyme volume should never be more than 10% of
total volume.
2. If the reaction is prepared at a colder temperature, a cloudy
solution will appear, which indicates the precipitation of sper-
midine and DTT.
3. A bioanalyzer can also be used instead of a TapeStation.
4. Keep the library on ice if it is not immediately loaded.
5. New versions of Guppy base-caller are released every few
months. The Guppy base-caller is available from the Nanopore
community ([Link] The
current code and downstream examples used in this chapter
correspond to Guppy version 3.1.5.
6. We recommend users to try different aligners and alignment
parameters and find an optimized approach to read mapping.
7. If the reference FASTA used is large (i.e., not synthetic
sequences, such as is the case of the “curlcakes”), we recom-
mend splitting the dataset into smaller subsets. This will greatly
reduce the computation time and required memory.
8. This model, which has been trained on RRACH k-mers base-
called with Guppy 3.1.5, is available in the EpiNano GitHub
repository ([Link]
9. There is a strong dependency between base-calling “errors”
and sequence context. Thus, if the user chooses to train on
diverse k-mers simultaneously, we recommend minimizing the
diversity of the k-mers included in the training.
50 Huanle Liu et al.

10. False positives can also be caused by low coverage. RNA mod-
ifications should be predicted on sites with high coverage. We
recommend a minimum coverage of 20–30 reads for a k-mer to
be included in the analysis.
11. FAST5 data used in this work are the same from [21] and can
be obtained from SRA database through the accession code
SRP174366. Intermediate datasets used for this chapter can be
found at [Link]
EpinanoBookChapter.
12. Raw features across the three replicates were merged as previ-
ously described [21] using the following pseudo-code:

if (probM1  0.5 and ProbM2  0.5 and probM3  0.5):


probM = 1
else:
probM = (probM1 + probM2 + probM3) / 3
if (probM_wt/probM_ko > 1.5):
prediction = ’mod’
else:
prediction = ’unm’

Acknowledgments

We thank all members of the Novoa lab for their valuable insights
and discussion. We thank Rebeca Medina for obtaining the TapeS-
tation image used for Fig. 1. OB is supported by an international
PhD fellowship (UIPA) from the University of New South Wales.
This work was supported by the Australian Research Council
(DP180103571 to EMN) and the Spanish Ministry of Economy,
Industry and Competitiveness (MEIC) (PGC2018-098152-A-100
to EMN). We acknowledge the support of the MEIC to the EMBL
partnership, Centro de Excelencia Severo Ochoa, and CERCA
Program/Generalitat de Catalunya.
Identifying m6A Modifications Using Nanopore Sequencing 51

References
1. Cohn WE, Volkin E (1951) Nucleoside-5- m6A mRNA methylation during haematopoie-
0
-phosphates from ribonucleic acid. Nature tic stem cell differentiation. Nat Cell Biol
167:483–484 21:700–709
2. Adams JM, Cory S (1975) Modified nucleo- 12. Zhao BS, He C (2015) Fate by RNA methyla-
sides and bizarre 50 -termini in mouse myeloma tion: m6A steers stem cell pluripotency.
mRNA. Nature 255:28–33 Genome Biol 16:43
3. Desrosiers R, Friderici K, Rottman F (1974) 13. Geula S, Moshitch-Moshkovitz S,
Identification of methylated nucleosides in Dominissini D, Mansour AA, Kol N, Salmon-
messenger RNA from Novikoff hepatoma Divon M, Hershkovitz V, Peer E, Mor N,
cells. Proc Natl Acad Sci U S A 71:3971–3975 Manor YS, Ben-Haim MS, Eyal E, Yunger S,
4. Dubin DT, Taylor RH (1975) The methylation Pinto Y, Jaitin DA, Viukov S, Rais Y,
state of poly A-containing messenger RNA Krupalnik V, Chomsky E, Zerbib M, Maza I,
from cultured hamster cells. Nucleic Acids Rechavi Y, Massarwa R, Hanna S, Amit I, Leva-
Res 2:1653–1668 non EY, Amariglio N, Stern-Ginossar N,
5. Perry RP, Kelley DE, Friderici K, Rottman F Novershtern N, Rechavi G, Hanna JH (2015)
(1975) The methylated constituents of L cell Stem cells. m6A mRNA methylation facilitates
messenger RNA: evidence for an unusual clus- resolution of naı̈ve pluripotency toward differ-
ter at the 50 terminus. Cell 4:387–394 entiation. Science 347:1002–1006
6. Jia G, Fu Y, Zhao X, Dai Q, Zheng G, Yang Y, 14. Lence T, Akhtar J, Bayer M, Schmid K,
Yi C, Lindahl T, Pan T, Yang Y-G, He C (2011) Spindler L, Ho CH, Kreim N, Andrade-
N6-methyladenosine in nuclear RNA is a major Navarro MA, Poeck B, Helm M, Roignant
substrate of the obesity-associated FTO. Nat J-Y (2016) m6A modulates neuronal functions
Chem Biol 7:885–887 and sex determination in Drosophila. Nature
540:242–247
7. Zheng G, Dahl JA, Niu Y, Fedorcsak P, Huang
C-M, Li CJ, Vågbø CB, Shi Y, Wang W-L, 15. Haussmann IU, Bodi Z, Sanchez-Moran E,
Song S-H, Lu Z, Bosmans RPG, Dai Q, Hao Mongan NP, Archer N, Fray RG, Soller M
Y-J, Yang X, Zhao W-M, Tong W-M, Wang (2016) m6A potentiates Sxl alternative
X-J, Bogdan F, Furu K, Fu Y, Jia G, Zhao X, pre-mRNA splicing for robust Drosophila sex
Liu J, Krokan HE, Klungland A, Yang Y-G, He determination. Nature 540:301–304
C (2013) ALKBH5 is a mammalian RNA 16. Helm M, Motorin Y (2017) Detecting RNA
demethylase that impacts RNA metabolism modifications in the epitranscriptome: predict
and mouse fertility. Mol Cell 49:18–29 and validate. Nat Rev Genet 18:275–291
8. Dominissini D, Moshitch-Moshkovitz S, 17. Li X, Xiong X, Yi C (2016) Epitranscriptome
Schwartz S, Salmon-Divon M, Ungar L, sequencing technologies: decoding RNA mod-
Osenberg S, Cesarkas K, Jacob-Hirsch J, ifications. Nat Methods 14:23–31
Amariglio N, Kupiec M, Sorek R, Rechavi G 18. Linder B, Grozhik AV, Olarerin-George AO,
(2012) Topology of the human and mouse Meydan C, Mason CE, Jaffrey SR (2015)
m6A RNA methylomes revealed by m6A-seq. Single-nucleotide-resolution mapping of m6A
Nature 485:201–206 and m6Am throughout the transcriptome. Nat
9. Meyer KD, Saletore Y, Zumbo P, Elemento O, Methods 12:767–772
Mason CE, Jaffrey SR (2012) Comprehensive 19. Novoa EM, Mason CE, Mattick JS (2017)
analysis of mRNA methylation reveals enrich- Charting the unknown epitranscriptome. Nat
ment in 30 UTRs and near stop codons. Cell Rev Mol Cell Biol 18:339–340
149:1635–1646 20. Jonkhout N, Tran J, Smith MA, Schonrock N,
10. Hu Y, Ouyang Z, Sui X, Qi M, Li M, He Y, Mattick JS, Novoa EM (2017) The RNA mod-
Cao Y, Cao Q, Lu Q, Zhou S, Liu L, Liu L, ification landscape in human disease. RNA
Shen B, Shu W, Huo R (2020) Oocyte compe- 23:1754–1769
tence is maintained by m6A methyltransferase 21. Liu H, Begik O, Lucas MC, Ramirez JM,
KIAA1429-mediated RNA metabolism during Mason CE, Wiener D, Schwartz S, Mattick
mouse follicular development. Cell Death Dif- JS, Smith MA, Novoa EM (2019) Accurate
fer. [Link] detection of m6A RNA modifications in native
0516-1 RNA sequences. Nat Commun 10:4079
11. Lee H, Bao S, Qian Y, Geula S, Leslie J, 22. Lorenz DA, Sathe S, Einstein JM, Yeo GW
Zhang C, Hanna JH, Ding L (2019) Stage- (2020) Direct RNA sequencing enables m6A
specific requirement for Mettl3-dependent
52 Huanle Liu et al.

detection in endogenous transcript isoforms at (2019) Direct RNA sequencing reveals m6A
base-specific resolution. RNA 26:19–28 modifications on adenovirus RNA are neces-
23. Parker MT, Knop K, Sherwood AV, Schurch sary for efficient splicing. bioRxiv 865485
NJ, Mackinnon K, Gould PD, Hall AJ, Barton 25. Li H (2018) Minimap2: pairwise alignment for
GJ, Simpson GG (2020) Nanopore direct RNA nucleotide sequences. Bioinformatics
sequencing maps the complexity of Arabidopsis 34:3094–3100
mRNA processing and m6A modification. elife 26. Loman NJ, Quick J, Simpson JT (2015) A
9. [Link] complete bacterial genome assembled de novo
24. Price AM, Hayer KE, McIntyre ABR, Gokhale using only nanopore sequencing data. Nat
NS, Della Fera AN, Mason CE, Horner SM, Methods 12:733–735
Wilson AC, Depledge DP, Weitzman MD
Chapter 4

Adaptation of Human Ribosomal RNA for Nanopore


Sequencing of Canonical and Modified Nucleotides
Miten Jain, Hugh E. Olsen, Mark Akeson, and Robin Abu-Shumays

Abstract
Historically, RNA has been sequenced as cDNA copies derived from reverse transcription of cellular RNA
followed by PCR amplification. Recently, RNA sequencing using nanopores has emerged as an alternative.
Using this technology, individual cellular RNA strands are read directly as they are driven through nanoscale
pores by an applied voltage. The speed of translocation is regulated by a helicase that is loaded onto each
RNA strand by an adapter that also facilitates capture by the nanopore electric field. Here we describe a
technique for adapting human ribosomal RNA subunits for nanopore sequencing. Using this strategy, a
single Oxford Nanopore MinION run delivered 470,907 sequence reads of which 396,048 aligned
to ribosomal RNA, with 28S, 18S, 5.8S, and 5S coverage of 6053, 369,472, 16,058, and 4465
reads, respectively. Example alignments that reveal putative nucleotide modifications are provided.

Key words Ribosome, RNA, Nanopore, Sequencing, Single molecule

1 Introduction

Nanopore sequencing was invented in 1989 and first implemented


as a commercial device in 2014 [1]. Briefly, arrays of hundreds to
thousands of independently addressable nanopores are formed in
thin films on an application-specific integrated circuit (ASIC). An
applied voltage produces an ionic current through each pore. When
an RNA or DNA strand is captured and translocated single file
through the pore, the current changes in discrete steps on the
millisecond timescale. These steps correspond to the sequence of
nucleotides passing through the pore. Neural networks trained on
known DNA or RNA sequences are used to convert the ionic
current segment series into nucleotide sequences. Typical median
single-strand read accuracies are 94% for DNA and 87% for RNA.
The nanopore sensor reads RNA directly, permitting detection
of base modifications in the context of surrounding canonical bases
[2–6]. One of the main challenges to developing computational
methods for base modification detection is the requirement for

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

53
54 Miten Jain et al.

high-confidence training data. Additionally, the heterogeneity and


abundance of RNA modifications further increase the complexity.
Software tools like nanopolish [7] and Guppy [8] have recently
enabled detection of 5-methylcytosine in nanopore genomic DNA
data. Many groups are working on improved computational meth-
ods for detecting base modifications, both for RNA and DNA.
Modified ribonucleotides regulate ribosome function through
tuning of RNA folding, and through interactions with ribosomal
proteins and tRNAs [9–12]. Therefore, substantial current research
involves using nanopore sequencing as a tool for identification of
base modifications in rRNA. To date a majority of the modified
bases in human 18S rRNA have been associated with differences
between canonical and modified base ionic current signals (personal
communication). Here, we provide a detailed description of our
nanopore-based human nuclear-encoded rRNA sequencing proto-
col, with the key steps diagrammed in Fig. 1.

Fig. 1 Schematic of the human rRNA adaption and nanopore sequencing protocol. 18S rRNA is given as an
example. (a) (i) Total RNA extraction using TRIzol and chloroform. (ii) Annealing and ligation of a nanopore-
specific adapter to the 30 end of human 18S rRNA (red). Thirteen nucleotides of the adapter are reverse
complements to 13 nucleotides of the target strand. (iii) Ligation of the nanopore sequencing adapter that has
the motor protein preloaded. (iv) Addition of the adapted library to the nanopore for sequencing. (v) Basecalling
of individual nanopore RNA strands using Guppy. This process yields individual RNA strand sequence reads.
(b) A representative ionic current trace for a human 18S rRNA strand processed on the nanopore. Ionic current
components: (i) strand capture; (ii) ONT and 18S rRNA-splint adapter translocation; (iii) human 18S rRNA
translocation; and (iv) exit of the strand into the transcompartment. (c) Expanded view of the region where the
ionic current transitions from the nanopore adapter to the 30 end of the adapted 18S rRNA
Nanopore Sequencing of Human Nuclear rRNA 55

2 Materials

2.1 Total RNA 1. TRI Reagent.


Isolation from 2. 1-Bromo-3-chloro-propane or chloroform.
Flash-Frozen Cell
3. Isopropanol.
Pellets
4. 100% Ethanol.
5. Nuclease-free water.
6. 10 TE (Tris-EDTA): 100 mM Tris–HCl pH 7.6, 10 mM
EDTA. Sterile filter the stock solutions through 0.2μm filters
and dilute to 1 as needed.
7. Qubit™ HS RNA kit.
8. Flash-frozen cell pellet from cell line of interest.

2.2 Nanopore 1. NEB Quick Ligase Buffer and T4 DNA Ligase (2000 U/μL).
Sequencing 2. ONT SQK-RNA002 kit (or newer).
of Biological
3. Tris-NaCl-EDTA buffer: 10 mM Tris–HCl, pH 8, 1 mM
Human rRNA
EDTA, 50 mM NaCl.
4. 100% Ethanol.
5. Beckman Coulter Agencourt RNAClean XP Beads.
6. Magnet for bead-based purifications.
7. Nuclease-free water.
8. Oligomers for preparing biological human rRNA splints. The
sequence for individual oligomers is below:
Top strand: This oligomer is common across all the splints.
Note that this oligomer needs to have a 50 -phosphate group for
ligation in sequencing library preparation:
50 -/5PHOS/GGCTTCTTCTTGCTCTTAGGTAGTA
GGTTC-30
Human 5S rRNA bottom strand:
50 -CCTAAGAGCAAGAAGAAGCCAAAGCCTACAGCA-30
Human 5.8S rRNA bottom strand:
50 -CCTAAGAGCAAGAAGAAGCCAAGCGACGCTCAG-30
Human 18S rRNA bottom strand:
50 -CCTAAGAGCAAGAAGAAGCCTAATGATCCTTCC-30
Human 28S rRNA bottom strand:
50 -CCTAAGAGCAAGAAGAAGCCGACAAACCCTTGT-30

2.3 Generating 1. 10 Tris-NaCl-EDTA buffer: 100 mM Tris–HCl pH 8,


and Sequencing 10 mM EDTA, 500 mM NaCl. Filter sterilize the stock solu-
Canonical rRNAs tions through 0.2μm filters and dilute to 1 as needed.
(Optional, Only Needed 2. Nuclease-free water.
for IVT rRNAs) 3. Oligonucleotides for IVT templates: Prepare oligonucleotides
as 100μM stocks in nuclease-free water or TE. The
56 Miten Jain et al.

oligonucleotides produce IVT products for which all but ~12


50 -most nucleotides can be sequenced. For suggestions to
address this see Note 1.
For 5S and 5.8S rRNA, 5 and 5.8S top strand (the T7
promoter site is underlined):
50 -CATCATCATTTAATACGACTCACTATAG-30
5S bottom strand:
50 -AAAGCCTACAGCACCCGGTATTCCCAGGCGG
TCTCCCATCCAAGTACTAACCAGGCCCGACCCTGC
TTAGCTTCCGAGATCAGACGAGATCGGGCGCGTTC
AGGGTGGTATGGCCGTAGACTATAGTGAGTCGTATT
AAATGATGATG-30
5.8S bottom strand:
50 -AAGCGACGCTCAGACAGGCGTAGCCCCGGGA
GGAACCCGGGGCCGCAAGTGCGTTCGAAGTGTCGA
TGATCAATGTGTCCTGCAATTCACATTAATTCTCGCA
GCTAGCTGCGTTCTTCATCGACGCACGAGCCGAGTG
ATCCACCGCTAAGAGTCGCTATAGTGAGTCGTATT
AAATGATGATG-30
4. HiScribe™ T7 Quick High Yield RNA Synthesis kit.
5. RNasin (40 units/μL) or equivalent RNase inhibitor.
6. DNase I (RNase-free) (2000 units/mL) and 10 DNase I
buffer.
7. 100% Ethanol.
8. Beckman Coulter Agencourt RNAClean XP Beads.
9. Qubit™ dsDNA BR Assay Kit.
10. Qubit™ RNA BR Assay Kit.
11. Q5 DNA polymerase 2 master mix.
12. Genomic DNA from a human cell line (1–2μg).
13. 18S PCR primers: Prepare primers as 100μM stocks in
nuclease-free water or TE. The primers produce an IVT tem-
plate for an RNA for which all but ~12 50 -most nucleotides can
be sequenced. For suggestions to address this see Note 1.
18S forward primer (the T7 promoter site is underlined):
50 -CATCATCATTTAATACGACTCACTATAGTACCTG
GTTGATCCTGCCAGTAGC-30 .
18S reverse primer:
50 -TAATGATCCTTCCGCAGGTTCACCTACGGAAACC-30
14. 10 TBE (Tris-borate-EDTA): 890 mM Tris base, pH 7.6,
890 mM boric acid, 20 mM EDTA. Sterile filter the stock
solutions through 0.2μm filters and dilute to 1 as needed.
15. 50 TAE (Tris-acetate-EDTA): 2 M Tris base, 1 M glacial
acetic acid, 50 mM EDTA. Sterile filter the stock solutions
through 0.2μm filters and dilute to 1 as needed.
Nanopore Sequencing of Human Nuclear rRNA 57

16. Agarose gel standards DNA and RNA ladders.


17. SYBR™ Gold.
18. 6 gel electrophoresis loading dye with 1:1000 SYBR™ Gold.
19. Molecular biology-grade agarose.
20. RNase AWAY™ or other product for eliminating surface
RNases.
21. pcDNA 3.1(+) plasmid containing a 28S rRNA gene made by
Taoka et al. [13]. Requests for this plasmid can be directed to
the authors of this manuscript.
22. XhoI (20,000 units/mL) and 10 buffer.
23. Razor blades for gel excision.
24. Magnet for bead-based purifications.
25. D-tube dialyzer MWCO 3.5 kD MIDI columns (Novagen) or
other product for gel purification.
26. E. coli poly(A) polymerase (5000 units/mL) with 10 buffer.
27. 10 mM Adenosine 50 -triphosphate (ATP).
28. Molecular biology-grade bis-acrylamide 29:1 (40% solution):
for 5 and 5.8S rRNA gel purification.
29. 3 M NaOAC pH 5.2, nuclease free.
30. Glycogen, RNA grade (20 mg/mL).
31. NEB Quick Ligase Buffer and T4 DNA Ligase (2000 U/μL).
32. ONT SQK-RNA002 (or newer).

2.4 Nanopore Please refer to the Oxford Nanopore Technologies website for
Sequencing Hardware details:
and Software, [Link] For
and Reagents new users, the MinION Starter Pack is a good option.

3 Methods

For general RNA work practices see Note 2.

3.1 RNA Isolation 1. Add 4 mL of TRI Reagent per frozen pellet of 5  107 cells,
and vortex immediately.
2. Incubate this sample at room temperature for 5 min.
3. Add 400μL BCP (1-bromo-3-chloro-propane) or 200μL
CHCl3 (chloroform) per mL of sample, followed by vigorous
mixing by inversion.
4. Incubate this mixture at room temperature for 5 min and then
mix vigorously again.
5. Spin the mixed sample for 10 min at 12,000  g (4  C).
58 Miten Jain et al.

6. Pool the aqueous phase in a LoBind Eppendorf tube and add


equal volume of isopropanol.
7. Mix the tube followed by incubation at room temperature for
15 min.
8. Spin for 15 min at 12,000  g (4  C).
9. Remove the supernatant.
10. Wash the RNA pellet by adding 750μL 80% ethanol.
11. Spin for 5 min at 12,000  g (4  C).
12. Remove the supernatant.
13. Air-dry the pellet for 10 min.
14. Resuspend the pellet in nuclease-free water (100μL final vol-
ume) or TE buffer and quantify RNA using Qubit or similar.
15. The RNA can be stored at 80  C.

3.2 Oligomer Splint The sequence for individual oligomers is shown in Subheading.
Preparation for RNA 2.2.
Adaptation
1. To make an oligomer splint adapter, the top and the bottom
strands (four annealing reactions total) are hybridized at 10μM
each in TNE buffer.
2. Heat the mixture at 75  C for 1 min before slowly cooling to
room temperature in a thermocycler.

3.3 Splint Annealing, The following steps are taken from the Oxford Nanopore Direct
Ligation, and Cleanup RNA Sequencing protocol using the SQK-RNA002 kit (see
Note 3).
1. Prepare the RNA in nuclease-free water by transferring
1000 ng RNA to a 1.5 mL Eppendorf DNA LoBind tube.
Adjust the volume to 9μL with nuclease-free water. Mix thor-
oughly by flicking the tube to avoid unwanted shearing. Spin
down briefly in a microfuge.
2. In a 0.2 mL thin-walled PCR tube, mix the reagents in the
order outlined below. The custom rRNA splint adapter mix is
made by combining 1μL volume from each of the 10μM olig-
omer splints (5S, 5.8S, 18S, 28S).

Reagent Volume
NEBNext quick ligation reaction buffer 3.0μL
(5)
RNA CS (RCS), 110 nM (optional) 0.5μL
RNA 6.5μL (or 6.0μL if adding
RCS)

(continued)
Nanopore Sequencing of Human Nuclear rRNA 59

Reagent Volume
Custom rRNA splint adapter mix 4.0μL
T4 DNA Ligase (2000 U/μL) 1.5μL
Total 15.0μL

3. Mix by pipetting and spin down.


4. Incubate the reaction for 10 min at room temperature.
5. Add 25μL of nuclease-free H2O to bring up the volume to a
total of 40μL.
6. Resuspend the stock of Agencourt RNAClean XP beads by
vortexing.
7. Add 72μL (1.8) of resuspended RNAClean XP beads to the
reaction and mix by pipetting.
8. Incubate on a Hula mixer (rotator mixer) for 5 min at room
temperature.
9. Prepare 200μL of fresh 70% ethanol with nuclease-free water.
10. Spin down the sample and place on the magnetic stand. Keep
the tube on the magnet, and pipette off the supernatant.
11. Keep the tube on magnet and wash the beads with 150μL of
freshly prepared 70% ethanol without disturbing the pellet as
described below.
12. Keeping the magnetic rack on the benchtop, rotate the bead-
containing tube by 180 . Wait for the beads to migrate toward
the magnet and form a pellet.
13. Rotate the tube 180 again (back to the starting position) and
wait for the beads to pellet.
14. Remove the 70% ethanol using a pipette, and discard.
15. Spin down and place the tube back on the magnet. Pipette off
any residual 70% ethanol.
16. Remove the tube from the magnetic rack and resuspend pellet
in 20μL nuclease-free water.
17. Incubate for 5 min at room temperature.
18. Pellet the beads on a magnet until the eluate is clear and
colorless.
19. Pipette 20μL of eluate into a clean 1.5 mL Eppendorf DNA
LoBind tube.

3.4 Nanopore The following steps are taken from the Direct RNA Sequencing
Sequencing Adapter protocol using the SQK-RNA002 kit, both of which are supplied
Ligation and Cleanup by Oxford Nanopore Technologies.
60 Miten Jain et al.

1. In a clean 1.5 mL Eppendorf DNA LoBind tube, mix the


reagents in the following order:

Reagent Volume
Splint-annealed RNA from Subheading 3.3 (step 19) 20.0μL
NEBNext quick ligation reaction buffer (5) 8.0μL
RNA adapter (RMX) 6.0μL
Nuclease-free water 3.0μL
T4 DNA Ligase (2000 U/μL) 3.0μL
Total 40.0μL

2. Mix by pipetting.
3. Incubate the reaction for 10 min at room temperature.
4. Resuspend the stock of Agencourt RNAClean XP beads by
vortexing.
5. Add 60μL (1.5) of resuspended RNAClean XP beads to the
adapter ligation reaction and mix by pipetting.
6. Incubate on a Hula mixer (rotator mixer) for 5 min at room
temperature.
7. Spin down the sample and pellet on a magnet. Keep the tube
on the magnet, and pipette off the supernatant.
8. Add 150μL of the wash buffer (WSB) from the SQK-RNA002
kit to the beads. Close the tube lid and resuspend the beads by
flicking the tube.
9. Return the tube to the magnetic rack, allow beads to pellet, and
pipette off the supernatant.
10. Repeat steps 8 and 9.
11. Remove the tube from the magnetic rack and resuspend pellet
in 21μL elution buffer from the kit by gently flicking the tube.
12. Incubate for 10 min at room temperature.
13. Pellet the beads on a magnet until the eluate is clear and
colorless.
14. Remove and retain 21μL of eluate into a clean 1.5 mL Eppen-
dorf DNA LoBind tube.
15. [Optional] Quantify 1μL of nanopore-adapted rRNA using the
Qubit fluorometer RNA HS assay (recovery aim ~200 ng).

3.5 Nanopore For this part we recommend following the instructions provided by
Sequencing Oxford Nanopore Technologies. Below are the steps to follow for
the SQK-RNA002 kit. The steps involved here include:
Nanopore Sequencing of Human Nuclear rRNA 61

1. Priming the flow cell.


2. Loading the nanopore-adapted rRNA.
3. Starting the sequencing run.

3.6 Basecalling Nanopore data are basecalled using Guppy software that is provided
by ONT. Guppy has a model to be used for Direct RNA Sequenc-
ing runs. Please follow instructions from ONT for using the newest
version of Guppy.

3.7 Data Analysis Nanopore basecalling yields sequence reads in FASTQ format.
and Visualization These can be aligned to the reference human rRNA sequences
(in FASTA format) using minimap2 (with -ax map-ont setting)
[14]. This process will yield a SAM file (human-readable alignment
format) that can then be converted to a sorted BAM file (machine-
readable alignment file) using SAMtools [15]. Once this is done,
the sorted BAM file and the reference sequence FASTA file can be
uploaded into Integrative Genomics Viewer (IGV) [16] for visual
inspection or used for downstream analyses. IGV can be down-
loaded from the Broad Institute at no cost (Fig. 2) (see Notes 4
and 5).

3.8 Anticipated A conventional nanopore sequencing experiment in our laboratory


Throughput using GM12878 cell line RNA yielded 470,907 reads, of which
396,048 aligned to one of the four rRNA reference sequences (5S,
5.8S, 18S, and 28S). The breakdown of aligned sequences by
specific rRNA is shown in Table 1.

3.9 Optional: IVT-derived rRNA strands can be analyzed using nanopore devices
Nanopore Sequencing and used to train algorithms that identify modified rRNA nucleo-
of Human rRNA Copies tides. In summary, RNA synthesis of the canonical controls is
Composed performed by in vitro transcription (IVT) using the HiScribe™
of Canonical T7 Quick High Yield RNA Synthesis kit per the manufacturer’s
Nucleotides protocol. For the 5S and 5.8S rRNAs, synthetic DNA oligonucleo-
tides are used for the templates. For the 18S rRNA, a PCR template
is amplified from human genomic DNA. For the 28S rRNA, a
plasmid template constructed by Taoka et al. [13] is used. The
IVT products are purified prior to sequencing. Gel purification is
recommended for all IVTs and is required for making 28S rRNA.
The library preparation for the purified 5S, 5.8S, and 18S IVT
products follows the same procedures described for biological
rRNA described starting in Subheading 3.3. The gel-purified 28S
IVT product is polyadenylated with E. coli poly(A) polymerase.
Following cleanup of the polyadenylated 28S IVT product, it is
sequenced using the standard SQK-RNA002 kit protocol.
62 Miten Jain et al.

Fig. 2 IGV alignment of human 18S rRNA reads. Reads were aligned to the 18S rDNA reference genome
sequence [13] using minimap2. The top axis is the nucleotide position in the 1858 nt long 18S rRNA reference
sequence [13]. Gray color represents alignment coverage where read and reference sequences agree. In the
body of the alignment, purple colors represent insertions, white spaces with black lines represent deletions,
and the red, blue, green, and orange colors represent base-specific mismatches in the alignment. Each
horizontal line is an individual nanopore strand read. The density plot below the reference coordinates
represents alignment coverage. The expanded view shows examples of features within a 40 nt region.
These include alignment coverage, insertions, deletions, mismatches, and alignment coverage density, as
described above. The consistent C miscalls at position 406 reveal a likely uridine-to-pseudouridine
modification
Nanopore Sequencing of Human Nuclear rRNA 63

Table 1
Number of aligned nanopore strand reads per rRNA class

rRNA Number of reads


5S 4465
5.8S 16,058
18S 369,472
28S 6053

3.9.1 Synthesis of 5S 1. In two PCR tubes, one for the 5S and the other for the 5.8S
and 5.8S rRNA rRNA, combine 0.7μL 10 Tris-NaCl-EDTA, 165 pmol
(1.65μL of 100μM) of the top-strand oligonucleotide,
33 pmol (0.66μL of 50μM) of bottom-strand oligonucleotide,
and nuclease-free H20 (4.69μL) to a total volume of 7μL.
2. Heat the mixture to 75  C for 15 s and slowly cool to 25  C in a
thermocycler to anneal the top and bottom DNA strands.
3. Assemble the IVT reactions in PCR tubes using the HiScribe™
T7 Quick High Yield RNA Synthesis kit reagents as follows:

Reagent Volume
Annealed top and bottom DNA strands from step 2 7.0μL
NTP buffer 10.0μL
T7 RNA polymerase mix 2.0μL
RNasin (40 units/μL) or equivalent RNase inhibitor 1μL
Total 20.0μL

4. Allow reaction to proceed for 16 h at 37  C in a thermocycler.


5. Add 3μL of 10 DNase I buffer, 1μL DNase I (RNase-free)
(2000 units/mL), and 6.0μL of nuclease-free water to bring
the reaction volume to 30μL. Incubate at 37  C for 15 min.
6. Add 10μL of nuclease-free H2O to bring the volume to 40μL.
Transfer solution to a 1.5 mL Eppendorf DNA LoBind tube.
7. Follow the procedure for Agencourt RNAClean XP bead puri-
fication as described in Subheading 3.9.2 below. Use 1.8
beads (72μL) and use 70% ethanol for washes.
8. Determine the concentration of the eluate using the Qubit™
fluorometer RNA BR assay. See Note 6 regarding concentra-
tion determination method and see Note 7 regarding monitor-
ing reactions with gels.
9. Purify the RNA as described in Subheading 3.9.7 below.
64 Miten Jain et al.

3.9.2 RNA Ampure XP 1. Resuspend the stock of Agencourt RNAClean XP beads by


Bead Purifications (for Use vortexing.
in Subheadings 3.9.1, 2. Based on the size of the rRNA species, different ratios of beads
3.9.2, 3.9.3, 3.9.4, 3.9.5, to solution volume are used:
3.9.6, 3.9.7, and 3.9.8)
(a) 1.8 for 5S and 5.8S rRNA, for example 72μL of beads
for a 40μL solution
(b) 0.8 for 18S rRNA
(c) 0.6 for 28S rRNA
Add the appropriate volume of beads and mix by pipetting
up and down.
3. Incubate on a Hula mixer (rotator mixer) for 5 min at room
temperature.
4. Make 500μL fresh ethanol solution for washes to be used in
step 6:
(a) For 5 and 5.8S make 80% ethanol: 400μL 100% ethanol
+100μL nuclease-free water.
(b) For 18S and 28S make 70% ethanol: 350μL 100% ethanol
+150μL nuclease-free water.
5. Spin down the sample and pellet on a magnet. Keep the tube
on the magnet, and pipette off the supernatant.
6. Keep the tube on the magnet and wash the beads with 150μL
of freshly prepared 70% (or 80%) ethanol without disturbing
the pellet as described below.
7. Keeping the magnetic rack on the benchtop, rotate the bead-
containing tube by 180 . Wait for the beads to migrate toward
the magnet and form a pellet.
8. Rotate the tube 180 again (back to the starting position) and
wait for the beads to pellet.
9. Remove the ethanol solution using a pipette, and discard.
10. Spin down and place the tube back on the magnet. Pipette off
any residual ethanol solution.
11. Repeat steps 6–10.
12. Remove the tube from the magnetic rack and resuspend pellet
in 15μL nuclease-free water by pipetting up and down.
13. Incubate on a Hula mixer (rotator mixer) for 5 min at room
temperature.
14. Return to the magnet and wait for the pellet to form and the
solution to clear.
15. Remove and retain ~15μL of eluate to a clean 1.5 mL Eppen-
dorf DNA LoBind tube.
Nanopore Sequencing of Human Nuclear rRNA 65

3.9.3 PCR Synthesis Generate the DNA template with PCR using Q5 master mix (2).
of the DNA Template
1. Input the Following PCR Protocol into Thermocycler:
for 18S rRNA IVT
Initial denaturation: 30 s at 98  C
33 cycles of:
5 s denaturation at 98  C
10 s annealing of 72  C
100 s of extension at 72  C
Final extension: 120 s at 72  C
2. Assemble the Reaction in a PCR Tube on Ice:

Reagent Volume
Q5 master mix (2) 25μL
Genomic DNA from human cell line X μL (2.4μg)
18S forward primer (10μM stock) 2.5μL
18S reverse primer (10μM stock) 2.5μL
Nuclease-free H20 50 (X + 30) μL
Total 50.0μL

3. Run PCR using the protocol described in step 1.


4. Verify the presence of a single amplified band of correct size
(~1.8 kb) by running 1μL of the PCR reaction with a standard
ladder on a 1% agarose gel using TAE or TBE buffer. Given that
the amplification product is in PCR buffer, it runs more slowly
than the standard ladder and may appear as high as 3 kb com-
pared with the ladder.
5. Assuming that a robust single band is seen on the gel for the
PCR reaction, transfer the solution to a clean 1.5 mL Eppen-
dorf DNA LoBind tube.
6. Follow the procedure for Agencourt RNAClean XP bead puri-
fication described in Subheading 3.9.2. Use 0.8 beads (40μL)
and use 70% ethanol for washes.
7. Determine the concentration of bead-purified PCR template
using the Qubit™ dsDNA BR assay kit.

3.9.4 Synthesis 1. Assemble the IVT reaction using the HiScribe™ T7 Quick
of Canonical 18S rRNA High Yield RNA Synthesis kit reagents as follows:
from the PCR-Derived DNA
Template
66 Miten Jain et al.

Reagent Volume
Purified DNA from Subheading 3.9.3 X μL (0.5–1μg)
NTP buffer 10.0μL
T7 RNA polymerase mix 2.0μL
RNasin (40 units/μL) or equivalent RNase inhibitor 1μL
Nuclease-free water 20 (X + 13) μL
Total 20.0μL

2. Run reaction for 2–3 h at RT.


3. Add 3μL of 10 DNase I buffer, 1μL DNase I (2000 units/
mL) (RNase-free), and 6.0μL of nuclease-free water to bring
the reaction volume to 30μL. Incubate at 37  C for 15 min.
4. Add 10μL of nuclease-free H2O to bring the volume to 40μL.
Transfer solution to a 1.5 mL DNA LoBind tube.
5. Follow the procedure for Agencourt RNAClean XP bead puri-
fication as described in Subheading 3.9.2. Use 0.8 beads
(32μL) and use 70% ethanol for washes.
6. Determine the concentration using the Qubit™ fluorometer
RNA BR assay kit.
7. Purify the RNA as described in Subheading 3.9.7 below.

3.9.5 Preparing A pcDNA3.1(+) plasmid containing the human 28S gene,


a Linearized DNA Plasmid described by Taoka et al. [13], is used for synthesizing the canonical
Bearing a 28S Gene 28S rRNA. The plasmid is transformed into competent DH5-alpha
E. coli and plasmid DNA isolated using a standard miniprep
protocol.
Preparation of linearized template:
1. Set up a XhoI restriction digest using 4μg of the plasmid as
follows (see Note 8):

Reagent Volume
28S containing plasmid (4μg) X μL
10 buffer for XhoI 10.0μL
Xho I (20,000 units/mL) 10.0μL
Nuclease-free H20 100 (X + 20)μL
Total 100.0μL

2. Digest for 2 h at 37  C.
Nanopore Sequencing of Human Nuclear rRNA 67

3. Clean up and concentrate the reaction product using Agen-


court RNAClean XP bead purification as described in Subhead-
ing 3.9.2. Use 0.6 beads (60μL) and use 70% ethanol for
washes.
4. Prepare samples for electrophoresis and gel purification (see
Note 9). The samples include (A) DNA ladder with 6 DNA
loading dye containing 1:1000 SYBR™ Gold; (B) DNA digest
with 6 DNA loading dye containing 1:1000 SYBR™ Gold:
this is done by using 15μL of bead-purified digested DNA
(step 3) + 3μL 6 loading dye containing 1:1000 SYBR™
Gold; and C) ~200 ng uncut plasmid in 6 loading dye con-
taining 1:1000 SYBR™ Gold.
5. Prepare a 0.8% TAE/agarose gel in 1 TAE using a comb that
can accommodate 20μL of sample per well. If 50 mL is appro-
priate for your gel rig, measure out 0.4 g agarose and add it to
50 mL 1 TAE. Microwave the solution to dissolve the aga-
rose. Allow to cool so that it may be handled. Pour into gel rig
using appropriate comb. Wait for gel to solidify. Put gel in gel
box and cover with 1 TAE to indicated fill line so that there is
about 1 cm of buffer covering the gel. Carefully load samples A,
B, and C in wells. With the negative electrode toward the base
of the gel, run gel at approximately 90 min at 120 V or until
bands are well separated.
6. Place the gel on a piece of Saran Wrap in a darkroom. Wear face
shield, glasses, lab coat, and gloves. Using a handheld UV
source, note positions of the uncut and cut plasmid samples
in their respective lanes. Having identified the linearized DNA
(~11 kB) band for sample B, excise it using a clean razor blade.
Shave off excess gel material from the excised band with the
razor blade. Work efficiently to avoid damaging the DNA from
prolonged UV exposure.
7. Purify the DNA from the gel. Several commercially available
products and do-it-yourself protocols are available for gel puri-
fication. We use the D-tube Dialyzer Midi columns according
to the manufacturer’s protocol (see Note 10).
8. Regardless of the specific method used for gel purification, the
final volume of the purified DNA should be ~15μL in nuclease-
free water or TE.
9. Quantitate the DNA concentration using Qubit™ dsDNA BR
Assay kit.

3.9.6 Synthesis of 28S 1. Assemble the IVT reaction using the HiScribe™ T7 Quick
rRNA from the Plasmid High Yield RNA Synthesis kit reagents in a PCR tube as
Template follows:
68 Miten Jain et al.

Reagent Volume
Purified linearized DNA from Subheading 3.9, step 5 X μL
(1.5–2.0μg)
NTP buffer 20.0μL
T7 RNA polymerase mix 4.0μL
RNasin (40 units/ul) or equivalent RNase inhibitor, 2.0μL
optional
Nuclease-free water 40 (X + 26)μ
L
Total 40.0μL

2. Run the reaction for 3 h at 37  C.


3. Add 5μL of 10 DNase I buffer, 2μL DNase I (2000 units/
mL), and 3.0μL of nuclease-free water to bring the reaction
volume to 50μL. Incubate at 37  C for 15 min.
4. Transfer solution to a 1.5 mL DNA LoBind tube.
5. Follow the procedure for Agencourt RNAClean XP bead puri-
fication as described in Subheading 3.9.2. Use 0.6 beads
(30μL) and use 70% ethanol for washes.
6. Determine the concentration using the Qubit fluorometer
RNA BR assay kit.
7. Gel purify the RNA as described in Subheading 3.9.7 below.

3.9.7 Purification 1. The IVT products must be purified prior to sequencing. In our
of Canonical Transcripts experience, gel purification is the method that results in the
best read coverage and throughput. This is especially true for
the 28S IVT product. Other options for purification include
phenol/chloroform extraction, followed by ethanol precipita-
tion, and use of spin columns (see Note 11).
2. The general scheme for gel purification involves running the
IVT sample along with a sizing ladder on a denaturing PAGE
or a non-denaturing agarose gel. The gel is pre- or post-stained
with SYBR™ Gold and the appropriately sized RNA (121 nt
for 5S, 155 nt for 5.8S, 1.8 kb for 18S, and 5 kb for 28S) is
identified and excised.
3. Following gel excision, we have had success using the D-tube
Dialyzers MWCO 3.5 kD according to the manufacturer’s
protocol to purify the RNA. This involves electroelution and
ethanol precipitation. Other gel purification kits require differ-
ent steps and may need to be optimized.
4. Following purification determine the concentration of the sam-
ple using Qubit.
Nanopore Sequencing of Human Nuclear rRNA 69

5. Once purified, the 5S, 5.8S, and 18S RNA are ready for
sequencing (Subheading 3.9.9).
6. The purified 28S RNA is next polyadenylated as described
below in Subheading 3.9.8.

3.9.8 Polyadenylation 1. Set up a 40μL reaction containing 3–4μg of gel-purified canon-


of the 28S rRNA IVT ical 28S rRNA.
Product
Reagent Volume
Gel-purified canonical 28S rRNA from Subheading X μL (3.0–4.0μg)
3.9.6
10 buffer 4.0μL
10 mM ATP 4.0μL
E. coli poly(A) polymerase 2.0μL
Nuclease-free H20 40 (X + 10) μL
Total 40.0μL

2. Run for 1 h at 37  C.
3. Follow the procedure for Agencourt RNAClean XP bead puri-
fication as described in Subheading 3.9.2. Use 0.6 beads
(24μL) and use 70% ethanol for washes.
4. Measure the concentration using a Qubit™ BR RNA Assay kit.
One μg of this material will be used for the library (see Sub-
heading 3.9.10).

3.9.9 Library Preparation Splint annealing follows the procedures described in Subheading
Using Splint Adapters 3.2. Please note that the 28S splint is not needed. The custom
for Canonical 5S, 5.8S, rRNA splint adapter mix listed below is made by combining 1μL
and 18S rRNAs volume from each of the 10μM oligomer splints (5S, 5.8S, 18S).
The following amounts of purified IVT rRNA are recommended:
500 ng of 5S, 500 ng of 5.8S, and 350 ng of 18S.
1. Set up the first ligation as follows:

Reagent Volume
NEBNext quick ligation reaction buffer 3.0μL
(5)
RNA CS (RCS), 110 nM (optional) 0.5μL
IVT rRNAs for 5S, 5.8S, 18S 7.5μL (or 7.0μL if adding
RCS)
Custom rRNA splint adapter mix 3.0μL
T4 DNA Ligase (2000 U/μL) 1.5μL
Total 15.0μL
70 Miten Jain et al.

2. Follow the procedures beginning in Subheading 3.3, step


3, through Subheading 3.7 for the sequencing, basecalling,
data analysis, and visualization of the data. They are the same
as described for the biological rRNAs. For information on
expected throughput see Note 12.

3.9.10 Library Because the sample is polyadenylated, the standard ONT mRNA
Preparation library protocol (SQK-RNA002) is used. We suggest starting with
for Polyadenylated 1μg of polyadenylated 28S IVT RNA for the library.
Canonical 28S rRNA
1. Set up the first ligation as follows in a 1.5 mL DNA LoBind
tube.

Reagent Volume
NEBNext quick ligation reaction buffer 3.0μL
(5)
RNA CS (RCS), 110 nM (optional) 0.5μL
RNA-1μg polyadenylated 28S IVT rRNA X μL (or X–0.5μL if adding
RCS)
RTA adapter 1.0μL
Nuclease-free H20 Bring to 13.5μL
T4 DNA Ligase (2000 U/μL) 1.5μL
Total 15.0μL

2. Follow the procedures beginning in Subheading 3.3, step 3,


through Subheading 3.7 to complete the library preparation,
sequencing, basecalling, data analysis, and visualization of the
data. For information on expected throughput see Note 13.

4 Notes

1. Concerning 50 coverage of canonical (IVT) rRNAs, in our


current design, the 50 -most ~12 nucleotides do not get
sequenced. One improvement to address this would be to
design the IVT template to produce a transcript with an addi-
tional ~15 nt 50 of the rRNA start. For the 5S and 5.8S rRNAs,
the bottom oligonucleotides could be redesigned. For exam-
ple, for 5S the top strand would remain the same:
50 -CATCATCATTTAATACGACTCACTATAG-30
But the bottom strand could be:
50 -AAAGCCTACAGCACCCGGTATTCCCAGGCGG
TCTCCCATCCAAGTACTAACCAGGCCCGACCCTGC
TTAGCTTCCGAGATCAGACGAGATCGGGCGCGTTC
AGGGTGGTATGGCCGTAGANNNNNNNNNNNNNNN
CTATAGTGAGTCGTATTAAATGATGATG-30
Nanopore Sequencing of Human Nuclear rRNA 71

where N is a randomly chosen sequence. This would pro-


duce sequence 50 of the start of the rRNA.
For 18S, one could design the forward PCR primer to
generate an amplicon with sequence 50 to the start of the
rRNA. This is shown below where N indicates nucleotides
between the T7 promoter and 18S start.
T7 promoter 18S start

50 -CATCATCATTTAATACGACTCACTATAGNNNNNNNNNNNNTACCTGGTTGATCCTGC-
CAGTAGC-30

To pursue this, PCR optimization will be required. Likely


the best results would come from using the sequences just 50 to
the 18S start found in the genomic DNA sequence for the
positions indicated by “N” above.
2. When working with RNA, wear gloves, use pipette tips with
barrier filters, use a designated RNA bench if possible, use
designated pipetman, clean surfaces with RNase AWAY™ or
other RNase surface decontaminant, and use nuclease-free
water.
3. There are some alterations (highlighted in bold) in this proto-
col to ensure recovery of the smaller rRNA molecules (5S and
5.8S).
4. It is important to be mindful that there are 17 gene copies for
5S rRNA, 6 gene copies for 5.8S rRNA, 5 gene copies for 18S
rRNA, and 5 gene copies for 28S rRNA in the human nuclear
genome. Some of the gene copies have identical sequences
(e.g., 5S and 5.8S), and others vary in sequence composition
(e.g., 18S and 28S). For the purpose of alignments, we use one
gene copy for each rRNA. Our recommendation is to use the
reference sequences curated by Taoka et al. [13].
5. Conventional nanopore sequencing cannot read ~12 nucleo-
tides at the 50 end of each strand due to the architecture of the
helicase/pore interface.
6. Concerning concentration determination:
Nanodrop vs. Qubit. We have noticed that using of nanodrop
to estimate the concentration of IVT-generated rRNA over-
estimates the concentration of the transcript. We think this is
because of unincorporated ribonucleotides remaining after
cleanup contributing to the measurement. This issue is avoided
by determining the concentration using Qubit. This also is the
case following polyadenylation where ATP carried over from
the reaction contributes to the concentration read by nanodrop
overestimating the actual amount of transcript present.
72 Miten Jain et al.

7. The production of IVT rRNAs for sequencing is a multistep


process. Therefore, particularly when running the procedure
for the first time, or when troubleshooting, it is worthwhile to
monitor reactions by running small amounts of reactions on
gels to assess the quality of the RNAs produced. The downside
of this is that it takes time and can decrease the amount of final
material available for sequencing. If this is an issue, multiple
reactions can be set up and pooled to insure adequate amounts
of product.
8. Alternatively, SpeI may be used to linearize the plasmid for
IVT [13].
9. Regarding the gel purification of linearized 28S plasmid DNA
following XhoI digest, purification of the linearized DNA is
necessary because uncut plasmid present can dominate the IVT
reaction producing unwanted concatenated sequences.
10. We have used the D-tube Dialyzers MWCO 3.5 kD (Novagen)
Midi columns for gel purifications of DNA and RNA according
to the manufacturer’s protocol. We are able to get good recov-
ery; however the procedure is relatively lengthy compared to
some other gel purification kits we have not evaluated. It is best
to run the electroelution steps in TAE as opposed to TBE.
After electroelution the sample is ethanol precipitated with 3
volume of ethanol overnight at 20  C. The ethanol precipi-
tation includes 0.3 M NaOAc pH 5.2 and 1μL of glycogen is
added to help follow the position of the pellet. Following
precipitation and 70% ethanol washes, the pellet is dried,
brought up in 15μL nuclease-free water, and quantitated by
Qubit.
11. In our hands, particularly for long transcripts, such as the 28S
rRNA, gel excision and electroelution of the appropriate-sized
transcript from a gel help to ensure complete sequence cover-
age. Smaller incomplete transcripts are more efficiently cap-
tured during nanopore sequencing and can prevent the
detection of full-length product in a mixture. We recommend
this for the purification of all IVT rRNAs. In the case of 5S and
5.8S, the unpurified IVT reactions can be run on denaturing
PAGE gels (8%) prior to excision. For the 18S and 28S IVT
reactions, the samples are run on 0.8% TAE agarose gels. For
these gels care is taken to clean the gel box and combs with
RNase away before casting the gel. Given that the IVT reac-
tions contain incomplete products and due to losses in purifi-
cation, run a minimum of 4μg of the IVT reaction for
purification. For best success, for each rRNA IVT run several
lanes of 4μg IVT, excise the appropriate band slices, purify, and
pool. We recommend running a gel to verify that a transcript of
the correct size is produced and that the purification method
Nanopore Sequencing of Human Nuclear rRNA 73

chosen is effective in enriching for the transcript size of inter-


est. It is always better to have too much material than too little.
When possible, set up multiple IVT reactions to safeguard
against losses and to ensure success.
12. Regarding throughput for IVT runs for 5S, 5.8S, and 18S
rRNA, in the past, the number of reads per flow cell was
especially low for 5S (3589 reads) and 5.8S rRNA
(5471 reads) compared to 18S rRNA (90,081 reads). The
current protocol as described herein uses 500 ng per flow cell
of gel-purified 5S and 5.8S canonical rRNA which should
improve those yields.
13. 28S canonical rRNA expected throughput and explanation of
why a plasmid-based template was used: The expected
throughput using the current protocol is 291,929 reads. We
found that using the linearized 28S-containing plasmid for the
template, followed by polyadenylation of the resultant IVT
product, gave more reads and better quality alignments than
when we used a PCR-based template for IVT. PCR amplifica-
tion of the 28S was challenging. We were not able to generate
an amplicon using the Q5 polymerase (NEB) but were success-
ful using the Primestar GXL polymerase (Takara). Using of
RNA synthesized from the 28S PCR template followed by
the splint ligation resulted in 4230 reads compared with
291,929 for the method described starting in Subheading
3.9.5.

References
1. Deamer D, Akeson M, Branton D (2016) insights into structural variants and enables
Three decades of nanopore sequencing. Nat modification analysis. [Link]
Biotechnol 34:518–524 1101/483693
2. Garalde DR, Snell EA, Jachimowicz D et al 7. Simpson JT, Workman RE, Zuzarte PC et al
(2018) Highly parallel direct RNA sequencing (2017) Detecting DNA cytosine methylation
on an array of nanopores. Nat Methods. using nanopore sequencing. Nat Methods.
[Link] [Link]
3. Workman RE, Tang AD, Tang PS et al (2019) 8. Methylation calling — Medaka 1.0.3 docu-
Nanopore native RNA sequencing of a human mentation. [Link]
poly(A) transcriptome. Nat Methods medaka/[Link]. Accessed 10 Jul
16:1297–1305 2020
4. Smith AM, Jain M, Mulroney L et al (2019) 9. Liang X-H, Liu Q, Fournier MJ (2009) Loss of
Reading canonical and modified nucleobases in rRNA modifications in the decoding center of
16S ribosomal RNA using nanopore native the ribosome impairs translation and strongly
RNA sequencing. PLoS One 14:e0216709 delays pre-rRNA processing. RNA
5. Kim D, Lee J-Y, Yang J-S et al (2020) The 15:1716–1728
architecture of SARS-CoV-2 transcriptome. 10. King TH, Liu B, McCully RR, Fournier MJ
Cell 181 e10:914–921 (2003) Ribosome structure and activity are
6. Viehweger A, Krautwurst S, Lamkiewicz K et al altered in cells lacking snoRNPs that form
Direct RNA nanopore sequencing of full- pseudouridines in the peptidyl transferase cen-
length coronavirus genomes provides novel ter. Mol Cell 11:425–435
74 Miten Jain et al.

11. Narla A, Ebert BL (2010) Ribosomopathies: 14. Li H (2018) Minimap2: pairwise alignment for
human disorders of ribosome dysfunction. nucleotide sequences. Bioinformatics. https://
Blood 115:3196–3205 [Link]/10.1093/bioinformatics/bty191
12. Lafontaine DLJ (2015) Noncoding RNAs in 15. Li H, Handsaker B, Wysoker A et al (2009)
eukaryotic ribosome biogenesis and function. The sequence alignment/map format and
Nat Struct Mol Biol 22:11–19 SAMtools. Bioinformatics 25:2078–2079
13. Taoka M, Nobe Y, Yamaki Y et al (2018) Land- 16. Robinson JT, Thorvaldsdóttir H, Winckler W
scape of the complete RNA chemical modifica- et al (2011) Integrative genomics viewer. Nat
tions in the human 80S ribosome. Nucleic Biotechnol 29:24–26
Acids Res 46:9289–9298
Part III

Next-Generation Sequencing Approaches to Detect and


Capture Modified RNAs
Chapter 5

AlkAniline-Seq: A Highly Sensitive and Specific Method


for Simultaneous Mapping of 7-Methyl-guanosine (m7G)
and 3-Methyl-cytosine (m3C) in RNAs by High-Throughput
Sequencing
Virginie Marchand, Lilia Ayadi, Valérie Bourguignon-Igel, Mark Helm,
and Yuri Motorin

Abstract
Epitranscriptomics is an emerging field where the development of high-throughput analytical technologies
is essential to profile the dynamics of RNA modifications under different conditions. Despite important
advances during the last 10 years, the number of RNA modifications detectable by next-generation
sequencing is restricted to a very limited subset. Here, we describe a highly efficient and fast method called
AlkAniline-Seq to map simultaneously two different RNA modifications: 7-methyl-guanosine (m7G) and
3-methyl-cytosine (m3C) in RNA. Our protocol is based on three subsequent chemical/enzymatic steps
allowing the enrichment of RNA fragments ending at position n + 1 to the modified nucleotide, without
any prior RNA selection. Therefore, AlkAniline-Seq demonstrates an outstanding sensitivity and specificity
for these two RNA modifications. We have validated AlkAniline-Seq using bacterial, yeast, and human total
RNA, and here we present, as an example, a synthetic view of the complete profiling of these RNA
modifications in S. cerevisiae tRNAs.

Key words 7-Methyl-guanosine, 3-Methyl-cytosine, High-throughput sequencing, RNA modifica-


tion mapping, Bacteria

1 Introduction

RNA modifications are extremely diverse throughout evolution


and present both in noncoding and in coding RNAs. Mapping of
these RNA modifications at single-nucleotide resolution represents
a big step forward since only 7 (m6A, m6Am, m1A, m5C, hm5C,
Nm, and ψ) [1–9] out of >150 RNA modifications can be detected
by high-throughput sequencing. Therefore, there is an urgent need
to develop new methods for analyzing many still insufficiently
studied RNA modifications. One attractive candidate is 7-methyl-
guanosine (m7G), a modified nucleotide which does not disrupt

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

77
78 Virginie Marchand et al.

A. CULTURE, RNA EXTRACTION AND QUALITY CONTROL


Culture human HEK cells bacteria E. coli yeast S. cerevisiae

RNA extraction TRIzol reagent Hot acid phenol extraction

RNA precipitation RNA precipitation

RNA quantity and quality control


10
9 Total RNA from yeast S. cerevisiae
RNA quantity 8
7
RNA quality control
absorbance

25S
(Spectrophotomer) 6 (Capillary electrophoresis) tRNA 18S 25S rRNA
5 18S
4
3
2
1
0 tRNA
220
230
240
250
260
270
280
290
300
310
320
330

wavelength Ladder (nt)

B. ALKANILINE-SEQ
AlkAniline Treatment 5’ P OH3’ Total RNA from yeast S. cerevisiae
Mild Alkaline Hydrolysis
5’ P
P3’
5’HO
OH 3’ RNA precipitation

Extensive 5’ and 3’ dephosphorylation


5’HO
OH3’
5’HO
OH 3’ RNA precipitation

Aniline cleavage
5’HO
OH3’
5’HO X 3’
5’ P OH 3’ RNA precipitation
Library preparation and clean-up
3’ Adaptor ligation PCR
P5

RT primer hybridization
P5
P5 BC P7
5’ Adaptor ligation
P5 BC P7
P5 BC P7
First strand cDNA synthesis

Clean up

Library quantification and quality control


Library quantity Library quality control
(Fluorometer) (Capillary electrophoresis)

Ladder (bp)

C. BIOINFORMATIC PIPELINE
Multiplexing and Sequencing Scoring
Normalized cleavage = 6*1000/14 = 428 units Stop-rao = 6/(6+2) = 0.75
Starng reads = 6
5’-end counts

6 Passing reads = 2
Total 5’-ends = 14

P5 N₊₁ BC P7 N₊₁
posions

Bioinformatic workflow
Trimming Alignment Keep only Conversion to BED Count 5’-ends Score calculaƟon
(TrimmomaƟc) (BowƟe 2.0) mapped reads (bedtools) and coverage (awk) and graphs (R)
parameters

TruSeq3-SE.[Link]
-D 15 -R 2 samtools view -h -F 4 -b
LEADING:30
TRAILING:30 -N 0 -L 10
SLIDINGWINDOW:4:15 -i S,1,1.15
MINLEN:8 AVGQUAL:30

Fig. 1 Overview of experimental and analysis steps of AlkAniline-Seq protocol. (a) Human total RNA is
extracted using the standard TRIzol protocol; extraction of total RNAs from bacterial and yeast cells is
Mapping of m7G and m3C in RNA 79

base-pairing and thus may be potentially present even in protein-


coding regions of mRNAs. A specific detection of m7G in RNA was
described in the 1970’s and consists of a two-step chemical
approach, combining sodium borohydride (NaBH4) reduction fol-
lowed by aniline cleavage [10, 11]. However, we found that cou-
pling these reactions to a next-generation sequencing technique
leads to a high background signal and numerous false discovery hits
[12]. Therefore, we designed a completely novel strategy (named
AlkAniline-Seq) to map m7G (and as revealed later, also m3C)
residues in RNA by combining a three-step protocol: a mild alkaline
hydrolysis, an extensive 30 - and 50 -dephosphorylation, and finally
aniline cleavage of the RNA chain. A key feature in our protocol is
the direct 50 -adapter ligation to the 50 -phosphate resulting from
phosphodiester bond scission induced by aniline cleavage at the
RNA abasic site (Fig. 1). In contrast to other NGS approaches, this
provides the basis for both high sensitivity and specificity of the
AlkAniline-Seq technology [12]. Here, we describe in detail the
procedure for AlkAniline-Seq from the RNA extraction to the
bioinformatic analysis, and as an example, we show the complete
profiling of m7G/m3C tRNA modifications in yeast S. cerevisiae
(Fig. 2).

2 Materials

Prepare all solutions using RNase-free water. Wear gloves to pre-


vent degradation of RNA samples by RNases.

Fig. 1 (continued) performed using the “hot acid phenol” protocol. Quantification and quality of RNA are
assessed by spectrophotometer and RNA integrity is evaluated using capillary electrophoresis. (b) RNAs are
subjected to mild alkaline hydrolysis, generating fragments of about 200–300 nt in length. Fragments are
extensively dephosphorylated to remove all possible 50 - and 30 -phosphate residues from RNA. Afterwards,
RNAs are subjected to aniline cleavage at the abasic sites generated by decomposition of m7G/m3C residues
upon alkaline hydrolysis. Library preparation is performed by direct ligation of pre-adenylated 30 -adapter,
followed by RT primer annealing and ligation of 50 -adapter and RT primer extension. The resulting cDNA is
converted to sequencing library by second-strand DNA synthesis and limited PCR step for barcoding and
inclusion of Illumina P5 and P7 sequences. Quantification of the library is performed using a fluorometer and
the quality is assessed by capillary electrophoresis. (c) Scoring of AlkAniline-Seq signals is done by calculation
of both normalized cleavage (ratio of reads starting at a given RNA position to total number of reads mapped to
this RNA) and stop-ratio, corresponding to the ratio of reads starting at a given position to reads overlapping
it. Normalized cleavage provides exceptional selectivity, while stop-ratio is very sensitive, but captures
numerous false-positive hits. Bioinformatic analysis consists of trimming step to keep adapter-free reads,
bowtie 2.0 end-to-end alignment followed by counting of reads mapped to different positions in RNA and
calculation of AlkAniline-Seq scores
80 Virginie Marchand et al.

Fig. 2 AlkAniline-Seq signals for m7G and m3C residues in yeast S. cerevisiae tRNAs. (a) m7G residues
detected in 10 m7G46-modified yeast tRNAs. Initiator tRNAMeti also has m7G46 residue but is not illustrated
here since it has a low coverage in sequencing. tRNA positions are numbered sequentially from 50 - to 30 -end;
however specific tRNA numbering scheme (with 17a, 20ab, and extra nucleotides in variable loop) is not
Mapping of m7G and m3C in RNA 81

2.1 Total RNA 1. Yeast or bacteria cell culture (10 mL of culture grown to an
Extraction OD600 of 0.7–2).
2.1.1 Yeast and Bacteria 2. RNase-free 1.5 mL microcentrifuge tubes.
Total RNA Extraction by Hot 3. RNase-free water.
Acid Phenol 4. AE buffer: 50 mM NaOAc in water, pH 5.2, 10 mM EDTA.
5. 10% (w/v) SDS.
6. Acid phenol, pH 4.5.
7. Phenol:chloroform:isoamyl alcohol mix ([Link], v/v).
8. Chloroform.
9. 3 M NaOAc, pH 5.2.
10. 96% Ethanol.
11. 80% Ethanol.
12. Dry ice.
13. Refrigerated tabletop centrifuge.
14. Water bath or heating block set to 65  C.

2.1.2 Human Total RNA 1. Human HEK 293 cells (8–10  106 cells grown to 90% con-
Extraction by TRIzol™ fluence in a cell culture dish).
2. 1 PBS (8.1 mM Na2HPO4, 1.47 mM KH2PO4, 137 mM
NaCl, and 2.7 mM KCl).
3. RNase-free 1.5 mL microcentrifuge tubes.
4. RNase-free water.
5. TRIzol™ reagent.
6. Chloroform.
7. Refrigerated tabletop centrifuge.
8. Isopropanol.
9. Glycoblue™ coprecipitant: 15 mg/mL.
10. 75% Ethanol.
11. Cell scraper.

Fig. 2 (continued) respected. Bar plot represents normalized cleavage score for a given RNA species in four
yeast strains (WT—wild-type BY 4741 strain, ΔTRM140—strain deleted for m3C32:tRNA-methyltransferase,
ΔTRM8/ΔTRM82—two strains deleted for genes encoding subunits of heterodimeric m7G46:tRNA-methyl-
transferase). Position of m7G46 is indicated by a gray dot. Some of the yeast tRNA species also contain
adjacent dihydrouridine (D47), purple dot. This residue is also partially cleaved during mild alkaline hydrolysis
(see small peaks at position 47) but becomes visible only in the absence of a major signal corresponding to
m7G. Some weak cleavage signals are also visible in the region 16–20, and correspond to D residues in the
D-loop of tRNA. (b) Two yeast tRNAs containing m3C32 residues (shown as pink dot). Closely related isoforms
of yeast tRNASer (AGA, CGA, and TGA) are collapsed into unique tRNASer(NGA) sequence. tRNA positions are
numbered as in (a)
82 Virginie Marchand et al.

2.2 RNA 1. UV-visible spectrophotometer for small volumes: Any kind of


Quantification UV-visible spectrophotometer allowing measurements of 1 μL
and Quality samples. We use NanoDrop™ 2000.
Assessment 2. RNase-free 1.5 mL microcentrifuge tubes.
3. RNase-free water.
4. Agilent 2100 Bioanalyzer or 2200 TapeStation (Agilent Tech-
nologies) or Experion (BioRad) or LabChip GX (Caliper): We
use an Agilent 2100 Bioanalyzer.
5. Agilent RNA 6000 Pico kit (quantitative range 50–5000 pg/μL).
6. Chip priming station.
7. Tabletop centrifuge.

2.3 AlkAniline-Seq 1. Sodium bicarbonate buffer: 100 mM Sodium bicarbonate,


pH 9.2.
2.3.1 Alkaline Hydrolysis
2. RNase-free water.
3. Individual RNase-free 0.2 mL PCR tubes.
4. PCR thermal cycler (we use Agilent SureCycler 8000).
5. RNase-free 1.5 mL microcentrifuge tubes.
6. 96% Ethanol.
7. 15 mg/mL Glycoblue™ coprecipitant.
8. 3 M NaOAc, pH 5.2.
9. 80% ethanol.
10. Dewar containing liquid nitrogen.
11. Refrigerated tabletop centrifuge.

2.3.2 Extensive RNA 1. RNase-free 0.2 mL PCR tubes, strips of 8.


Dephosphorylation 2. Flat PCR caps, strips of 8.
and RNA Precipitation
3. PCR thermal cycler.
4. 5 U/μL Antarctic phosphatase.
5. 40 U/μL RiboLock RNase inhibitor.
6. RNase-free 1.5 mL microcentrifuge tubes.
7. Phenol:chloroform:isoamyl alcohol mix ([Link], v/v/v).
8. Chloroform.
9. 15 mg/mL Glycoblue™ coprecipitant.
10. 96% Ethanol.
11. 3 M NaOAc, pH 5.2.
12. 80% Ethanol.
13. Refrigerated tabletop centrifuge.
Mapping of m7G and m3C in RNA 83

2.3.3 Aniline Cleavage 1. Aniline: 1 M in acetic acid, pH 4.5.


and RNA Precipitation 2. RNase-free 1.5 mL microcentrifuge tubes.
3. RNase-free water.
4. Agitating heating block (we use Eppendorf Thermomixer®).
5. 15 mg/mL Glycoblue™ coprecipitant.
6. 96% Ethanol.
7. 3 M NaOAc, pH 5.2.
8. 80% Ethanol.
9. Refrigerated tabletop centrifuge.

2.4 Library 1. NEBNext® Multiplex Small RNA Library Prep Set for Illu-
Preparation mina® (set 1 or 2) (see Note 1).
2. RNase-free 0.2 mL PCR tubes, strips of 8.
3. Flat PCR caps, strips of 8.
4. Thermal cycler.

2.5 Library 1. GeneJET PCR Purification Kit or equivalent.


Purification 2. RNase-free 1.5 mL microcentrifuge tubes.
3. RNase-free 1.5 mL DNA low-binding tubes.
4. Tabletop centrifuge.

2.6 Library 1. Any kind of fluorometer able to quantify DNA library with
Quantification high sensitivity (e.g., Qubit® 2.0 fluorometer).
and Quality 2. Qubit® dsDNA HS Assay kit (0.2–100 ng).
Assessment
3. Thin-walled polypropylene tubes of 500 μL compatible with
the fluorometer (e.g., Qubit® Assay Tube or Axygen®
PCR-05-C tubes).
4. Agilent 2100 Bioanalyzer (Agilent Technologies).
5. Agilent HS DNA kit (quantitative range 5–500 pg/μL).
6. Chip priming station.
7. RNase-free 1.5 mL microcentrifuge tubes.
8. Tabletop centrifuge.

2.7 Library 1. Any kind of Illumina sequencers (starting from MiSeq to


Sequencing NovaSeq).
2. Any appropriate sequencing kit for a single read length of
35–50 nt.

2.8 Bioinformatic 1. Unix (Linux) server (we use Illumina Compute Dell R815
Analysis server).
84 Virginie Marchand et al.

2. Adapter trimming software Trimmomatic (current version


0.36 [Link]
3. Alignment software Bowtie 2.0 (current version 2.2.9 http://
[Link]/bowtie2/[Link]).
4. Samtools (current version 1.9, [Link]
[Link]).
5. Bedtools v2.25.0 ([Link]
est/[Link]).
6. R environment ver. 3.3.3 for calculations of normalized cleav-
age and stop-ratio scores and data analysis.

3 Methods

3.1 Total RNA The following protocol details total RNA isolation from yeast/
Extraction bacteria using hot acid phenol and is adapted from [13].
3.1.1 Yeast and Bacteria 1. Transfer yeast/bacteria cell culture in 1.5 mL microcentrifuge
Total RNA Extraction by Hot tubes and pellet cells by centrifugation at 1200  g for 5 min at
Acid Phenol room temperature. Discard the supernatant.
2. Resuspend cells in 1 mL of RNase-free water. Centrifuge for
1 min at full speed at room temperature. Discard the
supernatant.
3. Resuspend the cell pellet in 400 μL of AE buffer.
4. Add 40 μL of 10% SDS and vortex until the pellet is completely
resuspended.
5. Add 440 μL of acid phenol. Vortex.
6. Incubate for 4 min at 65  C and then cool the mixture rapidly
on dry ice for 2–3 min.
7. Centrifuge the samples for 10 min at full speed at room tem-
perature. Carefully transfer the aqueous (upper) phase to a new
1.5 mL microcentrifuge tube.
8. Add 420 μL of phenol:chloroform:IAA, vortex, and centrifuge
for 10 min at full speed at room temperature.
9. Transfer the aqueous phase to a new 1.5 mL centrifuge tube.
Add 400 μL of chloroform. Vortex and centrifuge at full speed
at room temperature for 10 min.
10. Transfer the aqueous phase to a new 1.5 mL centrifuge tube.
Add 40 μL of 3 M NaOAc and 1 mL of 96% ethanol. Place at
80  C for at least 30 min.
11. Centrifuge for 30 min at full speed at 4  C.
12. Discard the supernatant and wash pellet with 500 μL of 80%
ethanol.
13. Centrifuge for 5 min at full speed at 4  C.
Mapping of m7G and m3C in RNA 85

14. Discard the supernatant; centrifuge again for a short spin.


15. Remove any remaining liquid with a pipette.
16. Incubate samples with open lid for 2 min at 37  C or 5 min at
room temperature.
17. Resuspend the pellet with 10 μL of RNase-free water and pool
your samples.
18. Quantify yeast or bacteria total RNA samples by measuring
A260nm using a UV spectrophotometer (see Note 2) (see Sub-
heading 3.2). Check the quality of the corresponding samples
by using the Agilent 2100 Bioanalyzer (see Subheading 3.2).

3.1.2 Human Total RNA Isolate total RNA using TRIzol™ following the manufacturer’s
Extraction by TRIzol™ instructions.
1. Wash HEK 293 cells grown in a cell culture dish with 1.5 mL
1 PBS.
2. After PBS removal, add 1 mL of TRIzol™ directly to the cell
culture dish to lyse the cells, scrap the cells, and pipet the lysate
up and down several times to homogenize.
3. Incubate for 5 min at room temperature to get complete RNP
dissociation.
4. Add 200 μL chloroform, vortex, and incubate for 5 min at
room temperature.
5. Centrifuge for 15 min at 12,000  g at room temperature.
6. Transfer the aqueous phase containing RNA to a new 1.5 mL
microcentrifuge tube, add 500 μL of isopropanol and 1 μL of
Glycoblue™, and mix by inverting the tube up and down
several times.
7. Incubate for 10 min at room temperature.
8. Centrifuge for 10 min at 12,000  g at 4  C.
9. Discard the supernatant and wash pellet with 1 mL of 75%
ethanol.
10. Centrifuge for 5 min at 12,000  g at 4  C.
11. Discard the supernatant, and centrifuge again for a short spin.
12. Remove any remaining liquid.
13. Incubate with open lid for 2 min at 37  C or 5 min at room
temperature.
14. Resuspend the pellet with 50 μL of RNase-free water.
15. Quantify human total RNA samples by measuring A260nm
using a UV spectrophotometer (see Subheading 3.2). Check
86 Virginie Marchand et al.

the quality of your samples by using the Agilent 2100 Bioana-


lyzer (see Subheading 3.2).

3.2 RNA Carry out all procedures at room temperature.


Quantification
and Quality Control

3.2.1 RNA Quantification 1. On a Nanodrop 2000 start screen, select the “Nucleic Acid”
application.
2. After the wavelength verification test, select the type of sample
to measure, in this case “RNA.”
3. Prepare the blank/buffer solution used for sample resuspen-
sion but without any trace of RNA (e.g., RNase-free water).
4. Load 1 μL of the blank solution to the bottom pedestal, lower
the arm, and click on the “Blank” button.
5. Wipe the upper and lower pedestal using a dry wipe, load 1 μL
of one of your samples of interest to the bottom pedestal, lower
the arm, and click “Measure.”
6. Analyze the data obtained for your different RNA samples. For
“pure” RNA, the ratio of A260/A280 should be 2; the ratio of
A260/A230 should be in the range of 1.8–2.2 (see Notes 3
and 4).

3.2.2 RNA Quality 1. Before starting, equilibrate all solutions of the kit at room
Assessment temperature for at least 30 min in the dark. Vortex and spin
down before use.
2. Transfer 550 μL of gel matrix (red-cap vial) into a spin filter
provided in the kit.
3. Centrifuge for 10 min at 1500  g at room temperature.
4. Prepare 65 μL aliquots of the gel and store them at 4  C for a
maximum of 1 month.
5. Prepare the gel-dye mix by mixing 1 μL of RNA dye concen-
trate to a gel aliquot.
6. Centrifuge for 10 min at 13,000  g at room temperature.
7. Dilute your RNA samples for quantification on the Nanodrop
to 3–5 ng/μL with RNase-free water to be within the optimal
range concentration of the assay.
8. Add 1 μL of your diluted RNA samples to 11 different 1.5 mL
tubes already containing 5 μL of RNA marker (green-cap vial)
(see Note 5). Mix by pipetting up and down.
9. Mix 1 μL of the ladder (see Note 6) with 5 μL of RNA marker
(green-cap vial). Mix by pipetting up and down.
Mapping of m7G and m3C in RNA 87

10. Prepare the chip priming station. Adjust the syringe clip to the
highest top position.
11. Load 9 μL of the gel-dye mix in the well marked with a “G”
surrounded by a black circle.
12. Close the chip priming station properly and press the plunger
of the syringe until it is held by the clip.
13. Wait for 30 s and then release the clip.
14. Wait for 5 s until the plunger stops and pull it slowly back to the
1 mL position of the syringe.
15. Open the chip priming station and load 9 μL of the gel-dye mix
in the two other wells marked “G.”
16. Load 9 μL of the conditioning solution (white-cap vial) in the
well marked “CS.”
17. Load 6 μL of the diluted ladder in the well marked with a
ladder.
18. Load 6 μL of the diluted RNA samples in the wells marked
1–11.
19. Inspect the chip and make sure that no liquid spills are present
on the edges of the wells.
20. Insert the chip in the Agilent 2100 Bioanalyzer and close the
lid (see Note 7).
21. Select the following assay “Eukaryote Total RNA Pico series
II” in the 2100 Expert Software screen.
22. Press “Start” to begin the chip to run (see Note 8).
23. After the run, immediately remove the chip and clean the
electrodes with the electrode cleaner filled with 350 μL of
RNase-free water.
24. Analyze the results of the chip (see Fig. 1).

3.3 AlkAniline-Seq 1. Prepare one 1.5 mL tube per sample to be analyzed (“precipi-
tation tube”) containing 10 μL of NaOAc, 1 μL of Glyco-
3.3.1 Alkaline Hydrolysis
blue™, and 1 mL of 96% ethanol for subsequent
precipitation of the sample (store at 20  C until further use).
2. Dilute your RNA samples to a concentration of 10 ng/μL with
RNase-free water.
3. To individual PCR tubes, add 10 μL of each of your diluted
RNA samples, and keep on ice until further use.
4. Add 10 μL of bicarbonate buffer and mix by pipetting up
and down.
5. Incubate in a thermal cycler preheated at 95  C. Start a timer
and incubate for 5 min (see Note 9).
6. Proceed with the next sample every 30 s.
88 Virginie Marchand et al.

7. Stop each reaction after the required time at 95  C by spinning


down the PCR microtube and add the whole sample into the
corresponding 1.5 mL precipitation tube from Step 1.
8. Mix by inverting the tube several times and snap freeze in liquid
nitrogen.
9. Recover tubes from liquid nitrogen and centrifuge your sam-
ples for 30 min at 4  C at full speed in a microcentrifuge.
10. Remove supernatant and make sure not to lose the pellet.
11. Wash with 600 μL of 80% ethanol.
12. Centrifuge your samples for 10 min at 4  C at full speed.
13. Remove supernatant.
14. Centrifuge your samples for a short spin.
15. Remove any remaining liquid.
16. Incubate your samples with open lid for 2 min at 37  C or
5 min at room temperature.
17. Resuspend the pellet with 16 μL of RNase-free water.

3.3.2 Extensive 1. Combine 16 μL of your treated RNA samples in a PCR tube


Dephosphorylation with 2 μL of phosphatase buffer, 1 μL of RiboLock RNase
and RNA Precipitation Inhibitor, and 1 μL of Antarctic phosphatase.
2. Mix by pipetting up and down.
3. Incubate the PCR tubes for 1 h at 37  C and then for 5 min at
70  C (to inactivate the phosphatase) and store for indefinite
hold at 4  C in a thermal cycler.
4. Add 180 μL of RNase-free water and 200 μL of phenol:chlo-
roform:IAA mix, and vortex.
5. Centrifuge for 10 min at full speed at room temperature.
6. Transfer the supernatant in a new 1.5 mL tube, add 200 μL of
chloroform, and vortex.
7. Centrifuge for 10 min at full speed at room temperature.
8. Transfer the supernatant in a new 1.5 mL tube and precipitate
the sample by adding 20 μL of 3 M NaOAc, 1 μL of Glycoblue,
and 1 mL 96% ethanol.
9. Incubate at 80  C for 30 min and centrifuge your samples for
30 min at 4  C at full speed.
10. Take out the supernatant and wash the pellet with 500 μL of
80% ethanol.
11. Centrifuge for 10 min at full speed at 4  C.
12. Remove supernatant and dry the pellet for 2 min at 37  C.
Mapping of m7G and m3C in RNA 89

3.3.3 Aniline Cleavage 1. Resuspend the pellet in 20 μL of aniline.


and RNA Precipitation 2. Incubate for 15 min at 60  C in the dark.
3. Stop the reaction by adding 180 μL of RNase-free water, 20 μL
of NaOAc, 1 μL of Glycoblue, and 600 μL of 96% ethanol and
mix by inverting up and down.
4. Incubate at 80  C for at least 1 h.
5. Centrifuge for 30 min at 4  C at full speed.
6. Remove supernatant and wash the pellet with 500 μL of 80%
ethanol.
7. Centrifuge for 10 min at full speed at 4  C.
8. Take out the supernatant and air-dry the pellet for 2 min at
37  C.
9. Resuspend the pellet in 6 μL of RNase-free water.

3.4 Library Upon opening the NEBNext® Multiplex Small RNA Library Prep
Preparation Set for Illumina®, resuspend 50 SR adapter (yellow-cap vial) in 120
μL of RNase-free water and store at 80  C.
1. Mix 6 μL of RNA sample with 1 μL of 30 SR adapter (green-cap
vial) (previously diluted ½ in RNase-free water) in a PCR tube.
2. Incubate for 2 min at 70  C in a preheated thermal cycler.
Transfer immediately to ice.
3. Add 10 μL of 30 ligation buffer (green-cap vial) and 3 μL of 30
ligation enzyme (green-cap vial).
4. Incubate for 1 h at 25  C in a thermal cycler.
5. Add 4.5 μL of RNase-free water and 1 μL of SR RT primer
(pink-cap vial) (previously diluted ½ in RNase-free water).
6. Incubate for 5 min at 75  C, 15 min at 37  C, and 15 min at
25  C.
7. Within the last 15 min of incubation, add 1.1*n (n ¼ number
of samples) μL of the 50 SR adapter (yellow-cap vial) (previously
diluted ½ in RNase-free water) in an individual PCR tube.
8. Denature the 50 SR adapter in a thermal cycler for 2 min at
70  C and immediately place the tube on ice (see Note 10).
9. Add 1 μL of 50 SR adapter (previously denatured step 8), 1 μL
of 50 ligation reaction buffer (yellow-cap vial), and 2.5 μL of
ligase enzyme mix (yellow-cap vial).
10. Incubate for 1 h at 25  C in a thermal cycler.
11. Add the following components to the adapter-ligated RNA
mix from the previous step, 8 μL of first-strand synthesis reac-
tion buffer (red-cap vial), 1 μL of murine RNase inhibitor
(red-cap vial), and 1 μL of ProtoScript II reverse transcriptase
(red-cap vial), and mix well by pipetting up and down.
90 Virginie Marchand et al.

12. Incubate for 1 h at 50  C.


13. Immediately proceed to PCR amplification (see Note 11). Add
the following components to the RT reaction mix from the
previous step: 50 μL of LongAmp Taq Master Mix (blue-cap
vial), 2.5 μL of SR primer (blue-cap vial), 2.5 μL of index
primer (see Note 12), and 5 μL of RNase-free water. Mix well.
14. Perform the following PCR cycling conditions: 1 cycle of initial
denaturation for 30 s at 94  C, 12–15 cycles of denaturation
for 15 s at 94  C, annealing for 30 s at 62  C, extension for 15 s
at 70  C, and 1 cycle of final extension for 5 min at 70  C and
store at 4  C for indefinite hold.

3.5 Purification Using the GeneJET PCR Purification Kit, carry out all procedures
of the Library at room temperature.
1. Transfer the PCR mix to a 1.5 mL tube and add 100 μL of
binding buffer. Mix thoroughly.
2. Transfer the solution to the purification column. Centrifuge at
full speed for 30 s. Discard the flow-through.
3. Add 700 μL of wash buffer to the column and centrifuge at full
speed for 30 s. Discard the flow-through.
4. Centrifuge the empty column for an additional 1 min.
5. Transfer the column to a clean 1.5 mL DNA low-binding tube.
Add 30 μL of elution buffer to the center of the column
membrane and centrifuge at full speed for 1 min.
6. Store the purified library at 20  C until further use.

3.6 Library 1. Before starting, incubate all solutions of the Qubit dsDNA HS
Quantification assay kit at room temperature for at least 30 min. The kit
provides the concentrated assay reagent, dilution buffer, and
pre-diluted standards.
2. Prepare the dye working solution by diluting the concentrated
assay reagent 1:200 in dilution buffer. Prepare 200 μL of
working solution for each sample and two additional standards.
3. Prepare the two standards annotated “C” and “D” by mixing
10 μL of standard with 190 μL of working solution.
4. Add working solution to 1 μL of library sample to obtain 200
μL in total.
5. Vortex the tubes for 2 s and incubate for 2 min at room
temperature.
6. Insert the tubes into the Qubit® 2.0 Fluorometer and proceed
with measurements: on the home screen of the Qubit® 2.0
Fluorometer, choose the type of assay (e.g., “HS DNA”) for
which you want to perform a new calibration.
7. Press “Yes” to read new standards.
Mapping of m7G and m3C in RNA 91

8. When indicated, insert the standard tube and press “Read.”


Standards #1 and #2 correspond to standards “C” and “D,”
respectively.
9. Once the calibration is done, insert each sample and press
“Read” to make the measurements. Check that the value of
your samples is within the assay’s range, and press “Calculate
Stock Conc” (see Note 13).

3.7 Library Quality 1. Before starting the experiments, incubate all solutions of the
Assessment Agilent High Sensitivity DNA kit at room temperature for at
least 30 min in the dark. Vortex them and spin them down
before use.
2. Add 15 μL of high-sensitivity DNA dye concentrate (blue-cap
vial) into a high-sensitivity DNA gel matrix vial (red-cap vial)
(see Note 14).
3. Vortex for 10 s and transfer the gel-dye mix to the center of the
spin filter.
4. Centrifuge for 10 min at 2240  g.
5. Add 1 μL of each of your library to 11 different tubes of 1.5 mL
already containing 5 μL of RNA marker (green-cap vial). Mix
by pipetting up and down.
6. Mix 1 μL of the ladder (yellow-cap vial) with 5 μL of high-
sensitivity DNA marker (green-cap vial). Mix by pipetting up
and down.
7. Prepare the chip priming station. Adjust the syringe clip to the
lowest top position.
8. Load 9 μL of the gel-dye mix in the well marked with a “G”
surrounded by a black circle.
9. Close the chip priming station properly and press the plunger
of the syringe until it is held by the clip.
10. Wait for 1 min and then release the clip.
11. Wait for 5 s until the plunger stops and pull it slowly back to the
1 mL position of the syringe.
12. Open the chip priming station and load 9 μL of the gel-dye mix
in the three other wells marked “G.”
13. Load 6 μL of the diluted ladder in the well marked with a
ladder.
14. Load 6 μL of the diluted library samples in the wells labeled
1–11.
15. Insert the chip in the Agilent 2100 Bioanalyzer, close the lid,
and select the following assay “High Sensitivity DNA” in the
2100 Expert Software screen.
92 Virginie Marchand et al.

16. Press “Start” to begin the chip to run.


17. After the run, immediately remove the chip and clean the
electrodes with the electrode cleaner filled with 350 μL of
RNase-free water.
18. Analyze the results of the chip.

3.8 Library 1. For sequencing, libraries are multiplexed and diluted to


Sequencing 6–8 pM final concentration.
2. Recommended sequencing depth or coverage for RNAs is
~5–10 million reads/sample.
3. Sequencing length may vary from 35 to 50 nt in a single-
read mode.

3.9 Bioinformatic 1. Trim adapter sequences of raw reads (FastQ files) using Trim-
Analysis momatic with the following parameters: java -jar
[Link] SE -phred33 [Link] out-
[Link] ILLUMINACLIP :TruSeq3-S[Link] LEAD-
ING:30 TRAILING:30 SLIDINGWINDOW:4:15 AVGQUAL:30
MINLEN:17 (see Note 15).
2. Align the trimmed reads to the appropriate reference sequence
(E. coli or yeast rRNA/tRNA dataset, described in [14]) using
bowtie2 with the following parameters: bowtie2 -D 15 -R
2 -N 0
-L 10 -i S,1,1.15 -x <bt2-idx> -U <r> --S. The use
of soft trimming is not recommended.
3. Mapped reads are extracted from the *.sam file by RNA ID and
converted to *.bed format using bedtools v2.25.0.
4. Count the 50 -ends in the produced *.bed file using Unix awk
command: awk ’{print $2}’ <*.bed> | sort | uniq -c |
awk ’{print $3,$2,$1,$4}’ | sort --n.
5. Calculate normalized cleavage and stop-ratio scores for each
position. Normalized cleavage is calculated as a proportion of
reads starting at a given position divided by the total number of
reads mapped to a given RNA species (1000) and stop-ratio
corresponds to the number of reads starting at a given position
divided by the number of reads covering this position in RNA
(Fig. 1). Normalized cleavage varies from 10–25 for back-
ground values to 1000 if all reads in RNA start at one single
position. Stop-ratio varies from 0 to 1; values >0.75 generally
correspond to m7G/m3C signals in stoichiometrically modi-
fied RNA (Fig. 2).
Mapping of m7G and m3C in RNA 93

4 Notes

1. The kit NEBNext® Multiplex Small RNA Library Prep Set for
Illumina® (set 1) includes a set of 12 barcoding primers (num-
bered 1–12) that will be used for multiplexing reactions during
PCR amplification. There is also a version set 2 with primers
(numbers 13–24). If you do not need these barcoding primers,
you may order a similar kit without the primers and use any
other source of barcoding primers (Illumina or NEB).
2. The typical amount obtained with 1 mL of a haploid wild-type
yeast culture (BY4741 or BY4742) or bacteria culture (DH5α)
grown to an OD600 of 1 is about 15–30 μg.
3. If your RNA sample is diluted with RNase-free water instead of
10 mM Tris-EDTA (TE) pH 8.0, the ratio of A260/A280 may
be below 2.0 due to the lower pH of water [15]. A ratio of
A260/A280 of 1.8 for samples diluted in RNase-free water is
considered “pure” for RNA.
4. If your RNA sample is contaminated by phenol or chaotropic
salts (e.g., guanidinium thiocyanate used in TRIzol™ extrac-
tion or other protocols), this will result in a ratio of A260/
A230 below 1.8. Another round of phenol:choroform:isoamyl
alcohol (PCA) extraction and two successive steps of chloro-
form extraction followed by ethanol precipitation are recom-
mended in this case before alkaline digestion.
5. In case you are working with less than 11 samples, add 1 μL of
RNase-free water to the empty wells.
6. The ladder loaded in the Pico RNA chip is provided in a
separate package and should be prepared before the beginning
of the experiment: spin down the tube and transfer 10 μL to a
RNase-free tube. Heat for 2 min at 70  C. Cool down on ice
and add 90 μL of RNase-free water. Prepare 5 μL aliquots using
the Safe-Lock PCR tubes provided in the kit and store at
70  C. Before use, thaw one tube and keep it on ice. The
ladder is quite stable at 70  C and may be used for at least
4 months.
7. RNase contamination problems of the Bioanalyzer electrodes
are very frequent and will affect the RNA integrity number of
your samples. Therefore, if the Agilent 2100 Bioanalyzer is also
frequently used to run DNA chips, it is strongly recommended
to use a dedicated electrode cartridge only for RNA assays. In
addition, we recommend for each chip to load an internal RNA
control (total RNA preparation with a known RIN >9). If you
encounter contamination problems, soak the electrode car-
tridge into an RNaseZap® decontamination solution for at
94 Virginie Marchand et al.

least 10 min, then rinse the electrodes with RNase-free water,


and let them dry out overnight.
8. The Agilent 2100 Bioanalyzer is very sensitive to vibrations and
this may affect your results. Therefore, make sure that no
vibrations occur during the run.
9. Fragmentation time should be adjusted for each RNA prepara-
tion depending on the species and quality of the RNA used. We
recommend testing 3–4 different times of fragmentation to
define the appropriate conditions for mild hydrolysis.
10. Do not leave the heated adapter on ice for more than 5–10 min
before proceeding to the next step; this may impact your
library preparation.
11. We recommend proceeding immediately with PCR amplifica-
tion. However, if it is not possible, inactivate the RT by heating
for 15 min at 70  C and cool down the reaction at 4  C for
1–3 h or safely store the reactions at 20  C overnight.
12. Make sure to use only combinations of compatible primers for
barcoding. Most Illumina sequencers use a green laser
(or LED) to read G and T nucleotides and a red laser
(or LED) to read A and C nucleotides. Within each sequencing
cycle, at least one nucleotide for each color channel must be
read in the index to ensure proper reading of the barcode
sequence. Use as a reference the following guide (ScriptSeq™
Index PCR primers, Illumina) for verification of barcode com-
patibility or check compatibility with Illumina Experimental
Manager software.
13. This quantification step is crucial. Make sure to quantify all
your libraries properly since an under- or overestimated quan-
tification will interfere with subsequent sequencing read pro-
portion and quality.
14. The high-sensitivity DNA gel-dye mix is stable for 1 month at
4  C protected from light.
15. MINLEN parameter can vary, but we use the minimal length
of 17 nt during trimming to avoid ambiguously mapped reads.

Acknowledgments

This work was supported by a joint ANR-DFG grant HTRNAMod


(ANR-13-ISV8-0001/HE 3397/8-1) to MH and YM.

References
1. Hussain S, Aleksic J, Blanco S, Dietmann S, 2. Novoa EM, Mason CE, Mattick JS (2017)
Frye M (2013) Characterizing Charting the unknown epitranscriptome. Nat
5-methylcytosine in the mammalian epitran- Rev Mol Cell Biol 18:339–340
scriptome. Genome Biol 14:215
Mapping of m7G and m3C in RNA 95

3. Schwartz S (2016) Cracking the epitranscrip- 10. Zueva VS, Mankin AS, Bogdanov AA, Baratova
tome. RNA N Y N 22:169–174 LA (1985) Specific fragmentation of tRNA and
4. Helm M, Motorin Y (2017) Detecting RNA rRNA at a 7-methylguanine residue in the pres-
modifications in the epitranscriptome: predict ence of methylated carrier RNA. Eur J Bio-
and validate. Nat Rev Genet 18:275–291 chem 146:679–687
5. Molinie B, Wang J, Lim KS, Hillebrand R, Lu 11. Wintermeyer W, Zachau HG (1975) Tertiary
Z-X, Van Wittenberghe N, Howard BD, structure interactions of 7-methylguanosine in
Daneshvar K, Mullen AC, Dedon P et al yeast tRNA Phe as studied by borohydride
(2016) m(6)A-LAIC-seq reveals the census reduction. FEBS Lett 58:306–309
and complexity of the m(6)A epitranscriptome. 12. Marchand V, Ayadi L, Ernst FGM, Hertler J,
Nat Methods 13:692–698 Bourguignon-Igel V, Galvanin A, Kotter A,
6. Schwartz S, Motorin Y (2017) Next- Helm M, Lafontaine DLJ, Motorin Y (2018)
generation sequencing technologies for detec- AlkAniline-Seq: profiling of m7G and m3C
tion of modified nucleotides in RNAs. RNA RNA modifications at single nucleotide resolu-
Biol 14:1124–1137 tion. Angew Chem Int Ed Engl 57
7. Meyer KD, Jaffrey SR (2014) The dynamic (51):16785–16790
epitranscriptome: N6-methyladenosine and 13. Schmitt ME, Brown TA, Trumpower BL
gene expression control. Nat Rev Mol Cell (1990) A rapid and simple method for prepara-
Biol 15:313–326 tion of RNA from Saccharomyces cerevisiae.
8. Li X, Peng J, Yi C (2017) Transcriptome-wide Nucleic Acids Res 18:3091–3092
mapping of N1-methyladenosine methylome. 14. Marchand V, Pichot F, Thüring K, Ayadi L,
Methods Mol Biol Clifton NJ 1562:245–255 Freund I, Dalpke A, Helm M, Motorin Y
9. Schwartz S, Bernstein DA, Mumbach MR, (2017) Next-generation sequencing-based
Jovanovic M, Herbst RH, León-Ricardo BX, ribomethseq protocol for analysis of tRNA
Engreitz JM, Guttman M, Satija R, Lander ES 2’-O-methylation. Biomol Ther 7(1):13
et al (2014) Transcriptome-wide mapping 15. Wilfinger WW, Mackey K, Chomczynski P
reveals widespread dynamic-regulated pseu- (1997) Effect of pH and ionic strength on the
douridylation of ncRNA and mRNA. Cell spectrophotometric assessment of nucleic acid
159:148–162 purity. BioTechniques 22:474–476. 478–481
Chapter 6

Transcriptome-Wide Detection of Internal


N7-Methylguanosine
Li-Sheng Zhang, Chang Liu, and Chuan He

Abstract
m7G-seq detects internal 7-methylguanosine (m7G) sites within mRNAs and noncoding RNAs by mis-
incorporation signatures. A chemical-assisted sequencing approach selectively converts internal m7G sites
into abasic sites, triggering misincorporation at these sites in the presence of a specific reverse transcriptase.
The further enrichment of m7G-induced abasic sites by biotin pull-down reveals hundreds of internal m7G
sites in human mRNA. The misincorporation ratio before pull-down enrichment can be used for estimating
the methylation fraction of some highly methylated m7G sites.

Key words 7-Methylguanosine, m7G-seq, RNA epitranscriptomics, mRNA methylation,


Misincorporation

1 Introduction

N7-methylguanosine (m7G) is a well-known RNA modification at


the mRNA cap region [1, 2] that stabilizes transcripts against
exonucleolytic degradation [3] and affects translation [4]. In addi-
tion, m7G can exist internally at position 46 of human cytoplasmic
tRNAs [5] and position 1639 of human 18S rRNA [6], installed by
METTL1-WDR4 complex and WBSCR22, respectively
[7, 8]. These internal m7G methylations display functional roles
in RNA processing and have been linked to human diseases
[9, 10]. To investigate the existence and distribution of internal
m7G within human mRNAs, we developed a chemical-assisted
method termed “m7G-seq” to sequence internal m7G at base
resolution in the forms of misincorporation signatures.
To map internal m7G at base precision, m7G-seq targets the
unique chemical reactivity of m7G (Fig. 1). Due to the positive
charge on the five-membered ring, NaBH4-mediated reduction
converts m7G into reduced m7G selectively [11], without affecting
unmodified G. Subsequent heating (55  C) at acidic condition
(pH 4.5) induces depurination of the reduced m7G, generating

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

97
98 Li-Sheng Zhang et al.

H+, pH=4.5
X

CH3 CH3
O O
N HN N OH OH
HN O N
H2 N N N NaBH4 H2 N N N H+, pH=4.5 O
Biotin O
NH

O O OH OH
O O
Reduction Depurination Hydrazidation biotin
O O
OH OH
O O
Biotinylated AP
m 7G Reduced m7G Abasic site (AP site) site

Reverse Transcription Reverse Transcription

No misincorporation GT/C misincorporation

Fig. 1 Schematic diagram displaying the chemical reactivity of positively charged m7G under NaBH4 reduction,
depurination, and biotin labeling reactions in m7G-seq. Only the reduced m7G can further generate abasic sites
and then biotinylated AP sites in the presence of biotin hydrazide under acidic conditions. Biotinylated AP sites,
before or after pull-down, induce misincorporation when performing reverse transcription using HIV RT

an RNA abasic site which can be further captured by biotin-ligated


hydrazide to produce a biotinylated RNA. After reverse transcrip-
tion using HIV reverse transcriptase (RT), the abasic sites
(or biotinylated abasic sites) are read as predominantly T as well as
C [12]. Thus, we can identify the internal m7G sites based on these
misincorporation signatures at single-base resolution [13].

2 Materials

Prepare all solutions using RNase-free water. Purchase, prepare,


and store all buffers at room temperature or 20  C (following
the manufacturer’s instructions). Properly follow all waste disposal
regulations when disposing of waste materials.

2.1 Preparation of 1. Biological samples: Cells or tissue of interest.


Fragmented mRNA 2. DPBS buffer: DPBS, no calcium, no magnesium (Gibco™,
14,190,144).
3. TRIzol™ Reagent.
4. Chloroform.
5. Isopropanol.
6. 70% Ethanol.
Detection of Internal N7-Methylguanosine in mRNA using m7G-seq 99

7. Dynabeads mRNA DIRECT kit.


8. Qubit™ RNA HS Assay Kit.
9. RNase-free water (DEPC-treated, DNase, RNase free/Mol.
Biol.).
10. Ambion 10 Fragmentation reagent.
11. Zymo Research Oligo Clean & Concentrator.

2.2 mRNA Decapping 1. 10 Decapping Reaction Buffer and Tobacco Decapping Plus
2 enzyme.
2. 20 U/μL SUPERaselIn™ RNase Inhibitor.

2.3 End Repair 1. 10 T4 Polynucleotide Kinase Reaction Buffer.


and 30 -Adapter 2. 10 U/μL T4 Polynucleotide Kinase.
Ligation
3. 10 CutSmart Buffer and Shrimp Alkaline Phosphatase (rSAP)
enzyme.
4. RNA 30 -adapter: 50 rApp- AGATCGGAAGAGCGTCGTG -
3SpC3 (synthesized by IDT).
5. 10 T4 RNA Ligase Reaction Buffer and T4 RNA ligase
2-truncated KQ.
6. PEG8000.
7. 50 -Deadenylase.
8. RecJf (NEB, M0264L).
9. Zymo Research RNA Clean and Concentrator.

2.4 Conversion 1. 1.0 M Sodium borohydride (NaBH4) (Sigma-Aldrich,


of m7G Site into 213,462-25G) solution: Freshly prepared in water.
Abasic Site 2. 50 mM EZ-Link Hydrazide-Biotin (in DMSO).
3. 1 M MES buffer, pH 4.5: Dissolve 1 pack BupH™ MES
Buffered Saline Packs (Thermo Scientific™, 28,390) in
50 mL H2O and adjust the pH to 4.5 using AcOH.
4. Dynabeads™ MyOne™ Streptavidin C1 beads and 2 B&W
buffer.
5. 1 IP wash buffer: 50 mM Tris–HCl, pH 7.4, 300 mM NaCl,
0.0 5% (v/v) NP-40.
6. 1 proteinase K digestion buffer: 50 mM Tris–HCl, pH 7.4,
75 mM NaCl, 5 mM EDTA, 1% (w/v) SDS.
7. Proteinase K, recombinant, PCR grade.

2.5 Reverse 1. RT primer: 50 - ACACGACGCTCTTCCGATCT -30 (synthe-


Transcription sized by IDT).
2. 10 AMV Reverse Transcriptase Reaction Buffer and AMV
Reverse Transcriptase.
100 Li-Sheng Zhang et al.

3. 10 mM dNTP solution mix.


4. RNaseOUT™ Recombinant Ribonuclease Inhibitor.
5. Recombinant HIV reverse transcriptase (Worthington,
LS05003).
6. RNase H.

2.6 cDNA 30 -Ligation 1. cDNA 30 -linker: 50 Phos- NNNNNNAGATCGGAAGAGCA


and PCR Amplification CACGTCTG-3SpC3 (synthesized by IDT).
2. T4 RNA Ligase 1 (ssRNA Ligase), High Concentration (NEB,
M0437M).
3. NEBNext Multiplex Oligos.

3 Methods

3.1 Preparation of 1. Starting from one 15 cm plate for cells of interest, wash cells
Fragmented mRNA once with 5 mL ice-cold DPBS buffer. Isolate cellular total
RNA using TRIzol reagent following the manufacturer’s pro-
tocol and using isopropanol precipitation.
2. With the purified cellular total RNA, extract mRNA by two
rounds of polyA+ purification with Dynabeads mRNA
DIRECT kit following the manufacturer’s protocol. mRNA
concentration is measured using Qubit™ RNA HS Assay Kit
with a Qubit 2.0 fluorometer.
3. Starting with 10μg of human mRNA (Step 2 above), dissolve
the mRNA in 18μL RNase-free water followed by adding 2μL
10 Fragmentation Buffer. Heat the mixture at 70  C for
15 min. mRNA will fragmented into 50–100 nt (see Note 1).
4. Fragmented mRNA is purified with Oligo Clean & Concentra-
tor and eluted with RNase-free water.

3.2 mRNA Decapping 1. Using a maximum of 6μg fragmented mRNA dissolved in 34μL
RNase-free water, add 5μL 10 Decapping Reaction Buffer,
1μL 50 mM MnCl2, and 2μL SUPERaselIn™ RNase Inhibi-
tor. Mix well and then add 8μL decapping enzyme. Mix well
and incubate the reaction at 37  C for 2 h.
2. Decapped mRNA is purified using the Oligo Clean & Concen-
trator and eluted with 40μL RNase-free water.

3.3 End Repair 1. After decapping the fragmented mRNA, 40μL RNA is mixed
and 30 -Adapter with 5μL 10 T4 Polynucleotide Kinase Reaction Buffer. Mix
Ligation well and add 5μL T4 PNK. Mix well and incubate the mixture
at 37  C for 1 h (see Note 2).
Detection of Internal N7-Methylguanosine in mRNA using m7G-seq 101

2. 30 -End-repaired RNA is extracted from the solution using the


Oligo Clean & Concentrator and eluted with 40μL RNase-free
water.
3. 40μL RNA is then mixed with 5μL 10 CutSmart Buffer. Mix
well and add 5μL rSAP enzyme. Mix well and the 50 -3-
0
-dephosphorylation step is conducted at 37  C for 1.5 h (see
Note 3).
4. Dephosphorylated RNA is extracted from the solution with
RNA Clean and Concentrator and eluted in 22μL RNase-free
water.
5. To start the RNA 30 -adapter ligation [14], incubate the
repaired and dephosphorylated RNA fragments (22μL) with
1.6μL 100μM of the RNA 30 -adapter at 70  C for 2 min and
transfer immediately to ice.
6. Add 5μL 10 T4 RNA Ligase Reaction Buffer, 15μL 50%
PEG8000, and 2μL SUPERaselIn™ RNase Inhibitor to the
RNA-adapter mixture. Mix well and add 4μL T4 RNA ligase
2. Mix well and incubate at 25  C for 2 h followed by 16  C for
12 h (see Note 4).
7. The reaction is then diluted to 94μL with RNase-free water,
and the 50 -ends of excess adapters are digested by adding 4μL
50 -deadenylase and incubating at 30  C for 1 h, followed by the
addition of 2μL RecJf for ssDNA digestion at 37  C for
another hour.
8. 30 -End-ligated RNA is then extracted using RNA Clean and
Concentrator and eluted with 20μL RNase-free water. Save
20 ng as “input.”

3.4 Conversion 1. The 30 -end-ligated RNA (in a volume of 20μL) is subject to


of m7G Sites into reduction by adding 20μL of 1.0 M NaBH4 solution. Incubate
Abasic Sites the reaction at 25  C for 1 h, with occasional low-speed shaking
(see Note 5).
2. Quench the reaction using 300μL RNA-binding buffer
provided in the RNA Clean and Concentrator and then extract
the RNA according to the manufacturer’s protocol. Elute RNA
into 35μL RNase-free water.
3. Add 5μL MES buffer to the eluted RNA (35μL last step). Mix
immediately with 10μL EZ-Link Hydrazide-Biotin. Incubate
the mixture at 55  C for 1 h (see Note 6).
4. Extract RNA using the Oligo Clean & Concentrator and elute
with 20μL RNase-free water. Save 20 ng as “before pull-
down.”
5. Using the streptavidin C1 beads, wash 10μL of beads twice
with 1 B&W buffer (according to the manufacturer’s proto-
col, see Note 7), and resuspend in 20μL 2 B&W buffer. Mix
102 Li-Sheng Zhang et al.

the bead suspension with 20μL RNA (Step 4) and then incu-
bate at 4  C for 20 min.
6. After the biotin pull-down, wash beads five times with 1 IP
wash buffer and proceed to proteinase K digestion by adding
45μL 1 proteinase K digestion buffer and 5μL proteinase K to
the beads. Mix well and incubate at 55  C for 30 min under
high-speed shaking.
7. The flow-through is saved and RNA recovered with the RNA
Clean and Concentrator. Elute RNA in 12μL RNase-free
water. Save as “pull-down.”

3.5 Reverse 1. The 30 -ligated RNA (as “input”), RNA before pull-down assay
Transcription (as “before pull-down”), and eluted RNA after pull-down
(as “pull-down”) are subjected to reverse transcription. RNAs
are dissolved in 12μL RNase-free water and incubated with 1μL
of 2.0μM RT primer at 65  C for 2 min, followed by moving
immediately onto ice.
2. Add 2μL 10 mM dNTPs, 2μL 10 AMV Reverse Transcriptase
Reaction Buffer, and 1μL RNaseOUT recombinant RNase
inhibitor to the 13μL RNA-primer mixture. Mix well and add
2μL recombinant HIV reverse transcriptase. Mix well and incu-
bate the reaction at 37  C for 1.5 h (see Note 8).
3. Add 1μL RNase H to the mixture and incubate at 37  C for
20 min.
4. cDNAs is purified with the Oligo Clean & Concentrator and
eluted with 7μL RNase-free water.

3.6 cDNA 30 -Ligation 1. The purified cDNA is then subject to cDNA 30 -adaptor ligation
and PCR Amplification [14]. The cDNA is first denatured with 1μL of 50μM cDNA
30 -linker at 75  C for 2 min, followed by transferring immedi-
ately to ice.
2. Add 3μL 10 T4 RNA Ligase Reaction Buffer, 15μL 50%
PEG8000, and 3μL 10 mM ATP to the 8μL cDNA-adapter
mixture. Mix well and add 1μL of T4 RNA ligase 1. Mix well
and incubate the reaction at 25  C for 12 h (see Note 9).
3. 30 -Ligated cDNA is purified with the Oligo Clean & Concen-
trator and eluted with 20μL RNase-free water.
4. The library is then PCR amplified with the universal primer and
indexed primers using the NEBNext Multiplex Oligos for Illu-
mina. All libraries are sequenced on an Illumina NextSeq
500 with single-end 80 bp read length.

3.7 Data Processing Proceed with data processing and analysis to identify m7G-seq-
and Analysis induced mutations as previously described [13].
Detection of Internal N7-Methylguanosine in mRNA using m7G-seq 103

4 Notes

1. 10 Fragmentation Buffer, which is a buffer based on Zn2+, is


used for generating RNA fragments of 50–100 nt. Traditional
fragmentation buffer (based on Mg2+) might give RNA frag-
ments with lengths of 100–150 nt. Fragmented RNA longer
than 100 nt may bring difficulties to the subsequent RNA
30 -ligation and cDNA 30 -ligation steps, where shorter size is
optimal for efficient ligation.
2. The chemical-assisted RNA fragmentation leaves damaged
structures at the 30 -ends, which need to be further repaired
by T4 PNK enzyme. PNK repair generates OH group at the
30 -ends.
3. The decapping reaction produces a monophosphate group at
the 50 -end of cap-containing RNA fragments. These 5-
0
-monophosphate groups are removed by subsequent alkaline
phosphatase (rSAP). This procedure ensures that the 5-
0
-monophosphate will not react with hydrazide in the step of
converting internal m7G into an abasic site. Only RNA frag-
ments with internal m7G modification will be enriched by the
biotin pull-down.
4. For RNA 30 -ligation, we incubate the reaction at 16  C for 12 h
to ensure ligation efficiency. Due to the long incubation time,
we include SUPERaselIn™ RNase Inhibitor to protect RNA
degradation. However, SUPERaselIn™ RNase Inhibitor con-
tains Na+ which is harmful for most ligation reactions. Here, we
used T4 RNA ligase 2 (truncated KQ) as a robust enzyme that
tolerates a low concentration of Na+ in the reaction mixture.
5. 20μL of 1.0 M NaBH4 (freshly prepared in water) solution is
used as a 2 buffer for the reduction reaction at internal m7G
sites. Adding 20 μL of 1.0 M KBH4 (freshly prepared in water)
instead of NaBH4 could further enhance the reduction
efficiency [15].
6. When mixing the RNA and EZ-Link Hydrazide-Biotin in MES
buffer, heating may trigger the generation of abasic sites. How-
ever, a heating temperature above 55  C might lead to unex-
pected cleavage at abasic sites.
7. When washing the streptavidin C1 beads, it is advised to add
SUPERaselIn™ RNase Inhibitor (used as 20) into 1 B&W
buffer. This will ensure that the washed beads are RNase free.
8. In the reverse transcription reaction, the wild-type HIV reverse
transcriptase displays an excellent behavior in generating mis-
incorporation at abasic sites. Other RT enzymes such as Proto-
Script II RT, SuperScript II RT, SuperScript III RT, and AMV
RT gave much lower misincorporation rates.
104 Li-Sheng Zhang et al.

9. For the cDNA 30 -linker ligation, T4 RNA ligase 1 is employed,


which is commonly used as a ssRNA ligase. In this case, a
higher PEG8000 concentration (a final 25% v/v) is applied to
ensure an efficient ligation. The PEG8000 concentration in
RNA 30 -ligation in the presence of T4 RNA ligase 2 (truncated
KQ) should be kept at around 15% v/v.

Acknowledgments

This work was supported by NIH HG008935 (C.H.). The Mass


Spectrometry Facility of the University of Chicago is funded by
National Science Foundation (CHE-1048528). We thank
Dr. Pieter W. Faber and the Genomics Facility of the University
of Chicago for their generous help with high-throughput
sequencing.

References
1. Cowling VH (2009) Regulation of mRNA cap the biogenesis of the 40S ribosomal subunits in
methylation. Biochem J 425:295–302 mammalian cells. PLoS One 8:e75686
2. Furuichi Y (2015) Discovery of m7G-cap in 10. Shaheen R, Abdel-Salam GMH, Guy MP,
eukaryotic mRNAs. Proc Jpn Acad Ser B Phys Alomar R, Abdel-Hamid MS, Afifi HH, Ismail
Biol Sci 91:394–409 SI, Emam BA, Phizicky EM, Alkuraya FS
3. Murthy KG, Park P, Manley JL (1991) A (2015) Mutation in WDR4 impairs tRNA
nuclear micrococcal-sensitive, ATP-dependent m7G46 methylation and causes a distinct form
exoribonuclease degrades uncapped but not of microcephalic primordial dwarfism. Genome
capped RNA substrates. Nucleic Acids Res Biol 16:210
19:2685–2692 11. Wintermeyer W, Zachau HG (1970) A specific
4. Muthukrishnan S, Both GW, Furuichi Y, Shat- chemical chain scission of tRNA at
kin AJ (1975) 50 -terminal 7-methylguanosine 7-methylguanosine. FEBS Lett 11:160–164
in eukaryotic mRNA is required for translation. 12. Kupfer PA, Leumann CJ (2005) RNA abasic
Nature 255:33–37 sites: preparation and trans-lesion synthesis by
5. Guy MP, Phizicky EM (2014) Two-subunit HIV-1 reverse transcriptase. Chembiochem
enzymes involved in eukaryotic post- 6:1970–1973
transcriptional tRNA modification. RNA Biol 13. Zhang LS, Liu C, Ma H, Dai Q, Sun HL,
11:1608–1618 Luo G, Zhang Z, Zhang L, Hu L, Dong X,
6. Sloan KE, Warda AS, Sharma S, Entian KD, He C (2019) Transcriptome-wide mapping of
Lafontaine DLJ, Bohnsack MT (2017) Tuning internal N7-methylguanosine methylome in
the ribosome: the influence of rRNA modifica- mammalian mRNA. Mol Cell 74:1304–1316
tion on eukaryotic ribosome biogenesis and 14. Li X, Xiong X, Zhang M, Wang K, Chen Y,
function. RNA Biol 14:1138–1152 Zhou J, Mao Y, Lv J, Yi D, Chen XW, Yi C
7. Leulliot N, Chaillet M, Durand D, Ulryck N, (2017) Base-resolution mapping reveals dis-
Blondeau K, van Tilbeurgh H (2008) Struc- tinct m1A methylome in nuclear- and
ture of the yeast tRNA m7G methylation com- mitochondrial-encoded transcripts. Mol Cell
plex. Structure 16:52–61 68:993–1005
8. Haag S, Kretschmer J, Bohnsack MT (2015) 15. Lin S, Liu Q, Lelyveld VS, Choe J, Szostak JW,
WBSCR22/Merm1 is required for late nuclear Gregory RI (2018) Mettl1/Wdr4-mediated
pre-ribosomal RNA processing and mediates m7G tRNA methylome is required for normal
N7-methylation of G1639 in human 18S mRNA translation and embryonic stem cell
rRNA. RNA 21:180–187 self-renewal and differentiation. Mol Cell
9. Õunap K, Kasper L, Kurg A, Kurg R (2013) 71:244–255
The human WBSCR22 protein is involved in
Chapter 7

miCLIP-MaPseq Identifies Substrates of Radical SAM


RNA-Methylating Enzyme Using Mechanistic Cross-Linking
and Mismatch Profiling
Vanja Stojković, David E. Weinberg, and Danica Galonić Fujimori

Abstract
The family of radical SAM RNA-methylating enzymes comprises a large group of proteins that contains
only a few functionally characterized members. Several enzymes in this family have been implicated in the
regulation of translation and antibiotic susceptibility, emphasizing their significance in bacterial physiology
and their relevance to human health. While few characterized enzymes have been shown to modify diverse
RNA substrates, highlighting potentially broad substrate scope within the family, many enzymes in this class
have no known substrates. The precise knowledge of RNA substrates and modification sites for unchar-
acterized family members is important for unraveling their biological function. Here, we describe a strategy
for substrate identification that takes advantage of mechanism-based cross-linking between the enzyme and
its RNA substrates, which we named individual-nucleotide-resolution cross-linking and immunoprecipita-
tion combined with mutational profiling with sequencing (miCLIP-MaPseq). Identification of the position
of the modification site is achieved using thermostable group II intron reverse transcriptase (TGIRT),
which introduces a mismatch at the site of the cross-link.

Key words RNA methylation, Radical SAM, Substrate identification, Methyl adenosine, RlmN, Cfr,
TGIRT

1 Introduction

There are more than 100 chemically distinct RNA modifications,


out of which methylation is the most common. RNA methylation is
ubiquitous across all domains of life; however, the exact location,
biological function, and corresponding RNA-methylating enzymes
are poorly understood. Recent strategies that combine immuno-
precipitation of modified RNA or chemical treatment of RNA with
next-generation sequencing have allowed mapping of the location
and abundance of a subset of RNA modifications, such as
N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseu-
douridine (Ψ) [1–8]. These approaches take advantage of either the
unique chemical reactivity of the methyl group (e.g., detection of

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

105
106 Vanja Stojković et al.

m5C via RNA bisulfite sequencing) [3] or the availability of


modification-specific antibodies (e.g., transcriptome-wide identifi-
cation of m6A and Ψ) [1, 4, 9]. Additionally, strategies based on UV
cross-linking and immunoprecipitation, known as CLIP-based
methods, have been used to identify enzyme-substrate pairs for
RNA-modifying enzymes [10–14]. These methods employ UV
irradiation to generate covalent protein-RNA adducts that can be
subsequently isolated and enriched to identify RNA-interacting
partners. Despite many advances that resulted from CLIP-based
methods, the main disadvantage of UV cross-linking is its low
cross-linking efficiency. An alternative approach developed for sub-
strate identification for some RNA methyltransferases exploits the
formation of a covalent catalytic intermediate between the enzyme
and its RNA substrate [3, 15]. While inherently limited to enzyme
families that form a covalent intermediate in their catalytic mechan-
isms, this approach allows for highly efficient cross-linking of
enzyme-substrate pairs. This strategy has earlier been applied to
NSun RNA methyltransferase family members, which methylate
cytosines in RNA to yield m5C. The covalent enzyme-substrate
intermediate is trapped either by using a 5-azacitidine (Aza) analog,
as in Aza-IP [3], or by mutation of the key cysteine residue in these
enzymes that is necessary for the resolution of the covalent inter-
mediate, as in methylation-iCLIP (miCLIP) approach [15].
Radical SAM RNA-methylating enzymes employ a radical-
based mechanism to generate 2-methyladenosine (m2A, RlmN
enzymes) and 8-methyladenosine (m8A, Cfr enzymes). Mechanis-
tic studies by our group [16] and others [17–20] have revealed that
substrate methylation by radical SAM RNA-methylating enzymes
proceeds through an enzyme-substrate covalent intermediate dis-
tinct from those formed by RNA m5C methyltransferases [21]. The
hallmark of the methylation is formation of a methylene-bridged
covalent intermediate between a Cys residue in the enzyme (C355
in E. coli RlmN) and amidine carbon of the adenosine substrate
(Fig. 1) [16–20, 22, 23]. Subsequently, the enzyme-RNA covalent
adduct is resolved by a second conserved cysteine (C118 in E. coli
RlmN) [16]. Mutation of C118 (C118A) stabilizes the protein-
RNA intermediate, enabling isolation of the enzyme-RNA covalent
pairs by immunoprecipitation (Fig. 1) [16]. By combining this key
mechanistic feature with next-generation sequencing, we have
developed a novel strategy where individual-nucleotide-resolution
cross-linking and immunoprecipitation are combined with muta-
tional profiling with sequencing (miCLIP-MaPseq) [24]. This
method can allow for the identification of substrates and modifica-
tion sites for any member of the radical SAM RNA-methylating
enzyme family. The method was developed and validated using the
most well-characterized member of the family, RlmN from E. coli
that is known to modify 23S rRNA, as well as a subset of tRNA
substrates [25, 26].
Identifying Modification Sites in RNA by miCLIP-MaPseq 107

Fig. 1 Mechanistic scheme for RlmN-mediated methylation of RNA showing key steps. The stable covalent
intermediate trapped by C118A mutation is shown

miCLIP-MaPseq relies on immunoprecipitation of a stable


covalent complex between the mutant enzyme and RNA, followed
by high-throughput RNA sequencing (Fig. 2). Following isolation,
the protein-RNA species are digested using Proteinase K, which
leaves a peptide scar on the RNA at the site of the protein-RNA
cross-link formation. RNA is then size-selected on a denaturing
TBE-urea gel (Fig. 3). RNA species larger than 300 nucleotides are
fragmented prior to dephosphorylation [27], while smaller RNA
fragments are dephosphorylated without prior fragmentation.
After size selection, RNAs are converted to cDNA using the
TGIRT reverse transcriptase, and the resulting library is subjected
to PCR amplification and high-throughput sequencing. One main
advantage of miCLIP-MaPseq is that it uses TGIRT to generate
cDNA. This reverse transcriptase is highly processive and intro-
duces a mismatch when it encounters the protein scar on RNA
and thus allows identification of methylation sites using mutational
profiling. Here, we provide a detailed miCLIP-MaPseq protocol
that can be easily modified and implemented to identify substrates
of any member of the radical SAM RNA-methylating family.

2 Materials

2.1 Cell Lysis 1. Lysis buffer: 50 mM Tris–HCl pH 7.5, 150 mM NaCl, 1 mM


and Target Protein EDTA, 1% (v/v) Triton X-100.
Immunoprecipitation 2. 100 mM PMSF.
3. TBS buffer: 50 mM Tris–HCl pH 7.5, 150 mM NaCl.
108 Vanja Stojković et al.

Fig. 2 Schematic representation of library preparation strategy for identification of substrates and methylation
sites of RlmN. Red bars represent the fraction of mismatches at a specific nucleotide on substrate RNA

4. Glycine buffer: 100 mM glycine–HCl pH 3.5.


5. Stringent-TBS wash: 50 mM Tris–HCl pH 7.5, 500 mM NaCl.
6. Resin recycle solution: 50% Glycerol, 50% TBS, 0.02% sodium
azide.
7. 10 mM Tris pH 7.5.
8. RQ1 RNase-free DNase.
9. Anti-FLAG M2 resin.
10. IP dilution solution: 12 μL of 5 mg/mL 1 FLAG peptide and
388 μL of 10 mM Tris pH 7.5.
11. LB medium.
12. 1.5 mL Eppendorf tubes.
13. E. coli strain encoding a FLAG-tagged wild-type RNA-modify-
ing enzyme of interest and the corresponding FLAG-tagged
mutant RNA-modifying enzyme. For the example described
herein, use E. coli BW25113 strain encoding FLAG-tagged
wild-type RlmN and E. coli BW25113 FLAG-tagged
C118A RlmN.

2.2 RNA Isolation 1. Proteinase K.


2. GlycoBlue.
3. 3 M Sodium acetate pH 5.5.
4. Isopropanol.
Identifying Modification Sites in RNA by miCLIP-MaPseq 109

Fig. 3 Gel analysis of isolated RNA after immunoprecipitation and Proteinase K


treatment of FLAG-tagged C118A RlmN. RNA was size selected into four regions
(A-D) as indicated on the gel, and each region was individually sequenced. Lanes
1–3: Isolated RNA after immunoprecipitation and Proteinase K treatment of
FLAG-tagged C118A RlmN, where the amount of sample loaded in lane 1 is
half of the amount loaded in each of the lanes 2 and 3; lane 4: low-range single-
stranded RNA markers

5. 80% Ethanol.
6. Novex 10% TBE-urea precast gel.
7. 1 TBE running buffer: Dilute 10 TBE running buffer to
1 using DEPC-treated water.
8. 2 RNA loading dye.
9. Low-range ssRNA ladder.
10. SYBR Gold Nucleic Acid Gel Stain.
11. 18G  1 ½ syringe needle.
12. Costar SpinX column.
13. Nuclease-free water.
14. RNase-free non-stick 0.5 mL tubes and RNase-free 1.5 mL
tubes.
15. 100% Ethanol.
110 Vanja Stojković et al.

2.3 RNA 1. 10 Fragmentation Reagent mix.


Fragmentation 2. Nuclease-free water.
3. RNase-free PCR tubes.

2.4 Library 1. T4 polynucleotide kinase.


Preparation 2. SUPERase-In.
3. TGIRT-III template-switching kit.
4. 10 PNK buffer: 70 mM Tris–HCl pH 7.5, 10 mM MgCl2,
5 mM DTT.
5. 5 TGIRT reaction buffer: 100 mM Tris–HCl pH 7.5, 2.25 M
NaCl, 25 mM MgCl2.
6. Novex 8% TBE precast gel.
7. 5 GelPilot DNA Loading Dye.
8. 50 DNA Adenylation Kit.
9. Zymo Oligo Clean & Concentrator kit.
10. Thermostable 50 AppDNA/RNA Ligase.
11. 10 NEBuffer 1.
12. 50 mM MnCl2.
13. MiniElute PCR Purification Kit.
14. Phusion High-Fidelity DNA Polymerase.
15. Deoxynucleotide (dNTP) Solution Mix.
16. 10 mM Tris, pH 8.
17. Oligos used for library preparation:
R2 RNA (provided in a kit by InGex; 3SpC3 is a C3 Spacer
phosphoramidite): 50 -rArGrA rUrCrG rGrArA rGrArG
rCrArC rArCrG rUrCrU rGrArArCrUrCrCrArG rUrCrA
rC/3SpC3/-30 .
R2R DNA (provided in a kit by InGex): 50 -GTG ACT
GGA GTT CAG ACG TGT GCT CTT CCG ATC TN
(N ¼ equimolar A, T, G, C).
R1R DNA (IDT; 3SpC3 is a C3 Spacer phosphoramidite):
50 -/5Phos/GAT CGT CGG ACT GTA GAA CTC TGA ACG
TGT AG/3SpC3/-30 .
Illumina multiplex primer (IDT): 50 -AAT GAT ACG GCG
ACC ACC GAG ATC TAC ACG TTC AGA GTT CTA CAG
TCC GAC GAT C-30 .
Illumina barcode primer (IDT): 50 -CAA GCA GAA GAC
GGC ATA CGA GAT [barcode] GTG ACT GGA GTT CAG
ACG TGT GCT CTT CCG ATC T-30 .
Identifying Modification Sites in RNA by miCLIP-MaPseq 111

2.5 Library 1. KAPA library quantification kit.


Quantification 2. 96-Well PCR plate.
3. BioRad CFX Connect real-time PCR.
4. Agilent 2100 Bioanalyzer.

2.6 Sequencing 1. Illumina HiSeq4000.

3 Methods

3.1 Expression A C-terminal DYKDDDDK octapeptide (FLAG) sequence is fused


of the FLAG-Tagged to the genomic version of rlmN or rlmN containing C118A muta-
Enzyme tion as described previously [16].
1. Inoculate 1 L of LB medium with 1:100 from an overnight
culture of E. coli BW25113 encoding either the FLAG-tagged
WT RlmN or the FLAG-tagged C118A RlmN.
2. Grow the cells at 37  C, 200 rpm, for 90 min for WT RlmN, or
150 min for C118A RlmN (see Note 1).
3. Harvest the cells by centrifugation at 1610  g (rotor F9S
4  1000y) for 15 min at 4  C. Flash freeze the pellets in liquid
nitrogen, and either store them at 80  C or immediately
proceed with the next step.

3.2 Lysis and DNase 1. Thaw ~1.5 g of cells and resuspend them in 4 mL of cold lysis
Treatment buffer. Add 40 μL of 100 mM PMSF.
2. While keeping the cells on ice, sonicate the cells using the
microtip on a power setting 3 and duty cycle 50%, for three
40-s pulses with 1-min breaks between pulses. The probe
should be positioned approximately 0.5 cm from the bottom
of the tube and should not be touching the tube sides in order
to avoid foaming.
3. Divide 4 mL sample among five 1.5 mL Eppendorf tubes. Add
18 μL RQ1 RNase-free DNase to each tube and incubate at
37  C for 15 min. Spin down at 19,722  g (Sorvall Legend
Micro 21R) for 10 min at 4  C. Without disturbing the pellet
remove the supernatant to a new 1.5 mL tube. If not immedi-
ately proceeding with the next step, store the samples at 20  C.

3.3 Immuno- 1. Thoroughly suspend the anti-FLAG M2 affinity resin and


precipitation immediately transfer 375 μL to a new Eppendorf tube. Centri-
fuge the resin at 8000  g, for 1 min at 4  C. Let the resin settle
3.3.1 Resin Preparation
for 1 min. Remove the supernatant making sure not to transfer
any resin.
2. Wash the resin twice with 1 mL of TBS buffer. For each wash,
add 1 mL of TBS buffer, resuspend the resin by gently pipet-
ting, centrifuge the resin at 8000  g for 1 min at 4  C, and
then let it settle for 1 min.
112 Vanja Stojković et al.

3. Wash the resin once with 1 mL glycine buffer (see Note 2).
4. Wash the resin three times with TBS buffer. For each wash, the
process is the same as in Step 2.

3.3.2 Binding 1. Add approximately 75 μL of resuspended resin into each tube


of FLAG-Tagged Protein containing ~800 μL sample. Let the samples incubate with the
resin on a rotator for at least 2 h at 4  C.

3.3.3 Elution 1. Centrifuge the samples at 8000  g, for 1 min at 4  C. Let the
of FLAG-Tagged Protein resin settle for 1 min. Remove the supernatant.
2. Wash the resin three times with 500 μL of stringent TBS wash
buffer.
3. Add 75 μL of IP dilution solution into each sample. Gently
resuspend and rotate samples for at least 1 h at 4  C (see
Note 3).
4. Centrifuge the samples at 8000  g, for 1 min at 4  C. Let the
resin settle for 2 min.
5. Carefully transfer the supernatant to a new Eppendorf tube.
Put both resin and supernatant on ice. Store resin without
glycerol at 4  C. Store supernatant at 20  C.

3.3.4 Recycling 1. Resuspend used resin in 500 μL of TBS buffer. Centrifuge the
of the Resin resin at 8000  g, for 1 min at 4  C. Remove the supernatant.
2. Wash the resin three times with 1 mL of glycine buffer. Centri-
fuge the resin at 8000  g, for 1 min at 4  C. Let the resin settle
for 1 min.
3. Wash the resin five times with 1 mL of resin recycle solution.
Centrifuge the resin at 8000  g, for 1 min at 4  C. Store the
resin at 20  C.

3.4 Proteinase K 1. Thaw the sample obtained after immunoprecipitation on ice.


Treatment Remove 10 μL for subsequent Western blot analysis. Add 9.6 U
of Proteinase K into the rest of the sample and incubate the
reaction for 2 h at 37  C.
2. Divide each sample into two Eppendorf tubes (~230–250 μL).
Precipitate RNA by adding 1000 μL of isopropanol, 25 μL of
sodium acetate pH 5.5, and 2 μL of GlycoBlue co-precipitant
into each tube. Leave the tubes overnight at 20  C.
3. The next day precipitate RNA by centrifugation at 20,000  g
for 30–40 min at 4  C.
4. Carefully remove the supernatant and wash the pellet with
750 μL of 80% cold ethanol. Precipitate RNA by centrifugation
at 20,000  g for 40 min at 4  C. Carefully remove the
supernatant and air-dry the pellets (see Note 4).
5. Resuspend pellets with 40 μL nuclease-free water (see Note 5).
Identifying Modification Sites in RNA by miCLIP-MaPseq 113

3.5 Gel Purification 1. Set up the gel apparatus and pre-run the 10% TBE-urea gel at
and RNA Extraction 180 V for at least 20 min in 1 TBE running buffer.
2. Add 25 μL of 2 RNA loading dye to 25 μL of sample. Prepare
molecular weight marker solution by mixing 1 μL of ssRNA
low-range ladder to 9 μL of nuclease-free water, followed by
10 μL of 2 RNA loading dye. Heat the samples at 92  C for
4 min. In two wells load 20 μL of sample and in one well 10 μL.
Load the marker in the last lane (see Note 6).
3. Run the gel at 180 V until the lower (dark blue) dye is close to
the bottom, approximately 70 min. Incubate the gel with
50 mL of TBE buffer containing 5 μL SYBR-Gold dye for
5 min. Wash the gel twice with nuclease-free water.
4. Visualize and record the gel under UV light.
5. Prepare 0.5 mL RNase-free non-stick tubes by piercing a hole
in the bottom using an 18G syringe needle. Cut the gel on
transilluminator as indicated in Fig. 3. Put gel pieces into
0.5 mL tubes and then place the tubes into a 1.5 mL collection
tube. Centrifuge at 20,000  g, for 3 min at 4  C. Remove
0.5 mL tube and add 300 μL nuclease-free water.
6. Shake samples on the thermomixer at 157  g for 10 min at
68  C, and then freeze them on dry ice for 10 min. Thaw
samples at room temperature for 10 min and then incubate
on thermomixer at 157  g for 10 min at 68  C.
7. Cut the tips of the P1000 barrier tips, transfer the sample (with
gel pieces) onto a Costar SpinX column, and spin at 20,000  g
for 3 min at 4  C.
8. Add to each tube 2 μL of GlycoBlue, 33 μL of 3 M sodium
acetate pH 5.5, and 900 μL of 100% ethanol. Vortex to mix.
Put at 20  C overnight. Next morning precipitate RNA as
described in Subheading 3.4.

3.6 RNA Fragment RNAs longer than 300 nt to 50–200 nt long fragments
Fragmentation using the following protocol:
1. Resuspend RNA obtained in Subheading 3.5 in 11 μL of
nuclease-free water. Transfer to PCR tubes and place in the
thermocycler.
2. Heat the samples for 2 min at 95  C to denature RNA.
3. Add 1 μL of 10 Fragmentation Reagent mix. If you are
dealing with larger number of tubes, keep the tubes on ice.
This will ensure that most of the RNA stays denatured. Keeping
samples at the room temperature will allow for the slow refold-
ing of RNA.
114 Vanja Stojković et al.

4. Return the samples to thermomixer and incubate them for


2 min at 95  C. Add 1 μL of stop solution and place the samples
on ice.
5. Purify the RNA on 10% TBE-urea gel. Extract 50–200 nt long
fragments and precipitate RNA as described in Subheading 3.5
(see Note 7).

3.7 Library 1. Determine RNA concentration by NanoDrop. Low RNA con-


Preparation centration should be measured by Qubit or Bioanalyzer.
3.7.1 RNA 30 End 2. Prepare 11 μL reaction mixture by mixing 7 μL of RNA sample
Dephosphorylation with 1.1 μL of 10 PNK buffer, 1 μL of Superase-In, and 2 μL
of T4 polynucleotide kinase. Incubate reaction mixture at
37  C for 1 h, followed by 3 min at 90  C to inactivate the
enzyme.

3.7.2 Template- 1. In a PCR tube prepare 17.5 μL reaction mixture by mixing


Switching Reaction Using nuclease-free water with 4 μL of 5 TGIRT reaction buffer,
TGIRT-III 2 μL of 50 mM DTT, ~50 ng of dephosphorylated RNA
sample, and 2 μL 10 TGIRT-III RT/template-primer sub-
strate mix (see Note 8).
2. Preincubate reaction mixture at room temperature for 30 min,
and then add 2.5 μL of 10 mM dNTPs.
3. Incubate the reaction at 60  C for 60 min (see Note 9).
4. Add 1 μL of 5 M NaOH and incubate the sample at 65  C for
15 min.
5. Cool to room temperature and neutralize sample with 1 μL of
5 M HCl.
6. To each sample add 100 μL of 10 mM Tris pH 8.0, 13 μL of
3 M sodium acetate pH 5.5, 3 μL of GlycoBlue, and 600 μL of
100% ethanol. Incubate overnight at 20  C. Next morning
precipitate cDNA as described in Subheading 3.4.
7. Pre-run 8% TBE gel at 155 V for at least 20 min.
8. Resuspend precipitated cDNA in 5 μL of 10 mM Tris pH 8.0
and add 1.25 μL of 5 DNA loading dye. Run gel at 155 V for
40–45 min.
9. Size-select cDNA using 10 bp ladder as a guide. Add 300 μL of
nuclease-free water to cut gel pieces, and extract cDNA from
gel pieces by following the general protocol presented in Sub-
heading 3.5. Precipitate cDNA overnight, at 20  C, by add-
ing 3 μL of GlycoBlue, 33 μL of sodium acetate pH 5.5, and
900 μL of 100% ethanol. Next morning precipitate cDNA as
described in Subheading 3.4 (see Note 10).
Identifying Modification Sites in RNA by miCLIP-MaPseq 115

3.7.3 Oligo Adenylation 1. In a PCR tube prepare 20 μL reaction mixture by combining


of Illumina Read 2 μL of 10 50 DNA Adenylation Reaction Buffer, 2 μL of
1 Sequencing Primer 1 mM ATP, 1 μL 100 μM R1R DNA, and 2 μL of Mth RNA
(R1R DNA) Ligase. We usually set up 4–8 parallel reactions.
2. Incubate reactions in a thermocycler at 65  C for 1 h.
3. Incubate samples at 85  C for 5 min to inactivate the enzyme.
4. Clean up the adenylated R1R DNA with an Oligo Clean &
Concentrator kit and elute in 10 μL of nuclease-free water to
obtain 10 μM adenylated R1R DNA. If doing several adenyla-
tion reactions in separate PCR tubes, combine them for a
cleanup since higher elution volume helps with consistent and
efficient recovery of adenylated oligos.
5. Check the extent of adenylation by running the sample on a
20% TBE-7 M-urea gel.

3.7.4 Thermostable 1. In a PCR tube prepare 20 μL reaction mixture by combining


Ligation 2 μL of 10 NEBuffer 1, 2 μL of 50 mM MnCl2, 4 μL of
10 μM adenylated R1R DNA, 10 μL of cDNA from template-
switching, and 2 μL of Thermostable 50 AppDNA/RNA
Ligase.
2. Incubate reactions in thermocycler at 65  C for 2 h.
3. Incubate samples at 90  C for 3 min to inactivate the enzyme.
4. Clean up the ligated cDNA with a MiniElute PCR Purification
Kit and elute in 23 μL of nuclease-free water.

3.7.5 PCR Amplification 1. In an Eppendorf tube prepare a 53 μL reaction mixture by


combining 29.5 μL of nuclease-free water, 10 μL of 5 Phu-
sion HF buffer, 1 μL of 10 μM Illumina multiplex primer, 1 μL
of 10 μM Illumina barcode primer, 10 μL of cDNA, 1 μL of
10 mM dNTPs, and 0.5 μL of Phusion High-Fidelity DNA
Polymerase.
2. Divide reaction mixture among three PCR tubes. Heat cDNA
at 98  C for 5 s, then amplify it for 15, 18, or 21 cycles of 98  C
for 5 s, 60  C for 10 s, and 72  C for 12 s.
3. Mix 17 μL of PCR product with 4.25 μL of 5 DNA loading
dye, load on an 8% TBE gel, and run the gel at 155 V for
45 min. Stain the gel with SYBR gold.
4. Size-select amplified DNA using 10 bp ladder as a guide. Add
300 μL of nuclease-free water to cut gel pieces, and extract
DNA from gel pieces by following the general protocol pre-
sented in Subheading 3.5. Precipitate DNA overnight, at
20  C, by adding 3 μL of GlycoBlue, 33 μL of sodium acetate
pH 5.5, and 900 μL of 100% ethanol. Next morning precipitate
as described in Subheading 3.4. Resuspend each library in
10 μL of 10 mM Tris pH 8.
116 Vanja Stojković et al.

3.7.6 qPCR 1. For the quantification of the libraries we use KAPA library
Quantification quantification kit. If the kit is being used for the first time,
add 1 mL of 10 Primer Premix to the bottle of 2 KAPA
SYBR FAST qPCR Master Mix (5 mL) and mix by vortexing.
Aliquot this solution and store at 20  C.
2. Determine the total number of reactions that will be per-
formed. Usually we run six DNA standards in triplicate and
each library dilution in duplicate. Using a NanoDrop, estimate
the concentration of each library and determine which dilu-
tions to prepare to stay within the dynamic range of the assay. A
1:5000 and 1:10,000 dilution usually fall around the midpoint
of the assay standards.
3. Prepare 1:5, 1:50, 1:5000, and 1:10,000 library dilutions in
10 mM Tris pH 8 buffer.
4. For each reaction, prepare the following in a 96-well PCR
plate: 6 μL of Master Mix containing primers, and 4 μL of
either DNA standard or specific library dilution. Seal the plate
with optical adhesive film.
5. Run the plate with the following program in the qPCR
machine: 95  C for 5 min, and 35 cycles of 95  C for 30 s,
and 60  C for 45 s.
6. Use the KAPA analysis template to calculate slope and intercept
of the standard curve, to convert the average Cq score for each
library dilution to pM, to calculate the average size-adjusted
concentration (in pM) for each dilution, and to calculate the
size-adjusted concentration for the original undiluted library.
7. Prepare 15 μL of 10 nM library solution, containing up to
20 libraries. Store individual libraries in RNase-free low-reten-
tion Eppendorf tubes at 20  C (see Note 11).
8. Check the quality of the library on Bioanalyzer prior to sub-
mitting sample for sequencing on an Illumina HiSeq4000 or
similar (see Note 12).

3.8 Sequencing Read 1. Prior to bioinformatic analysis it is important to de-multiplex


Mapping and Analysis sequences if multiple samples were run within one sequencing
lane. De-multiplexing for our samples was performed by the
3.8.1 Sequence
Center for Advanced Technology at UCSF.
Processing and Alignment
2. Upload sequencing data to the Galaxy web platform and use
the public server at [Link] to analyze the data [28] (see
Note 13).
3. Process reads with FASTQ Groomer [29] and then remove
adapters using Clip tool also available through Galaxy web
platform.
Identifying Modification Sites in RNA by miCLIP-MaPseq 117

4. Go to Ensembl Bacteria Genome Database (EMBL-EBI),


[Link] and download
E. coli BW25113 FASTA file and gtf file.
5. Align sequences greater than 15 bp to the genome using
Bowtie 2 with default options [30, 31]. Default settings for
“sensitive-local” are the default option in “local-mode” (details
are -D 15 -R 2 -N 0 -L 20 -i S,1,0.75) (see Note 14).

3.8.2 Enrichment 1. Determine the raw counts per gene by using HTSeq-count
Analysis of Reads script, which is available through Galaxy web platform
[32]. Select intersection-nonempty mode to handle reads over-
lapping more than one feature. Summarize counts from
regions A-D per replicate. Use this file to perform the enrich-
ment analysis.
2. For enrichment analysis of reads mapped to any set of genes use
DESeq2 module. In DESeq2 specify the factor levels that will
be analyzed (e.g., sample vs. control), and select all replicates
belonging to a specific factor level. As an input data use sum-
marized HTSeq-count data as described above. Use parametric
fitting and leave on the following options: outliers replace-
ment, outliers filtering, and independent filtering. For the
control samples, we generated a library from the rRNA-
depleted total RNA isolated from E. coli BW25113 strain (see
Note 15).

3.8.3 Analysis of Stop 1. Download Integrated Genomic Viewer (IGV) [33]. Open
Sites and Mismatches E. coli BW25113 genome file. Open all BAM files and their
corresponding BAM_index files.
2. Select the gene of interest and determine the percent of mis-
matches for a specific nucleotide by cumulative analysis of all
biological replicates (Fig. 4).
3. To determine the 50 end of the reads (stop sites) use script
“make_wiggle” to convert sorted and indexed BAM files to
wiggle files. This script was developed by the Weissman lab at
UCSF and is readily available through Plastid [36]. The results
can be readily visualized with IGV.

4 Notes

1. Optimal expression time for an enzyme should be determined


empirically prior to proceeding to the next step.
2. Resin cannot stay in glycine buffer for longer than 20 min.
3. When dealing with a very small sample volume, combine all the
samples from a single experiment into a single tube prior to
leaving the sample on a rotator.
118 Vanja Stojković et al.

Fig. 4 Examples of read profiles for specific RNAs displayed in Integrative


Genomic Viewer. (a) Read profiles for tRNAGlnUUG displayed in Integrative Geno-
mic Viewer (IGV) [33–35]. (b) Read profiles for tRNAHisGUG displayed in IGV. tRNAs
were isolated after immunoprecipitation of FLAG-tagged C118A RlmN. The depth
Identifying Modification Sites in RNA by miCLIP-MaPseq 119

4. After centrifugation remove the ethanol using P1000 pipette.


Recap the tube, and pulse centrifuge to bring down the remain-
ing ethanol. Remove the remaining liquid using P200 pipette.
Leave the tube open at room temperature before proceeding to
another sample. By the time all the samples are finished, the
pellets should be sufficiently dry. Do not overdry the pellet
since it can be hard to re-solubilize RNA.
5. Perform Western blot analysis to ensure that the enzyme was
successfully digested. We use monoclonal anti-FLAG M2-per-
oxidase (HRP) antibody.
6. Prior to loading the samples flush the remaining urea out of the
wells using P1000 pipette. This will decrease smearing and
abnormal band shapes. Additionally, it is advisable to load a
smaller amount of sample in one of the lanes to better see
discrete bands, since loading a large amount of sample can
lead to increased smearing in the gel.
7. Under these conditions, the extent of fragmentation will
depend on the initial amount of RNA. If substantial amount
of RNA is not successfully fragmented, extract the RNA longer
than 300 nt, and repeat the fragmentation step. Make sure to
decrease the time for the re-fragmentation step (e.g., from
2 min to 1 min, or less).
8. Add RNA sample and enzyme/template-primer mix last.
9. For long or heavily modified RNAs, such as tRNAs, it is neces-
sary to run the reaction for 60 min. For short RNAs 5–15 min
is usually sufficient, but the exact time should be determined
empirically.
10. In case no pellet is observed after the first centrifugation step,
add 1 μL of GlycoBlue, place sample on dry ice for at least
30 min, thaw sample at room temperature, and then repeat
centrifugation step. Only then remove the supernatant and
perform the wash step with cold 80% ethanol.
11. When submitting multiple samples within one sequencing
lane, approximately equal amounts of each library should be
added.

Fig. 4 (continued) of the reads (counts) displayed at a specific locus is repre-


sented as a gray bar chart (top panel). Alignment of individual reads is repre-
sented in the bottom panel. Known modifications are represented using
abbreviations and were taken from the MODOMICS database [35]. Abbreviations:
4-thiouridine (s4U), dihydrouridine (D), queuosine (Q), 7-methylguanosine (m7G),
20 -O-methylguanosine (Gm), 20 -O-methyluridine (Um), 5-carboxymethylamino-
methyl-2-selenouridine (cmnm5se2U), 5-methyluridine (m5U),
2
2-methyladenosine (m A), and pseudouridine (Ψ)
120 Vanja Stojković et al.

12. For our application and for cost-effectiveness, 50-nucleotide


single-end runs are sufficient.
13. When uploading large sequencing files, FileZilla, an open-
source software, can be used.
14. If applying this method to a different system, we suggest
aligning sequences to the genome of interest using both Bow-
tie 1 and Bowtie 2 under various settings and then comparing
results.
15. DESeq2 considers the variability between the replicates and
normalizes read counts to account for differences in sequenc-
ing depth between samples, reporting fold change values
between the sample and the control. In our analysis, we use a
fourfold increase in abundance and adjusted P value of <0.01
as our threshold for identifying substrates in samples where
TGIRT was used as reverse transcriptase. DESeq2-adjusted P-
values are adjusted for multiple-comparison testing and are
used to lower the false-positive detection.

Acknowledgments

This work was supported by UCSF Program for Breakthrough


Biomedical Research (PBBR) Postdoctoral Grant (to V.S.),
NIAID R01AI137270 (to D.G.F.), UCSF Program for Break-
through Biomedical Research funded in part by the Sandler Foun-
dation (to D.E.W.), and NIH Director’s Early Independence
Award DP5OD017895 (to D.E.W.).

References
1. Dominissini D, Moshitch-Moshkovitz S, 4. Carlile TM, Rojas-Duran MF, Zinshteyn B,
Schwartz S, Salmon-Divon M, Ungar L, Shin H, Bartoli KM, Gilbert WV (2014) Pseu-
Osenberg S, Cesarkas K, Jacob-Hirsch J, douridine profiling reveals regulated mRNA
Amariglio N, Kupiec M, Sorek R, Rechavi G pseudouridylation in yeast and human cells.
(2012) Topology of the human and mouse Nature 515(7525):143–146. [Link]
m6A RNA methylomes revealed by m6A-seq. org/10.1038/nature13802
Nature 485(7397):201–206. [Link] 5. Schwartz S, Agarwala SD, Mumbach MR,
10.1038/nature11112 Jovanovic M, Mertins P, Shishkin A,
2. Meyer KD, Saletore Y, Zumbo P, Elemento O, Tabach Y, Mikkelsen TS, Satija R, Ruvkun G,
Mason CE, Jaffrey SR (2012) Comprehensive Carr SA, Lander ES, Fink GR, Regev A (2013)
analysis of mRNA methylation reveals enrich- High-resolution mapping reveals a conserved,
ment in 3’ UTRs and near stop codons. Cell widespread, dynamic mRNA methylation pro-
149(7):1635–1646. [Link] gram in yeast meiosis. Cell 155(6):1409–1421.
1016/[Link].2012.05.003 [Link]
3. Khoddami V, Cairns BR (2013) Identification 6. Delatte B, Wang F, Ngoc LV, Collignon E,
of direct targets and modified bases of RNA Bonvin E, Deplus R, Calonne E, Hassabi B,
cytosine methyltransferases. Nat Biotechnol Putmans P, Awe S, Wetzel C, Kreher J,
31(5):458–464. [Link] Soin R, Creppe C, Limbach PA, Gueydan C,
nbt.2566 Kruys V, Brehm A, Minakhina S, Defrance M,
Steward R, Fuks F (2016) RNA biochemistry.
Identifying Modification Sites in RNA by miCLIP-MaPseq 121

Transcriptome-wide distribution and function the catalytic mechanism of the radical S-adeno-
of RNA hydroxymethylcytosine. Science 351 syl-L-methionine methyl synthase RlmN
(6270):282–285. [Link] trapped by mutagenesis. J Am Chem Soc 134
science.aac5253 (43):18074–18081. [Link]
7. Li X, Zhu P, Ma S, Song J, Bai J, Sun F, Yi C 1021/ja307855d
(2015) Chemical pulldown reveals dynamic 17. Grove TL, Benner JS, Radle MI, Ahlum JH,
pseudouridylation of the mammalian transcrip- Landgraf BJ, Krebs C, Booker SJ (2011) A
tome. Nat Chem Biol 11(8):592–597. https:// radically different mechanism for S-
[Link]/10.1038/nchembio.1836 adenosylmethionine-dependent methyltrans-
8. Lovejoy AF, Riordan DP, Brown PO (2014) ferases. Science 332(6029):604–607. https://
Transcriptome-wide mapping of pseudouri- [Link]/10.1126/science.1200877
dines: pseudouridine synthases modify specific 18. Grove TL, Livada J, Schwalm EL, Green MT,
mRNAs in S. cerevisiae. PLoS One 9(10): Booker SJ, Silakov A (2013) A substrate radical
e110799. [Link] intermediate in catalysis by the antibiotic resis-
pone.0110799 tance protein Cfr. Nat Chem Biol 9
9. Linder B, Grozhik AV, Olarerin-George AO, (7):422–427. [Link]
Meydan C, Mason CE, Jaffrey SR (2015) nchembio.1251
Single-nucleotide-resolution mapping of m6A 19. Silakov A, Grove TL, Radle MI, Bauerle MR,
and m6Am throughout the transcriptome. Nat Green MT, Rosenzweig AC, Boal AK, Booker
Methods 12(8):767–772. [Link] SJ (2014) Characterization of a cross-linked
1038/nmeth.3453 protein-nucleic acid substrate radical in the
10. Ule J, Jensen KB, Ruggiu M, Mele A, Ule A, reaction catalyzed by RlmN. J Am Chem Soc
Darnell RB (2003) CLIP identifies Nova- 136(23):8221–8228. [Link]
regulated RNA networks in the brain. Science 1021/ja410560p
302(5648):1212–1215. [Link] 20. Boal AK, Grove TL, McLaughlin MI, Yenna-
1126/science.1090095 war NH, Booker SJ, Rosenzweig AC (2011)
11. Hafner M, Lianoglou S, Tuschl T, Betel D Structural basis for methyl transfer by a radical
(2012) Genome-wide identification of SAM enzyme. Science 332(6033):1089–1092.
miRNA targets by PAR-CLIP. Methods 58 [Link]
(2):94–105. [Link] 21. King MY, Redman KL (2002) RNA methyl-
ymeth.2012.08.006 transferases utilize two cysteine residues in the
12. Konig J, Zarnack K, Rot G, Curk T, Kayikci M, formation of 5-methylcytosine. Biochemistry
Zupan B, Turner DJ, Luscombe NM, Ule J 41(37):11218–11225
(2010) iCLIP reveals the function of hnRNP 22. Yan F, LaMarre JM, Rohrich R, Wiesner J,
particles in splicing at individual nucleotide Jomaa H, Mankin AS, Fujimori DG (2010)
resolution. Nat Struct Mol Biol 17 RlmN and Cfr are radical SAM enzymes
(7):909–915. [Link] involved in methylation of ribosomal RNA. J
nsmb.1838 Am Chem Soc 132(11):3953–3964. https://
13. Haag S, Kretschmer J, Sloan KE, Bohnsack MT [Link]/10.1021/ja910850y
(2017) Crosslinking methods to identify RNA 23. Yan F, Fujimori DG (2011) RNA methylation
methyltransferase targets in vivo. Methods Mol by radical SAM enzymes RlmN and Cfr pro-
Biol 1562:269–281. [Link] ceeds via methylene transfer and hydride shift.
1007/978-1-4939-6807-7_18 Proc Natl Acad Sci U S A 108(10):3930–3934.
14. Zhang CL, Darnell RB (2011) Mapping in vivo [Link]
protein-RNA interactions at single-nucleotide 24. Stojkovic V, Chu T, Therizols G, Weinberg
resolution from HITS-CLIP data. Nat Bio- DE, Fujimori DG (2018) miCLIP-MaPseq, a
technol 29(7):607–U686. [Link] substrate identification approach for radical
10.1038/nbt.1873 SAM RNA methylating enzymes. J Am Chem
15. Hussain S, Sajini AA, Blanco S, Dietmann S, Soc 140(23):7135–7143. [Link]
Lombard P, Sugimoto Y, Paramor M, Gleeson 1021/jacs.8b02618
JG, Odom DT, Ule J, Frye M (2013) NSun2- 25. Benitez-Paez A, Villarroya M, Armengod ME
mediated cytosine-5 methylation of vault non- (2012) The Escherichia coli RlmN methyl-
coding RNA determines its processing into transferase is a dual-specificity enzyme that
regulatory small RNAs. Cell Rep 4 modifies both rRNA and tRNA and controls
(2):255–261. [Link] translational accuracy. RNA 18
rep.2013.06.029 (10):1783–1795. [Link]
16. McCusker KP, Medzihradszky KF, Shiver AL, rna.033266.112
Nichols RJ, Yan F, Maltby DA, Gross CA, 26. Fitzsimmons CM, Fujimori DG (2016) Deter-
Fujimori DG (2012) Covalent intermediate in minants of tRNA recognition by the radical
122 Vanja Stojković et al.

SAM enzyme RlmN. PLoS One 11(11): 31. Langmead B, Trapnell C, Pop M, Salzberg SL
e0167298. [Link] (2009) Ultrafast and memory-efficient align-
pone.0167298 ment of short DNA sequences to the human
27. Dominissini D, Moshitch-Moshkovitz S, genome. Genome Biol 10(3):R25. https://
Salmon-Divon M, Amariglio N, Rechavi G [Link]/10.1186/gb-2009-10-3-r25
(2013) Transcriptome-wide mapping of N(6)- 32. Anders S, Pyl PT, Huber W (2014) HTSeq--a
methyladenosine by m(6)A-seq based on Python framework to work with high-
immunocapturing and massively parallel throughput sequencing data. Bioinformatics
sequencing. Nat Protoc 8(1):176–189. 31(2):166–169
[Link] 33. Thorvaldsdottir H, Robinson JT, Mesirov JP
28. Afgan E, Baker D, van den Beek M, (2013) Integrative Genomics Viewer (IGV):
Blankenberg D, Bouvier D, Cech M, high-performance genomics data visualization
Chilton J, Clements D, Coraor N, and exploration. Brief Bioinform 14
Eberhard C, Gruning B, Guerler A, Hillman- (2):178–192. [Link]
Jackson J, Von Kuster G, Rasche E, Soranzo N, bbs017
Turaga N, Taylor J, Nekrutenko A, Goecks J 34. Robinson JT, Thorvaldsdóttir H, Winckler W,
(2016) The Galaxy platform for accessible, Guttman M, Lander ES, Getz G, Mesirov JP
reproducible and collaborative biomedical ana- (2011) Integrative genomic viewer. Nat Bio-
lyses: 2016 update. Nucleic Acids Res 44(W1): technol 29:24–26
W3–W10. [Link] 35. Boccaletto P, Machnicka MA, Purta E,
gkw343 Pia˛tkowski P, Bagiński B, Wirecki TK, de
29. Blankenberg D, Gordon A, Von Kuster G, Crécy-Lagard V, Ross R, Limbach PA,
Coraor N, Taylor J, Nekrutenko A, Galaxy T Kotter A, Helm M (2017) MODOMICS: a
(2010) Manipulation of FASTQ data with Gal- database of RNA modification pathways.
axy. Bioinformatics 26(14):1783–1785. 2017 update. Nucleic Acids Res 46(D1):
[Link] D303–D307
btq281 36. Dunn JG, Weissman JS (2016) Plastid:
30. Langmead B, Salzberg SL (2012) Fast gapped- nucleotide-resolution analysis of next-
read alignment with bowtie 2. Nat Methods 9 generation sequencing and genomics data.
(4):357–359. [Link] BMC Genomics 17(1):958. [Link]
nmeth.1923 10.1186/s12864-016-3278-x
Chapter 8

Mapping RNA Modifications Using


Photo-Crosslinking-Assisted Modification Sequencing
Bryan R. Cullen and Kevin Tsai

Abstract
Epitranscriptomic RNA modifications function as an important layer of gene regulation that modulates the
function of RNA transcripts. A key step in understanding how RNA modifications regulate biological
processes is the mapping of their locations, which is most commonly done by RNA immunoprecipitation
(RIP) using modification-specific antibodies. Here, we describe the use of a photoactivatable
ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP) method, in conjunction
with RNA modification-specific antibodies, to map modification sites. First described as photo-
crosslinking-assisted m6A sequencing (PA-m6A-seq), this method allows the mapping of RNA modifica-
tions at a higher resolution, with lower background than traditional RIP, and can be adapted to any RNA
modification for which a specific antibody is available.

Key words Epitranscriptomic RNA modification, RNA immunoprecipitation, m6A, m5C, ac4C

1 Introduction

Epitranscriptomic RNA modifications have recently emerged as a


novel layer of gene regulation, where covalent modifications such as
methylations, acetylations, or isomerizations of individual bases
lead to a change in the fate of the modified mRNA. These changes
may include any step in RNA metabolism, such as splicing, nuclear
export, translation efficiency, and RNA decay, ultimately affecting
the outcome of diseases such as cancer and viral infections [1–
3]. While mRNAs have been known to bear internal chemical
modifications for almost half a century, it was only recently that
technical advances allowed the elucidation of the functions of these
modifications [4, 5]. The most important advance was no doubt
the development of the modification mapping technique methyl-
RNA immunoprecipitation (meRIP), alternatively named m6A-seq
[6–8]. This method relies on an antibody that recognizes N6-
methyladenosine (m6A), bearing a methyl group added to the N6
position of adenosine, to enrich for m6A-containing RNA, and

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

123
124 Bryan R. Cullen and Kevin Tsai

then uses high-throughput next-generation sequencing (NGS) to


efficiently identify the antibody-bound modified transcripts. Not
only is this technique straightforward and widely adapted, but also
the m6A-specific antibody can be easily switched for an antibody
recognizing a different RNA modification, such as N4-acetylcyti-
dine (ac4C) [9]. However, this RNA immunoprecipitation (IP)
approach suffers from two major issues: low resolution and high
background. The RNA subject to IP is typically chemically frag-
mented by alkaline hydrolysis, resulting in a range of RNA frag-
ments between 50 and 200 nts, averaging ~100 nts in length. After
sequencing, it is unclear where the modified residue is located
within the IP-enriched ~100 nt region, or if there is more then
one modified site. This is less of an issue with m6A, as putative
modification sites can be narrowed down to the RRACH (R ¼ G/
A; H ¼ A/C/U) motifs within the 100 nt region, yet several other
modifications are deposited with no clear sequence motif. Further-
more, we and others have typically observed high background from
meRIP/m6A-seq, where even the IP-enriched reads seem to have
transcriptome-wide coverage, with only a modest increase in read
depth in modification-enriched regions, thus necessitating the
search for regions that are enriched in the IP over the input control
(total RNA-seq), and perhaps a nonspecific IgG IP negative
control [10].
In order to overcome these issues with traditional RNA IP,
Chen et al. proposed a method to improve RNA modification
mapping using a method they named photo-crosslinking-assisted
m6A sequencing (PA-m6A-seq, outlined in Fig. 1) [11], which we
have successfully adapted for use in mapping various RNA modifi-
cations on viral transcripts [12–14]. This method is a variation of
the widely used protein-RNA interaction assay, photoactivatable
ribonucleoside-enhanced cross-linking and immunoprecipitation
(PAR-CLIP), specifically probing for interactions between the
m6A antibody and RNA [15]. Prior to RNA extraction, cells of
interest are first pulsed with the photoactivatable nucleoside analog
4-thiouridine (4SU), which is incorporated into transcribed RNAs
at a level between 0.1% and 1% of all uridine residues. The extracted
4SU+ RNA is then bound to the m6A-specific antibody and sub-
jected to UV cross-linking at a wavelength of 365 nm, which
efficiently cross-links 4SU residues to RNA-bound proteins, in
this case the antibody. The cross-linked antibody-RNA complexes
are then collected by immunoprecipitation and the RNA fragments
released by Proteinase K digestion of bound antibodies. This results
in a residual amino acid adjunct bound to the previously cross-
linked 4SU, a fragment of the digested antibody that leads to
misincorporation in the reverse transcription step in preparation
for RNA-seq, ultimately resulting in a T > C mutation at the cross-
linked site. As with PAR-CLIP, computational screening for this
T > C conversion in the sequencing reads allows the identification
RNA Modification Mapping Using PA-mod-seq 125

Fig. 1 Overview of PA-mod-seq. (a) Flowchart of the steps involved in the method, with PA-m6A-seq as an
example. (b) Example of PA-m6A-seq results visualized in Integrated Genome Viewer (IGV), m6A sites mapped
on the first 500 nts of Simian virus 40 (SV40) VP1 mRNA (GEO database accession #GSE106698) [14]. Red/
blue bars in the upper coverage pileup track denote sites of cross-link-induced T > C conversions, with the
height of red/blue bars proportional to the occurrence of T and C residues. Blue bars in the bottom individual
read track denote the location of T > C conversions in each read. Note that a diverse variety (3+) of T > C
conversion sites are expected in a good m6A peak

of reads that truly derive from antibody-bound RNA fragments,


thus allowing the elimination of almost all background reads. Fur-
thermore, an RNase footprinting step is included during immuno-
precipitation, so that any RNA that is not bound and protected by
the antibody will be degraded. This step not only further decreases
the observed background but also increases the modification
mapping resolution down to the size of the antibody, which pro-
tects ~32 nts of bound RNA [11]. During early testing of this
126 Bryan R. Cullen and Kevin Tsai

method with nonspecific IgG control antibodies, we were unable to


build any sequencing libraries from IgG immunoprecipitated sam-
ples, demonstrating the ability of RNase to effectively degrade
background RNA fragments. Overall, PA-m6A-seq results in mod-
ification maps of RNA-seq read peaks ~32 nts wide that are con-
firmed as antibody bound due to the presence of UV cross-linking-
induced T > C conversions, with very low numbers of background
reads between peaks. Through the use of different antibodies, we
have successfully adapted this method to map multiple different
RNA modifications including not only N6-methyladenosine (m6A)
but also 5-methylcytidine (m5C) and N4-acetylcytidine (ac4C), and
envision further adaptation to map other types of modifications as
antibodies become available [13, 16, 17]. To accommodate the
variety of modifications that can be mapped with this method,
below we refer to this method as PA-mod-seq.

2 Materials

2.1 Tissue Culture 1. 0.2 M 4-Thiouridine (4SU) (Sigma T4509 or Carbosynth


and RNA Preparation NT06186): Prepare a 0.2 M stock solution by dissolving
250 mg of 4SU in 4.8 mL DMSO. Aliquot in small volumes,
and keep solution at 80  C (see Note 1).
2. Trizol reagent.
3. Chloroform.
4. Isopropanol.
5. 70% Ethanol.
6. GlycoBlue Coprecipitant.
7. RNase-free H2O.
8. Poly(A)Purist MAG Kit.

2.2 Immuno- 1. IPP buffer: 10 mM Tris–HCl pH 7.4, 150 mM NaCl, 0.1%


precipitation NP-40 in RNase-free H2O.
and Cross-Linking 2. 40 U/μL RNaseIn.
3. Modification-specific antibody: We have successfully used the
following antibodies: anti-m6A (SySy #202003), anti-m5C
(Diagenode #C15200081), anti-ac4C (Abcam #ab252215).
4. 12-Well tissue culture plate.
5. Stratagene UV Stratalinker 2400 with 365 nm light source.
6. 1000 U/μL RNase T1.
7. Protein G magnetic beads.
8. DynaMag-2 Magnet.
9. Phosphate-buffered saline (PBS).
RNA Modification Mapping Using PA-mod-seq 127

10. PAR-CLIP IP wash buffer: 50 mM HEPES-KOH, pH 7.5,


300 mM KCl, 0.05% NP40 in RNase-free H2O.
11. PAR-CLIP high-salt wash buffer: 50 mM HEPES-KOH,
pH 7.5, 500 mM KCl, 0.05% NP40 in RNase-free H2O.

2.3 RNA End Repair 1. 10 U/μL Calf intestinal phosphatase (see Note 2).
2. 10 NEB CutSmart Buffer.
3. Phosphatase wash buffer: 50 mM Tris–HCl pH 7.5, 20 mM
EGTA-NaOH pH 7.5, 0.5% NP40 in RNase-free H2O.
4. PNK buffer without DTT: 50 mM Tris–HCl pH 7.5, 50 mM
NaCl, 10 mM MgCl2 in RNase-free H2O.
5. 1 M DTT.
6. ATP.
7. T4 Polynucleotide Kinase (T4-PNK).

2.4 RNA Elution 1. 20 mg/mL Proteinase K.


2. 4x Proteinase K buffer: 200 mM Tris–HCl pH 7.5, 300 mM
NaCl, 25 mM EDTA-NaOH pH 8, 4% SDS in RNase-free
H2O.
3. Trizol LS reagent.

2.5 Sequencing 1. NEB Next Small RNA Library Prep Set for Illumina.
Library Preparation 2. Novex 10-well 6% TBE gel.

3 Methods

3.1 Tissue Culture Start with an amount of cells that produce ~10 μg of RNA for the
immunoprecipitation step (see Notes 3 and 4). Passage cells so they
are ~70% confluent and actively growing the day before harvest.
Actively growing cells are essential to ensure efficient 4SU uptake
and incorporation into RNA.
1. Add 4SU directly to the cell culture media to a final concentra-
tion of 100 μM (see Note 1).
2. 16–24 h later, collect cells as appropriate for the cell type:
scrape off attached cells or collect suspension cells. Spin cells
down at 500  g for 10 min to pellet (see Note 5).
3. Wash cell pellet once with ice-cold PBS, spin down at 500  g
for 10 min, and remove PBS.
4. Lyse cell pellet directly in Trizol using 1 mL for every 107 cells.
Cells in Trizol can be stored at 80  C.
128 Bryan R. Cullen and Kevin Tsai

3.2 Total RNA 1. Aliquot every 1 mL of Trizol-cell lysate into 1.5 mL tubes.
Extraction Using Trizol 2. Add 200 μL chloroform to every 1 mL of Trizol, shake vigor-
ously for 15 s, and incubate at room temperature for 3 min.
3. Centrifuge at 12,000  g for 15 min at 4  C.
4. Collect the upper aqueous phase into a new tube with 500 μL
isopropanol (avoid collecting the white interphase), and incu-
bate at room temperature for 10 min.
5. Pellet RNA by centrifugation at 12,000  g for 15 min at 4  C.
6. Precipitate RNA with dH2O:NaOAc:EtOH ¼ 1:0.1:2.2 vol-
ume ([Link] μL each), and then add 1 μL of GlycoBlue.
Mix well and incubate at 80  C for 30 min (or on dry ice for
15 min) (see Note 6).
7. Pellet RNA again at 12,000  g for 20 min at 4  C.
8. Wash pellet with 1 mL 70% ethanol, and centrifuge at
12,000  g for 10 min at 4  C.
9. Remove supernatant. Do a quick spin and carefully remove
residual ethanol with a clean pipette.
10. Resuspend RNA pellet in RNase-free H2O, using 25 μL for
every 1 mL of starting Trizol.

3.3 Poly(A) Depending on the cell type, roughly 1–2.5% of total RNA is poly
Purification (A)+ mRNA. An IP reaction requires 8–12 μg of RNA; thus start by
using ~600 μg of total RNA for poly(A) purification, aiming for a
yield of 10 μg poly(A) + mRNA.
1. Follow the Poly(A)Purist MAG Kit instructions to isolate poly
(A)+ RNA. Resuspend the resulting RNA pellet in 30 μL
dH2O.
2. Measure the concentration of RNA with a Nanodrop spectro-
photometer prior to starting the IP.

3.4 Immuno- 1. Prepare IP mix by combining in a 1.5 mL tube, on ice: 10 μg


precipitation RNA (we use poly(A)-purified RNA from Subheading 3.3
and Cross-Linking above), 20 μL RNaseIn (a total of 800 units), 7.5 μg of
modification-specific antibody, and 800 μL of IPP buffer (see
Note 7).
2. Seal the tube lid with parafilm, and rotate at 4  C for 2 h to
overnight to allow the antibodies bind to the modified RNA.
3. Transfer the IP mix to a 12-well tissue culture plate well on ice.
Irradiate twice with 365 nm UV 2500  100 μJ/cm2 in a UV
Stratalinker with the lid off, and then transfer IP mix back into a
1.5 mL tube (the original sample tube can be reused) (see
Note 8).
4. Pre-warm a water bath in a cold room to 22  C.
RNA Modification Mapping Using PA-mod-seq 129

5. Dilute 2 μL RNase T1 in 58 μL IPP buffer to 0.1 U/μL (1/30


dilution). Add 3 μL of the 1/30 diluted RNase T1 to each IP
tube, and digest for 15 min in the 22  C water bath. Invert the
tubes every 5 min to mix, and then cool tubes for 5 min on ice
(leave the 22  C water bath on for later use).
6. While waiting for the RNase digestion, transfer 90 μL of pro-
tein G magnetic beads per IP to a new 1.5 mL tube, place on a
magnetic rack for 2 min, and then remove buffer. Wash the
beads by resuspending beads in 1 mL PBS, incubating on the
magnetic rack for 2 min, and removing PBS. Wash a total of
two times. Then resuspend beads in the original bead volume
(90 μL per IP) of IPP buffer.
7. After the RNase/IP mix has cooled, add 90 μL of pre-washed
protein G beads to each IP tube, and then rotate at 4  C for 1 h.
8. While waiting for the beads to bind antibody-RNA complexes,
aliquot 10 mL each of PAR-CLIP IP wash buffer and high-salt
wash buffer. Add 5 μL of 1 M DTT to each aliquot (add DTT
shortly before use).
9. After the bead capture incubation (step 7 above), isolate the
bead-antibody-RNA complexes by placing the IP tubes on a
magnetic rack for 2 min, and remove the supernatant. (This
supernatant contains the population of RNA not captured by
the antibody, so collect and save for analysis if needed.)
10. Wash the bead complexes with IP wash buffer: Resuspend
beads in 1 mL IP wash buffer, incubate tubes on the magnetic
rack for 2 min, and then carefully remove buffer without
touching beads. Repeat this wash a total of three times.
11. Resuspend beads in 100 μL of IP wash buffer, and add 1.5 μL
of RNase T1 to a final concentration of 15 U/μL. Digest for
15 min in the 22  C water bath. Invert the tubes every few
minutes, and then cool the tubes on ice for 5 min (the dephos-
phorylation reaction mix can be prepared while waiting for
this step).
12. Wash the beads three times with the high-salt wash buffer as
before.

3.5 RNA End Repair 1. Prepare dephosphorylation reaction mix by adding 5 μL CIP
(final concentration 0.5 U/μL), 10 μL of 10x NEB CutSmart
Buffer, and 85 μL of H2O. Resuspend beads from each IP in
100 μL of this reaction mix.
2. Incubate the dephosphorylation reaction on a tube shaker at
800 rpm in a 37  C incubator for 10 min.
3. Wash beads twice with 500 μL of phosphatase wash buffer as
before.
4. Wash beads twice with 500 μL PNK buffer without DTT.
130 Bryan R. Cullen and Kevin Tsai

5. Prepare a PNK reaction mix, for each IP: 10 μL of T4-PNK


(final 1 U/μL), 88.5 μL of PNK buffer without DTT, 0.5 μL of
1 M DTT (final 5 mM), and 1 μL of 100 μM ATP (final
10 mM).
6. Resuspend beads in PNK reaction mix, 100 μL per IP. Incubate
the PNK reaction on a tube shaker at 800 rpm in a 37  C
incubator for 30 min.
7. Wash beads three times with 500 μL of PNK buffer
without DTT.

3.6 RNA Elution 1. Prepare the Proteinase K elution mix, for each IP: 75 μL of 4x
Proteinase K buffer, 225 μL of H2O, and 4.2 μL of Proteinase
K (~85 μg).
2. Resuspend the washed beads in Proteinase K elution mix
(300 μL per IP), and then incubate at 50  C for 90 min,
tapping the tubes to mix every 15 min (alternatively, use an
Eppendorf thermoshaker set at 900 rpm).
3. After proteinase digestion, all RNA originally bound to anti-
bodies should be in the supernatant (~300 μL). Transfer super-
natant to a fresh tube.
4. Remove Proteinase K using Trizol LS: Add 900 μL Trizol LS to
the 300 μL eluate, and mix well.
5. Add 240 μL of chloroform, vortex for 15 s, wait for 2 min, and
then centrifuge at 12,000  g for 15 min at 4  C to separate the
organic phase from the aqueous phase.
6. Collect the upper aqueous phase into a new tube, and mix with
600 μL of isopropanol and 1 μL of GlycoBlue Coprecipitant.
Wait for 10 min, and then centrifuge at 12,000  g for 20 min
at 4  C to pellet RNA (see Note 9).
7. (Optional cleanup) Resuspend RNA pellet in 300 μL of H2O,
and then add 30 μL of sodium acetate, 1 μL of GlycoBlue, and
660 μL of 100% ethanol. Precipitate at 20  C for 2 h or
overnight. Then centrifuge at 12,000  g for 30 min at 4  C
(see Note 10).
8. Wash pellet with 70% ethanol, vortex briefly, and centrifuge at
12,000  g for 10 min at 4  C.
9. Resuspend each pellet in 15 μL of H2O.

3.7 Sequencing 1. Prepare sequencing libraries with the NEB Next Small RNA
Library Preparation Library Prep Set for Illumina following the kit instructions.
The expected RNA size is the RNase footprint of the antibody,
which is typically ~32–50 nt. With the ligated 50 and 30 adapters
totaling 120 nt, it is necessary to isolate bands ~150–170 nt on
a TBE polyacrylamide gel. We have found the Invitrogen
Novex 6% TBE gel to give the best separation between our
desired product and adapter dimers (120 nt).
RNA Modification Mapping Using PA-mod-seq 131

2. With expected pulled-down RNA sizing of 32–50 nt, we have


found Illumina sequencing at the 50 nt single-read mode to be
sufficient. While paired-end sequencing may increase the data
quality, the difference is not noticeable when sufficient read
depth is obtained.

3.8 Sequencing Data Sequencing result analysis typically involves trimming off adaptor
Analysis sequences, discarding any reads shorter than 15 nts, aligning
sequencing reads to the genome sequence of interest, screening
for T > C conversions, and reformatting the alignment information
for visualization in the Integrated Genome Viewer (IGV) (see
example in Fig. 1b).
1. We use the FASTX toolkit [18] for sequencing read preproces-
sing; this includes screening for reads with a FASTQ quality
score >Q33, removing adapter sequences, and selecting for a
minimum read length of 15 nts. Alternatively, we have used
Cutadapt [19] with good results.
2. We then use Bowtie [20, 21] for alignment of the sequencing
reads to the genome of interest. If you are interested in human
cellular transcripts, for example, you can align your reads to a
human genome build such as hg19. If you are interested in viral
transcripts, you should pre-align the reads to the host genome,
then take the host non-aligning reads, and align them to the
viral genome. If your model system of interest has a heavily
spliced transcriptome, it might be informative to align to the
transcriptome with a splice-aware aligner, such as TopHat
[22, 23] or STAR [24].
3. To screen for T > C conversions, freely available analysis
packages such as PARalyzer [25] are available. If your model
system of interest has a simple genome like a virus, then a
simple script can be used to screen for T > C conversions.
Note that if you only use a simple T > C conversion screen
with no statistics, we would recommend manually looking at
each peak with the following criteria for reliable peaks: Each
peak needs to consist of more than three distinct reads (varying
in length or alignment location so that they are unlikely to be
PCR duplicate reads), with at least three different locations of
T > C conversions.
4. The Integrated Genome Viewer (IGV) from the Broad Insti-
tute [26] can be used to visualize sequencing results. Running
Bowtie with the “--sam” argument will give results output in
the SAM format. IGV reads alignments in the BAM format
more efficiently, with a requirement that BAM files be pre-
sorted and indexed. The SAMtools suite [27] can be used to
convert SAM to BAM, and sort and index the files for loading
into IGV.
132 Bryan R. Cullen and Kevin Tsai

4 Notes

1. If the model system has an unusually low abundance of uridines


in the genes of interest, 6-thioguanosine (6SG, Sigma
#858412 or Carbosynth #NT04480) can be used as an alter-
native to 4SU. While we have not tested 6SG to date, it was
previously noted to have lower cross-linking efficiency, and will
result in G > A conversions [15].
2. We have used NEB CIP that has recently been discontinued.
One potential alternative is alkaline phosphatase from Roche
(#11097075001, 20 U/μL). The unit definitions from both
NEB and Roche are the same: the amount of enzyme that
hydrolyzes 1 μmol of p-nitrophenylphosphate (pNPP) in a
1 mL reaction at 37  C. Thus half the volume (2.5 μL) of the
Roche CIP would be needed per reaction.
3. We have obtained from 1.1  108 HIV-1-infected CEM T cells
700 μg of total RNA, which yields 10 μg of poly(A)+ RNA
(at 1.4% poly(A)+). For 293 T cells, we typically start with
10  15 cm plates at 50% confluency, whereas a good starting
point for lymphoblastoid cell lines (Epstein-Barr virus immor-
talized B cells) would be 1.8  108 cells grown at a concentra-
tion of 6  105 cells/mL.
4. For HIV-1 studies, we typically use CEM T cells infected with
HIV-1, remove the input virus, resuspend cells in fresh media
1 day postinfection (dpi), supplement with 4SU 2 dpi, and
harvest at 3 dpi.
5. If mapping of RNA modifications on the genomic RNA of
viruses is desired, as can be done with HIV-1, the supernatant
of infected cells can also be collected for isolation of viral
particles. Viral particles can be pelleted through a 20% sucrose
cushion, and the virion pellet can then be lysed in Trizol to
extract virion RNA, as described by Eckwahl et al. [28]. Virion
RNA does not need to be poly(A) purified and can be directly
used for immunoprecipitation.
6. The RNA re-precipitations in Subheading 3.2, steps 6 and 7,
are optional if you typically do not see high salt contamination
after Trizol extraction, or if the downstream poly
(A) purification is not needed, as with virion-extracted RNA.
7. We recommend saving a 0.5 μg + aliquot of the input RNA
used for the IP reaction. This input sample can be analyzed by
RNA-seq run alongside the PA-mod-seq IP sample to measure
the relative expression level of each transcript.
8. As UV does not penetrate plastic plate lids well, it is essential to
cross-link with the lid off.
RNA Modification Mapping Using PA-mod-seq 133

9. The RNA pellet at this step will be small yet visible. If the pellet
is not visible, add an additional 100 μL of isopropanol and 1 μL
of GlycoBlue, mix well, and repeat the 20-min centrifugation
step. A preincubation of the tube at 80  C or on dry ice prior
to centrifugation may also enhance precipitation.
10. This optional re-precipitation is to ensure minimum salt con-
tamination going into the library preparation. If the RNA
pellet from the previous step is very small and hard to spot,
we omit this step to avoid loss of the pellet.

Acknowledgments

This research was funded in part by NIH grants R01-DA046111


and U54-GM103297 to B.R.C., along with a Duke University
Center for AIDS Research (CFAR, P30-AI064518) pilot award
to K.T.

References
1. Li S, Mason CE (2014) The pivotal regulatory methyladenosine by m(6)A-seq based on
landscape of RNA modifications. Annu Rev immunocapturing and massively parallel
Genomics Hum Genet 15:127–150. https:// sequencing. Nat Protoc 8(1):176–189.
[Link]/10.1146/annurev-genom-090413- [Link]
025405 8. Dominissini D, Moshitch-Moshkovitz S,
2. Wang GG, Allis CD, Chi P (2007) Chromatin Schwartz S, Salmon-Divon M, Ungar L,
remodeling and cancer, part II: Osenberg S, Cesarkas K, Jacob-Hirsch J,
ATP-dependent chromatin remodeling. Amariglio N, Kupiec M, Sorek R, Rechavi G
Trends Mol Med 13(9):373–380. https:// (2012) Topology of the human and mouse
[Link]/10.1016/[Link].2007.07.004 m6A RNA methylomes revealed by m6A-seq.
3. Kennedy EM, Courtney DG, Tsai K, Cullen Nature 485(7397):201–206. [Link]
BR (2017) Viral epitranscriptomics. J Virol 91 10.1038/nature11112
(9). [Link] 9. Arango D, Sturgill D, Alhusaini N, Dillman
4. Desrosiers R, Friderici K, Rottman F (1974) AA, Sweet TJ, Hanson G, Hosogane M, Sin-
Identification of methylated nucleosides in clair WR, Nanan KK, Mandler MD, Fox SD,
messenger RNA from Novikoff hepatoma Zengeya TT, Andresson T, Meier JL, Coller J,
cells. Proc Natl Acad Sci U S A 71 Oberdoerffer S (2018) Acetylation of cytidine
(10):3971–3975. [Link] in mRNA promotes translation efficiency. Cell
pnas.71.10.3971 175(7):1872–1886. e1824. [Link]
5. Lavi S, Shatkin AJ (1975) Methylated simian 10.1016/[Link].2018.10.030
virus 40-specific RNA from nuclei and cyto- 10. McIntyre ABR, Gokhale NS, Cerchietti L, Jaf-
plasm of infected BSC-1 cells. Proc Natl Acad frey SR, Horner SM, Mason CE (2020) Limits
Sci U S A 72(6):2012–2016. [Link] in the detection of m6A changes using
10.1073/pnas.72.6.2012 MeRIP/m6A-seq. Sci Rep 10(1):6590.
6. Meyer KD, Saletore Y, Zumbo P, Elemento O, [Link]
Mason CE, Jaffrey SR (2012) Comprehensive 63355-3
analysis of mRNA methylation reveals enrich- 11. Chen K, Lu Z, Wang X, Fu Y, Luo GZ, Liu N,
ment in 3’ UTRs and near stop codons. Cell Han D, Dominissini D, Dai Q, Pan T, He C
149(7):1635–1646. [Link] (2015) High-resolution N(6)-methyladeno-
1016/[Link].2012.05.003 sine (m(6) A) map using photo-crosslinking-
7. Dominissini D, Moshitch-Moshkovitz S, assisted m(6) A sequencing. Angew Chem Int
Salmon-Divon M, Amariglio N, Rechavi G Ed Engl 54(5):1587–1590. [Link]
(2013) Transcriptome-wide mapping of N(6)- 10.1002/anie.201410647
134 Bryan R. Cullen and Kevin Tsai

12. Courtney DG, Kennedy EM, Dumm RE, Chapter 11:Unit 11, 17. doi:[Link]
Bogerd HP, Tsai K, Heaton NS, Cullen BR 10.1002/0471250953.bi1107s32
(2017) Epitranscriptomic enhancement of 21. Langmead B, Trapnell C, Pop M, Salzberg SL
influenza a virus gene expression and replica- (2009) Ultrafast and memory-efficient align-
tion. Cell Host Microbe 22(3):377–386. e375. ment of short DNA sequences to the human
[Link] genome. Genome Biol 10(3):R25. https://
004 [Link]/10.1186/gb-2009-10-3-r25
13. Kennedy EM, Bogerd HP, Kornepati AV, 22. Trapnell C, Pachter L, Salzberg SL (2009)
Kang D, Ghoshal D, Marshall JB, Poling BC, TopHat: discovering splice junctions with
Tsai K, Gokhale NS, Horner SM, Cullen BR RNA-Seq. Bioinformatics 25(9):1105–1111.
(2016) Posttranscriptional m(6)a editing of [Link]
HIV-1 mRNAs enhances viral gene expression. btp120
Cell Host Microbe 19(5):675–685. https:// 23. Trapnell C, Roberts A, Goff L, Pertea G,
[Link]/10.1016/[Link].2016.04.002 Kim D, Kelley DR, Pimentel H, Salzberg SL,
14. Tsai K, Courtney DG, Cullen BR (2018) Addi- Rinn JL, Pachter L (2012) Differential gene
tion of m6A to SV40 late mRNAs enhances and transcript expression analysis of RNA-seq
viral structural gene expression and replication. experiments with TopHat and Cufflinks. Nat
PLoS Pathog 14(2):e1006919. [Link] Protoc 7(3):562–578. [Link]
org/10.1371/[Link].1006919 1038/nprot.2012.016
15. Hafner M, Landthaler M, Burger L, 24. Dobin A, Davis CA, Schlesinger F, Drenkow J,
Khorshid M, Hausser J, Berninger P, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras
Rothballer A, Ascano M Jr, Jungkamp AC, TR (2013) STAR: ultrafast universal RNA-seq
Munschauer M, Ulrich A, Wardle GS, aligner. Bioinformatics 29(1):15–21. https://
Dewell S, Zavolan M, Tuschl T (2010) [Link]/10.1093/bioinformatics/bts635
Transcriptome-wide identification of 25. Corcoran DL, Georgiev S, Mukherjee N,
RNA-binding protein and microRNA target Gottwein E, Skalsky RL, Keene JD, Ohler U
sites by PAR-CLIP. Cell 141(1):129–141. (2011) PARalyzer: definition of RNA binding
[Link] sites from PAR-CLIP short-read sequence
16. Courtney DG, Tsai K, Bogerd HP, Kennedy data. Genome Biol 12(8):R79. [Link]
EM, Law BA, Emery A, Swanstrom R, Holley org/10.1186/gb-2011-12-8-r79
CL, Cullen BR (2019) Epitranscriptomic addi- 26. Robinson JT, Thorvaldsdottir H, Winckler W,
tion of m5C to HIV-1 transcripts regulates Guttman M, Lander ES, Getz G, Mesirov JP
viral gene expression. Cell Host Microbe 26 (2011) Integrative genomics viewer. Nat Bio-
(2):217–227.e216. [Link] technol 29(1):24–26. [Link]
1016/[Link].2019.07.005 1038/nbt.1754
17. Tsai K, Jaguva Vasudevan AA, Martinez 27. Li H, Handsaker B, Wysoker A, Fennell T,
Campos C, Emery A, Swanstrom R, Cullen Ruan J, Homer N, Marth G, Abecasis G, Dur-
BR (2020) Acetylation of cytidine residues bin R (2009) Genome project data processing
boosts HIV-1 gene expression by increasing S: the sequence alignment/map format and
viral RNA stability. Cell Host Microbe 28 SAMtools. Bioinformatics 25
(2):306–312.e6. [Link] (16):2078–2079. [Link]
chom.2020.05.011 bioinformatics/btp352
18. Gordon A, Hannon G (2010) FastX toolkit. 28. Eckwahl MJ, Arnion H, Kharytonchyk S,
[Link] Zang T, Bieniasz PD, Telesnitsky A, Wolin SL
19. Martin M (2011) Cutadapt removes adapter (2016) Analysis of the human immunodefi-
sequences from high-throughput sequencing ciency virus-1 RNA packageome. RNA 22
reads 2011. EMBnet J 17(1):3. [Link] (8):1228–1238. [Link]
org/10.14806/ej.17.1.200 rna.057299.116
20. Langmead B (2010) Aligning short sequencing
reads with Bowtie. Curr Protoc Bioinformatics
Chapter 9

Quantitative and Single-Nucleotide Resolution Profiling


of RNA 5-Methylcytosine
Jun Li, Xingyu Wu, Trung Do, Vy Nguyen, Jing Zhao, Pei Qin Ng,
Alice Burgess, Rakesh David, and Iain Searle

Abstract
RNA has coevolved with numerous posttranscriptional modifications to sculpt interactions with proteins
and other molecules. One of these modifications is 5-methylcytosine (m5C) and mapping the position and
quantifying the level in different types of cellular RNAs and tissues is an important objective in the field of
epitranscriptomics. Both in plants and animals bisulfite conversion has long been the gold standard for
detection of m5C in DNA but it can also be applied to RNA. Here, we detail methods for highly
reproducible bisulfite treatment of RNA, efficient locus-specific PCR amplification, detection of candidate
sites by sequencing on the Illumina MiSeq platform, and bioinformatic calling of non-converted sites.

Key words Bisulfite conversion, Epitranscriptome, Fluidigm Access Array, Illumina, Next-generation
sequencing, 5-Methylcytosine

1 Introduction

Cellular RNAs can be modified, or decorated, with more than


120 chemically and structurally distinct nucleoside modifications
[1]. The emerging field of epitranscriptomics [2] has been enabled
by the development of high-throughput mapping methods for
RNA modifications, typically based on second-generation sequenc-
ing. Transcriptome-wide positions of N1-methyladenosine (m1A,
[3–5]), N6-methyladenosine (m6A, [6, 7]), 5-methylcytosine
(m5C, [8]), and pseudouridine [9] have each been reported in
this way. To detect m5C in RNA, a range of methods have been
developed, including the indirect (aza-IP [10], miCLIP [11])
immunoprecipitation of methylated RNA or direct methods
(meRIP, [7]). Of particular interest here, the bisulfite conversion
approach previously used for DNA has been adapted to RNA
[12, 13]. Bisulfite conversion of nucleic acids takes advantage of
the differential chemical reactivity of m5C compared to

Mary McMahon (ed.), RNA Modifications: Methods and Protocols , Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

135
136 Jun Li et al.

Fig. 1 Protocol overview showing the workflow for either parallel or single amplicon amplification for effective
detection of m5C. (a) Parallel amplification and sequencing of up to 2304 amplicons across 48 tissues and
48 primer pairs. Forty-eight different tissues can be selected, total RNA isolated and purified, spiked with
MGFP in vitro-transcribed control RNA and bisulfite converted. Bisulfite-converted RNA is reverse transcribed
Single-nucleotide Resolution of RNA m5C 137

unmethylated cytosines; unmethylated cytosines are deaminated to


uracil while m5C remains as a cytosine.
The RNA bisulfite conversion method has been applied to
animals and plants [8, 14] using second-generation sequencing,
for example Illumina, based transcriptome-wide readout and has
mapped thousands of novel candidate m5C sites in a diverse array of
RNAs, including mRNAs and long noncoding RNAs (lncRNAs).
Here, we detail protocols for RNA bisulfite conversion, locus-
specific PCR amplification of up to 2304 amplicons, and bioinfor-
matics calling of converted or non-converted sites. Sequencing of
PCR amplicons is conveniently done on the Illumina MiSeq, as this
affords multiplexing of multiple distinct amplicons while still
achieving ample read depth for estimating the proportion of m5C
at targeted positions. For instance, each of the 96 Fluidigm indexed
adapters could be assigned to a separate RNA derived from differ-
ent tissues, and 96 multiple PCR amplicons per sample could be
included in the sequencing pool, potentially generating thousands
of independent quantitative measurements of the m5C levels in a
single MiSeq run (Fig. 1).

2 Materials

Prepare all solutions using RNase-free and DNase-free H2O and


analytical grade reagents. Store and prepare all reagents at room
temperature unless indicated otherwise. Prepare and perform bisul-
fite conversion, cDNA synthesis, and PCR amplification experi-
ments in an RNase-free area. Follow all state or national safety
and waste disposal regulations when performing experiments.

2.1 Total RNA 1. TRIzol™ reagent.


Extraction and In Vitro 2. Chloroform.
Transcription
3. Isopropanol.
4. 75% Ethanol.
5. Ultrapure™ H2O.
6. Monster Green® Fluorescent protein phMGFP vector.
ä

Fig. 1 (continued) (RT) to cDNA using gene-specific RT primers that include the positive control MAG5
(AT5G47480) and negative control MGFP. Target regions are PCR amplified using a Fluidigm Access Array
Integrated Fluidic Circuit (IFC); up to 2304 amplicons are harvested and eluted pools are quantified. Equal
concentrations of the pools are combined into a final pool, purified using AMPure beads, accurately quantified,
PhiX control library spiked-in, and subjected to sequencing on the Illumina MiSeq platform. (b) Single amplicon
amplification and sequencing. A single tissue is selected, RNA isolated and purified in triplicate, spiked with
MGFP in vitro-transcribed control RNA and bisulfite converted. Bisulfite conversion and cDNA synthesis are the
same as outlined above except a specific target RT primer is used. The target amplicon is PCR amplified,
triplicate amplicons are pooled, size and concentration are assessed on a Shimadzu MultiNA, and amplicons
are pooled at equal concentration. Pooled amplicons are purified, PhiX control library spiked-in, and subjected
to sequencing on the Illumina MiSeq platform
138 Jun Li et al.

7. XbaI restriction enzyme.


8. HiScribe™ T7 In Vitro Transcription Kit.
9. TURBO™ DNase.
10. Phase Lock Gel ™ QuantBio (2.0 mL).
11. UltraPure ™ Phenol:Water (3.75:1 v/v).
12. 100% Ethanol.
13. 3 M sodium acetate, pH 5.2.
14. 5 mg/mL Glycogen.
15. Agilent RNA 6000 Nano Kit.
16. Biological tissue samples (animal or plant).

2.2 Sodium Bisulfite 1. Sodium bisulfite solution: 40% (w/v) Sodium metabisulfite,
Conversion 0.6 mM hydroquinone, final pH 5.1.
To prepare the sodium bisulfite solution, prepare the
following:
0.6 M Hydroquinone: Weigh 66 mg hydroquinone and
place into a 1.5 mL tube. Add H2O to 1 mL and cover in foil to
protect from light. Place in an orbital shaker to dissolve.
40% (w/v) Sodium bisulfite: Dissolve 4 g sodium metabi-
sulfite in 10 mL H2O in a 50 mL falcon tube and vortex until it
completely dissolves.
Add 10μL 0.6 M hydroquinone to the 40% sodium bisul-
fite solution, vortex, and adjust pH to 5.1 with 10 M NaOH.
Filter the solution through a 0.2μm filter. Cover in foil to
protect from light (see Note 1).
2. 1 M Tris–HCl, pH 9.0.
3. Micro Bio-Spin ™ P-6 Gel Columns.
4. Mineral oil.
5. 75% Ethanol.
6. 100% Ethanol.
7. 3 M sodium acetate, pH 5.2.
8. 5 mg/mL Glycogen.

2.3 cDNA Synthesis 1. SuperScript ™ III Reverse Transcriptase.


2. 10 mM Mixed dNTPs.
3. Single-target priming: 20μM Gene-specific oligo for each
amplicon. Here is an example of a positive control (MAG5)
and a negative control (mGFP) (all C should be converted to
U) primer sequence:
MAG5: CACACACACCCATACATCCAC.
mGFP: AACAAAAAAATTAACCCCATC
4. Pool target priming: Up to 48 primers at 20μM each. Design
for target genes of interest.
Single-nucleotide Resolution of RNA m5C 139

2.4 PCR Amplicon 1. KAPA HiFi DNA polymerase.


Amplification 2. First PCR primers: Examples of positive control (MAG5) and
negative control (mGFP) first round primer primers are shown
(underlined sequence ¼ tag, BC ¼ 8 nt barcode):
MAG5 F: ACACTGACGACATGGTTCTACAGGTAAAGGT
AAAATTGGGTAATGAG.
MAG5 R:
TACGGTAGCAGAGACTTGGTCT-[BC]-AGACCAAGTCT
CTGCTACCGTA.
mGFP F: ACACTGACGACATGGTTCTACAGAGGGTGAT
GGGAAAGGTAAG.
mGFP R: TACGGTAGCAGAGACTTGGTCTCAATCATCC
ACACCCTTCATC
3. Second PCR primers (underlined sequence ¼ tag, BC ¼ 8 nt
barcode):
P5_CS1_F: AATGATACGGCGACCACCGAGATCTACACT
GACGACATGGTTCTACA P7_BC_CS2_R:
CAAGCAGAAGACGGCATACGAGAT -[BC]-TACGGTAG-
CAGAGACTTGGTCT
4. 10 mM Mixed dNTPs.
5. Fluidigm Access Array Integrated Fluidic Circuit (IFC) 48.48.
6. FastStart ™ High Fidelity PCR System, dNTPack.
7. 20 Access Array Loading Reagent.
8. 1 Access Array Harvest Solution.
9. 1 Access Array Hydration Reagent v2.
10. Access Array Barcode primers for Illumina Sequencers-384:
Single Direction.

2.5 MultiNA 1. DNA-500 Kit (Shimadzu Corporation).


Microelectrophoresis
System

2.6 PCR Amplicon 1. Agencourt AMPure XP beads.


Purification 2. Qubit ™ dsDNA Broad Range Assay Kit.
and Quantification
3. KAPA Library Quantification Kit (Universal).

2.7 Library 1. 0.2 M NaOH.


Sequencing 2. Illumina MiSeq Reagent Kit v3 (150 or 600 cycles) (see
Components Note 2).

3 Methods

Carry out all procedures described below at room temperature


unless otherwise stated.
140 Jun Li et al.

3.1 RNA Extraction, Total RNA is extracted and purified directly from tissue with 1 mL
Purification, of TRIzol™ as per the manufacturer’s protocol. RNA is then
and DNase Treatment treated with TURBO™ DNase as per the manufacturer’s protocol.
Assess the integrity of the RNA by using a RNA 6000 Nano Chip
on the Agilent 2100 Bioanalyzer according to the manufacturer’s
protocol.

3.2 Generation 1. Linearize the phMGFP vector by using the restriction enzyme
of the MGFP In Vitro XbaI and purify the linearized DNA vector according to the
Transcript Spike-in HiScribe T7 In Vitro Transcription Kit protocol.
Control 2. Perform in vitro transcription according to the HiScribe T7 In
Vitro Transcription Kit protocol by using 1μg of linearized
DNA. An incubation period of 4 h at 37  C with the kit
components is sufficient.
3. Add 2 U TURBO™ DNase and incubate at 37  C for 30 min.
4. Transfer the reaction to a Phase Lock Gel™ tube and make the
volume of the reaction up to 100μL with ultrapure H2O.
5. Add an equal volume of phenol:water and chloroform, shake
vigorously for 15 s, and centrifuge at 15,000  g for 5 min.
6. Add the same volume of chloroform as in step 5 to the tube,
shake vigorously for 15 s, and centrifuge at 15,000  g for
5 min again.
7. Transfer the aqueous phase to a clean 1.5 mL tube. Add 1/10
volume 3 M sodium acetate, 3 volumes of 100% ethanol, and
1μL glycogen; vortex; and precipitate the RNA overnight at
80  C.
8. Centrifuge RNA at 17,000  g at 4  C for 60 min and carefully
remove the supernatant.
9. Add 1 mL 75% ethanol to the RNA, invert five times, and
centrifuge at 7500  g at 4  C for 10 min (see Note 3).
10. Carefully remove the supernatant and let the pellet air-dry for
approximately 15 min (see Note 4).
11. Resuspend the RNA in 25μL of ultrapure H2O.
12. Optional step: Treat 5μg of in vitro-transcribed MGFP tran-
script with 2 U TURBO™ DNase according to the manufac-
turer’s protocol at 37  C for 30 min.
13. Assess the integrity and size of the MGFP in vitro transcripts by
using an RNA 6000 Nano Chip on the Agilent 2100 Bioana-
lyzer according to the manufacturer’s protocol (see Note 5).

3.3 Bisulfite 1. Add 1/2000 of the MGFP RNA transcript to 2μg DNase-
Conversion of RNA treated purified total RNA. Increase the volume of the RNA
sample to 20μL with ultrapure H2O.
2. Denature RNA by heating to 75  C for 5 min in a heat block.
Single-nucleotide Resolution of RNA m5C 141

3. Preheat the sodium bisulfite solution to 75  C, add 100μL to


the RNA, vortex thoroughly, and briefly spin in a microcentri-
fuge (13,000  g for 1 min).
4. Overlay the reaction mixture with 100μL of mineral oil. Cover
the tube in aluminum foil to protect the reaction mixture from
light (see Note 6).
5. Incubate at 75  C for 4 h in a heat block.
6. About 15 min before the bisulfite conversion reaction is com-
plete, prepare two Micro Bio-Spin Columns for each conver-
sion reaction by allowing the Tris solution in the column to
drain into a collection tube. Discard the Tris flow-through,
place the column back into the collection tube, and centrifuge
at 1000  g for 2 min. Transfer each column to a clean 1.5 mL
tube (see Note 7).
7. Remove the bisulfite reaction mixture from the heat block and
gently transfer the aqueous layer (that is under the mineral oil)
containing the sodium bisulfite/RNA mixture to the Micro
Bio-Spin column (see Note 8).
8. Centrifuge at 1000  g for 4 min.
9. Carefully transfer the eluate into the second Micro Bio-Spin
column placed in a 1.5 mL tube and repeat step 8.
10. Preheat the temperature of the heat block to 75  C in prepara-
tion for step 12.
11. Add an equal volume of 1 M Tris–HCl pH 9.0 to the second
eluate, vortex, spin briefly, and then overlay with 175μL of
mineral oil. Cover the tube in aluminum foil to protect the
reaction mixture from light.
12. Incubate at 75  C for 1 h in the heat block.
13. Transfer the bottom aqueous layer containing the RNA to a
clean 1.5 mL tube.
14. Precipitate the bisulfite-treated RNA by following steps 7–11
in Subheading 3.2 and resuspend the bisulfite-converted RNA
in H2O (see Note 9).

3.4 Bisulfite 1. For efficient parallel amplification of 48 target amplicons on the


Oligonucleotide Primer Fluidigm Access Array, use targeted cDNA synthesis to reduce
Design for cDNA the amplification of spurious amplicons. Targeted cDNA syn-
Synthesis and PCR thesis is achieved by designing reverse transcriptase
(RT) primers 30–40 nt 30 of the cytosine(s) to be assayed. N.
B.: Design the RT primers such that they avoid areas of
bisulfite-converted cytosines as inefficient BS conversion may
result in unconverted cytosines and biasing of later amplifica-
tion. See Fig. 2.
142 Jun Li et al.

Fig. 2 Overview of bisulfite conversion of RNA, reverse transcription to cDNA, and PCR amplification. (a) In the
in vitro-transcribed MGFP sequence, unmodified cytosines (underlined) are converted to uracil, reverse
transcribed (RT) by reverse transcriptase to cDNA, and then PCR amplified. RT and PCR primers are designed
to avoid stretches of converted cytosines to prevent preferential amplification of converted sequences which
may incorrectly indicate efficient bisulfite conversion. (b) In MAG5 control and other candidate sequences,
primers are designed to span areas containing converted cytosines to preferentially amplify converted
sequences. C3349 is methylated in Arabidopsis thaliana and serves as an over-conversion control. Flanking
cytosines are not methylated and should be completely converted. Primers are designed with a Tm of
59–61  C, preferably with a 30 G nucleotide and to amplify PCR products of 170–200 bp
Single-nucleotide Resolution of RNA m5C 143

Fig. 3 Overview of first and second PCR amplification of target regions. (a) For the first PCR, the forward PCR
primer is designed with the gene-specific sequence (GS) and universal forward tag called Common Sequence,
CS1 (50 - TACGGTAGCAGAGACTTGGTCT -30 ), and reverse PCR primer is designed with the gene-specific
sequence (GSS) and universal reverse tag called Common Sequence CS2 (5’-ACACTGACGACATGGTTCTACA
-30 ). (b) For the second PCR, the forward primer is designed with the CS1 and Illumina P5 sequences and the
reverse primer contains the CS2, barcoding, and Illumina P7 sequences. The Fluidigm barcodes or indexes are
10 nt in length

2. Design primers for the first round of PCR amplification so that


small amplicons are 170–200 bp, to allow efficient amplifica-
tion (see Notes 10 and 11). As the G/C content in the tem-
plate is low, design long primers to ensure that a Tm is in the
rage of 59–61  C. Add the CS1 sequence (50 -TACGGTAGCA
GAGACTTGGTCT -30 ) to the forward primer gene-specific
sequence (GSS) and CS2 (50 - ACACTGACGACATGGTTC
TACA -30 ) to the reverse primer GSS. For the second PCR
amplification, use the forward primer containing the comple-
mentary sequences to the P5 Illumina flow cell combined with
CS1 (P5_CS1) and the reverse primer containing the barcode,
and complementary sequences to the P7 Illumina flow cell
combined with CS2 (P7_BC_CS2) primer (see Note 12). See
Fig. 3.

3.5 cDNA Synthesis 1. Mix 500 ng of bisulfite-converted RNA, 1μL of 1 mM dNTP


mix, and 2μL of 10 pooled primer mix and add ultrapure
H2O to a final volume of 13μL. Incubate the mix at 65  C for
5 min to denature the RNA.
2. Reverse transcribe the bisulfite-converted RNA using Super-
Script™ III Reverse Transcriptase according to the manufac-
turer’s protocol. Add either pooled 48 RT primers for parallel
Access Array amplification or random hexamers for single-PCR
amplicons.
144 Jun Li et al.

Suggested controls: Include RT minus controls for each


sample as the PCR primers are not necessarily designed to span
exon-exon junctions. In the controls, use 1μL of H2O instead
of reverse transcriptase.
3. After the reaction is complete, dilute the cDNAs 1:10 in ultra-
pure H2O for PCR amplification.

3.6 Individual PCR 1. For a 10μL PCR, add 0.2μL of KAPA HiFi DNA Polymerase,
Amplification, 2μL of 5 HiFi Fidelity buffer (with MgCl2), 0.3μL of 10 mM
Quantification, dNTP, 0.4μL of 10μM forward primer (CS1_GSS), 0.4μL of
and Pooling 10μM reverse primer (CS2_GSS), 1μL of diluted cDNA, and
H2O to a final volume of 10μL. Perform PCR for each ampli-
con in triplicate.
2. Gently finger vortex, briefly centrifuge, and place into a pre-
heated thermal cycler.
3. Perform a two-step thermal cycling PCR program. See Table 1
for more details.
4. Pool the triplicates and perform an AMPure bead cleanup at a
ratio of 1.8:1 to remove unincorporated primers and primer
dimers. Repeat this step (see Notes 13 and 14).
5. Assess PCR amplicon size and concentration after separation
on a Shimadzu Microchip Electrophoresis System MCE®-202
MultiNA.
6. Normalize the concentration of each amplicon in the experi-
ment by dilution with H2O to a concentration in the range of
0.5–5 ng/μL.
7. Perform the barcoding and Illumina adapter addition PCR. In
a 10μL PCR, add 0.2μL of KAPA HiFi DNA Polymerase, 2μL
of 5 HiFi Fidelity buffer (with MgCl2), 0.3μL of 10 mM
dNTP, 1μL of 10μM forward primer (P5_CS1), 1μL of
10μM reverse primer (P7_CS2), 2μL of diluted PCR amplicon,
and H2O to a final volume of 10μL.
8. Gently finger vortex, briefly centrifuge, and place into a pre-
heated thermal cycler.
9. Perform a two-step thermal cycling PCR program. See Table 2
for more details.
10. Assess PCR amplicon size and concentration after separation
on a Shimadzu Microchip Electrophoresis System MCE®-202
MultiNA.
11. Pool the amplicons in equimolar concentration and purify
them using AMPure beads according to the manufacturer’s
protocol. Use a ratio of beads to pooled amplicons of 0.9:1
to ensure binding of amplicons and not primer dimers or
unincorporated primers.
Single-nucleotide Resolution of RNA m5C 145

Table 1
Two-step thermal cycling conditions for the amplification of individual amplicons

Stage Temperature ( C) Time (s)


Initial denaturation 98 15
Step I (10 cycles)
Denaturation 94 10
Annealing 60 30
Extension 72 15
Step II (20 cycles)
Denaturation 94 10
Annealing 55 30
Extension 72 15
Final extension 72 60
Hold 4 Forever

Table 2
One-step thermal cycling conditions for the addition of barcodes and Illumina adapters

Stage Temperature ( C) Time (s)


Initial denaturation 98 15
One step (12 cycles)
Denaturation 94 10
Annealing 63 30
Extension 72 30
Final extension 72 120
Hold 4 Forever

12. First estimate the DNA concentration using a Qubit dsDNA


Broad Range Assay Kit according to the manufacturer’s proto-
col. Then accurately assess the DNA concentration by using
KAPA Library Quantification Kit for Illumina® Platforms. Per-
form serial dilution of the pooled amplicons such that they fall
into the dynamic range of the assay of 5.5–0.000055 pg/μL.

3.7 Parallel PCR 1. Prime the Access Array according to the manufacturer’s
Amplification Using protocol.
a Fluidigm Access 2. Pre-warm the 20 Access Array loading reagent to room tem-
Array Integrated perature before use. Prepare the pooled 48-oligonucleotide
Fluidic Circuit (IFC)
146 Jun Li et al.

primer mix by mixing 2.0μL of 50μM CS1-GS forward, 2.0μL


of 50μM CS1-GS reverse, 5.0μL of 20 Access Array loading
reagent, and 91μL of H2O to a final volume of 100μL.
3. Finger vortex the mix and centrifuge to spin the contents to the
bottom of the tube.
4. Prepare the sample premix solution by mixing 30μL 10 Fas-
tStart High Fidelity Reaction Buffer (without MgCl2), 54μL
25 mM MgCl2, 15μL DMSO, 6.0μL 10 mM dNTP mix, 3.0μL
FastStart High Fidelity Enzyme Blend, 15.0μL 20 Access
Array Loading Reagent, and 57μL H2O.
5. Finger vortex the mix and centrifuge to spin the contents to the
bottom of the tube.
6. Prepare the sample mix solutions, 48 in total, in a 96-well plate.
Mix 3.0μL sample premix, 1.0μL cDNA, and 1.0μL Access
Array Barcode library primers.
7. Thoroughly vortex the solutions for at least 30 s and then
centrifuge to spin down the contents to the bottom of the
plate. N.B.: Each well should receive a uniquely barcoded
primer pair.
8. Load 4.0μL of the primer solution and 4.0μL of the sample mix
solution into the primer and sample inlets of the Access Array
by using an 8-channel pipette.
9. Load the Access Array into the Pre-PCR IFC Controller AX
according to the manufacturer’s protocol.
10. Place the Access Array onto the FC1 Cycler and start thermal
cycling by selecting the protocol AA 48  48 Standard v1. The
thermal cycling conditions are presented in Table 3.
11. To harvest the PCR products from the Access Array follow the
manufacturer’s protocol. Once the final step is completed, eject
the Access Array.
12. Collect the harvested PCR products into a labeled PCR
96-well plate. Carefully transfer 10μL of harvested PCR pro-
ducts from each of the sample inlets into columns 1–6 of the
labeled 96-well plate by using an 8-channel pipette.
13. Assess PCR amplicon size and concentration after separation
on a Shimadzu Microchip Electrophoresis System MCE®-202
MultiNA.
14. Pool the amplicons in equimolar concentration and purify
them using AMPure beads according to the manufacturer’s
protocol. Use a ratio of beads to pooled amplicons of 0.9:1
to ensure binding of amplicons and not primer dimers or
unincorporated primers (see Note 14).
15. First estimate the DNA concentration using a Qubit dsDNA
Broad Range Assay Kit according to the manufacturer’s
Single-nucleotide Resolution of RNA m5C 147

Table 3
Multistep thermal cycling conditions for the Access Array

Temperature ( C) Time (s) Number of cycles


50 120 1
70 1200 1
95 600 1
95 15 10
60 30
72 60
95 15 2
80 30
60 30
72 60
95 15 8
60 30
72 60
95 15 2
80 30
60 30
72 60
95 15 8
60 30
72 60
95 15 5
80 30
60 30
72 60

protocol. Then accurately assess the DNA concentration by


using KAPA Library Quantification Kit for Illumina® Plat-
forms. Perform serial dilution of the pooled amplicons such
that they fall into the dynamic range of the assay of
5.5–0.000055 pg/μL.

3.8 MiSeq 1. Prepare the sample sheet using the Illumina Experiment Man-
Sequencing ager by following the manufacturer’s protocol (see Note 15).
2. Dilute the library to 10 nM in EBT buffer based on the con-
centrations determined by the qPCR. From this point, keep the
libraries on ice.
3. Dilute the PhiX control library to 2 nM by adding 8μL EBT
buffer to 2μL of the 10 nM PhiX control library (see Note 16).
4. Denature the pooled libraries and PhiX control library sepa-
rately by adding 10μL of 0.2 M NaOH to 10μL of the 2 nM
libraries (see Note 17).
148 Jun Li et al.

5. Vortex thoroughly to mix and centrifuge at 1000  g for 30 s.


Incubate at room temperature for 5 min.
6. Dilute the denatured pooled libraries and PhiX control library
separately to 20 pM by adding 980μL pre-chilled HT1 to 20μL
denatured libraries.
7. Dilute the 20 pM pooled libraries and PhiX control library
separately to 10 pM by adding 500μL pre-chilled HT1 to
500μL 20 pM libraries.
8. Combine 100μL of the 10 pM PhiX control library with 900μL
of the 10 pM pooled libraries and vortex to mix (see Note 18).
9. Load 600μL of the final sample into the cartridge. Ensure that
air bubbles are removed by gently tapping the cartridge.
10. Perform the sequencing run according to the manufacturer’s
protocol.

3.9 Bioinformatics 1. To trim the Illumina adaptor sequences that were incorporated
Analysis of Data into the amplicons to permit sequencing of the 150 bp paired-
end reads, use Trimmomatic in palindromic mode [15].
2. Sequencing reads can be aligned with meRanTK by using
Bowtie2 internally [16]. Assemble reference sequences for the
alignment by using the segments of RNA interrogated by
sequencing prior to bisulfite conversion.
3. Extract the methylation state of individual cytosines from
bisulfite-read alignments by using meRanCall. The number of
reads can be extracted from the aligned sequencing reads in
order to determine read coverage at a given cytosine.
4. To call differentially methylated cytosines use meRanCompare.
The number of reads can be extracted from the aligned
sequencing reads in order to determine read coverage at a
given cytosine (Fig. 4).

4 Notes

1. Slowly add 10 M NaOH dropwise to the sodium bisulfite


solution while mixing. Slightly less than 1 mL is required to
adjust the pH to 5.1.
2. The MiSeq Reagent Kit v3 (150- or 600-cycle) provides
1  150 bp or the 600-cycle kit allows combinations of cycles
that add to 600, for example 200 and 400 cycles.
3. Do not machine or finger vortex the RNA as this will increase
the risk of RNA loss.
4. Air-drying the samples in a sterile laminar flow hood is recom-
mended. Do not allow the RNA to completely dry as this will
cause difficulties in resuspending the RNA.
Single-nucleotide Resolution of RNA m5C 149

Fig. 4 Representative analysis of an Illumina MiSeq amplicon sequencing of negative and positive controls. (a)
A region of the MGFP spiked-in in vitro control transcript showing even coverage and all cytosines are
converted (no methylation). The y-axis shows the read depth and the x-axis shows the cytosines (numbers) in
the sequenced region. (b) A region of the Mag5 gene that shows converted and non-converted cytosine,
C3349. Cytosines flanking C339 are completely converted, demonstrating that bisulfite conversion was very
efficient. The heatmaps display the cytosine non-conversion percentage

5. As the in vitro MGFP transcript will most likely be at a high


concentration, it is good practice to perform a serial dilution in
H2O such that the estimated concentrations are in the range of
5–50 ng/μL. Prepare and run three dilutions on the RNA
Nano chip.
150 Jun Li et al.

6. Tilt the 1.5 mL tube at a 45 angle and then slowly pipette the
mineral oil directly on top of the RNA-bisulfite reaction
mixture.
7. Emptying of the Micro Bio-Spin gel column takes about 2 min.
If the gel column does not empty by gravity, place the lid back
onto the column and remove again.
8. Gently pipette the reaction mixture onto the gel bed and avoid
disturbing the gel bed. Minimize the transfer of mineral oil to
the column although there will be traces which is unavoidable.
9. About 25% of the RNA is lost during the procedure, and we
find that 10μL of H2O/2μg RNA used in the bisulfite conver-
sion reaction results in concentrations of ~150 ng/μL.
10. Bisulfite treatment of the RNA causes significant shearing and
we have observed that shorter amplicons are preferentially
amplified over longer amplicons.
Longer PCR amplicons increase the tendency of detecting
non-converted cytosines in RNA exhibiting strong secondary
structure.
11. Inefficient bisulfite conversion may result in unconverted cyto-
sines, so it is important to ensure that the PCR primers are not
biasing the amplification toward converted cytosines.
12. Occasionally, not all triplicates successfully amplify and it may
be necessary to optimize the PCR.
13. We elute the purified PCR products in 10–30μL depending on
the amount of amplified PCR products.
14. After purification of the amplicons, residual ethanol may
remain in the purified amplicons. We find that concentrating
down the pooled amplicons even if there is <55μL and adding
H2O to 55μL are best to remove as much ethanol as possible.
15. The sample sheet is required to insert the sample names and
adapter indices used for each sample. We have selected the
“Other” as the category followed by “Fastq only.” This option
generates FASTQ files only and also enables the deselection of
downstream processing steps like adapter trimming, allowing
trimming and mapping to be performed separately.
16. The prepared PhiX library is added to the pooled amplicon
libraries as an internal control for the MiSeq sequencing run.
17. It is best to prepare fresh 0.2 M NaOH for the denaturation of
libraries.
18. Loading 10% PhiX control library is sufficient for low-diversity
libraries. We have previously loaded between 7 and 10 pM.
Underloading of the libraries can give cluster densities below
the optimal range and overloading of the libraries can give
cluster densities above the optimal range, reducing the quality
of the data. The optimal cluster density is 700–1000 K/mm2.
Single-nucleotide Resolution of RNA m5C 151

Acknowledgments

This work was supported by an Australian Research Council Future


Fellowship (FT130100525) awarded to IS, a Grains Research and
Development Corporation scholarship awarded to AB, a Chinese
Scholarship Council scholarship awarded to JZ, and a MOET-
VIED PhD scholarship awarded to TD.

References

1. Burgess A, David R, Searle IR (2016) Deci- 9. Lovejoy AF, Riordan DP, Brown PO (2014)
phering the epitranscriptome: a green perspec- Transcriptome-wide mapping of pseudouri-
tive. J Integr Plant Biol 58(10):822–835 dines: pseudouridine synthases modify specific
2. Saletore Y, Meyer K, Korlach J, Vilfan ID, mRNAs in S. cerevisiae. PLoS One 9(10):
Jaffrey S, Mason CE (2012) The birth of the e110799
Epitranscriptome: deciphering the function of 10. Khoddami V, Cairns BR (2013) Identification
RNA modifications. Genome Biol 13(10):175 of direct targets and modified bases of RNA
3. Bujnicki JM (2001) In silico analysis of the cytosine methyltransferases. Nat Biotechnol
tRNA: m1A58 methyltransferase family: 31(5):458–464
homology-based fold prediction and identifica- 11. Hussain S, Sajini AA, Blanco S, Dietmann S,
tion of new members from Eubacteria and Lombard P, Sugimoto Y, Paramor M, Gleeson
Archaea. FEBS Lett 507(2):123–127 JG, Odom DT, Ule J (2013) NSun2-mediated
4. Droogmans L, Roovers M, Bujnicki JM, cytosine-5 methylation of vault noncoding
Tricot C, Hartsch T, Stalon V, Grosjean H RNA determines its processing into regulatory
(2003) Cloning and characterization of tRNA small RNAs. Cell Rep 4(2):255–261
(m1A58) methyltransferase (TrmI) from Ther- 12. Cokus SJ, Feng S, Zhang X, Chen Z,
mus thermophilus HB27, a protein required Merriman B, Haudenschild CD, Pradhan S,
for cell growth at extreme temperatures. Nelson SF, Pellegrini M, Jacobsen SE (2008)
Nucleic Acids Res 31(8):2148–2156 Shotgun bisulphite sequencing of the Arabi-
5. Oerum S, Dégut C, Barraud P, Tisné C (2017) dopsis genome reveals DNA methylation pat-
m1A post-transcriptional modification in terning. Nature 452(7184):215–219
tRNAs. Biomol Ther 7(1):20 13. Schaefer M, Pollex T, Hanna K, Lyko F (2008)
6. Dominissini D, Moshitch-Moshkovitz S, RNA cytosine methylation analysis by bisulfite
Schwartz S, Salmon-Divon M, Ungar L, sequencing. Nucleic Acids Res 37(2):e12–e12
Osenberg S, Cesarkas K, Jacob-Hirsch J, 14. David R, Burgess A, Parker B, Li J, Pulsford K,
Amariglio N, Kupiec M (2012) Topology of Sibbritt T, Preiss T, Searle IR (2017)
the human and mouse m6A RNA methylomes Transcriptome-wide mapping of RNA
revealed by m6A-seq. Nature 485 5-methylcytosine in Arabidopsis mRNAs and
(7397):201–206 noncoding RNAs. Plant Cell 29(3):445–460
7. Meyer KD, Saletore Y, Zumbo P, Elemento O, 15. Bolger AM, Lohse M, Usadel B (2014) Trim-
Mason CE, Jaffrey SR (2012) Comprehensive momatic: a flexible trimmer for Illumina
analysis of mRNA methylation reveals enrich- sequence data. Bioinformatics 30
ment in 30 UTRs and near stop codons. Cell (15):2114–2120
149(7):1635–1646 16. Rieder D, Amort T, Kugler E, Lusser A, Traja-
8. Squires JE, Patel HR, Nousch M, Sibbritt T, noski Z (2015) meRanTK: methylated RNA
Humphreys DT, Parker BJ, Suter CM, Preiss T analysis ToolKit. Bioinformatics 32
(2012) Widespread occurrence of (5):782–785
5-methylcytosine in human coding and
non-coding RNA. Nucleic Acids Res:gks144
Chapter 10

A Small RNA-Seq Protocol with Less Bias and Improved


Capture of 20 -O-Methyl RNAs
Erwin L. van Dijk and Claude Thermes

Abstract
The study of small RNAs (sRNAs) by next-generation sequencing (NGS) is challenged by bias issues during
library preparation. Several types of sRNAs such as plant microRNAs (miRNAs) carry a 20 -O-methyl
(20 -OMe) modification at their 30 terminal nucleotide. This modification adds another level of difficulty
as it inhibits 30 adapter ligation. We previously demonstrated that modified versions of the “TruSeq (TS)”
protocol have less bias and an improved detection of 20 -OMe RNAs. Here we describe in detail protocol
“TS5,” which showed the best overall performance. We also provide guidelines for bioinformatics analysis
of the sequencing data.

Key words Small RNA, Small RNA-seq, Bias, Library preparation, Next-generation sequencing,
NGS, 20 -O-methyl (20 -OMe) RNA, Plant microRNA, Plant miRNA

1 Introduction

Small RNAs (sRNAs) are involved in the control of a diversity of


biological processes [1]. Eukaryotic regulatory sRNAs are typically
between 20 and 30 nt in size; the three major types are microRNAs
(miRNAs), piwi-interacting RNAs (piRNAs), and small interfering
RNAs (siRNAs). Aberrant miRNA expression levels have been
implicated in a variety of diseases [2]. This underscores the impor-
tance of miRNAs in health and disease and the requirement for
accurate, quantitative research tools to detect sRNAs in general.
Next-generation sequencing (NGS) is a widely used method to
study sRNAs. Main advantages of NGS as compared with other
approaches, such as quantitative PCR (qPCR) or microarray tech-
niques, are that it does not need a prior knowledge of the sRNA
sequences and can therefore be used to discover novel RNAs, and in
addition it suffers less from background signal and saturation
effects. Furthermore, it can detect single-nucleotide differences
and has a higher throughput than microarrays. However, NGS
also has some drawbacks; the cost of a sequencing run remains

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

153
154 Erwin L. van Dijk and Claude Thermes

relatively high and the multistep process required to convert a


sample into a library for sequencing may introduce biases. In a
typical sRNA library preparation process, a 30 adapter is first ligated
to the sRNA (often gel-purified from total RNA) using a truncated
version of RNA ligase 2 (RNL2) and a preadenylated 30 adapter in
the absence of ATP. This increases the efficiency of sRNA-adapter
ligation and reduces the formation of side reactions such as sRNA
circularization or concatemerization. Subsequently, a 50 adapter is
ligated by RNA ligase 1 (RNL1), followed by reverse transcription
(RT) and PCR amplification. All these steps may introduce bias
[3, 4]. Consequently, read numbers may not reflect actual sRNA
expression levels leading to artificial, method-dependent expression
patterns. Specific sRNAs may be either over- or underrepresented
in a library, and strongly underrepresented sRNAs may escape
detection. The situation is particularly complicated with plant miR-
NAs, siRNAs in insects and plants, and piRNAs in insects, nema-
todes, and mammals, in which the 30 terminal nucleotide has a
20 -O-methyl (20 -OMe) modification [1]. This modification
strongly inhibits 30 adapter ligation [5], making library preparation
for these types of RNA a difficult task.
Previous work demonstrated that adapter ligation introduces
serious bias, due to RNA sequence/structure effects [6–11]. Steps
downstream of adapter ligation such as reverse transcription and
PCR do not significantly contribute to bias [6, 11, 12]. Ligation
bias is likely due to the fact that adapter molecules with a given
sequence will interact with sRNA molecules in the reaction mixture
to form co-folds that may lead to either favorable or unfavorable
configurations for ligation. Data from Sorefan et al. [7] suggest that
RNL1 prefers a single-stranded context, while RNL2 prefers a
double strand for ligation. The fact that the adapter/sRNA
co-fold structures are determined by the specific adapter and
sRNA sequences explains why specific sRNAs are over- or under-
represented with a given adapter set. It is also important to note
that within a series of sRNA libraries to be compared, the same
adapter sequences should be used. Indeed, it has previously been
observed that changing adapters by the introduction of different
barcode sequences alters miRNA profiles in sequencing libraries
[9, 13].
Randomization of adapter sequences near the ligation junction
likely reduces these biases. Sorefan and colleagues [7] used adapters
with four random nucleotides at their extremities, designated “high
definition” (HD) adapters, and showed that the use of these adap-
ters leads to libraries that better reflect true sRNA expression levels.
More recent work confirmed these observations and revealed that
the randomized region does not need to be adjacent to the ligation
junction [11]. This novel type of adapters was named “MidRand”
adapters. Together, these results demonstrate that improved
adapter design can reduce bias.
Improved Capture of 20 -O-Methyl RNAs by Small RNA-seq 155

Instead of modifying the adapters, bias can be suppressed


through the optimization of reaction conditions. Polyethylene gly-
col (PEG), a macromolecular crowding agent known to increase
ligation efficiency [14], has been shown to significantly reduce bias
[15, 16]. Based on these results, several “low-bias” kits appeared on
the market. These include kits that use PEG in the ligation reac-
tions, in combination with either classical adapters or HD adapters.
Other kits avoid ligation altogether, and use 30 polyadenylation and
template switching for 30 and 50 adapter addition, respectively12. In
yet another strategy, 30 adapter ligation is followed by a circulariza-
tion step, thus omitting 50 adapter ligation [17].
We searched for a sRNA library preparation protocol with the
lowest possible levels of bias and the best detection of 20 -OMe
RNAs [12, 18]. We tested some of the abovementioned “low-
bias” kits, which had a better detection of 20 -OMe RNAs than the
standard protocol (TS). Surprisingly however, upon modification
(the use of randomized adapters, PEG in the ligation reactions, and
removal of excess 30 adapter by purification) the latter outper-
formed the other protocols for the detection of 20 -OMe RNAs.
Here, we provide a step-by-step description of a protocol based on
the TS protocol, “TS5,” which had the best overall detection of
20 -OMe RNAs. In the TS5 protocol, the following modifications
have been introduced: (1) the adapters are randomized at their
extremities, (2) PEG is used in the ligation reactions, and (3) excess
30 adapter is eliminated by purification on beads (Fig. 1). We also
provide a detailed protocol for the purification of sRNA from total
RNA and the preparation of preadenylated 30 adapter.

2 Materials

2.1 Isolation 1. 15% TBE-urea gel.


of Small RNAs 2. XCell SureLock Mini-Cell gel electrophoresis system.
3. Formamide loading dye: 95% Deionized formamide, 0.025%
bromophenol blue, 0.025% xylene cyanol, 5 mM EDTA, pH 8.
4. ZR Small RNA ladder (Zymo Research).
5. SYBR Gold Nucleic Acid Gel Stain.
6. “Dark Reader” transilluminator.
7. Corning Costar Spin-X centrifuge tube filters.
8. 0.3 M NaCl.
9. 20μg/μL Glycogen.
10. 100% Ethanol.
11. Nuclease-free water.
12. Optional: 2100 Bioanalyzer Instrument.
13. Optional: Bioanalyzer Small RNA Kit.
156 Erwin L. van Dijk and Claude Thermes

3’ adapter ligaon sRNA 3’ HD adapter


+ NNNN

RNL2

NNNN + NNNN

Excess 3’ adapter removal

NNNN

5’ HD adapter
5’ adapter ligaon NNNN + NNNN

RNL1
NNNN NNNN

RT
NNNN NNNN
NNNN NNNN

PCR
P5 In P7
NNNN NNNN
NNNN NNNN
NNNN NNNN
NNNN NNNN
NNNN NNNN
NNNN NNNN

Protocol Modificaons as compared to classical TruSeq protocol

* HD adapters used instead of classical Illumina adapters


TS5
* 3' adapter ligaon 16°C o/n in the presence of PEG

* purificaon step aer 3' adapter ligaon

Fig. 1 Schematic representation of sRNA library preparation protocol TS5. First, a preadenylated (App) 30
“high-definition” (HD) adapter is ligated to the sRNA. In contrast to the classical protocol, the HD adapter
carries four random nucleotides at its extremity. Then, a cleanup step is performed to remove excess
unligated 30 adapter. A 50 HD adapter is subsequently ligated. Reverse transcription is performed using a
primer complementary to the 30 adapter, followed by PCR amplification, during which the Illumina P5, P7, and
index (“In”) sequences are added. The modifications of protocol TS5 as compared to the standard TruSeq
protocol are summarized

2.2 Preparation 1. 100μM 50 phosphorylated, 30 blocked 30 HD adapter oligonu-


of Preadenylated 30 HD cleotide: See Table 1 for sequence and modifications. Note that
Adapter “3AmMO” is a 30 amino modifier group; most suppliers can
produce oligonucleotide with this modification. Dilute in
nuclease-free water to 100μM.
2. 10 mM ATP.
3. 50% PEG8000.
Table 1
Oligonucleotides used with this protocol

50 30
Name Modification Modification Sequence 50 to 30 Purification
50 HD (TS5) 5AmMC6 [5AmMC6]GTTCAGAGTTCTACAGTCCGACGATCNrNrNrN (note that this oligo is a HPLC
adapter DNA-RNA chimeric; the three 30 terminal nucleotides are RNA)
30 HD (TS5) Phosphate 3AmMO [Phos]rNrNrNrNTGGAATTCTCGGGTGCCAAGG[3AaMO] (note that this oligo is a HPLC
adapter DNA-RNA chimeric; the four 50 terminal nucleotides are RNA)
RT primer GCCTTGGCACCCGAGAATTCCA HPLC

Improved Capture of 20 -O-Methyl RNAs by Small RNA-seq


Universal P5 AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA HPLC
primer
P7-index primer CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCCTTGG HPLC
CACCCGAGAATTCCA (NNNNNN ¼ index)
Index 1: CGTGAT
Index 2: ACATCG
Index 3: GCCTAA
Index 4: TGGTCA
Index 5: CACTGT
Index 6: ATTGGC
Index 7: GATCTG
Index 8: TCAAGT
Index 9: CTGATC
Index 10: AAGCTA
Index 11: GTAGCC
Index 12: TACAAG

157
158 Erwin L. van Dijk and Claude Thermes

4. T4 RNA ligase 1.
5. Acid phenol:chloroform, pH 4.5.
6. Optional: QIAquick Nucleotide Removal Kit.
7. Qubit ssDNA Assay Kit.
8. Qubit Fluorometer.
9. 15% TBE-urea gel.

2.3 Library 1. T4 RNA ligase 2 truncated.


Preparation: 30 Adapter 2. 50% PEG 8000 (provided with T4 RNA ligase).
Ligation
3. RNase inhibitor.

2.4 Library 1. 3 M NaOAc, pH 5.2 or “Adapter Depletion Solution” from


Preparation: the NEXTflex V3 kit.
Elimination 2. SPRI beads (Beckman Coulter).
of Unligated 30 Adapter
3. Isopropanol.
4. Ethanol 80%.
5. 10 mM Tris–HCl, pH 8.

2.5 Library 1. 10μM 50 HD adapter: See Table 1.


Preparation: Ligation 2. T4 RNA ligase 1.
of 50 Adapter
3. 10 mM ATP (provided with T4 RNA ligase 1).

2.6 Library 1. Reverse transcription (RT) primer: See Table 1.


Preparation: Reverse 2. Superscript IV reverse transcriptase and accompanying buffer.
Transcription
3. 100 mM DTT (provided with reverse transcriptase).
4. 12.5 mM dNTPs.
5. RNase inhibitor.

2.7 Library 1. μM Universal P5 primer (Table 1).


Preparation: PCR 2. 10μM P7-index primer (Table 1).
Amplification
3. Kapa HiFi Hotstart PCR kit; includes dNTPs and buffer.

2.8 Library 1. 6% TBE gel.


Preparation: Gel 2. XCell SureLock Mini-Cell gel electrophoresis system.
Purification
3. SYBR Gold Nucleic Acid Gel Stain (10,000 concentrate).
4. TrackIt 50 bp DNA ladder.
5. 20μg/μL Glycogen.
6. 3 M NaOAc, pH 5.2.
7. 10 mM Tris, pH 8.
Improved Capture of 20 -O-Methyl RNAs by Small RNA-seq 159

3 Methods

3.1 Isolation As an alternative to small RNA isolation by gel purification (step 1),
of Small RNAs a strategy using magnetic beads to enrich for small RNAs exists
([Link]
Protocol_for_miRNA_.pdf). We have not used this method our-
selves, but it is worth testing and if it works well it could signifi-
cantly simplify the protocol.
1. Extract total RNA from the sample of interest using phenol-
based reagents or any other method. Verify the RNA is of good
quality.
2. Pre-run a 15% TBE-urea gel for 15 min at 200 V.
3. While the gel is pre-running, mix 5–20μg of total RNA in a
5–15μL volume with an equal volume of formamide loading
dye in a 200μL PCR tube. Likewise, mix 10μL (200 ng) of
small-RNA ladder with an equal volume of formamide loading
dye. Incubate for 5 min at 65  C in a thermocycler with heated
lid, and then place the tubes immediately on ice.
4. Load the ladder and the sample on the same gel with at least
one lane between them and run at 200 V until the bromophe-
nol blue (dark blue) has migrated about two-thirds of the gel
length (approximately 40 min).
5. Prepare to elute the RNA from the gel as follows: puncture the
bottom of a nuclease-free 0.5 mL microcentrifuge tube with a
21-gauge needle (make three holes). Place the punctured
0.5 mL tube in a nuclease-free round-bottom 2 mL
microcentrifuge tube.
6. Remove the gel, and incubate at room temperature with 3μL
SYBR Gold Nucleic Acid Gel Stain (10,000 concentrate) in
30 mL water for 10–15 min.
7. View the gel on a “Dark Reader” transilluminator (it is strongly
recommended to avoid UV as this might damage the RNA) and
cut out the sample RNA between the 17 nt and the 29 nt ladder
bands. Transfer the gel piece to the prepared 0.5 mL tube from
step 5 above.
8. Centrifuge the 0.5 mL tube in the 2 mL tube in a microcen-
trifuge at maximum speed for 2 min. Remove the 0.5 mL tube,
which should be empty. See Note 1.
9. Add 300μL of nuclease-free 0.3 M NaCl to the 2 mL tube
containing the crushed gel and rotate for 2–3 h at room tem-
perature or at 4  C overnight (16 h).
10. Transfer the suspension of crushed gel pieces to a spin column
and centrifuge for 2 min at maximum speed in a
microcentrifuge.
160 Erwin L. van Dijk and Claude Thermes

Fig. 2 Isolation of small RNA and quality control. (a) Electrophoretic separation of Brassica napus total RNA
(10μg) on a 15% TBE-urea denaturing polyacrylamide gel. Image shows a small RNA ladder alongside the
samples as a molecular size marker. After migration the gel is stained and the RNA visualized on a
transilluminator. The region from 17 to 29 nucleotides is cut out (indicated by a red rectangle) and RNA
eluted. (b) Image shows the quality of the purified RNA as checked by capillary gel electrophoresis. Note that
this analysis provides information on the proportion of miRNA in the sample (93% in this case)

11. Add 1μL of glycogen (20μg/μL) and 950μL of room-


temperature 100% ethanol. Incubate for at least 30 min at
80  C.
12. Centrifuge for 20 min at maximum speed in a microcentrifuge
at 4  C. Remove the supernatant, and wash the pellet with
800μL of cold 80% ethanol. Centrifuge again for 5 min at 4  C,
and carefully remove all supernatant. Resuspend the RNA
pellet in 15μL of nuclease-free water. Typically, ~5–20 ng of
small (17–29 nt) RNA should be recovered, depending on the
amount of input total RNA (~1–5 ng of small RNA per 1μg of
input total RNA).
13. Recommended additional step: Check the quantity and quality
of the recovered sRNA (e.g., by capillary gel electrophoresis
using an Agilent small RNA kit). Figure 2 shows an example of
the isolation of the small RNA fraction from total RNA mate-
rial. Small RNA is isolated from the 15% TBE-urea gel and can
be analyzed on a small RNA capillary electrophoresis chip. This
will allow users to estimate the amount of small RNA recovered
and the proportion of miRNA in the preparation.

3.2 Preparation Preadenylation of 30 HD adapter is done in a manner similar to the


of Preadenylated 30 HD protocol described by Chen et al. [19]. Note that preadenylated
Adapter adapter can be ordered directly (/5rApp/modification), but this is
quite expensive.
Improved Capture of 20 -O-Methyl RNAs by Small RNA-seq 161

1. Set up a 100μL reaction containing the following reagents:


10μL of 50 phosphorylated, 30 blocked 30 HD adapter oligonu-
cleotide (100μM), 10μL of T4 RNA ligase buffer (10), 10μL
of ATP (10 mM), 40μL of 50% PEG8000, 5μL of T4 RNA
ligase 1 (50 units), and 25μL of nuclease-free water. Incubate
overnight at 20  C.
2. Perform a classical phenol:chloroform extraction of the prea-
denylated oligonucleotide followed by ethanol precipitation.
Add 100μL of acid (pH 4.5) phenol:chloroform and vortex.
3. Spin for 5 min at room temperature, maximum speed. Care-
fully transfer 90μL of the upper phase to a new tube; add 10μL
of 3 M sodium acetate pH 5.2, 1μL of ultrapure glycogen, and
250μL of cold 100% ethanol.
4. Keep at 20  C for at least 30 min. Centrifuge for 30 min at
4  C maximum speed. Remove the supernatant, and wash the
pellet once with 500μL cold 80% ethanol. As an alternative to
phenol:chloroform extraction and ethanol precipitation, a
nucleotide removal kit can be used. Resuspend in 25μL water.
5. Measure the concentration of the adapter using a kit for specific
detection of single-stranded DNA (e.g., Qubit ssDNA Assay
Kit). Dilute to 80 ng/μL (10μM).
6. Recommended additional step: Verify the efficacy of preadenyla-
tion by migrating 1μL 10μM adapter on a 15% TBE-urea gel
along with untreated oligonucleotide. The preadenylated
adapter should migrate slightly slower than untreated oligonu-
cleotide. If desired, the preadenylated adapter can be
gel-purified; proceed as described in Subheading 3.1, steps
2–12.

3.3 Library 1. Combine 1μL of preadenylated 30 HD adapter (10μM) with


Preparation: 1μL of purified small RNA (~0.1–1μM) in a 0.2 mL microcen-
Protocol TS5 trifuge tube. Incubate for 2 min at 72  C in a thermocycler, and
then put directly on ice.
3.3.1 30 Adapter Ligation
2. Add 4μL of 50% PEG 8000 (viscous solution; pipette slowly),
1μL of RNA ligase buffer (10), 1μL of H2O, 1μL of T4 RNA
ligase 2 truncated, and 1μL of RNase inhibitor. Incubate over-
night at 16  C.

3.3.2 Elimination 1. Add 10μL of nuclease-free water and mix well. Add 6μL of 3 M
of Unligated 30 Adapter NaOAc pH 5.2 or “Adapter Depletion Solution” from the
NEXTflex V3 kit and mix well. Add 40μL of magnetic purifica-
tion beads and 60μL of isopropanol. Mix well and incubate for
5 min at room temperature.
2. Put the sample in a magnetic rack until the solution appears
clear. Remove and discard the supernatant.
162 Erwin L. van Dijk and Claude Thermes

3. Add 180μL of freshly prepared 80% ethanol. Incubate for


~30 s, and then remove. Take care to use freshly prepared
80% ethanol and do not incubate with 80% ethanol for
extended periods.
4. Briefly spin the tube and remove residual liquid that may have
collected at the bottom of the well. Let the beads dry for 2 min,
and then resuspend in 22μL of 10 mM Tris, pH 8, or resus-
pension buffer from the NEXTflex V3 kit. Incubate for 2 min,
and then magnetize the sample until the solution appears clear.
5. Add 6μL of 3 M NaOAc, pH 5.2, or “Adapter Depletion
Solution” from the NEXTflex V3 kit to a new tube. Transfer
20μL of the supernatant from the previous step to this new
tube and mix by pipetting. Add 40μL of magnetic beads and
60μL of 100% isopropanol and mix well by pipetting. Incubate
for 5 min.
6. Magnetize the sample until the solution appears clear, and then
remove and discard the supernatant.
7. Add 180μL of freshly prepared 80% ethanol. Incubate for
~30 s, and then remove. Take care to use freshly prepared
80% ethanol and do not incubate with 80% ethanol for
extended periods.
8. Briefly spin the tube and remove residual liquid that may have
collected at the bottom of the well. Let the beads dry for 2 min
and resuspend in 11μL of nuclease-free water. Incubate for
2 min, and then magnetize the sample until the solution
appears clear.
9. Transfer 10μL of the supernatant to a new tube. Add 1μL of T4
RNA ligase buffer (10).

3.3.3 Ligation of 50 1. Add 1μL of 50 HD adapter (10μM; Table 1) to a 200μL PCR


Adapter tube in a thermocycler with heated lid. Incubate for 2 min at
70  C, and then put the tube directly on ice.
2. Add 1μL of 10 mM ATP and 1μL of T4 RNA ligase 1. Mix well
by gently pipetting. Add 3μL of this mix to the 30 ligated RNA
from Subheading 3.3.2, step 9, and mix by pipetting. Incubate
for 1 h at 28  C.

3.3.4 Reverse 1. Transfer 6μL of 30 and 50 adapter-ligated RNA to a new 200μL


Transcription (RT) PCR tube (keep the remaining ~8μL at 80  C for later use if
necessary). Add 1μL of RT primer (10μM; Table 1) and mix by
pipetting. Incubate for 2 min at 70  C, and then put the tube
directly on ice.
2. Add the following reagents for RT: 2μL of 5 first-strand
buffer, 0.5μL of 12.5 mM dNTP mix, 1μL of 100 mM DTT,
1μL of RNase inhibitor, and 1μL of reverse transcriptase. Incu-
bate for 1 h at 50  C.
Improved Capture of 20 -O-Methyl RNAs by Small RNA-seq 163

3.3.5 PCR Amplification 1. Using the Kapa HiFi Hotstart PCR kit, add the following
reagents to the 12.5μL of RT reaction mixture: 10μL of PCR
polymerase buffer, 2μL of universal P5 primer (10μM;
Table 1), 2μL of P7-index primer (10μM; Table 1), 1μL of
12.5 mM dNTPs, 0.5μL of DNA polymerase, and 22μL of
water. Keep the reaction on ice until use.
2. Run the following PCR program: 98  C for 30 s, 11 cycles
(98  C for 10 s, 60  C for 30 s, and 72  C for 15 s), and 72  C
for 10 min (see Note 2). Keep the reaction at 4  C when
finished.

3.3.6 Gel Purification Gel purification of the final library product is a delicate step as a
and Sequencing number of additional products are formed that migrate close to the
desired library. It is important to not overload the gel as this will
increase the risk to contaminate the library with other species such
as adapter dimers. An example is shown in Fig. 3, where increasing
amounts of PCR-amplified library (from B. napus small RNA) were
loaded on the gel and the products corresponding to the expected
size (150 bp) were cut out (Fig. 3a). After elution, the purified
library was checked on a capillary gel electrophoresis chip; in addi-
tion to the expected 150 bp product, an increasing proportion of a
130 bp species, corresponding to adapter dimers, was observed as
increasing amounts of PCR product were loaded (Fig. 3b, c).
1. Run 5, 10, and 20μL of PCR product on a native 6% TBE gel
along with a suitable ladder (e.g., TrackIt 50 bp DNA ladder).
Run the gel for about 1 h at 145 V (until the bromophenol
blue reaches the bottom; this dye migrates at the 65 bp
position).
2. Remove the gel, and incubate with nucleic acid gel stain in
water for 10–15 min.
3. View the gel on a “Dark Reader” transilluminator and cut out
the library band at 150 bp. Prepare a system to elute the RNA
from gel as described in Subheading 3.1, step 5, and transfer
the gel piece to the 0.5 mL tube.
4. Centrifuge in a microcentrifuge at maximum speed for 2 min.
Remove the 0.5 mL tube, which should be empty now.
5. Add 300μL of nuclease-free water to the 2 mL tube containing
the crushed gel and rotate for at least 2 h at room temperature
or at 4  C overnight.
6. Transfer the suspension of crushed gel pieces in water to a spin
column and centrifuge for 2 min at maximum speed.
7. Add 1μL of glycogen (20μg/μL), 30μL of 3 M NaOAc pH 5.2,
and 975μL of ice-cold 100% ethanol. Centrifuge for 20 min at
max speed at 4  C.
164 Erwin L. van Dijk and Claude Thermes

Fig. 3 Gel purification of a B. napus small RNA library prepared following protocol TS5 and quality control. (a)
Image of a 6% native TBE gel showing increasing amounts of a PCR-amplified library from B. napus small
RNA; 2.5μL (a), 5μL (b), 10μL (c), or 20μL (d) PCR product. A 50 bp ladder is migrated alongside the samples.
PCR products migrating at the expected 150 bp position are isolated (red rectangle), DNA eluted, and purified.
(b) Representative quality control of the purified library. (c) Electropherogram representation of the same
analysis shown in (b). As can be seen, the 150 bp product is increasingly contaminated with adapter dimers
(~130 bp) as larger amounts of PCR product are loaded on gel

8. Resuspend the pellet in 20μL of 10 mM Tris pH 8. Use 1μL for


concentration measurement (e.g., Qubit system) and 1μL for
quality control (e.g., Agilent chip).
9. Submit libraries for Illumina sequencing. Recommended
sequencing depth for small RNA libraries is ~5–10 million
reads/sample.

3.4 Data Analysis Sequences obtained from small RNA libraries can be analyzed using
the data analysis procedure described below (based on the Linux
operating system Ubuntu 16.04 LTS).

3.4.1 Treatment of Raw 1. Download the FASTQ sequence file(s) generated during the
Sequence Files sequencing run. If required, perform demultiplexing with
bcl2fastq2 (version V2.2.18.12; a manual can be downloaded
from the following link: [Link]
sequencing/sequencing_software/bcl2fastq-conversion-soft
ware/[Link]).
Improved Capture of 20 -O-Methyl RNAs by Small RNA-seq 165

Use the following command:


nohup Pathway_of_bcl2fastq/bcl2fastq --runfolder-dir
Pathway_of_Run --ignore-missing-bcl --output-dir Pathway_o-
f_Output_Directory --barcode-mismatches 1 --aggregated-tiles
AUTO -r 16 -d 16 -p 16 -w 16
2. Remove adapter sequences using Cutadapt [20] version 1.15.
A manual can be downloaded here: [Link]
[Link]/en/stable/[Link]. Use the following
command:
Pathway_to_cutadapt/cutadapt -a TGGAATTCTCGGG
TGCCAAGG -n 5 -O 4 -m 10 -j 0 --nextseq-trim 10 -o Out-
put_File_Read1_cutadapt.[Link] Input_File_Read1.[Link]
Note that the sequence in the command corresponds to
the 30 HD adapter without the four random nucleotides; these
will therefore not be removed during this step.
3. Use seqtk ([Link] to remove the ter-
minal random nucleotides in the sequencing reads. Use the
following command (variable names in bold):
seqtk trimfq -b 4 -e 4 Output_File_Read1_cutadapt.fastq >
Output_File_Read1_trimmed.fastq
4. Use the following awk command in order to discard the
sequences shorter than 10 nt:
awk ’BEGIN {FS ¼ "\t" ; OFS ¼ "\n"} {header ¼ $0 ; getline
seq ; getline qheader ; getline qseq ; if (length(seq) >¼ 10) {print
header, seq, qheader, qseq}}’ Output_File_Read1_trimmed.
fastq > Length_Filtered.fastq

3.4.2 Mapping 1. Download the database corresponding to the organism of


of the Trimmed Sequences interest from miRBase as follows. Go to [Link]
org/[Link] and download the “[Link]” file. Note that
the sequences are indicated in RNA notation. Replace the U
residues by T with the following command (see Note 3):
sed -i ’/^>/! s/U/T/g’ [Link]
2. Select the miRNA sequences of your organism of interest with
the following command:
awk ’/name_of_the_organism/{print; nr[NR+1]; next};
NR in nr’ [Link] > mature_name_of_the_organism_mirs.fa
3. Map the reads to the above-created file using Bowtie2 [21]
(version 2.3.0) allowing no mismatches. First, build an index
for your file with the following command:
bowtie2-build mature_name_of_the_organism_mirs.fa
mature_name_of_the_organism_mirs
4. Align the sequencing reads to the database, requiring that a
read map entirely to a miRNA of the database, without any
mismatches. To this end, use the following tool (see Note 4):
166 Erwin L. van Dijk and Claude Thermes

bowtie2 -N 0 -L 10 --score-min C,0,0 --end-to-end --time -x


mature_name_of_the_organism_mirs -U Length_Filtered.
fastq -S Length_Filtered_ALIGNMENT.sam
5. To discard the reads that did not align, use the following
command (see Note 5):
samtools view -F 4 Length_Filtered_ALIGNMENT.sam >
Reads_aligned_to_Mirs.sam

4 Notes

1. Small amounts of gel may remain in the 0.5 mL tube; carefully


transfer with a pipette tip.
2. We typically perform 11 cycles, but this should be optimized by
the user. Try to perform the smallest possible number of PCR
cycles.
3. This command will yield a complete list of all miRNAs in
miRBase, originating from a variety of organisms.
4. The option --score-min C,0,0 ensures that alignment is without
any mismatches. For an explanation of the various parameters
in the tool, please visit the following website: [Link]
[Link]/bowtie2/[Link].
5. As a result of these steps, you should now have obtained the
aligned reads, corresponding to miRNAs.

Acknowledgments

This work was supported by the National Center for Scientific


Research (CNRS), the French Alternative Energies and Atomic
Energy Commission (CEA), and Paris-Sud University. The mem-
bers of the I2BC Next-Generation Sequencing service are acknowl-
edged for critical reading of the manuscript and helpful
suggestions.

References
1. Ghildiyal M, Zamore PD (2009) Small silenc- sequencing: implications of enzymatic manipu-
ing RNAs: an expanding universe. Nat Rev lation. J Nucleic Acids 2012:360358. https://
Genet 10(2):94–108. [Link] [Link]/10.1155/2012/360358
1038/nrg2504 4. van Dijk EL, Jaszczyszyn Y, Thermes C (2014)
2. Chang TC, Mendell JT (2007) microRNAs in Library preparation methods for next-
vertebrate physiology and human disease. generation sequencing: tone down the bias.
Annu Rev Genomics Hum Genet 8:215–239. Exp Cell Res 322(1):12–20. [Link]
[Link] 10.1016/[Link].2014.01.008
080706.092351 5. Munafo DB, Robb GB (2010) Optimization of
3. Zhuang F, Fuchs RT, Robb GB (2012) Small enzymatic reaction conditions for generating
RNA expression profiling by high-throughput representative pools of cDNA from small
Improved Capture of 20 -O-Methyl RNAs by Small RNA-seq 167

RNA. RNA 16(12):2537–2552. [Link] Ordoukhanian P (2011) Quantitative bias in


org/10.1261/rna.2242610 Illumina TruSeq and a novel post amplification
6. Hafner M, Renwick N, Brown M, barcoding strategy for multiplexed DNA and
Mihailovic A, Holoch D, Lin C, Pena JT, Nus- small RNA deep sequencing. PLoS One 6(10):
baum JD, Morozov P, Ludwig J, Ojo T, Luo S, e26969. [Link]
Schroth G, Tuschl T (2011) RNA-ligase- pone.0026969
dependent biases in miRNA representation in 14. Harrison B, Zimmerman SB (1984) Polymer-
deep-sequenced small RNA cDNA libraries. stimulated ligation: enhanced ligation of oligo-
RNA 17(9):1697–1712. [Link] and polynucleotides by T4 RNA ligase in poly-
1261/rna.2799511 mer solutions. Nucleic Acids Res 12
7. Sorefan K, Pais H, Hall AE, Kozomara A, (21):8235–8251
Griffiths-Jones S, Moulton V, Dalmay T 15. Song Y, Liu KJ, Wang TH (2014) Elimination
(2012) Reducing ligation bias of small RNAs of ligation dependent artifacts in T4 RNA
in libraries for next generation sequencing. ligase to achieve high efficiency and low bias
Silence 3(1):4. [Link] microRNA capture. PLoS One 9(4):e94619.
1758-907X-3-4 [Link]
8. Sun G, Wu X, Wang J, Li H, Li X, Gao H, 0094619
Rossi J, Yen Y (2011) A bias-reducing strategy 16. Zhang Z, Lee JE, Riemondy K, Anderson EM,
in profiling small RNAs using Solexa. RNA 17 Yi R (2013) High-efficiency RNA cloning
(12):2256–2262. [Link] enables accurate quantification of miRNA
rna.028621.111 expression by deep sequencing. Genome Biol
9. Jayaprakash AD, Jabado O, Brown BD, Sachi- 14(10):R109. [Link]
danandam R (2011) Identification and remedi- 2013-14-10-r109
ation of biases in the activity of RNA ligases in 17. Barberan-Soler S, Vo JM, Hogans RE,
small-RNA deep sequencing. Nucleic Acids Dallas A, Johnston BH, Kazakov SA (2018)
Res 39(21):e141. [Link] Decreasing miRNA sequencing bias using a
nar/gkr693 single adapter and circularization approach.
10. Zhuang F, Fuchs RT, Sun Z, Zheng Y, Robb Genome Biol 19(1):105. [Link]
GB (2012) Structural bias in T4 RNA ligase- 1186/s13059-018-1488-z
mediated 30 -adapter ligation. Nucleic Acids 18. van Dijk EL, Eleftheriou E, Thermes C (2019)
Res 40(7):e54. [Link] Improving small RNA-seq: less bias and better
nar/gkr1263 detection of 2’-O-methyl RNAs. J Vis Exp
11. Fuchs RT, Sun Z, Zhuang F, Robb GB (2015) (151). [Link]
Bias in ligation-based small RNA sequencing 19. Chen YR, Zheng Y, Liu B, Zhong S,
library construction is determined by adaptor Giovannoni J, Fei Z (2012) A cost-effective
and RNA structure. PLoS One 10(5): method for Illumina small RNA-Seq library
e0126049. [Link] preparation using T4 RNA ligase 1 adenylated
pone.0126049 adapters. Plant Methods 8(1):41. [Link]
12. Dard-Dascot C, Naquin D, d’Aubenton- org/10.1186/1746-4811-8-41
Carafa Y, Alix K, Thermes C, van Dijk E 20. Martin M (2011) Cutadapt removes adapter
(2018) Systematic comparison of small RNA sequences from high-throughput sequencing
library preparation protocols for next- reads. EMBnet. [Link]
generation sequencing. BMC Genomics 19 ej.17.1.200
(1):118. [Link] 21. Langmead B, Trapnell C, Pop M, Salzberg SL
018-4491-6 (2009) Ultrafast and memory-efficient align-
13. Van Nieuwerburgh F, Soetaert S, ment of short DNA sequences to the human
Podshivalova K, Ay-Lin Wang E, Schaffer L, genome. Genome Biol 10(3):R25. https://
Deforce D, Salomon DR, Head SR, [Link]/10.1186/gb-2009-10-3-r25
Part IV

Assessing RNA Modifications Using qPCR- and Molecular


Biology-Based Methods
Chapter 11

Assessing 20 -O-Methylation of mRNA Using


Quantitative PCR
Brittany A. Elliott and Christopher L. Holley

Abstract
20 -O-methylation (Nm) is an RNA modification commonly found on rRNA and snRNA, and at the mRNA
50 -cap, but has more recently been found internally on mRNA. The study of internal Nm modifications on
mRNA is in the early stages, but we have reported that this sort of Nm modification can regulate mRNA
abundance and translation. Although there are many methods to determine the presence of Nm on rRNA,
detecting Nm on specific mRNA transcripts is technically difficult because they are much less abundant than
rRNA. Some of these methods rely on the fact that Nm modification of RNA disrupts reverse transcription
reactions when performed at low dNTP concentrations. In this chapter, we describe our approach to using
quantitative PCR in conjunction with reverse transcription at low dNTPs, which is sensitive enough to
detect changes to Nm modification of mRNA.

Key words 20 -O-methylation, RNA modifications, Reverse transcription, Low dNTP, qPCR

1 Introduction

20 -O-methylation is a posttranscriptional ribose modification that


can occur in conjunction with any RNA base (Nm, see Fig. 1). Nm
is commonly found on noncoding RNAs such as ribosomal RNA
(rRNA), snRNA, and tRNA, as well as the first and second
nucleotides at the 50 -cap of mRNA [1–3]. More recently,
transcriptome-wide mapping has suggested that Nm sites are also
present internally on mRNA and pre-mRNA [4]. Validating these
novel sites on an individual basis has been challenging, given the
low abundance of gene-specific transcripts. Using genetic models,
we have recently demonstrated snoRNA-guided Nm modification
of Pxdn mRNA and shown that this modification inhibits transla-
tion in vivo—which is consistent with other in vitro work [5–
8]. Our findings demonstrate that at least some mRNA Nm sites

Supplementary Information The online version of this chapter ([Link]


0_11) contains supplementary material, which is available to authorized users.

Mary McMahon (ed.), RNA Modifications: Methods and Protocols, Methods in Molecular Biology, vol. 2298,
[Link] © Springer Science+Business Media, LLC, part of Springer Nature 2021

171
172 Brittany A. Elliott and Christopher L. Holley

“N” “Nm”
Base Base

2’-O-methyltransferase
+ SAM

Fig. 1 20 -O-methylation (Nm) adds a methyl group to any nucleoside ribose (red arrow). The 20 -O-methyl-
transferase (i.e., fibrillarin and others) uses S-adenosyl methionine (SAM) as the methyl donor

are guided by snoRNAs and catalyzed by the enzyme fibrillarin


(FBL), which is the same mechanism used for Nm modification of
rRNA and snRNA.
Although there are no antibodies that can specifically bind and
detect Nm sites, Nm sites can be detected by mass spectrometry,
resistance to site-specific cleavage, oxidation-elimination chemistry,
and interference with reverse transcription [9–13]. Mass spectrom-
etry has the advantage of being quantitative and exquisitely specific
for detecting nucleoside modifications, but most applications
involve the digestion of RNA into single nucleosides, thereby
losing positional information about the modification(s). The chem-
istry of Nm sites can also be exploited for detection and mapping.
Nm sites are resistant to alkaline hydrolysis (by preventing 20 -OH
nucleophile attack on the 30 -phosphate backbone), and this prop-
erty has been exploited for mapping them with primer extension
techniques and newer high-throughput mapping methods such as
RiboMeth-seq [11]. Nm sites are also resistant to oxidation-
elimination chemistry, which is the basis of the RNA-seq-based
Nm mapping methods known as Nm-seq and RibOxi-seq
[4, 12]. Unfortunately, these methods are limited by the require-
ment of a large amount of purified starting material or high-depth
RNA-sequencing and are not practical for routine detection of Nm
modifications on less abundant RNA molecules such as mRNA.
The most commonly used method for detecting Nm on RNA
relies on the fact that reverse transcription (RT) is inhibited by Nm
modifications, if the RT reaction is carried out with low amounts of
dNTPs. The presence of limiting dNTPs generally reduces proces-
sivity of the reverse transcriptase, which results in pausing or stop-
page at Nm and other points of steric hindrance (such as highly
structured regions) [14]. One of the earliest applications was to
map Nm sites on rRNA and snRNAs using radiolabeled primers
and primer extension assay at low dNTPs. This approach was very
successful and a mainstay of the field for several decades [15]. There
has also been more recent work done to develop RT enzymes that
qPCR Quantification of mRNA 20 -O-methylation 173

A. B.

Gm1328 Um1326
Gm1328

Fig. 2 Detection of Nm sites by primer extension. (a) SuperScript III Reverse Transcriptase (SSIII RT) or (b) an
engineered 20 -O-methyl-sensitive DNA polymerase (Klen Taq V669L; KTQ) was used to detect Nm on 18S
rRNA by fluorescently labeled primer extension. RNAs from hearts of WT and Rpl13a snoRNA KO mice (lacking
U32A, U33, U34, and U35A) were processed with (a) SSIII RT and low dNTP or (b) Nm-sensitive KTQ. Rpl13a
snoRNA KO mice lack U33 that guides 18S Um1326 and they also lack U32A, which is one of the two guides
for 18S Gm1328 (U32B is a redundant snoRNA with the same guide sequence). Reduction of Gm1328 can be
clearly seen in KO mice from samples processed with SSIII with low dNTPs (a, arrow), but nearby Um1326 is
unable to be detected in WT or KO RNA (a). In samples processed with Nm-sensitive KTQ, both Um1326 and
Gm1328 are resolved in WT mice (b, arrows). In KO mice, loss of Um1326 is observed as well as partial loss of
methylation of Gm1328. Loss of Gm1328 is incomplete due to the redundant U32B snoRNA

are inhibited by Nm at high (normal) dNTP concentrations. One


such enzyme is an Nm-sensitive DNA polymerase with RT activity:
Klen Taq V669L [16]. To avoid radioisotopes, we have used fluo-
rescently labeled RT primers to detect Nm on abundant rRNA
using both SuperScript III at low dNTPs and Klen Taq V669L at
high dNTPs, in conjunction with a genetic model that lacks snoR-
NAs guiding the modifications in question. In the presence of low
dNTPs, presence and reduction of 18S Gm1328 were detected
with RT SuperScript III (Fig. 2a). However, detection of Nm was
even better using Klen Taq V669L. Using this approach, a “stop”
product from both 18S Gm1328 and nearby Um1326 could be
easily visualized, and loss or reduction in these sites could be seen in
animals lacking the Rpl13a snoRNAs that guide them (Fig. 2b).
Reverse transcription “low dNTP” methods and engineered
enzymes suffer from several weaknesses that must be considered,
including lack of specificity, need to know approximate position of
Nm site, and poor sensitivity for low-abundance transcripts with
primer extension. One contributor to the lack of specificity is that
RNA modifications other than Nm can also disrupt RT to a lesser
extent, and that inherently structured regions of RNA can strongly
inhibit RT at low dNTPs [14, 15]. It is therefore essential to have
proper controls, such as unmodified synthetic RNA or genetic
models that lack the modification being studied.
174 Brittany A. Elliott and Christopher L. Holley

OR
A. B. RT GSP RT odT
m RT RT primer(s) m RT primer(s)
5’ 3’ RNA target 5’ AAAAAA 3’ mRNA target
PCR primers qPCR primers
Fu FD R F R

C. Methylated Transcript (WT) Unmethylated Transcript (snoRNA KO or FBL KD)


m
1. mRNA
2. RT
High dNTP Low dNTP High dNTP Low dNTP
m mx
m mx
m m
Less
3. qPCR Nm

More
Nm

Fig. 3 Detecting snoRNA-guided 20 -O-methylation of mRNA using reverse transcription under low-dNTP
conditions followed by qPCR (RTL-P). (a) Schematic of primer design for RTL-P as described in Dong et al.,
2012. (b) Schematic of primer design sensitive to the presence or absence of 20 -O-methylation on mRNA.
Oligo-dT or gene-specific primers (GSP) for target and housekeeping genes are used as an RT primer. qPCR
primers (F & R) are designed to be upstream of the putative Nm site. (c) Schematic of expected results at each
step of the assay. (1) Total RNA is extracted from control and fibrillarin (Fbl) or snoRNA KO cells or tissue.
(2) Under high-dNTP conditions, RT will read through Nm and methylated transcripts will be indistinguishable
from unmethylated ones. In low-dNTP conditions, RT will frequently pause at sites of Nm, resulting in
truncated transcripts that will not be amplified by PCR. (3) RT products are amplified by qPCR. In this example,
loss of Nm modification due to snoRNA KO leads to increased read-through during RT, which leads to
increased qPCR product

To overcome sensitivity limitations with primer extension, the


low dNTP/primer extension approach has been combined with
PCR amplification, allowing for the use of very little starting mate-
rial (see “Reverse Transcription Under Low dNTP Conditions
Followed by PCR (RTL-P)” [17]). In this method, RT is per-
formed at both high (standard) and low dNTP conditions, fol-
lowed by PCR amplification of the cDNA (Fig. 3). In its original
form (Fig. 3a, c), the products of endpoint PCR are run on an
agarose gel, quantified by imaging, and the relative inhibition of RT
by Nm is calculated by comparing the amount of product in the low
dNTP condition to the standard reaction. A forward primer 50 to
the methylation (Nm sensitive, Fu) will generate a longer product
than a forward primer 30 to the point of the methylation
(Nm unsensitive, FD). Following endpoint PCR, the amplified
reactions are run on an electrophoresis gel and analyzed using
densitometry of a fluorescent nucleotide reporter dye. In condi-
tions where an Nm is present, there will be less observed longer
qPCR Quantification of mRNA 20 -O-methylation 175

product compared with short product (FU/F