Current Topics in Bioinformatics
(BST520)
Current Topics in Bioinformatics
What is Bioinformatics?
Hogeweg and Hesper: the study of information
processes in biotic systems, in contrast to
biochemistry and biophysics
Current Topics in Bioinformatics
What is Bioinformatics?
Wikipedia: Bioinformatics now entails the creation
and advancement of databases, algorithms,
computational and statistical techniques and theory
to solve formal and practical problems arising from
the management and analysis of biological data.
Current Topics in Bioinformatics
Staying Current
Seminars & Conferences
I TIGR Meeting: Transcriptomics and Integrated
Genomics Working Group
Thurs 9:30-10:30 MRBX 1.11211
I BioC: Bioconductor Conference
each summer at FHCRC in Seattle
Current Topics in Bioinformatics
Staying Current
Blogs & Social Media
I R-bloggers (http://www.r-bloggers.com)
I Genomics, Evolution, and Pseudoscience
(http://genome.fieldofscience.com)
I Simply Statistics (twitter: @simplystats)
Current Topics in Bioinformatics
Staying Current
Core Bioinformatics journals:
I Bioinformatics
I Biostatistics
I BMC Bioinformatics
I BMC Systems Biology
I Briefings in Bioinformatics
I PLoS Computational Biology
I Statistical Applications in Genetics and
Molecular Biology
I IEEE/ACM Transactions on Computational
Biology and Bioinformatics
Current Topics in Bioinformatics
Staying Current
Journals that publish Bioinformatics research:
I Science
I PNAS
I Nature
I Nature Methods
I Nature Biotechnology
I Nucleic Acids Research
I Genome Research
I Genome Biology
I PLoS Biology
Current Topics in Bioinformatics
Staying Current
Scientific Literature
I PubMed
I Google Scholar
I RSS feeds
Current Topics in Bioinformatics
Introduction to R
History of R:
I based on the S programming language
developed at Bell Labs
I developed by Ross Ihaka and Robert Gentleman
I Feb 29, 2000: version 1.0.0
I June 22, 2012: version 2.15.1
http://www.r-project.org/
Current Topics in Bioinformatics
Introduction to R
Advantages of R:
I open source programming language
I functions for most statistical and graphical
techniques – e.g. glm()
I interface with C/C++/PERL/MYSQL
I contributed add-on packages
http://www.r-project.org/
Current Topics in Bioinformatics
Bioconductor
History of Bioconductor:
I collection of R packages for genomic data
I started in 2001
I updated twice per year
I April 2, 2012: version 2.10
http://www.bioconductor.org
Current Topics in Bioinformatics
Bioconductor
Advantages of Bioconductor:
I open source / open development
I stricter contributed package guidelines
I focus on biostatistics / bioinformatics
I genomic annotation / metadata
I focus on reproducible research
http://www.bioconductor.org
Current Topics in Bioinformatics
R Integrated Development Environments
Advantages:
I R script formatting
I syntax highlighting
I interactive session reproducibility
A few options:
I Emacs + ESS
I RStudio
I Eclipse
Current Topics in Bioinformatics
Reproducible Research
Level 0:
I no R script saved:
all commands at the R prompt
I raw data:
deleted once used
I only final table / figure saved
Current Topics in Bioinformatics
Reproducible Research
Level 1:
I R script:
commented code; begins with loading raw data
I raw data:
saved and annotated
Current Topics in Bioinformatics
Reproducible Research
Level 2:
I description of data and analysis, R script, and
results:
Sweave, knitr, etc.
I raw data:
saved and annotated
I processed data:
saved and annotated
Current Topics in Bioinformatics
Reproducible Research
Level 3:
I R/Bioconductor software package:
I documented R functions
I example data set(s)
I package vignette(s)
I versioning using svn
I R/Bioconductor data package:
I documented raw and processed data
I S4 class data structure and methods
I advanced data format – e.g. mysql database
Current Topics in Bioinformatics
Finding Help
Being self-sufficient:
I read the manual / vignette
I Google keywords
I search the R/Bioconductor mailing list archives
http://tolstoy.newcastle.edu.au/R/
Current Topics in Bioinformatics
Finding Help
Being reliant:
I email the R/Bioconductor mailing list
http://www.r-project.org/posting-guide.html
I attend office hours
I email a classmate
I email a professor
Current Topics in Bioinformatics
Course Project
I analysis of a data set / treatment of a
methodological issue
I formal proposal due Oct 15th
I discuss ideas with instructors early
I 50% of your grade
I possibility of publication
Current Topics in Bioinformatics
Homework
First homework assignment due Monday Sept 10th
at 5pm.
Available at http://mnmccall.com/teaching/bst520
Current Topics in Bioinformatics