0% found this document useful (0 votes)

10 views25 pages

Using SQLite in R

Uploaded by

teamjdr01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views25 pages

Using SQLite in R

Uploaded by

teamjdr01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Using Databases in R

Marc Carlson
Fred Hutchinson Cancer Research Center

17-18 February, 2011

Introduction

Basic SQL

Using SQL from within R

Relational Databases

Relational database basics

I Data stored in tables
I Tables related through keys
I Relational model called a schema
I Tables designed to avoid redundancy

Beneficial uses by R packages

I Out-of-memory data storage
I Fast access to data subsets
I Databases accessible by other software
Uses of Relational Databases in Bioconductor

Annotation packages
I Organism, genome (e.g. org.Hs.eg.db)
I Microarray platforms (e.g. hgu95av2.db)
I Homology (e.g. hom.Hs.inp.db)

Software packages
I Transcript annotations (e.g. GenomicFeatures)
I NGS experiments (e.g. Genominator )
I Annotation infrastructure (e.g. AnnotationDbi)
What do I mean by relational?
From R it looks pretty simple

Making a TranscriptDb object

> library(GenomicFeatures)
> mm9KG <-
+ makeTranscriptDbFromUCSC(genome = "mm9",
+ tablename = "knownGene")

Saving and Loading

> saveFeatures(mm9KG, file="mm9KG.sqlite")
> mm9KG <-
+ loadFeatures(system.file("extdata", "mm9KG.sqlite",
+ package = "AdvancedR2011"))
There are even accessors etc.
> head(transcripts(mm9KG), 3)
GRanges with 3 ranges and 2 elementMetadata values
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr9 [3215314, 3215339] +
[2] chr9 [3335231, 3385846] +
[3] chr9 [3335473, 3343608] +
| tx_id tx_name
| <integer> <character>
[1] | 24312 uc009oas.1
[2] | 24315 uc009oat.1
[3] | 24313 uc009oau.1

seqlengths
chr1 ... chrY_random
197195432 ... 58682461
But what actually happens when we do this?

> options(verbose=TRUE)
> txs <- transcripts(mm9KG)

SQL QUERY: SELECT tx_chrom, tx_start, tx_end, tx_strand,

transcript._tx_id AS tx_id , tx_name FROM transcript
ORDER BY tx_chrom, tx_strand, tx_start, tx_end

Notice how the database query is pretty simple?

I DB joins promote flexible access
I BUT: there can be a cost if using A LOT of joins
I Therefore (in this case) a hybrid approach: Retrieve relevant
records and subset in R
Remember: our DB schema has more than 1 table
SQL in 3 slides

Structured Query Language (SQL) is the most common language

for interacting with relational databases.
Database Retrieval
Single table selections

SELECT * FROM gene;

SELECT gene_id, gene._tx_id FROM gene;

SELECT * FROM gene WHERE _tx_id=49245;

SELECT * FROM transcript WHERE tx_name LIKE '%oap.1';

Inner joins

SELECT gene.gene_id,transcript._tx_id
FROM gene, transcript
WHERE gene._tx_id=transcript._tx_id;

SELECT g.gene_id,t._tx_id
FROM gene AS g, transcript AS t
WHERE g._tx_id=t._tx_id
AND t._tx_id > 10;
Database Modifications
CREATE TABLE

CREATE TABLE foo (

id INTEGER,
string TEXT
);

INSERT

INSERT INTO foo (id, string) VALUES (1,"bar");

CREATE INDEX

CREATE INDEX fooInd1 ON foo(id);

The DBI package

I Provides a nice generic access to databases in R

I Many of the functions are convenient and simple to use
Some popular DBI functions

> library(RSQLite) #loads DBI too, (but we need both)

> drv <- SQLite()
> con <- dbConnect(drv, dbname=system.file("extdata",
+ "mm9KG.sqlite", package="AdvancedR2011")
> dbListTables(con)
[1] "cds" "chrominfo"
[3] "exon" "gene"
[5] "metadata" "splicing"
[7] "transcript"
> dbListFields(con,"transcript")
[1] "_tx_id" "tx_name" "tx_chrom"
[4] "tx_strand" "tx_start" "tx_end"
The dbGetQuery approach

> dbGetQuery(con, "SELECT * FROM transcript LIMIT 3")

_tx_id tx_name tx_chrom tx_strand
1 24308 uc009oap.1 chr9 -
2 24309 uc009oao.1 chr9 -
3 24310 uc009oaq.1 chr9 -
tx_start tx_end
1 3186316 3186344
2 3133847 3199799
3 3190269 3199799
The dbSendQuery approach
If you use result sets, you also need to put them away
> res <- dbSendQuery(con, "SELECT * FROM transcript")
> fetch(res, n= 3)
_tx_id tx_name tx_chrom tx_strand
1 24308 uc009oap.1 chr9 -
2 24309 uc009oao.1 chr9 -
3 24310 uc009oaq.1 chr9 -
tx_start tx_end
1 3186316 3186344
2 3133847 3199799
3 3190269 3199799
> dbClearResult(res)
[1] TRUE
Calling fetch again will get the next three results. This allows for
simple iteration.
Setting up a new DB
First, lets close the connection to our other DB:
> dbDisconnect(con)
[1] TRUE
Then lets make a new database. Notice that we specify the database name
with ”dbname” This allows it to be written to disc instead of just memory.
> drv <- SQLite()
> con <- dbConnect(drv, dbname="myNewDb.sqlite")
Once you have this, you may want to make a new table
> dbGetQuery(con, "CREATE Table foo (id INTEGER, string TEXT)")
NULL
The RSQLite package

I Provides SQLite access for R

I Much better support for complex inserts
Prepared queries

> data <- data.frame(c(226089,66745),

+ c("C030046E11Rik","Trpd52l3"),
+ stringsAsFactors=FALSE)
> names(data) <- c("id","string")
> sql <- "INSERT INTO foo VALUES ($id, $string)"
> dbBeginTransaction(con)
[1] TRUE
> dbGetPreparedQuery(con, sql, bind.data = data)
NULL
> dbCommit(con)
[1] TRUE
Notice that we want strings instead of factors in our data.frame
in SQLite, you can ATTACH Dbs

The SQL what we want looks quite simple:

ATTACH "mm9KG.sqlite" AS db;

So we just need to do something like this:

> db <- system.file("extdata", "mm9KG.sqlite",
+ package="AdvancedR2011")
> dbGetQuery(con, sprintf("ATTACH '%s' AS db",db))
NULL
You can join across attached Dbs
The SQL this looks like:

SELECT * FROM db.gene AS dbg, foo AS f

WHERE dbg.gene_id=f.id;

Then in R:
> sql <- "SELECT * FROM db.gene AS dbg,
+ foo AS f WHERE dbg.gene_id=f.id"
> res <- dbGetQuery(con, sql)
> res
gene_id _tx_id id string
1 226089 48508 226089 C030046E11Rik
2 226089 48509 226089 C030046E11Rik
3 226089 48511 226089 C030046E11Rik
4 226089 48510 226089 C030046E11Rik
5 66745 48522 66745 Trpd52l3
The End.
No seriously, that was it.
There is nothing ”back there” that you need to see...
That was the end.
XML
Using XML in R: the XML package (using xpath)
> ## 1st assemble a URL to use NCBIs web services
> entrezGenes = c(1,100008564)
> library(XML)
> baseUrl <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.f
> xsep <- paste(entrezGenes, collapse=",")
> url <- paste(baseUrl,"db=gene&id=",xsep,"&retmode=xml", sep="")
> url
[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene
> ## Then parse this XML for use by xpathApply
> EGSet <- xmlParse(url)
> ## Then you can use xpathApply() to parse using xpath expressions
> ## xmlValue() returns the contents of that tag
> speciesNames <- unlist(xpathApply(EGSet, "//Org-ref_taxname", xml
> speciesNames
[1] "Homo sapiens" "Mus musculus"
[1] TRUE

Relationaldatabase
No ratings yet
Relationaldatabase
11 pages
Database Connections in R
No ratings yet
Database Connections in R
10 pages
RSQLite Basics: Using Databases in R
No ratings yet
RSQLite Basics: Using Databases in R
3 pages
Accessing RDBMS with R and ODBC
No ratings yet
Accessing RDBMS with R and ODBC
31 pages
T4 L9 Host Program R-1
No ratings yet
T4 L9 Host Program R-1
15 pages
Databases For Microarrays: Vidhya Jagannathan SIB, Lausanne
No ratings yet
Databases For Microarrays: Vidhya Jagannathan SIB, Lausanne
49 pages
BCOA12 Rohan Kapdi Practical 12
No ratings yet
BCOA12 Rohan Kapdi Practical 12
4 pages
SQL With R
100% (1)
SQL With R
12 pages
1 Accessing A Database From R
No ratings yet
1 Accessing A Database From R
4 pages
Unit 5
No ratings yet
Unit 5
17 pages
Session 10
No ratings yet
Session 10
30 pages
Lesson 01 Intro DataBases V2
No ratings yet
Lesson 01 Intro DataBases V2
38 pages
Debugging SQL in R: Error Handling
No ratings yet
Debugging SQL in R: Error Handling
124 pages
Week 5 Database
No ratings yet
Week 5 Database
68 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
20 pages
08 Data Warehouse Uwe
No ratings yet
08 Data Warehouse Uwe
35 pages
My SQLDay 1
No ratings yet
My SQLDay 1
32 pages
A Common Database Interface (DBI) : R-Databases Special Interest Group
No ratings yet
A Common Database Interface (DBI) : R-Databases Special Interest Group
13 pages
PostBio: Bioinformatics with PostgreSQL
No ratings yet
PostBio: Bioinformatics with PostgreSQL
20 pages
Intro to Biological Databases
No ratings yet
Intro to Biological Databases
14 pages
R Database Interface Guide
No ratings yet
R Database Interface Guide
13 pages
RSQLite: SQLite Interface for R
No ratings yet
RSQLite: SQLite Interface for R
22 pages
Notes 03 R Large Data
No ratings yet
Notes 03 R Large Data
8 pages
Using MonetDB and Dplyr To Work With Large HCUP NIS Data Files
No ratings yet
Using MonetDB and Dplyr To Work With Large HCUP NIS Data Files
31 pages
R Data Import/Export
No ratings yet
R Data Import/Export
38 pages
Data Retrieval System: Text-Based Database Searching
No ratings yet
Data Retrieval System: Text-Based Database Searching
54 pages
Procamiasymp00005 0974
No ratings yet
Procamiasymp00005 0974
5 pages
Understanding Biological Databases
No ratings yet
Understanding Biological Databases
47 pages
Bioinformatics Lecture 1
No ratings yet
Bioinformatics Lecture 1
48 pages
RODBC Manual PartI
No ratings yet
RODBC Manual PartI
28 pages
1 Databases
No ratings yet
1 Databases
10 pages
DB Cheat Sheet Till Mid
No ratings yet
DB Cheat Sheet Till Mid
2 pages
RODBC: ODBC Database Connectivity Guide
No ratings yet
RODBC: ODBC Database Connectivity Guide
30 pages
Bioinformatics Presentation 2024 (Object Oriented Databases)
No ratings yet
Bioinformatics Presentation 2024 (Object Oriented Databases)
21 pages
RODBC
No ratings yet
RODBC
34 pages
CBS Databases&SQL MinorDegree EVEN SEM
No ratings yet
CBS Databases&SQL MinorDegree EVEN SEM
5 pages
Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13
No ratings yet
Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13
7 pages
Bioinformatics: Overview and Applications
No ratings yet
Bioinformatics: Overview and Applications
24 pages
02-A-Introduction To Biological Databases
No ratings yet
02-A-Introduction To Biological Databases
52 pages
Database Intro Powerpoint
No ratings yet
Database Intro Powerpoint
24 pages
Structured Query Language Database Query
No ratings yet
Structured Query Language Database Query
1 page
Biological Databases in Bioinformatics
No ratings yet
Biological Databases in Bioinformatics
29 pages
RODBC: ODBC Database Connectivity Guide
No ratings yet
RODBC: ODBC Database Connectivity Guide
30 pages
Bioinformatics Day 5
No ratings yet
Bioinformatics Day 5
6 pages
SQL Basics for R Users
No ratings yet
SQL Basics for R Users
30 pages
Bio Informatics Group 5 Presentation Object Relational Database Updated
No ratings yet
Bio Informatics Group 5 Presentation Object Relational Database Updated
18 pages
DuckDB: An Analytical RDBMS Overview
No ratings yet
DuckDB: An Analytical RDBMS Overview
38 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Biology Bdbms System
No ratings yet
Biology Bdbms System
11 pages
Graph Databases: Solutions & Benefits
No ratings yet
Graph Databases: Solutions & Benefits
24 pages
Package SQLDF': R Topics Documented
No ratings yet
Package SQLDF': R Topics Documented
13 pages
Adobe Scan 06 Aug 2025
No ratings yet
Adobe Scan 06 Aug 2025
17 pages
Blood Bank Management System Project Report
No ratings yet
Blood Bank Management System Project Report
3 pages
DBMS File
No ratings yet
DBMS File
22 pages
Package DBI': June 2, 2024
No ratings yet
Package DBI': June 2, 2024
95 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Handout 2
No ratings yet
Handout 2
15 pages
SQL
No ratings yet
SQL
46 pages
Microsoft Excel Workbook 2.3 - Introducing Spreadsheets
No ratings yet
Microsoft Excel Workbook 2.3 - Introducing Spreadsheets
2,341 pages
? Master SQL DDL With These 50 Interview Questions!
No ratings yet
? Master SQL DDL With These 50 Interview Questions!
8 pages
Page 10
No ratings yet
Page 10
1 page
Fundamentals of Database Systems 6th Edition Elmasri Solutions Manual Download
No ratings yet
Fundamentals of Database Systems 6th Edition Elmasri Solutions Manual Download
43 pages
DAD 220 Module Five Major Activity Template - exaWWpN
No ratings yet
DAD 220 Module Five Major Activity Template - exaWWpN
2 pages
Different Select Statements in SAP ABAP
No ratings yet
Different Select Statements in SAP ABAP
4 pages
SQL (Coursera)
No ratings yet
SQL (Coursera)
7 pages
Database Languages in DBMS
No ratings yet
Database Languages in DBMS
2 pages
SQL Deep Guide
No ratings yet
SQL Deep Guide
236 pages
Database A1
No ratings yet
Database A1
51 pages
Structured Query Language
No ratings yet
Structured Query Language
29 pages
PLSQL Constraints
No ratings yet
PLSQL Constraints
20 pages
SQL Practice Questions Infosys Power Programmer
No ratings yet
SQL Practice Questions Infosys Power Programmer
4 pages
SQL - Create View
No ratings yet
SQL - Create View
4 pages
CSK W Mysql - Assignments
No ratings yet
CSK W Mysql - Assignments
5 pages
Relational Data Model
No ratings yet
Relational Data Model
56 pages
6-Database Design and Development
No ratings yet
6-Database Design and Development
6 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
2 pages
BCA DBMS Unit 3 4
No ratings yet
BCA DBMS Unit 3 4
28 pages
SQL YouTube Course Notes The Iscale
No ratings yet
SQL YouTube Course Notes The Iscale
105 pages
08 PW 5 SW 1 Yv - Js
No ratings yet
08 PW 5 SW 1 Yv - Js
3 pages
DRONAHQ
No ratings yet
DRONAHQ
5 pages
MySQL Database & Table Creation Guide
No ratings yet
MySQL Database & Table Creation Guide
9 pages
Rdbms Unit 3
No ratings yet
Rdbms Unit 3
15 pages
Constraints in SQL
No ratings yet
Constraints in SQL
4 pages
Basdat Salah 2
No ratings yet
Basdat Salah 2
16 pages
Dbms Bcs403 Second Ia Question Bank 24-25
No ratings yet
Dbms Bcs403 Second Ia Question Bank 24-25
2 pages
A5 New
No ratings yet
A5 New
4 pages
SQL 101
No ratings yet
SQL 101
46 pages

Using SQLite in R

Uploaded by

Using SQLite in R

Uploaded by

Using Databases in R

17-18 February, 2011

Using SQL from within R

Relational database basics

Beneficial uses by R packages

Making a TranscriptDb object

Saving and Loading

SQL QUERY: SELECT tx_chrom, tx_start, tx_end, tx_strand,

Notice how the database query is pretty simple?

Structured Query Language (SQL) is the most common language

SELECT * FROM gene;

SELECT * FROM gene WHERE _tx_id=49245;

CREATE TABLE foo (

INSERT INTO foo (id, string) VALUES (1,"bar");

CREATE INDEX fooInd1 ON foo(id);

I Provides a nice generic access to databases in R

> library(RSQLite) #loads DBI too, (but we need both)

> dbGetQuery(con, "SELECT * FROM transcript LIMIT 3")

I Provides SQLite access for R

> data <- data.frame(c(226089,66745),

The SQL what we want looks quite simple:

ATTACH "mm9KG.sqlite" AS db;

So we just need to do something like this:

SELECT * FROM db.gene AS dbg, foo AS f

You might also like