This is the R script/materials repository of the "Mastering R Skills" course in the 2019/2020 Spring term, part of the MSc in Business Analytics at CEU.
- Syllabus
- Technical Prerequisites
- API ingest and data transformation exercises
- Report on the current price of 0.42 BTC
- Report on the current price of 0.42 BTC in HUF
- Move helpers to a new R package
- Report on the price of 0.42 BTC in the past 30 days
- Report on the price of 0.42 BTC and 1.2 ETH in the past 30 days
- Report on the price of cryptocurrency assets read from a database
- Report on the price of cryptocurrency assets based on the transaction history read from a database
- Profiling, benchmarks
- Reporting exercises
- Homeworks
- Home assignment
- References
Please find in the syllabus folder of this repository.
-
Bookmark, watch or star this repository so that you can easily find it later
-
Please bring your own laptop and make sure to install R and RStudio before attending the first class!
💪 R packages to be installed from CRAN via
install.packages:data.tablehttrjsonlitelubridateggplot2scaleszooRMySQLRSQLiteopenxlsxgooglesheetsdevtoolsroxygen2panderloggerbotor(requires Python andboto3Python module)
💪 R packages to be installed from GitHub via
remotes::install_github:daroczig/binancerdaroczig/loggerdaroczig/dbr
If you get stuck, feel free to use the preconfigured, shared RStudio Server at http://mr.ceudata.net (I will share the usernames and passwords at the start of the class). In such case, you can skip all the steps prefixed with "💪" as the server already have that configured.
-
Join the #ba-mr-2019 Slack channel in the
ceu-bizanalyticsSlack group. -
If you do not already have a GitHub account, create one
-
Create a new GitHub repository called
mastering-r -
💪 Install
gitfrom https://git-scm.com/ -
💪 Verify that in RStudio, you can see the path of the
gitexecutable binary in the Tools/Global Options menu's "Git/Svn" tab -- if not, then you might have to restart RStudio (if you installed git after starting RStudio) or installed git by not adding that to the PATH on Windows. Either way, browse the "git executable" manually (in somebinfolder look for theegitexecutable file). -
Create an RSA key via Tools/Global options/Git/Create RSA Key button (optionally with a passphrase for increased security -- that you have to enter every time you push and pull to and from GitHub), then copy the public key (from
~/.ssh/id_rsa.pub) and add that to you SSH keys on your GitHub profile. -
Create a new project in RStudio choosing "version control", then "git" and paste the SSH version of the repo URL copied from GitHub (from point 4) in the pop-up -- now RStudio should be able to download the repo. If it asks you to accept GitHub's fingerprint, say "Yes".
-
If RStudio/git is complaining that you have to set your identity, click on the "Git" tab in the top-right panel, then click on the Gear icon and then "Shell" -- here you can set your username and e-mail address in the command line, so that RStudio/git integration can work. Use the following commands:
$ git config --global user.name "Your Name" $ git config --global user.email "Your e-mail address"
Close this window, commit, push changes, all set.
Find more resources in Jenny Bryan's "Happy Git and GitHub for the useR" tutorial if in doubt or contact me.
We have 0.42 Bitcoin. Let's write an R script reporting on the current value of this asset in USD.
Click here for a potential solution ...
library(devtools)
install_github('daroczig/binancer')
library(binancer)
coin_prices <- binance_ticker_all_prices()
library(data.table)
coin_prices[from == 'BTC' & to == 'USDT', to_usd]
## alternative solution
coin_prices <- binance_coins_prices()
coin_prices[symbol == 'BTC', usd]
## don't forget that we need to report on the price of 0.42 BTC instead of 1 BTC
coin_prices[symbol == 'BTC', usd * 0.42]Let's do the same report as above, but instead of USD, now let's report in Hungarian Forints.
Click here for a potential solution ...
## How to get USD/HUF rate?
## See eg https://exchangeratesapi.io for free API access
## Loading data without any dependencies
https://api.exchangeratesapi.io/latest
https://api.exchangeratesapi.io/latest?base=USD
readLines('https://api.exchangeratesapi.io/latest?base=USD')
## Parse JSON
library(jsonlite)
fromJSON(readLines('https://api.exchangeratesapi.io/latest?base=USD'))
fromJSON('https://api.exchangeratesapi.io/latest?base=USD')
## Extract the USD/HUF exchange rate from the list
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
coin_prices[symbol == 'BTC', 0.42 * usd * usdhuf]Click here for a potential solution ... after cleaning up
## loading requires packages on the top of the script
library(binancer)
library(httr)
## constants
BITCOINS <- 0.42
## get Bitcoin price in USD
coin_prices <- binance_coins_prices()
btcusdt <- coin_prices[symbol == 'BTC', usd]
## get USD/HUF exchange rate
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
## report
BITCOINS * btcusdt * usdhufClick here for a potential solution ... with logging
library(binancer)
library(httr)
library(data.table)
library(logger)
BITCOINS <- 0.42
coin_prices <- binance_coins_prices()
log_info('Found {coin_prices[, .N]} coins on Binance')
btcusdt <- coin_prices[symbol == 'BTC', usd]
log_info('The current Bitcoin price is ${btcusdt}')
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
log_info('1 USD currently costs {usdhuf} Hungarian Forints')
log_eval(forint(BITCOINS * btcusdt * usdhuf), level = INFO)
log_info('{BITCOINS} Bitcoins now worth {round(btcusdt * usdhuf * BITCOINS)} HUF')Click here for a potential solution ... with validating values received from the API
library(binancer)
library(httr)
library(data.table)
library(logger)
library(checkmate)
BITCOINS <- 0.42
coin_prices <- binance_coins_prices()
log_info('Found {coin_prices[, .N]} coins on Binance')
btcusdt <- coin_prices[symbol == 'BTC', usd]
log_info('The current Bitcoin price is ${btcusdt}')
assert_number(btcusdt, lower = 1000)
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
log_info('1 USD currently costs {usdhuf} Hungarian Forints')
assert_number(usdhuf, lower = 250, upper = 500)
log_info('{BITCOINS} Bitcoins now worth {round(btcusdt * usdhuf * BITCOINS)} HUF')Click here for a potential solution ... with auto-retries for API errors
library(binancer)
library(httr)
library(data.table)
library(logger)
library(checkmate)
BITCOINS <- 0.42
get_bitcoin_price <- function() {
tryCatch(
binance_coins_prices()[symbol == 'BTC', usd],
error = function(e) get_bitcoin_price())
}
btcusdt <- get_bitcoin_price()
log_info('The current Bitcoin price is ${btcusdt}')
assert_number(btcusdt, lower = 1000)
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
log_info('1 USD currently costs {usdhuf} Hungarian Forints')
assert_number(usdhuf, lower = 250, upper = 500)
log_info('{BITCOINS} Bitcoins now worth {round(btcusdt * usdhuf * BITCOINS)} HUF')Click here for a potential solution ... with auto-retries for API errors with exponential backoff
library(binancer)
library(httr)
library(data.table)
library(logger)
library(checkmate)
BITCOINS <- 0.42
get_bitcoin_price <- function(retried = 0) {
tryCatch(
binance_coins_prices()[symbol == 'BTC', usd],
error = function(e) {
## exponential backoff retries
Sys.sleep(1 + retried^2)
get_bitcoin_price(retried = retried + 1)
})
}
btcusdt <- get_bitcoin_price()
log_info('The current Bitcoin price is ${btcusdt}')
assert_number(btcusdt, lower = 1000)
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
log_info('1 USD currently costs {usdhuf} Hungarian Forints')
assert_number(usdhuf, lower = 250, upper = 500)
log_info('{BITCOINS} Bitcoins now worth {round(btcusdt * usdhuf * BITCOINS)} HUF')Click here for a potential solution ... with better currency formatter
round(btcusdt * usdhuf * BITCOINS)
format(btcusdt * usdhuf * BITCOINS, big.mark = ',', digits = 10)
format(btcusdt * usdhuf * BITCOINS, big.mark = ',', digits = 6)
library(scales)
dollar(btcusdt * usdhuf * BITCOINS)
dollar(btcusdt * usdhuf * BITCOINS, prefix = '', suffix = ' HUF')
forint <- function(x) {
dollar(x, prefix = '', suffix = ' HUF')
}
forint(btcusdt * usdhuf * BITCOINS)-
Click File / New Project / New folder and create a new R package (maybe call it
mr, also create a git repo for it) -- that will fill in your newly created folder with a package skeleton delivering thehellofunction in thehello.Rfile. -
Get familiar with:
-
the
DESCRIPTIONfile- semantic versioning: https://semver.org
- open-source license, see eg http://r-pkgs.had.co.nz/description.html#license or https://rstats-pkgs.readthedocs.io/en/latest/licensing.html
-
the
Rsubfolder -
the
mansubfolder -
the
NAMESPACEfile
-
-
Install the package (in the Build menu), load it and try
hello(), then?hello -
Create a git repo (if not done that already) and add/commit this package skeleton
-
Add a new function called
forintin theRsubfolder:forint.Rforint <- function(x) { dollar(x, prefix = '', suffix = ' HUF') }
-
Install the package, re-load it, and try running
forinteg calling on42-- realize it's failing -
After loading the
scalespackage (that delivers thedollarfunction), it works ... we need to prepare our package to loadscales::dollarwithout user interventation -
Also, look at the docs of
forint-- realize it's missing, so let's learn aboutroxygen2and update theforint.Rfile to explicitely list the function to be exported and note thatdollaris to be imported from thescalespackage:forint.R#' Formats Hungarian Forint #' @param x number #' @return string #' @export #' @importFrom scales dollar #' @examples #' forint(100000) #' forint(10.3241245125125) forint <- function(x) { dollar(x, prefix = '', suffix = ' HUF') }
-
Run
roxygen2on the package by enabling it in the "Build" menu's "Configure Build Tools", then "Document" it (if there's no such option, probably you need to install theroxygen2package first), and make sure to check what changes happened in theman,NAMESPACE(you might need to delete the original one) andDESCRIPTIONfiles. It's also a good idea to automatically runroxygen2before each install, so I'd suggests marking that option as well. The resulting files should look something like:DESCRIPTIONPackage: mr Type: Package Title: Demo R package for the Mastering R class Version: 0.1.0 Author: Gergely <***@***.***> Maintainer: Gergely <***@***.***> Description: Demo R package for the Mastering R class License: AGPL Encoding: UTF-8 LazyData: true RoxygenNote: 7.1.0 Imports: scalesNAMESPACE# Generated by roxygen2: do not edit by hand export(forint) importFrom(scales,dollar) -
Keep committing to the git repo
-
Delete
hello.Rand rerunroxygen2/ reinstall the package -
Add a new function that gets the most recent price of a Bitcoin in USD with retries:
converter.R#' Hitting the Binance API to get the most recent price of a Bitcoin in USD #' #' This is a wrapper around the \code{binancer} package ... #' @export #' @param retried the number if retries previously done before the exponential backoff sleep #' @importFrom binancer binance_coins_prices get_bitcoin_price <- function(retried = 0) { tryCatch( ## not using data.table syntax here and falling back to data.frame ## so that this could run on systems wihtout data.table as well subset(binance_coins_prices(), symbol == 'BTC')$usd, error = function(e) { ## exponential backoff retries Sys.sleep(1 + retried^2) get_bitcoin_price(retried = retried + 1) }) }
-
Now you can run the original R script hitting the Binance and ExchangeRatesAPI by using these helper functions:
library(binancer)
library(jsonlite)
library(logger)
library(checkmate)
library(scales)
log_threshold(TRACE)
library(mr)
BITCOINS <- 0.42
log_info('Number of Bitoins: {BITCOINS}')
btcusdt <- get_bitcoin_price()
log_info('The value of 1 Bitcoin: {dollar(btcusdt)}')
assert_number(btcusdt, lower = 1000)
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
log_info('The value of 1 USD: {forint(usdhuf)}')
assert_number(usdhuf, lower = 250, upper = 500)
log_eval(forint(BITCOINS * btcusdt * usdhuf))Let's do the same report as above, but instead of reporting the most recent value of the asset, let's report on the daily values from the past 30 days.
Click here for a potential solution ... with fixed USD/HUF exchange rate
library(binancer)
library(httr)
library(data.table)
library(logger)
library(ggplot2)
library(mr)
## ########################################################
## CONSTANTS
BITCOINS <- 0.42
## ########################################################
## Loading data
## USD in HUF
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
log_info('The current USD price is {forint(usdhuf)}')
## Bitcoin price in USD
btcusdt <- binance_klines('BTCUSDT', interval = '1d', limit = 30)
str(btcusdt)
balance <- btcusdt[, .(date = as.Date(close_time), btcusd = close)]
str(balance)
balance[, btchuf := btcusd * usdhuf]
balance[, btc := BITCOINS]
balance[, value := btc * btchuf]
str(balance)
## ########################################################
## Report
ggplot(balance, aes(date, value)) +
geom_line() +
xlab('') +
ylab('') +
scale_y_continuous(labels = forint) +
theme_bw() +
ggtitle('My crypto fortune',
subtitle = paste(BITCOINS, 'BTC'))
Click here for a potential solution ... with daily corrected USD/HUF exchange rate
library(binancer)
library(httr)
library(data.table)
library(logger)
library(scales)
library(ggplot2)
library(mr)
## ########################################################
## CONSTANTS
BITCOINS <- 0.42
## ########################################################
## Loading data
## USD in HUF
usdhuf <- fromJSON('https://api.exchangeratesapi.io/latest?base=USD&symbols=HUF')$rates$HUF
## try with a single date?
fromJSON('https://api.exchangeratesapi.io/2020-05-01?base=USD&symbols=HUF')
## no, it's just a single day
# fromJSON('https://api.exchangeratesapi.io/history?start_at=2020-05-01&base=USD&symbols=HUF')
## need end
fromJSON('https://api.exchangeratesapi.io/history?start_at=2020-05-01&end_at=2020-05-30&base=USD&symbols=HUF')
## we can do a much better job!
library(httr)
response <- GET(
'https://api.exchangeratesapi.io/history',
query = list(
start_at = Sys.Date() - 30,
end_at = Sys.Date(),
base = 'USD',
symbols = 'HUF'
))
exchange_rates <- content(response)
str(exchange_rates)
exchange_rates <- exchange_rates$rates
library(data.table)
usdhuf <- data.table(
date = as.Date(names(exchange_rates)),
usdhuf = as.numeric(unlist(exchange_rates)))
str(usdhuf)
## Bitcoin price in USD
btcusdt <- binance_klines('BTCUSDT', interval = '1d', limit = 30)
str(btcusdt)
balance <- btcusdt[, .(date = as.Date(close_time), btcusd = close)]
str(balance)
str(usdhuf)
merge(balance, usdhuf, by = 'date')
## oh no, missing records??
## rolling join to look up the most recently available USD/HUF rate
## (published on business days) for each calendar day
setkey(balance, date)
setkey(usdhuf, date)
balance <- usdhuf[balance, roll = TRUE]
str(balance)
balance[, btchuf := btcusd * usdhuf]
balance[, btc := BITCOINS]
balance[, value := btc * btchuf]
str(balance)
## ########################################################
## Report
ggplot(balance, aes(date, value)) +
geom_line() +
xlab('') +
ylab('') +
scale_y_continuous(labels = forint) +
theme_bw() +
ggtitle('My crypto fortune',
subtitle = paste(BITCOINS, 'BTC'))Let's do the same report as above, but now we not only have 0.42 Bitcoin, but 1.2 Ethereum as well.
Click here for a potential solution ...
library(binancer)
library(httr)
library(data.table)
library(logger)
library(scales)
library(ggplot2)
library(mr)
## ########################################################
## CONSTANTS
BITCOINS <- 0.42
ETHEREUMS <- 1.2
## ########################################################
## Loading data
## USD in HUF
exchange_rates <- content(GET(
'https://api.exchangeratesapi.io/history',
query = list(
start_at = Sys.Date() - 40,
end_at = Sys.Date(),
base = 'USD',
symbols = 'HUF'
)))$rates
usdhuf <- data.table(
date = as.Date(names(exchange_rates)),
usdhuf = as.numeric(unlist(exchange_rates)))
## Cryptocurrency prices in USD
btcusdt <- binance_klines('BTCUSDT', interval = '1d', limit = 30)
ethusdt <- binance_klines('ETHUSDT', interval = '1d', limit = 30)
coinusdt <- rbind(btcusdt, ethusdt)
str(coinusdt)
## oh no, how to keep the symbol??
balance <- coinusdt[, .(date = as.Date(close_time), btcusd = close, symbol = ???)]
## DRY (don't repeat yourself)
balance <- rbindlist(lapply(c('BTC', 'ETH'), function(s) {
binance_klines(paste0(s, 'USDT'), interval = '1d', limit = 30)[, .(
date = as.Date(close_time),
usdt = close,
symbol = s
)]
}))
balance[, amount := switch(
symbol,
'BTC' = BITCOINS,
'ETH' = ETHEREUMS,
stop('Unsupported coin')),
by = symbol]
str(balance)
## rolling join
setkey(balance, date)
setkey(usdhuf, date)
balance <- usdhuf[balance, roll = TRUE]
str(balance)
balance[, value := amount * usdt * usdhuf]
str(balance)
## ########################################################
## Report
ggplot(balance, aes(date, value, fill = symbol)) +
geom_col() +
xlab('') +
ylab('') +
scale_y_continuous(labels = forint) +
theme_bw() +
ggtitle(
'My crypto fortune',
subtitle = balance[date == max(date), paste(paste(amount, symbol), collapse = ' + ')])-
Create a new MySQL account and database at some free service provider (eg remotemysql.com or freemysqlhosting.net)
-
Log in and give a try to PhpMyAdmin
-
💪 Install
dbrfrom GitHub:library(devtools) install_github('daroczig/logger') install_github('daroczig/dbr')
-
Install
botoras well to be able to use encrypted credentials (note that this requires you to install Python first and thenpip install boto3as well):install_github('daroczig/botor') -
Set up a YAML file (menu: new file/text file, save as
databases.yml) for the database connection, something like:remotemysql: host: remotemysql.com port: 3306 dbname: ... user: ... drv: !expr RMySQL::MySQL() password: ...
-
Set up
dbrto use that YAML file:options('dbr.db_config_path' = '/path/to/databases.yml')
-
Create a table for the balances and insert some records:
library(dbr) db_config('remotemysql') db_query('CREATE TABLE coins (symbol VARCHAR(3) NOT NULL, amount DOUBLE NOT NULL DEFAULT 0)', 'remotemysql') db_query('TRUNCATE TABLE coins', 'remotemysql') db_query('INSERT INTO coins VALUES ("BTC", 0.42)', 'remotemysql') db_query('INSERT INTO coins VALUES ("ETH", 1.2)', 'remotemysql')
-
Write the reporting script, something like:
Click here for a potential solution ...
library(binancer) library(httr) library(data.table) library(logger) library(scales) library(ggplot2) library(mr) library(dbr) options('dbr.db_config_path' = '/path/to/databases.yml') options('dbr.output_format' = 'data.table') ## ######################################################## ## Loading data ## Read actual balances from the DB balance <- db_query('SELECT * FROM coins', 'remotemysql') ## Look up cryptocurrency prices in USD and merge balances balance <- rbindlist(lapply(balance$symbol, function(s) { binance_klines(paste0(s, 'USDT'), interval = '1d', limit = 30)[, .( date = as.Date(close_time), usdt = close, symbol = s, amount = balance[symbol == s, amount] )] })) ## USD in HUF exchange_rates <- content(GET( 'https://api.exchangeratesapi.io/history', query = list( start_at = Sys.Date() - 40, end_at = Sys.Date(), base = 'USD', symbols = 'HUF' )))$rates usdhufs <- data.table( date = as.Date(names(exchange_rates)), usdhuf = as.numeric(unlist(exchange_rates))) ## rolling join USD/HUF exchange rate to balances setkey(balance, date) setkey(usdhufs, date) balance <- usdhufs[balance, roll = TRUE] ## DT[i, j, by = ...] ## compute daily values in HUF balance[, value := amount * usdt * usdhuf] ## ######################################################## ## Report ggplot(balance, aes(date, value, fill = symbol)) + geom_col() + xlab('') + ylab('') + #scale_y_continuous(labels = forint) + theme_bw() + ggtitle( 'My crypto fortune', subtitle = balance[date == max(date), paste(paste(amount, symbol), collapse = ' + ')])
-
Rerun the above report after inserting two new records to the table:
db_query("INSERT INTO coins VALUES ('NEO', 100)", 'remotemysql') db_query("INSERT INTO coins VALUES ('LTC', 25)", 'remotemysql')
Let's prepare the transactions table:
library(dbr)
options('dbr.db_config_path' = '/path/to/database.yml')
options('dbr.output_format' = 'data.table')
db_query('
CREATE TABLE transactions (
date TIMESTAMP NOT NULL,
symbol VARCHAR(3) NOT NULL,
amount DOUBLE NOT NULL DEFAULT 0)',
db = 'remotemysql')
db_query('TRUNCATE TABLE transactions', 'remotemysql')
db_query('INSERT INTO transactions VALUES ("2020-01-01 10:42:02", "BTC", 1.42)', 'remotemysql')
db_query('INSERT INTO transactions VALUES ("2020-01-01 10:45:20", "ETH", 1.2)', 'remotemysql')
db_query('INSERT INTO transactions VALUES ("2020-02-28", "BTC", -1)', 'remotemysql')
db_query('INSERT INTO transactions VALUES ("2020-04-13", "NEO", 100)', 'remotemysql')
db_query('INSERT INTO transactions VALUES ("2020-04-20 12:12:21", "LTC", 25)', 'remotemysql')Click here for a potential solution for the report ...
library(binancer)
library(httr)
library(data.table)
library(logger)
library(scales)
library(ggplot2)
library(zoo)
library(mr)
## ########################################################
## Loading data
## Read transactions from the DB
transactions <- db_query('SELECT * FROM transactions', 'remotemysql')
## Prepare daily balance sheets
balance <- transactions[, .(date = as.Date(date), amount = cumsum(amount)), by = symbol]
balance
## Transform long table into wide
balance <- dcast(balance, date ~ symbol)
balance
## Add missing dates
dates <- data.table(date = seq(from = Sys.Date() - 30, to = Sys.Date(), by = '1 day'))
balance <- merge(balance, dates, by = 'date', all.x = TRUE, all.y = TRUE)
balance
## Fill in missing values between actual balances
balance <- na.locf(balance)
## Fill in remaining missing values with zero
balance[is.na(balance)] <- 0
## Transform wide table back to long format
balance <- melt(balance, id.vars = 'date', variable.name = 'symbol', value.name = 'amount')
balance
## Get crypt prices
prices <- rbindlist(lapply(as.character(unique(balance$symbol)), function(s) {
binance_klines(paste0(s, 'USDT'), interval = '1d', limit = 30)[
, .(date = as.Date(close_time), symbol = s, usdt = close)]
}))
balance <- merge(balance, prices, by = c('date', 'symbol'), all.x = TRUE, all.y = FALSE)
## Merge USD/HUF rate
response <- GET(
'https://api.exchangeratesapi.io/history',
query = list(start_at = Sys.Date() - 30, end_at = Sys.Date(),
base = 'USD', symbols = 'HUF'))
exchange_rates <- content(response)$rates
usdhufs <- data.table(
date = as.Date(names(exchange_rates)),
usdhuf = as.numeric(unlist(exchange_rates)))
setkey(balance, date)
setkey(usdhufs, date)
balance <- usdhufs[balance, roll = TRUE]
## compute daily values in HUF
balance[, value := amount * usdt * usdhuf]
## ########################################################
## Report
ggplot(balance, aes(date, value, fill = symbol)) +
geom_col() +
ylab('') + scale_y_continuous(labels = forint) +
xlab('') +
theme_bw() +
ggtitle(
'My crypto fortune',
subtitle = balance[date == max(date), paste(paste(amount, symbol), collapse = ' + ')])Breaking down the a single run of the get_bitcoin_price function to see which component is slow and taking up resources:
library(profvis)
profvis({
get_bitcoin_price()
})A more realistic example: is ggplot2 indeed slow when generating scatter plots on a dataset with larger number of observations?
profvis({
library(ggplot2)
x <- ggplot(diamonds, aes(price, carat)) + geom_point()
print(x)
})Pipe VS Bracket:
library(data.table)
library(dplyr)
dt <- data.table(diamonds)
profvis({
dt[, sum(carat), by = color][order(color)]
group_by(dt, color) %>% summarise(price = sum(carat))
})
## run too quickly for profiling ...
library(microbenchmark)
microbenchmark(
aggregate(dt$carat, by = list(dt$color), FUN = sum),
dt[, sum(carat), by = color][order(color)],
group_by(dt, color) %>% summarise(price = sum(carat)),
times = 100)Also check out dtplyr.
Download and extract the database file:
## download database file
download.file('http://bit.ly/CEU-R-ecommerce', 'ecommerce.zip', mode = 'wb')
unzip('ecommerce.zip')Install the SQLite client on your operating system and then use the sqlite3 ecommerce.sqlite3 command to enter the command-line SQLite client to browse the database:
-- list tables in the database
.tables
-- show the structure of the sales table
.schema sales
-- show the first 5 rows of the table
select * from sales limit 5
-- tweak how the rows are shown
.headers on
.mode column
select * from sales limit 5
-- count number of rows in the table
SELECT COUNT(*) FROM sales;
-- count number of rows in January 2011 (lack of proper date/time handling in SQLite)
SELECT COUNT(*)
FROM sales
WHERE SUBSTR(InvoiceDate, 7, 4) || SUBSTR(InvoiceDate, 1, 2) || SUBSTR(InvoiceDate, 4, 2)
BETWEEN '20110101' AND '20110131'
-- check on the date format
SELECT InvoiceDate FROM sales ORDER BY random() LIMIT 25;
-- count the number of rows per month
SELECT
SUBSTR(InvoiceDate, 7, 4) || SUBSTR(InvoiceDate, 1, 2) AS month,
COUNT(*)
FROM sales
GROUP BY month
ORDER BY month;Let's switch to R!
Create a database config file for the dbr package:
ecommerce:
drv: !expr RSQLite::SQLite()
dbname: /path/to/ecommerce.sqlite3Update your dbr settings to use the config file:
library(dbr)
options('dbr.db_config_path' = '/path/to/database.yml')
options('dbr.output_format' = 'data.table')
sales <- db_query('SELECT * FROM sales', 'ecommerce')
str(sales)
## explore and fix the invoice date column
sales[, sample(InvoiceDate, 25)]
sales[, InvoiceDate := as.POSIXct(InvoiceDate, format = '%m/%d/%Y %H:%M')]
## number of sales per month like in SQL
library(lubridate)
sales[, .N, by = month(InvoiceDate)]
sales[, .N, by = year(InvoiceDate)]
sales[, .N, by = paste(year(InvoiceDate), month(InvoiceDate))]
# slow
sales[, .N, by = as.character(InvoiceDate, format = '%Y %m')]
# smart
sales[, .N, by = floor_date(InvoiceDate, 'month')]
## number of items per country
sales[, .N, by = Country]
sales[, .N, by = Country][order(-N)]invoices <- sales[, .(date = min(as.Date(InvoiceDate)),
value = sum(Quantity * UnitPrice)),
by = .(invoice = InvoiceNo, customer = CustomerID, country = Country)]
db_insert(invoices, 'invoices', 'ecommerce')Check the structure of the newly (and automatically) created table using the command-line SQLite client:
.schema invoicesCheck the date column after reading back from the database:
invoices <- db_query('SELECT * FROM invoices', 'ecommerce')
str(invoices)
invoices[, date := as.Date(date, origin = '1970-01-01')]revenue <- invoices[, .(revenue = sum(value)), by = date]
library(openxlsx)
wb <- createWorkbook()
sheet <- 'Revenue'
addWorksheet(wb, sheet)
writeData(wb, sheet, revenue)
## open for quick check
openXL(wb)
## write to a file to be sent in an e-mail, uploaded to Slack or as a Google Spreasheet etc
filename <- tempfile(fileext = '.xlsx')
saveWorkbook(wb, filename)
unlink(filename)
## static file name
filename <- 'report.xlsx'
saveWorkbook(wb, filename)Tweak that spreadsheet:
freezePane(wb, sheet, firstRow = TRUE)
setColWidths(wb, sheet, 1:ncol(revenue), 'auto')
poundStyle <- createStyle(numFmt = '£0,000.00')
addStyle(wb, sheet = sheet, poundStyle,
gridExpand = TRUE, cols = 2, rows = (1:nrow(revenue)) + 1, stack = TRUE)
conditionalFormatting(wb, sheet, cols = 2,
rows = 2:(nrow(revenue) + 1),
rule = '$B2<66788.35', style = greenStyle)
standardStyle <- createStyle()
conditionalFormatting(wb, sheet, cols = 2,
rows = 2:(nrow(revenue) + 1),
rule = '$B2<=66788.35', style = standardStyle)Add a plot:
addWorksheet(wb, 'Plot')
library(ggplot2)
library(ggthemes)
ggplot(revenue, aes(date, revenue)) + geom_line() + theme_excel()
insertPlot(wb, 'Plot')
saveWorkbook(wb, filename)
saveWorkbook(wb, filename, overwrite = TRUE)library(lubridate)
monthly <- invoices[, .(value = sum(value)), by = .(month = floor_date(date, 'month'))]
library(openxlsx)
wb <- createWorkbook()
sheet <- 'Summary'
addWorksheet(wb, sheet)
writeData(wb, sheet, monthly)
for (month in monthly$month) {
revenue <- invoices[floor_date(date, 'month') == month,
.(revenue = sum(value)), by = date]
addWorksheet(wb, as.character(month))
writeData(wb, as.character(month), revenue)
}
saveWorkbook(wb, 'monthly-report.xlsx')top10 <- sales[!is.na(CustomerID),
.(revenue = sum(UnitPrice * Quantity)), by = CustomerID][order(-revenue)][1:10]
library(openxlsx)
wb <- createWorkbook()
sheet <- 'Top Customers'
addWorksheet(wb, sheet)
writeData(wb, sheet, top10)
t <- tempfile(fileext = '.xlsx')
saveWorkbook(wb, t)
## upload file
library(googledrive)
drive_auth()
drive_update(media = t, file = 'top customers')
## instead of top10, let's do top25 ... so appending a few rows to an already existing spreadsheet
top25 <- sales[!is.na(CustomerID),
.(revenue = sum(UnitPrice * Quantity)), by = CustomerID][order(-revenue)][1:25]
library(googlesheets4)
gs4_auth()
for (i in 11:25) {
sheet_append('your.spreadsheet.id', data = top25[i])
}Create the mr R package described above with the forint and get_bitcoin_price functions, and push to a new repo in your GitHub account, so that you can install the package on any computer via remotes::install_github. Submit the URL to your GitHub repo in Moodle.
Create a new git branch in your (above created) git repo for the mr package, and introduce a new function there that queries historical exchange rates for any currency (so configurable symbol and base currency) for the past number of days. Example run:
> convert_currency('EUR', 'USD', 2)
date rate
1: 2020-05-12 1.0858
2: 2020-05-13 1.0875Don't forget about documenting the function!
Then push your changes (either in one or multiple commits) as a new branch to GitHub and create a pull request to merge to your master branch. Share the URL to your pull request on Moodle!
For pass:
- Merge your PR from the second week's homework!
- Read the "Testing" chapter from Hadley's "R Packages" book at http://r-pkgs.had.co.nz/tests.html
- Create a new branch in your R package, write a unit test for the
forintfunction to make sure thatforint(42)returns42 HUF, open a pull request and share the PR URL on Moodle!
For grade:
- Set up CI to automatically run your unit tests when pushing to GitHub (see https://r-pkgs.org/r-cmd-check.html)
- Set up a webpage for your package using
pkgdownand GitHub Pages (see https://pkgdown.r-lib.org)
Deadline: June 14, 2020 (midnight by CET)
- AWS Console: https://ceu.signin.aws.amazon.com/console
- Binance (cryptocurrency exchange) API: https://github.com/binance-exchange/binance-official-api-docs/blob/master/rest-api.md (R implementation available at https://github.com/daroczig/binancer)
- Foreign exchange rates API, eg https://exchangeratesapi.io
- Free MySQL database: https://remotemysql.com
- "Writing R Extensions" docs: https://cran.r-project.org/doc/manuals/r-release/R-exts.html
- Hadley Wickham's "R packages" book: http://r-pkgs.had.co.nz
- Hadley Wickham's "Advanced R" book (1st edition): http://adv-r.had.co.nz/
- The
tidyversestyle guide: https://style.tidyverse.org/ pkgdownpackage: https://pkgdown.r-lib.org/index.htmldbrpackage: https://github.com/daroczig/dbr