Best Practice for R : : CHEAT SHEET
Software Packages Databases Learning More
Write code in the RStudio IDE Packages should be loaded in one place with • Use {DBI} and {odbc} to connect to SQL • For common data science tasks,
successive calls to library() • Use helper functions to create connections see R for Data Science (2e)
Use quarto for literate
programming connect_to_db <- function(db) { • For package development,
Use the tidyverse for normal DBI::dbConnect( see R Packages (2e)
Use git to version-control your wrangling, plotting etc
code and analysis odbc::odbc(), Database = db, • For advanced programming,
Use tidymodels for modelling and # Hard-code common options here see Advanced R (2e)
Use GitHub to collaborate with • For app development,
machine learning ) # Connect using the helper
other people
} con <- connect_to_db(“DWH") see Mastering Shiny
Use {shiny}, {bslib} and
Projects {bs4Dash} for app development
Functions WRITING FUNCTIONS: WORKFLOW
Use r-lib packages like {rlang}, {cli}
PROJECT CREATION & {glue} for low-level programming • Write functions to reduce repetition or a <- complex operation on a 1. Repetitive, complex
• Create a new project in RStudio using Use {renv} in long-term projects to increase clarity b <- complex operation on b code; purpose
File > New Project > New Directory track dependency packages • Write many small functions that call c <- complex operation on c clarified by
• Do put projects in a single, local folder like each other d <- complex operation on d comments
C:\Users\your-name\Documents GitHub stars are a good proxy for a package's • Define functions in dedicated
• Don't put projects in locations controlled by quality. Not sure whether to use a package? If scripts with corresponding names operate_on <- function(x) { 2. Complex logic
OneDrive / iCloud (these don’t play well with it has >200 stars on GitHub it's probably good!
complex operation on x abstracted into
Git) NAMING CONVENTIONS } functions
PROJECT STRUCTURE Getting Help ✗ Bad (noun-like) ✓ Good (verb-like)
a <- operate_on (a)
totals_getter() compute_totals() 3. Repetition
Most projects should be structured like this: CREATE A REPREX b <- operate_on (b) reduced; clearer
modeller_func() fit_model() c <- operate_on (c) code; less need for
.gitignore tells git which • A minimal, reproducible example should
my-project/ comments
files not to track demonstrate the issue as simply as possible project_data() import_datasets() d <- operate_on (d)
.gitignore • Copy your example code and run
R code to run on
.Rprofile
startup reprex::reprex() to embed errors/
messages/outputs as comments Styling For other styling guidance, refer to the Tidyverse style guide
R/ Scripts in R/ should • Use your reprex in a question on Teams or NAMING THINGS
define functions for # Good (lower_snake_case everywhere):
01-import.R use elsewhere Stackoverflow
• Use lower_snake_case for most objects add1 <- function(x) x + 1
02-tidy.R print("Hello " + "world!") (functions, variables etc) first_letters <- letters[1:3]
Use folders SQL/,
data/ etc for other file #> Error in "Hello " + "world!": non- • Title_Snake_Case may be used for column iris_sample <- slice_sample(iris, n = 5)
SQL/
types numeric argument to binary operator names # Bad (non-syntactic, not lower_snake_case):
costs.sql • Use only syntactic names where possible
Use a top-level R This reprex minimally demonstrates an `add 1` <- function(x) x + 1
run-all.R script to run everything error when attempting to use + for (include only numbers, letters, underscores FirstLetters <- letters[1:3]
Python-style string concatenation and periods, and don't start with a number) iris.sample <- slice_sample(iris, n = 5)
renv/ Records of package
versions; created using
renv.lock renv::init() ETIQUETTE WHEN ASKING QUESTIONS WHITESPACE # Good (lots of spaces, indents always by +2):
Don't Do • Add spaces after commas and around df <- iris |>
my-project.Rproj
operators like |>, %>%, +, -, *, /, = and <- mutate(
A .Rproj file Post screenshots Use reprex::reprex() and
README.md makes this • Indentation increases should always be by Sepal.Area = Sepal.Width * Sepal.Length,
directory an of your code paste your code as text exactly 2 spaces Petal.Area = Petal.Width * Petal.Length
Write the main facts RStudio project
about the project here Use dput() or • Add linebreaks when lines get longer )
Include big files tibble::tribble() to include than 80 characters.
# Bad (inconsistent spacing and indentation):
NB, usethis::use_description() + a data sample • When there are many arguments in a call,
df<-iris |>
give each argument its own line (including
usethis::use_namespace() will Ensure your code only mutate(Sepal.Area=Sepal.Width*Sepal.Length,
Ignore messages the first one!)
turn this structure into a package! fails where you're Petal.Area=Petal.Width*Petal.Length)
or warnings
expecting it to
CC BY SA Jacob Scott • github.com/wurli • Updated: 2023-11