-
Notifications
You must be signed in to change notification settings - Fork 11
Expand file tree
/
Copy pathindex.Rmd
More file actions
914 lines (661 loc) · 34.9 KB
/
index.Rmd
File metadata and controls
914 lines (661 loc) · 34.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
---
title: "R course"
author: "Berry Boessenkool, <[email protected]>"
site: bookdown::bookdown_site
documentclass: book
output:
bookdown::gitbook:
number_sections: false
toc_depth: 1
split_by: chapter
config:
toc:
collapse: section
editor_options:
chunk_output_type: console
---
# welcome {#welcome}
**to programming with R!**\
I hope learning to code will change your life like it did mine :).
<img src="https://github.com/brry/course/raw/master/externalfig/Blogo.png" width="100"/>
Since 2012, I teach R courses with great joy, see [brry.github.io](https://brry.github.io).\
This website is here to help you start your coding journey.
My free online courses are fairly suitable to learn programming on your own.
- Short [R intro](https://github.com/brry/hour) (ca 1-5 hours)
- **Fundamentals of Programming** [videos + autograded exercises](https://open.hpi.de/courses/hpi-dh-fprog2025), slides:
- [Infrastructure](https://www.dropbox.com/scl/fi/f3ybm79lfecg51ywoag4l/FP_infrastructure.pdf?rlkey=sdqv7k1rxac1bk45hgxn54r2r&dl=0) (ca 1-3 hours)
- [Python](https://www.dropbox.com/scl/fi/mfsj541kd2tw3pc99uvkq/PyCourse.pdf?rlkey=olwpsady0qqnzvt7v9xyorkve&dl=0) (ca 20-50 hours)
- [R](https://www.dropbox.com/scl/fi/d07pma4zbnlmdto455s0g/Rcourse.pdf?rlkey=1oo6bi6k8u0u3850nki7zzmfj&dl=0) (ca 30-70 hours)
- [German R MOOC](https://open.hpi.de/courses/programmieren-r2022) (ca 20-50 hours)
- [German Python MOOC](https://open.hpi.de/courses/python2024) (ca 10-40 hours)
Feel free to book me as a trainer in addition.
**A few notes on this website:**\
The source code is available at [github.com/brry/course/docs](https://github.com/brry/course/blob/master/docs/index.Rmd).\
In case the table of content on the left is not shown, click the four bars at the top.\
*Pro tip: the arrow left/right keys jump between chapters.*
# install {#install}
First install R itself and then RStudio. Follow the steps below depending on your operating system.\
Jump to section: [Windows](#windows), [Mac](#mac), [Linux](#linux), [Manjaro](#manjaro-linux)
### Windows {#windows}
- install [R](https://cloud.r-project.org/bin/windows/base/release.htm)
- install [RStudio](https://posit.co/download/rstudio-desktop/)
- change the .RData [settings](#settings) to enhance reproducibility
### Mac {#mac}
Either
- install [homebrew](https://brew.sh/)
- in the terminal, run (one after the other)\
`brew install --cask r`\
`brew install --cask rstudio`
- change the .RData [settings](#settings) to enhance reproducibility
or
- install [Xquartz](https://www.xquartz.org/)
- from [cran.r-project.org/bin/macosx](https://cran.r-project.org/bin/macosx), download the latest release, e.g. `R-4.5.2.pkg`
- open the file with 'Installer' and follow the instructions
- install [RStudio](https://posit.co/download/rstudio-desktop/)
- drag the app to the 'Applications' folder
- change the .RData [settings](#settings) to enhance reproducibility
### Linux {#linux}
- open a terminal (CTRL+ALT+T) and paste (CTRL+SHIFT+V) the lines from the [Ubuntu instructions](https://cloud.r-project.org/bin/linux/ubuntu/) one by one, but:
- instead of the last line, use\
`sudo apt install r-base r-base-dev`
- on Linux **Mint**, replace `$(lsb_release -cs)` with e.g. `jammy` or `noble`
- on **Debian**, use this [link](https://cloud.r-project.org/bin/linux/debian/#supported-branches) and
[key](https://cloud.r-project.org/bin/linux/debian/#secure-apt)
- download the [RStudio](https://posit.co/download/rstudio-desktop/) deb file and run:\
`sudo apt install gdebi-core`\
`sudo gdebi rstudio-*.deb`
- change the .RData [settings](#settings) to enhance reproducibility
- if you expect to install many packages with non-R dependencies, check out [r2u](https://eddelbuettel.github.io/r2u/)
### Manjaro Linux {#manjaro-linux}
- enable [AUR](https://www.fosslinux.com/4278/what-is-aur-and-how-to-enable-it-in-manjaro.htm)
- search for `rstudio-desktop` and install it. R will be installed with it. This may take up to an hour.\
*(Instructions by Frank de Boer).*
- change the .RData [settings](#settings) to enhance reproducibility
# settings {#settings}
RStudio settings I strongly suggest for reproducibility:
- [Tools]{style="background-color:aquamarine"} - [Global Options]{style="background-color:aquamarine"} - General
- **OFF**: Restore .Rdata into workspace at startup\
- Save workspace to .RData on exit: **NEVER**\
- [Rather restore the results of long computations [manually](#saveload) if needed.]{style="color:grey"}
Settings I use for compatibility and ease of use:
- [Tools]{style="background-color:aquamarine"} - [Global Options]{style="background-color:aquamarine"} - General - Advanced
- set a project user data directory - *if you work in Dropbox / Google Drive, see [issue](https://github.com/rstudio/rstudio/issues/14778) & [solution](https://github.com/rstudio/rstudio/pull/14875)*
- [Tools]{style="background-color:aquamarine"} - [Global Options]{style="background-color:aquamarine"} - [Code]{style="background-color:palegreen"} - Editing
- **ON**: Use native pipe operator
- **OFF**: Auto-indent code after paste - *if you have a custom indenting scheme*
- **OFF**: Enable Code snippets - *personal preference of mine*
- [Tools]{style="background-color:aquamarine"} - [Global Options]{style="background-color:aquamarine"} - [Code]{style="background-color:palegreen"} - Display
- **ON**: Show margin (Margin column: 80) - *avoid horizontal scrolling!*
- **ON**: Highlight R function calls
- **ON**: Use rainbow parentheses
- [Tools]{style="background-color:aquamarine"} - [Global Options]{style="background-color:aquamarine"} - [Code]{style="background-color:palegreen"} - Saving
- Line ending conversion: **Windows (CR/LF)** - *compatible across OS platforms*
- Default Text Encoding: **UTF-8**
- [Tools]{style="background-color:aquamarine"} - [Global Options]{style="background-color:aquamarine"} - Appearance
- [Editor font size: 10]{style="color:grey"}
- Editor theme: **Cobalt**
- [Tools]{style="background-color:aquamarine"} - [Global Options]{style="background-color:aquamarine"} - Spelling
- **OFF**: Use real time spell-checking
- [Tools]{style="background-color:aquamarine"} - Modify Keyboard Shortcuts
- To set the following and more keyboard shortcuts, see [rskey](https://github.com/brry/rskey#rskey)`::setKeyboardBindings()`
- Remove `CTRL+Y` from the command "paste last yank" (to mean "redo" as in other programs)
- Set Working Directory to Current Document's Directory: `CTRL + H`
# good practice {#good-practice}
Apply best practices when coding!
- Write readable code that is easy to maintain (see below).
- Use [version control](#git) with a single source of truth (SSOT). Do not have duplicate code versions in both .qmd and .R/py scripts.
- For reports / presentations, use [quarto](#quarto) instead of separate files with scripts, Jupyter notebook, images, code outputs, word document.
### good code
- clear
- simple (one job per function)
- documented
- performant
- bug-free
- well tested
The following points are written for R, but apply (in spirit) to Python and other languages as well.
See also the [RStudio best practice cheatsheet](https://rstudio.github.io/cheatsheets/R-best-practice.pdf).
### organisation / workflow
- Use RStudio projects (File -\> New project). They set the working directory and manage settings & opened scripts.
- Never use `setwd()`: others don't have that path and neither do you, after rearranging folders.
- Use relative path names, e.g. `read.table("datafolder/file.txt")` instead of `"C:/Users/berry/Desktop/Project/datafolder/file.txt"`.
- Put `source("functions.R")` in your main (quarto) script (see [project](#project)), or write your own [package](https://github.com/brry/course/blob/master/data/packdev.R).
- Use short informative folder + script names without spaces.
- Reference the source / authors of copied code (including chatbot model info).
### code format
- Follow a style guide consistantly ([example](http://adv-r.had.co.nz/Style.html)).
- Choose short but descriptive object names. `df`, `data`, `X` are not!
- Use expressive verbs for function names. Functions *do* something.
- Functions should call each other, instead of being one big multi-purpose monster.
- Use RStudio script sections `# 1 clean data ----` for an outline (`CTRL`+`SHIFT`+`O`).
- Use line breaks (with indentation) to avoid horizontal scrolling (margin [settings](#settings)).
- In qmd text sections, use line breaks for nicer version control history.
- In qmd documents, use short code chunk names (labels) with no spaces.
### code quality
- Vectorize code whenever possible.
- If not, use `lapply/sapply` instead of `for` loops (lesson 4.3 and 8.3).
- DRY: don't repeat yourself.
- Write [defensive code](https://www.r-bloggers.com/2018/07/the-ten-rules-of-defensive-programming-in-r/) that checks inputs (lesson 8.1).
- Use arrays for all-numeric data (lesson 4.4).
- Do not load \>2 packages from the library, instead use `pack::fun`.
- Install [packages](#packages) conditionally.
- Do not create more objects than needed, clean up with `rm`.
- Make sure your code runs in a clean session:
- `CTRL`+`SHIFT`+`F10` to restart R with a clean workspace (Rdata [settings](#settings))
- `source()` the entire script with `CTRL`+`SHIFT`+`S`.
To practice writing good R code, improve the examples in [elegant code](#elegant-code).
# elegant code {#elegant-code}
#### get the average Sepal.Length per Species
■ Consider a better alternative:
```{r elegant_coding1a, eval=FALSE}
groups <- levels(iris$Species)
averages <- c()
for(g in groups) averages[g] <-
mean(iris$Sepal.Length[iris$Species==g])
rm(g, groups)
averages
```
<details>
<summary>then click here to see the elegant approach:</summary>
```{r elegant_coding1b, eval=FALSE}
tapply(iris$Sepal.Length, iris$Species, mean)
```
</details>
<br>
#### conditionally set invalid records to NA
■ Consider a better alternative:
```{r elegant_coding2a, eval=FALSE}
for(i in c(1:nrow(DF))){
if(DF[i, "column1"] != "valid"){ # Where column1 is not
DF[i, "column2"] <- NA # 'valid', set
} # column2 to NA
}
```
<details>
<summary>then click here to see the elegant approach:</summary>
```{r elegant_coding2b, eval=FALSE}
DF$column2[DF$column1 != "valid"] <- NA
```
</details>
<br>
#### collect results
■ Consider a better alternative:
```{r elegant_coding3a, eval=FALSE}
results <- c()
for(i in 1:10000) {
results <- c(results, some_calculation(i))
}
```
<details>
<summary>then click here to see the elegant approach:</summary>
```{r elegant_coding3b, eval=FALSE}
results <- sapply(1:1e6, some_calculation)
```
</details>
<br>
#### scale numeric columns
■ Consider a better alternative:
```{r elegant_coding4a, eval=FALSE}
for(col in names(DF)) {
if(is.numeric(DF[ ,col])) {
mean_val <- mean(DF[ ,col], na.rm=TRUE)
sd_val <- sd(DF[ ,col], na.rm=TRUE)
for(i in 1:nrow(DF)) {
DF[i, col] <- (DF[i, col] - mean_val) / sd_val
}
}
}
```
<details>
<summary>then click here to see the elegant approach:</summary>
```{r elegant_coding4b, eval=FALSE}
numcols <- sapply(DF, is.numeric)
DF[ ,numcols] <- scale(DF[ ,numcols])
rm(numcols)
```
</details>
<br>
#### read multiple csv files
■ Consider a better alternative:
```{r elegant_coding5a, eval=FALSE}
file1 <- read.csv("data1.csv")
file2 <- read.csv("data2.csv")
file3 <- read.csv("data3.csv")
# ... repeat for 50 files
combined <- rbind(file1, file2, file3) # ... and so on
```
<details>
<summary>then click here to see the elegant approach:</summary>
```{r elegant_coding5b, eval=FALSE}
files <- list.files(pattern="*.csv", full.names=TRUE)
combined <- do.call(rbind, lapply(files, read.csv))
```
</details>
<br>
#### count occurrences
■ Consider a better alternative:
```{r elegant_coding6a, eval=FALSE}
categories <- unique(DF$category)
counts <- numeric(length(categories))
names(counts) <- categories
for(i in 1:nrow(DF)) {
cat <- DF$category[i]
counts[cat] <- counts[cat] + 1
}
```
<details>
<summary>then click here to see the elegant approach:</summary>
```{r elegant_coding6b, eval=FALSE}
table(DF$category)
```
</details>
<br>
#### create age groups
■ Consider a better alternative:
```{r elegant_coding7a, eval=FALSE}
DF$age_group <- NA
for(i in 1:nrow(DF)) {
if(DF$age[i] >= 0 & DF$age[i] < 18) DF$age_group[i] <- "0-17"
else if(DF$age[i] >= 18 & DF$age[i] < 30) DF$age_group[i] <- "18-29"
else if(DF$age[i] >= 30 & DF$age[i] < 50) DF$age_group[i] <- "30-49"
else if(DF$age[i] >= 50 & DF$age[i] < 65) DF$age_group[i] <- "50-64"
else if(DF$age[i] >= 65) DF$age_group[i] <- "65+"
else DF$age_group[i] <- NA
}
```
<details>
<summary>then click here to see the elegant approach:</summary>
```{r elegant_coding7b, eval=FALSE}
DF$age_group <- cut(DF$age,
breaks = c(0, 18, 30, 50, 65, Inf),
labels = c("0-17", "18-29", "30-49", "50-64", "65+"),
right = FALSE)
```
</details>
<br>
More examples are shown in the [fundamentals of programming](#welcome) tutorial slides.
# packages {#packages}
### usage
Whenever possible, use `pack::fun()` instead of `library("pack") ; fun()`. It makes clear from wich package 'fun' is used.\
Otherwise, when multiple packages have `fun`, the one from the lastly loaded package will be used. And that might not be the one you expect.\
When you use multiple functions from a package, the second option is fine of course.
### installation
Installing add-on R packages usually is easy from within R (and works without admin rights):
``` r
install.packages("ggplot2")
```
For potential installation [issues](#issues), see below.
At the top of a script, conditionally install all needed packages.
For a single package, you could use
```{r packinst, eval=FALSE}
# if package cannot be loaded, install it:
if(!requireNamespace("berryFunctions", quietly=TRUE))
install.packages("berryFunctions")
```
To depend on a certain development version, use
```{r packver, eval=FALSE}
if(packageVersion("berryFunctions") < "1.19.3")
{
if(!requireNamespace("remotes", quietly=TRUE)) install.packages("remotes")
remotes::install_github("brry/berryFunctions")
}
```
When using several packages, use
```{r packman, eval=FALSE}
if(!requireNamespace("pacman", quietly=TRUE)) install.packages("pacman")
pacman::p_load("berryFunctions", "rdwd")
```
If you know of a short, elegant way to conditionally install packs without loading them, please let me know. (`pak::pkg_install` re-installs packages and creates local directories, no bueno.)
### issues {#issues}
Here are solutions to some issues I have encountered in the past while installing packages with external (i.e. non-R) dependencies.
#### rJava on Windows
Check if Java is available. There should be no errors when running (in R):
```
install.packages("rJava") ; library(rJava)
```
If necessary, install [Java](http://www.java.com/de/download/manual.jsp) in the same bit-version as R (eg 64bit). The Java binary file must be on the [search path](http://www.java.com/en/download/help/path.xml), which will normally happen automatically.
In case you run into the 32/64 bits error: "JAVA_HOME cannot be determined from the Registry", try installing the package with no multiarchitecture support, e.g.: `remotes::install_github("brry/OSMscale", build_opts="--no-multiarch")`
#### rJava on Linux
Open a terminal (CTRL+ALT+T) and paste (CTRL+SHIFT+V) all lower-cased:
```
sudo apt-get install r-cran-rjava
```
Here's the [list of other supported packages](https://cran.r-project.org/bin/linux/ubuntu/fullREADME.html#supported-packages) using this mechanism.\
You might first have to run something like ([source](https://launchpad.net/~c2d4u.team/+archive/ubuntu/c2d4u4.0+?field.series_filter=focal)):
```
sudo add-apt-repository ppa:c2d4u.team/c2d4u4.0+
```
#### sf
If `install.packages("sf")` on Linux does not work, you can try the following:
```
sudo apt-get install libudunits2-dev
sudo add-apt-repository ppa:ubuntugis/ppa && sudo apt-get update
sudo apt-get install gdal-bin
sudo apt install libgdal-dev libproj-dev
```
See also [thinkr.fr](https://rtask.thinkr.fr/installation-of-r-4-0-on-ubuntu-20-04-lts-and-tips-for-spatial-packages/) on R4 on Ubuntu 20.04
#### gdal
Probably obsolete with the retirement of `rgdal`, but just for reference:
```
sudo apt update
sudo apt install libgdal-dev libproj-dev
```
#### source
If you cannot install a package, you might be able to `source` some functions.\
Download the package zip folder on github (see [git](#git)) and then run:
``` r
Vectorize(source)(dir("unzipped_package_path/R", full=TRUE))
```
This creates all R functions as objects in your globalenv workspace (and overwrites existing objects of the same name!).
# project
**Start a research project under version control**
- install [R, RStudio](#install) and [git](#git)
- create a well-named github repository (<https://github.com/new>), initialize with Readme
- `Code` -\> Copy URL (SSH)
- RStudio -\> File -\> New Project -\> Version Control -\> Git: paste URL, set subdirectory, create project.
- RStudio -\> File -\> New File -\> R script / quarto document
- follow [good practices](#good-practice)
- work, then commit changes and push to github
**Organize a research project**
in the git folder you could have something like:
```
project/
├── raw_data_large/
├── reduce_data_size.R
├── raw_data_small/
│ ├── file1.csv
│ └── file2.csv
├── process_data.R
├── data_full.csv
├── functions.R
├── test_functions.R
└── main_file.qmd
```
- `raw_data_large/` (only locally, i.e. listed in `.gitignore`)
- `reduce_data_size.R`: read big files, select interesting bits, store in `raw_data_small/` with `write.csv`. If you have (many) text entries with commas but no tabstops, use `write.table(..., sep="\t", row.names=FALSE, fileEncoding="UTF-8")` instead.
- `process_data.R` with
```{r, proj_process, eval=FALSE}
data_csv <- lapply(csvfiles, read.csv)
data_full <- Reduce(merge, data_csv)
write.csv(data_full, "data_full.csv")
```
- `functions.R` with
```{r, proj_functions, eval=FALSE}
helper <- function(x) x
analyze <- function(df) sapply(df, helper)
visualize <- function(column) plot(analyze(full_data)[,column])
```
- `test_functions.R` with
```{r, proj_, eval=FALSE}
source("functions.R")
helper(input) == expected
checkmate::assert_number(helper(example))
testthat::expect_equal(analyze(example_df), expected)
res <- analyze(example_df)
if(res != expected) stop("analyze(example_df) should be ", expected, ", not ", res)
```
- `main_file.qmd` with code chunks for
```{r, proj_qmd, eval=FALSE}
full_data <- read.csv("data_full.csv")
source("functions.R")
visualize("columnname")
```
# git {#git}
git is a version control software. With it, you can download repositories, track your changes and develop code collaboratively.
### install git
This can be tedious, but needs to be done only once.
- If not already done before: create a github account at <https://github.com/signup>. Use a short username!
- Download and install git, see <https://git-scm.com/downloads>
- Restart RStudio (if open)
- Connect git to RStudio: (instructions [with screenshots](https://www.r-bloggers.com/rstudio-and-github))
- RStudio -\> Tools -\> Global Options -\> Git/SVN: Ensure the path to the Git executable is correct.
- `Create SSH Key`, close window
- `View public key`: copy the displayed public key
- go to <https://github.com/settings/keys> and click `New SSH key`, paste the public key
- in the RStudio bottom Tab `Terminal`: type the following:
```
git version # just to see git works fine
git config --global user.email "[email protected]"
git config --global user.name "YourUserNameHere"
```
### use git
To clone a git repo, RStudio handles most the work for you.
-
1. Go to a repo (for example [github.com/brry/fpsetup](https://github.com/brry/fpsetup)) and click on [Code]{style="background-color:lime"} - Copy URL

-
2. At RStudio - File - New Project - Version Control - Git,\
paste the repository URL, set the subdirectory and create project.\
*I recommend to keep the Project directory name so local and github folder names match exactly.*

-
3. From now on, get the latest version with a single click on `Pull`:
<img src="https://github.com/brry/course/raw/master/externalfig/git_clone_6.PNG" width="400"/>
### git resources
Happy Git with R: <https://happygitwithr.com>\
Excellent tutorial on git in general (mostly without RStudio): <http://kbroman.org/github_tutorial>\
HPI course: <https://open.hpi.de/courses/git2020>\
Contribute to OS software: <https://egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github>
# quarto {#quarto}
quarto documents allow you to mix code and markdown text to generate hassle-free and reproducible reports / presentations / websites. It's a full publishing ecosystem, see [quarto.org](https://quarto.org/). :)
In **RStudio**, click File -\> New File -\> Quarto Document.\
Click "Create" for a template, or "Create Empty Document" on the left for just the file.
In **VScode** under [extensions](https://marketplace.visualstudio.com/items?itemName=quarto.quarto), search for quarto and install quarto-vscode.\
Create a qmd file and for a template, copy content e.g. from the [quarto guide](https://quarto.org/docs/get-started/hello/vscode.html#render-and-preview).

# resources
Reference cards
- [RefCard](https://github.com/jonasstein/R-Reference-Card/raw/master/R-refcard.pdf) by Tom Short & Jonas Stein
- [base](https://rstudio.github.io/cheatsheets/base-r.pdf) and [advanced](https://rklopotek.blog.uksw.edu.pl/files/2017/09/advancedR.pdf) cheatsheets from [Posit](https://posit.co/resources/cheatsheets/)
Books
- Grolemund & Wickham (2017): [R for Data Science](https://r4ds.hadley.nz/)
- J. Adler (2010): R in a Nutshell
- U. Ligges (2008): Programmieren mit R (German)
- M. Crawley (2007): The R-book
- H. Wickham (2014): [Advanced R](https://adv-r.hadley.nz/)
- H. Wickham (2015): [R Packages](https://r-pkgs.org/)
- domain specific: [Chapman and Hall R Series](https://www.routledge.com/Chapman--HallCRC-The-R-Series/book-series/crctherser)
- Many more listed at [github.com/RomanTsegelskyi/rbooks](https://github.com/RomanTsegelskyi/rbooks)
- Review list at [ecotope.org](https://web.archive.org/web/20130619094650/http://ecotope.org/blogs/page/R-Book-Review.aspx) or [r4stats.com](http://r4stats.com/articles/book-reviews)
The internet
- [R-weekly](https://rweekly.org/) - weekly newsletter about all things R
- [Rbloggers](https://www.r-bloggers.com/) - blog aggregator about R
- [StackOverflow](https://stackoverflow.com/questions/tagged/r) - programming questions (main resource)
- [CrossValidated](https://stats.stackexchange.com) - statistical questions
- [rseek.org](https://rseek.org) - R focused internet search
- [R-Manuals](https://cran.r-project.org/manuals.html) - official introduction to the language
- [Mailing lists](https://www.r-project.org/mail.html)
- [stat545](https://stat545.com) - excellent online tutorial
- [Shiny](https://shiny.posit.co/m) - web application framework for R
- [quarto markdown](https://quarto.org/) - document generation and publishing framework
- [Github guides](https://docs.github.com/en) - Introduction to github
# AI
pro AI in coding
- quickly solves a problem
- works in unfamiliar coding areas, concepts, languages
- suggests unknown functions and approaches
contra AI in coding
- AI generated solutions often do not represent [best practices](#good-practice)
- false sense of competence may become problematic in debugging complex issues
- critical thinking is (and remains) important
- structured thinking is helpful
- actually learning to code yourself is rewarding and conductive to these goals
- make mistakes, debug, struggle, research, think, build problem solving skills
- develop good coding habits from the beginning
detailed problems
- Coders with AI assistance write more [security leaks](https://www.louisbouchard.ai/genai-coding-risks) (with less awareness of them)
- AI assistants actually [slow down software development](https://open.substack.com/pub/mikelovesrobots/p/wheres-the-shovelware-why-ai-coding) ([reproducible](https://arxiv.org/abs/2507.09089
))
- AI-generated code contains [more bugs and errors](https://www.techradar.com/pro/security/ai-generated-code-contains-more-bugs-and-errors-than-human-output) than human output
- AI coding tools chase phantom bugs and [destroy real production databases](https://arstechnica.com/information-technology/2025/07/ai-coding-assistants-chase-phantoms-destroy-real-user-data/)
- Vibe coding creates [unmaintainable code](https://open.substack.com/pub/addyo/p/vibe-coding-is-not-the-same-as-ai) without engineering discipline
- AI coding collapses under real-world pressure, requires [expertise for final 30%](https://open.substack.com/pub/addyo/p/the-70-problem-hard-truths-about)
- Coding agents create [cybersecurity vulnerabilities](https://open.substack.com/pub/garymarcus/p/llms-coding-agents-security-nightmare) in numerous ways
- 95% of companies see [no return on AI investment](https://www.reddit.com/r/cscareerquestions/s/JHrH5pGxDU)
(reasons [why](https://open.substack.com/pub/garymarcus/p/why-is-the-roi-on-generative-ai-so))
- With increasing complexity, it gets [harder to check](https://open.substack.com/pub/oneusefulthing/p/on-working-with-wizards) if AI output is correct
- AI usage [degrades our ability to catch bugs](https://codebytom.blog/2025/07/the-hidden-cost-of-ai-reliance), creates a reliability problem
- AI output [lowers skepticism](https://open.substack.com/pub/addyo/p/treat-ai-generated-code-as-a-draft) during reviews, stunts junior devs knowledge gain
- Gen-AI-assisted changes cause outages, [decrease maintainability](https://garymarcus.substack.com/p/a-spate-of-outages-including-incidents)
- AI-written code causes [security breaches](https://www.techradar.com/pro/security/one-in-five-security-breaches-now-thought-to-be-caused-by-ai-written-code), hard to track down vulnerabilites
- LLMs are [bad at conceptual programming](https://open.substack.com/pub/clauswilke/p/llms-excel-at-programminghow-can)
- Medical X-ray [benchmark scores are worthless](https://arxiv.org/abs/2603.21687) (top rank without using an image)
on learning
- Outsourcing work to AI hurts long-term growth of [learning skills](https://open.substack.com/pub/fitzyhistory/p/students-arent-obsessing-about-ai)
- AI stunts intellectual [development as engineer](https://open.substack.com/pub/theargument/p/chatgpt-and-the-end-of-learning) + reasoning about complex concepts
- Chatbots threaten [critical thinking](https://www.forkingpaths.co/p/the-death-of-the-student-essayand) and cognitive development skills
- Students using AI lose [critical skills](https://mileswilliams.substack.com/p/ai-data-and-international-relations) and become dependent on unreliable tools
- LLM-assisted writing [lowers cognitive activity](https://arxiv.org/abs/2506.08872), engagement, memory recall, ownership
# saveload
Store the results of long-running computations on disc.\
The next time a script is run, they are loaded quickly.
```{r saveload, eval=FALSE}
if( file.exists("objects.Rdata") )
{
load("objects.Rdata") # load previously saved objects
} else
{
obj1 <- mean(rnorm(2e7)) # in the first run,
obj2 <- median(rnorm(2e7)) # compute the objects
save(obj1, obj2, file="objects.Rdata") # and write them to disc
}
```
If you need to rerun an analysis if the last run is older then 6 hours, this could be the condition:
```{r difftime, eval=FALSE}
difftime(Sys.time(), file.mtime("objects.Rdata"), units="h") > 6
```
For a single object, a good alternative to `save` and `load` is:
```{r saverds, eval=FALSE}
saveRDS(one_single_object, "object.Rdata")
explicit_object_name <- readRDS("object.Rdata")
```
More on this topic from [Rcrastinate](https://www.r-bloggers.com/2019/05/how-to-save-and-load-datasets-in-r-an-overview/)
# sumatra {#sumatra}
I like to use sumatra PDF viewer as the default viewer.\
It doesn't lock files from editing, hence currently opened files can be changed (e.g. by R).
It comes in `RStudio/resources/app/bin/sumatra` and I like to change some [settings](https://www.sumatrapdfreader.org/settings.html).
### with R
```{r sumatrainit, eval=FALSE}
# install.packages("berryFunctions")
# remotes::install_github("brry/berryFunctions") # Version 1.22.1 (2023-11-17)
berryFunctions::sumatraInitialize()
```
### manually
In `C:/Program Files/`, set write permissions for the `RStudio` folder (or at least the sumatra folder, see next step) with rightclick - properties - safety - edit.
Open `C:/Program Files/RStudio/resources/app/bin/sumatra/sumatrapdfrestrict.ini` and set
- `SavePreferences = 1`\
- `FullscreenAccess = 1`
Open and close a pdf, so that `C:/Users/berry/AppData/Roaming/SumatraPDF/SumatraPDF-settings.txt` will be created.\
There, change the following entries:
- `DefaultZoom = fit page` (probably already the default)
- `ShowToc = 0`
- `DefaultDisplayMode = single page`
# PATH
For VScode, R needs to be on the PATH (locations where executables are found).
You can check this in the **Terminal**, also known as console, shell, bash, cmd.
For most other things (like `git add -A` or `git clean -fd`), you can use the terminal built into your IDE (RStudio, VScode).
For this particular check, I suggest using a clean OS-provided terminal:
### Mac OS
- search (`CMD` + `SPACE`) for "Terminal"
- run the command `R`
- if it succeeds, it shows the R version and other info
- quit with `q("no")`
### Windows
- search (`Windows key`) for "Terminal"
- run the command `R.exe` (if not using the Windows Powershell, just `R`)
- if it succeeds, it shows the R version and other info
- quit with `q("no")`
- if R is not found / recognized, add it to the _system_ (not user) PATH:
- copy the path where you installed R - or - in RStudio -> Tools -> Global Options,
copy the path (e.g. `C:\Program Files\R\R-4.5.1)`
- search (`Windows key`) for "env", click "Edit the system environment variables",
then "Environment Variables" ([guide with images](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/))
- under "**System** Variables" (in the bottom half) double click on "Path"
- click "New" and copy-paste your installation location (if from RStudio, add `\bin` at the end)
- close the variable windows + the terminal, open a new one and try `R`/`R.exe` again.
Potentially restart Windows inbetween.
# VS Code
Integrate RStudio-like features to Visual Studio Code. This includes syntax highlighting, script execution, accessing R help, and visual representation of plots.
<img src="https://github.com/brry/course/raw/master/externalfig/vscode.png" width="600"/>
*Guide provided by Laura Spies*
### Step 1: installation
1. install Visual Studio Code from [code.visualstudio.com](https://code.visualstudio.com)
2. install R with the section [install](#install)
3. install Python from [python.org/downloads](https://www.python.org/downloads/) (hints for [Windows users](https://docs.python.org/using/windows.html)), preferably using the Anaconda distribution
4. optional: Set up a dedicated Conda environment for R by running the following command (in the terminal)\
*This conda environment has been designated using the name of my folder, `ProFun`.* *You can change the name or use an existing one if you would like.*
``` bash
conda create -n ProFun r-base=4.4 r-essentials radian -c conda-forge
```
5. optional: connect R to Jupyter (in the R console):
``` r
install.packages("IRkernel")
IRkernel::installspec(name = 'ir', displayname = 'R (ProFun)')
install.packages(c("repr", "IRdisplay", "IRkernel"))
```
6. install the necessary R packages (in the R console):
``` r
install.packages(c("languageserver", "httpgd", "jsonlite", "rmarkdown"))
```
7. install Radian, an improved R console REPL interface (in the terminal):
``` bash
pip install -U radian
# alternatively, if employing Conda:
conda activate ProFun
conda install -U radian
```
8. check if the following initiates the R console within the terminal:
``` bash
radian
```
9. download the following extensions for VSCode:
- Python
- Jupyter
- R Debugger
- R Tools
- R Extension Pack
### Step 2: configuration
To access the `settings.json` file, press `Ctrl + Shift + P`, then type "Preferences: Open Settings (JSON)." Utilize the following JSON configuration:
``` json
{
"editor.fontSize": 15,
// R Settings
"r.rterm.windows": "C:\\Users\\YOUR_USERNAME\\miniconda3\\envs\\ProFun\\Scripts\\radian.exe",
"r.bracketedPaste": true,
"r.lsp.path": "C:\\Program Files\\R\\R-4.4.1\\bin\\x64\\R.exe",
// Terminal Profiles for R and Python
"terminal.integrated.profiles.windows": {
"ProFun (Radian)": {
"path": "C:\\Windows\\System32\\cmd.exe",
"args": ["/K", "conda activate ProFun && radian"]
},
"Python (Base)": {
"path": "C:\\Windows\\System32\\cmd.exe",
"args": ["/K", "conda activate base"]
}
},
// Set Default Terminal Profile to Radian
"terminal.integrated.defaultProfile.windows": "ProFun (Radian)",
// Keybindings for R-Specific Behavior
"r.alwaysUseActiveTerminal": true,
"r.plot.useHttpgd": true
}
```
Search for your `keybindings.json` and set keyboard shortcuts as desired, e.g.:
``` json
// R-Specific Keybindings
{ "key": "ctrl+shift+f10", "command": "r.restartSession" },
{ "key": "ctrl+shift+enter", "command": "r.runSource" },
{ "key": "ctrl+alt+i", "command": "r.createRmdChunk" },
{ "key": "ctrl+alt+h", "command": "r.help" },
{ "key": "ctrl+alt+o", "command": "r.viewObject" },
{ "key": "ctrl+alt+p", "command": "r.showPlotHistory" },
{ "key": "ctrl+alt+k", "command": "r.knit" },
{ "key": "ctrl+alt+f", "command": "r.changeWorkingDirectory" },
{ "key": "ctrl+alt+enter", "command": "r.executeCode" },
{ "key": "ctrl+enter", "command": "r.runSelectionAndMoveCursor", "when": "editorTextFocus && editorLangId == 'r'" },
{ "key": "ctrl+shift+m", "command": "type", "args": { "text": " %>% " }, "when": "editorTextFocus && editorLangId == 'r'" },
{ "key": "alt+-", "command": "type", "args": { "text": " <- " }, "when": "editorTextFocus && editorLangId == 'r'" },
{ "key": "ctrl+shift+l", "command": "workbench.action.terminal.clear", "when": "terminalFocus" },
{ "key": "ctrl+alt+w", "command": "r.openWebHelp", "when": "editorTextFocus && editorLangId == 'r'" },
{ "key": "f1", "command": "r.help", "when": "editorTextFocus && editorLangId == 'r'" },
```