Datatable Cheat Sheet R

Data table is an R package that provides a high-performance version of data.frame. It allows for fast aggregation, joining, and updating of large data sets. Key features include: - Subsetting rows using i and calculating columns grouped by by for fast aggregation - Adding or updating columns by reference using := for efficient programming - Using .SD to work with subsets of the data within each group for aggregation operations - Chaining operations together for concise code

Uploaded by

loshude

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views1 page

Datatable Cheat Sheet R

Uploaded by

loshude

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

R For Data Science Cheat Sheet General form: DT[i, j, by] Advanced Data Table Operations

> DT[.N-1]
data.table Take DT, subset rows using i, then calculate j grouped by by > DT[,.N]
Return the penultimate row of the DT
Return the number of rows
> DT[,.(V2,V3)] Return V2 and V3 as a data.table
Learn R for data science Interactively at www.DataCamp.com
Adding/Updating Columns By Reference in j Using := >
>
DT[,list(V2,V3)]
DT[,mean(V3),by=.(V1,V2)]
Return V2 and V3 as a data.table
Return the result of j, grouped by all possible
> DT[,V1:=round(exp(V1),2)] V1 is updated by what is after := V1 V2 V1 combinations of groups specified in by
> DT Return the result by calling DT 1: 1 A 0.4053
2: 1 B 0.4053
V1 V2 V3 V4
data.table 1: 2.72 A -0.1107 1
2: 7.39 B -0.1427 2
3:
4:
1 C 0.4053
2 A -0.6443
5: 2 B -0.6443
data.table is an R package that provides a high-performance 3: 2.72 C -1.8893 3 6: 2 C -0.6443
4: 7.39 A -0.3571 4
version of base Rs data.frame with syntax and feature ... .SD & .SDcols
enhancements for ease of use, convenience and > DT[,c("V1","V2"):=list(round(exp(V1),2), Columns V1 and V2 are updated by
> DT[,print(.SD),by=V2] Look at what .SD contains
LETTERS[4:6])] what is after :=
programming speed. > DT[,':='(V1=round(exp(V1),2), Alternative to the above one. With [], > DT[,.SD[c(1,.N)],by=V2] Select the first and last row grouped by V2
V2=LETTERS[4:6])][] you print the result to the screen > DT[,lapply(.SD,sum),by=V2] Calculate sum of columns in .SD grouped by
Load the package: V1 V2 V3 V4
V2
1: 15.18 D -0.1107 1 > DT[,lapply(.SD,sum),by=V2, Calculate sum of V3 and V4 in .SD grouped by
> library(data.table) .SDcols=c("V3","V4")] V2
2: 1619.71 E -0.1427 2
V2 V3 V4

Creating A data.table
3: 15.18 F -1.8893 3
1: A -0.478 22
4: 1619.71 D -0.3571 4 2: B -0.478 26

> set.seed(45L) Create a data.table > DT[,V1:=NULL] Remove V1 3: C -0.478 30

> DT[,c("V1","V2"):=NULL] Remove columns V1 and V2 > DT[,lapply(.SD,sum),by=V2, Calculate sum of V3 and V4 in .SD grouped by
> DT <- data.table(V1=c(1L,2L), and call it DT .SDcols=paste0("V",3:4)] V2
V2=LETTERS[1:3], > Cols.chosen=c("A","B")
V3=round(rnorm(4),4), > DT[,Cols.Chosen:=NULL] Delete the column with column name
V4=1:12)
> DT[,(Cols.Chosen):=NULL]
Cols.chosen
Delete the columns specified in the Chaining
variable Cols.chosen
Subsetting Rows Using i > DT <- DT[,.(V4.SUM=sum(V4)),
by=V1]
Calculate sum of V4, grouped by V1

> DT[3:5,] Select 3rd to 5th row Indexing And Keys 1:

V1 V4.Sum
1 36
> DT[3:5] Select 3rd to 5th row
2: 2 42
> DT[V2=="A"] Select all rows that have value A in column V2 > setkey(DT,V2) A key is set on V2; output is returned invisibly
> DT["A"] Return all rows where the key column (set to V2) has > DT[V4.Sum>40] Select that group of which the sum is >40
> DT[V2 %in% c("A","C")] Select all rows that have value A or C in column V2
V1 V2 V3 V4 the value A > DT[,.(V4.Sum=sum(V4)), Select that group of which the sum is >40
by=V1][V4.Sum>40] (chaining)
Manipulating on Columns in j
1: 1 A -0.2392 1
2: 2 A -1.6148 4 V1 V4.Sum
3: 1 A 1.0498 7 1: 2 42
> DT[,V2] Return V2 as a vector 4: 2 A 0.3262 10 2: 1 36
[1] A B C A B C ... > DT[c("A","C")] Return all rows where the key column (V2) has value A or C > DT[,.(V4.Sum=sum(V4)), Calculate sum of V4, grouped by V1,
> DT[,.(V2,V3)] Return V2 and V3 as a data.table > DT["A",mult="first"] Return first row of all rows that match value A in key by=V1][order(-V1)] ordered on V1
> DT[,sum(V1)] Return the sum of all elements of V1 in a column V2 V1 V4.Sum
[1] 18 vector > DT["A",mult="last"] Return last row of all rows that match value A in key 1: 2 42
> DT[,.(sum(V1),sd(V3))] Return the sum of all elements of V1 and the column V2
2: 1 36
V1 V2 std. dev. of V3 in a data.table > DT[c("A","D")] Return all rows where key column V2 has value A or D
1: 18 0.4546055
> DT[,.(Aggregate=sum(V1), The same as the above, with new names
V1 V2 V3 V4
1: 1 A -0.2392 1 set()-Family
Sd.V3=sd(V3))] 2: 2 A -1.6148 4
Aggregate Sd.V3 3: 1 A 1.0498 7 set()
1: 18 0.4546055 4: 2 A 0.3262 10
> DT[,.(V1,Sd.V3=sd(V3))] Select column V2 and compute std. dev. of V3, 5: NA D NA NA Syntax: for (i in from:to) set(DT, row, column, new value)
which returns a single value and gets recycled > DT[c("A","D"),nomatch=0] Return all rows where key column V2 has value A or D > rows <- list(3:4,5:6)
V1 V2 V3 V4
> DT[,.(print(V2), Print column V2 and plot V3 > cols <- 1:2
1: 1 A -0.2392 1
plot(V3), > for(i in seq_along(rows)) Sequence along the values of rows, and
2: 2 A -1.6148 4
NULL)] {set(DT, for the values of cols, set the values of
3: 1 A 1.0498 7
4: 2 A 0.3262 10 i=rows[[i]], those elements equal to NA (invisible)
j=cols[i],
Doing j by Group > DT[c("A","C"),sum(V4)] Return total sum of V4, for rows of key column V2 that
have values A or C value=NA)}
> DT[,.(V4.Sum=sum(V4)),by=V1] Calculate sum of V4 for every group in V1
V1 V4.Sum
> DT[c("A","C"),
sum(V4),
Return sum of column V4 for rows of V2 that have value A,
and anohter sum for rows of V2 that have value C setnames()
1: 1 36 by=.EACHI] Syntax: setnames(DT,"old","new")[]
2: 2 42 V2 V1
1: A 22 > setnames(DT,"V2","Rating") Set name of V2 to Rating (invisible)
> DT[,.(V4.Sum=sum(V4)), Calculate sum of V4 for every group in V1 Change 2 column names (invisible)
by=.(V1,V2)] and V2 2: C 30 > setnames(DT,
> DT[,.(V4.Sum=sum(V4)), Calculate sum of V4 for every group in > setkey(DT,V1,V2) Sort by V1 and then by V2 within each group of V1 (invisible) c("V2","V3"),
by=sign(V1-1)] sign(V1-1) > DT[.(2,"C")] Select rows that have value 2 for the first key (V1) and the c("V2.rating","V3.DC"))
value C for the second key (V2)
setnames()
V1 V2 V3 V4
sign V4.Sum
1: 0 36 1: 2 C 0.3262 6
2: 1 42 2: 2 C -1.6148 12
Syntax: setcolorder(DT,"neworder")
> DT[,.(V4.Sum=sum(V4)), The same as the above, with new name > DT[.(2,c("A","C"))] Select rows that have value 2 for the first key (V1) and within
V1 V2 V3 V4 those rows the value A or C for the second key (V2) > setcolorder(DT, Change column ordering to contents
by=.(V1.01=sign(V1-1))] for the variable youre grouping by
> DT[1:5,.(V4.Sum=sum(V4)), Calculate sum of V4 for every group in V1 1: 2 A -1.6148 4 c("V2","V1","V4","V3")) of the specified vector (invisible)
2: 2 A 0.3262 10
by=V1] after subsetting on the first 5 rows
3: 2 C 0.3262 6
> DT[,.N,by=V1] Count number of rows for every group in
4: 2 C -1.6148 12
DataCamp
V1 Learn Python for Data Science Interactively

Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
Financial Time Series Analysis Practice
No ratings yet
Financial Time Series Analysis Practice
11 pages
Random Variable Generation
No ratings yet
Random Variable Generation
5 pages
Python Tricks That You Can't Live Without - Audrey Roy
No ratings yet
Python Tricks That You Can't Live Without - Audrey Roy
45 pages
Python List and Numpy Array Basics
No ratings yet
Python List and Numpy Array Basics
1 page
Quantitative Finance With Python A Deep Dive Into Financial Modelling and Analysis (Python For Finance Book 5) (Van Der Post, Hayden) (Z-Library)
No ratings yet
Quantitative Finance With Python A Deep Dive Into Financial Modelling and Analysis (Python For Finance Book 5) (Van Der Post, Hayden) (Z-Library)
198 pages
Numpy Ref
No ratings yet
Numpy Ref
1,128 pages
Python List Comprehensions - Learn Python List Comprehensions
No ratings yet
Python List Comprehensions - Learn Python List Comprehensions
12 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Data Analytics Training Manual in R
No ratings yet
Data Analytics Training Manual in R
47 pages
R Basics Notes
No ratings yet
R Basics Notes
15 pages
2.1 Importing Python Data
No ratings yet
2.1 Importing Python Data
1 page
Lasso Regularization for Statisticians
No ratings yet
Lasso Regularization for Statisticians
14 pages
R Notes Chapter 1. Data Type and Data Entry
No ratings yet
R Notes Chapter 1. Data Type and Data Entry
54 pages
Data Science 101: R Programming Basics
No ratings yet
Data Science 101: R Programming Basics
603 pages
CRYPTOGRAPHY
No ratings yet
CRYPTOGRAPHY
18 pages
Gakhov Time Series Forecasting With Python
No ratings yet
Gakhov Time Series Forecasting With Python
66 pages
Integers - Distance and Absolute Value
No ratings yet
Integers - Distance and Absolute Value
26 pages
Coordinate Descent and Golden Selection Search
No ratings yet
Coordinate Descent and Golden Selection Search
2 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Saving R Environment to RData
No ratings yet
Saving R Environment to RData
60 pages
Programming & Numerical Analysis
No ratings yet
Programming & Numerical Analysis
71 pages
Multivariate Normal Distribution Basics
No ratings yet
Multivariate Normal Distribution Basics
46 pages
Statistical Modeling for Analysts
No ratings yet
Statistical Modeling for Analysts
22 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
Topic 24 - Hypothesis Tests and Confidence Intervals in Multiple Regression Question
No ratings yet
Topic 24 - Hypothesis Tests and Confidence Intervals in Multiple Regression Question
10 pages
Introductory Time Series Analysis in R
No ratings yet
Introductory Time Series Analysis in R
22 pages
Time Series Analysis with Tableau
No ratings yet
Time Series Analysis with Tableau
28 pages
SQL String Functions Cheat Sheet
No ratings yet
SQL String Functions Cheat Sheet
1 page
Build Your Movie Recommendation System
No ratings yet
Build Your Movie Recommendation System
8 pages
Jeremy Miles, Mark Shevlin Applying Regression and Correlation
No ratings yet
Jeremy Miles, Mark Shevlin Applying Regression and Correlation
230 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
4 pages
R Programming Checklist of Basic Skills With Examples
No ratings yet
R Programming Checklist of Basic Skills With Examples
33 pages
Advanced Excel for Data Analysts
No ratings yet
Advanced Excel for Data Analysts
15 pages
போகர் நிகண்டு அட்டவணை
No ratings yet
போகர் நிகண்டு அட்டவணை
65 pages
Toad Users Guide
No ratings yet
Toad Users Guide
427 pages
Bioinformatics A Practical Handbook of Next Generation Sequencing and Its Applications (Lloyd Low, Martti Tammi)
No ratings yet
Bioinformatics A Practical Handbook of Next Generation Sequencing and Its Applications (Lloyd Low, Martti Tammi)
242 pages
Key Features of NumPy Arrays
No ratings yet
Key Features of NumPy Arrays
15 pages
Introductory Concepts of Probabability & Statistics
No ratings yet
Introductory Concepts of Probabability & Statistics
6 pages
MATLAB & Simulink Engineering Guide
No ratings yet
MATLAB & Simulink Engineering Guide
49 pages
Computer Basics Document
No ratings yet
Computer Basics Document
27 pages
Advanced Multivariate Statistics
No ratings yet
Advanced Multivariate Statistics
18 pages
Top 100 Must Know SQL Queries
No ratings yet
Top 100 Must Know SQL Queries
10 pages
M6 - Basic Statistics
No ratings yet
M6 - Basic Statistics
66 pages
Python Options Lab
No ratings yet
Python Options Lab
455 pages
Secondary Structure Prediction of Tuberculosis Genomes Using Machine Learning Algorithms
No ratings yet
Secondary Structure Prediction of Tuberculosis Genomes Using Machine Learning Algorithms
111 pages
Tensor Analysis in Continuum Mechanics
No ratings yet
Tensor Analysis in Continuum Mechanics
21 pages
Data Science Math Concepts
No ratings yet
Data Science Math Concepts
18 pages
Nadi Astrology
100% (1)
Nadi Astrology
2 pages
Astrology, Politics & Influence
100% (1)
Astrology, Politics & Influence
8 pages
R data.table Cheat Sheet Guide
No ratings yet
R data.table Cheat Sheet Guide
1 page
Datatable
No ratings yet
Datatable
2 pages
Data Table
No ratings yet
Data Table
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
8 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
8 pages
43-Fruits of Worshipping Each Nakshatra
91% (11)
43-Fruits of Worshipping Each Nakshatra
5 pages
V40i03 PDF
No ratings yet
V40i03 PDF
25 pages
R - Time Intervals - Differences
No ratings yet
R - Time Intervals - Differences
3 pages
R - Date Class
No ratings yet
R - Date Class
2 pages
Secrets of Bharani Nakshatra Revealed
No ratings yet
Secrets of Bharani Nakshatra Revealed
4 pages
R: Combine Values with c() Function
No ratings yet
R: Combine Values with c() Function
2 pages
R Environment Management Guide
No ratings yet
R Environment Management Guide
3 pages
R Package Versioning Guide
No ratings yet
R Package Versioning Guide
2 pages
R: Arrange Rows with dplyr
No ratings yet
R: Arrange Rows with dplyr
2 pages
Dplyr Case When in R
No ratings yet
Dplyr Case When in R
2 pages
R mutate and transmute Functions Guide
No ratings yet
R mutate and transmute Functions Guide
2 pages
Programmatic Adtech Insights
No ratings yet
Programmatic Adtech Insights
11 pages
Lab Technical Report Template 2017
No ratings yet
Lab Technical Report Template 2017
4 pages
Uc 3854
No ratings yet
Uc 3854
32 pages
ADG004
No ratings yet
ADG004
18 pages
AI in Video Games: Overview and History
No ratings yet
AI in Video Games: Overview and History
6 pages
Primal-Dual Subgradient Method Guide
No ratings yet
Primal-Dual Subgradient Method Guide
13 pages
SMP Physics Exam Answer Key
No ratings yet
SMP Physics Exam Answer Key
2 pages
Cisco Icons
No ratings yet
Cisco Icons
12 pages
PDS Lab Manual - All
No ratings yet
PDS Lab Manual - All
54 pages
Flat Unit 2 Problems
No ratings yet
Flat Unit 2 Problems
36 pages
Object Oriented Programing (CSE 2124)
No ratings yet
Object Oriented Programing (CSE 2124)
2 pages
Laberport
No ratings yet
Laberport
9 pages
Departmental Tests, May-2012 Session: Andhra Pradesh Public Service Commission: Hyderabad
No ratings yet
Departmental Tests, May-2012 Session: Andhra Pradesh Public Service Commission: Hyderabad
2 pages
Estimation Tableau for Gottlieb Method
No ratings yet
Estimation Tableau for Gottlieb Method
7 pages
How To Connect To The Electronic Premium Remittance System Website
No ratings yet
How To Connect To The Electronic Premium Remittance System Website
18 pages
Microsoft Enterprise Agreement RFP for SADC
No ratings yet
Microsoft Enterprise Agreement RFP for SADC
6 pages
Top 25 Fi: 1. What Are The Options in SAP For Fiscal Years?
No ratings yet
Top 25 Fi: 1. What Are The Options in SAP For Fiscal Years?
5 pages
CV Template INSEAD
No ratings yet
CV Template INSEAD
1 page
Robot Getting Started Guide Eng 2011 Metric 2
100% (5)
Robot Getting Started Guide Eng 2011 Metric 2
188 pages
Peepeepoopoo
No ratings yet
Peepeepoopoo
3 pages
SOP QC 048 07 Appendix C
No ratings yet
SOP QC 048 07 Appendix C
1 page
Routers Ploit
No ratings yet
Routers Ploit
5 pages
Least Cost & NW Corner Methods Seminar
No ratings yet
Least Cost & NW Corner Methods Seminar
46 pages
Leading Principal Minors and Matrix Definiteness
No ratings yet
Leading Principal Minors and Matrix Definiteness
2 pages
Defining Company Code in SAP
100% (1)
Defining Company Code in SAP
5 pages
Practical Last
No ratings yet
Practical Last
2 pages
How To Write Simple Todo CRUD ASP
No ratings yet
How To Write Simple Todo CRUD ASP
28 pages
JPMorgan Candidate Consent
No ratings yet
JPMorgan Candidate Consent
24 pages
Database Fundamentals 98-364 Practice Tests
No ratings yet
Database Fundamentals 98-364 Practice Tests
31 pages