Data science for Engineers
Data frames
Dataframes-1 NPTEL NOC18-CS28 1
Data science for Engineers
In this lecture
Dataframe
Create
Access rows and columns
Edit
Add new rows and columns
Dataframes-1 NPTEL NOC18-CS28 2
Data science for Engineers
Dataframes: Create dataframe
Data frames are generic data objects of R, used to store
tabular data
Code Console Output
# Introduction to data frames
vec1 = c(1,2,3)
vec2 = c("R","Scilab","Java")
vec3 = c("For prototyping",
"For prototyping","For Scaleup")
df = data.frame(vec1,vec2,vec3)
print(df)
Dataframes-1 NPTEL NOC18-CS28 3
Data science for Engineers
Create a dataframe using data from a file
• A dataframe can also be created by reading data from a
file using the following command
newDF = read.table(path=“Path of the file”)
• In the path, please use ‘/’ instead of ‘\’
Example: “C:/Users/hii/Documents/R/R-Workspace/”
• A separator can also be used to distinguish between
entries. Default separator is space, ‘ ’
newDF = read.table(file=“path of the file”, sep)
Dataframes-1 NPTEL NOC18-CS28 4
Data science for Engineers
Accessing rows and columns
• df[val1,val2] refers to row “val1”, column “val2”. Can be number or
string
• “val1” or “val2” can also be array of values like “1:2” or “c(1,3)”
• df[val2] (no commas) - just refers to column “val2” only
Code Console Output
# accessing first & second row:
print(df[1:2,])
# accessing first & second column:
print(df[,1:2])
# accessing 1st & 2nd column –
# alternate:
print(df[1:2])
Dataframes-1 NPTEL NOC18-CS28 5
Data science for Engineers
Subset
subset() which extracts subset of data based on conditions
Code Console Output
# Data frame example 2
pd=data.frame("Name"=c("Senthil","
Senthil","Sam","Sam"),
"Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))
pd2 = subset(pd,Name=="Senthil" |
BS> 150 )
print("new subset pd2")
print(pd2)
Dataframes-1 NPTEL NOC18-CS28 6
Data science for Engineers
Editing dataframes
Dataframes can be edited by direct assignment
Code
Console Output
# Introduction to dataframes
vec1 = c(1,2,3)
vec2 = c("R","Scilab","Java")
vec3 = c("For prototyping", "For
prototyping","For Scaleup")
df = data.frame(vec1,vec2,vec3)
print(df)
df[[2]][2] = “R”
Dataframes-1 NPTEL NOC18-CS28 7
Data science for Engineers
Editing dataframes
• A dataframe can also be edited using the edit() command
• Create an instance of data frame and use edit command to open a
table editor, changes can be manually made
Code
# Editing a data frame
myTable = data.frame()
myTable = edit(myTable)
Dataframes-1 NPTEL NOC18-CS28 8
Data science for Engineers
Adding extra rows and columns
Extra row can be added with “rbind” function and extra column with “cbind”
Code Console Output
# continuing from previous example
# adding extra row and column:
df = rbind(df,data.frame(vec1=4,
vec2="C", vec3="For Scaleup"))
print("adding extra row")
print(df)
df = cbind(df,vec4=c(10,20,30,40))
print("adding extra col")
print(df)
Dataframes-1 NPTEL NOC18-CS28 9
Data science for Engineers
Deleting rows and columns
There are several ways to delete a row/column, some cases are
shown below
Code
# continuing from previous example A ‘-’ sign before value and before ‘,’
for rows & after ‘,’ for columns
# Deleting rows and columns:
‘!’ means no to those rows /columns
df2 = df[-3,-1] which satisfy the condition
print(df2)
# conditional deletion:
df3 = df[,!names(df) %in% c(“vec3”)]
print(df3)
df4 = df[!df$vec1==3,]
print(df4)
Dataframes-1 NPTEL NOC18-CS28 10
Data science for Engineers
Manipulating rows – the factor issue
When character columns are created in a data.frame, they become factors
Factor variables are those where the character column is split into
categories or factor levels
Code Console Output
# Manipulating rows in data frame
# continued from previous page
df[3,1]= 3.1
df[3,3]= "Others"
print(df)
Notice the NA values displayed instead of the string “Others”.
Also see the use of the word “factor” in the warning above
Dataframes-1 NPTEL NOC18-CS28 11
Data science for Engineers
Resolving factor issue
New entries need to be consistent with factor levels which are fixed
when the dataframe is first created
Code Console Output
vec1 = c(1,2,3)
vec2 = c("R","Scilab","Java")
vec3 = c("For prototyping",
"For prototyping","For Scaleup")
df = data.frame(vec1,vec2,vec3,
stringsAsFactors = F)
# Now trying the same manipulation
df[3,3]= "Others“
print(df)
Dataframes-1 NPTEL NOC18-CS28 12