Assignment -4
Foundations for Data Analytics
Name: K.Venkat
Reg no: 22MIS7153
Slot: L51+52
1. Use the dataset “mtcars” and perform the following:
1. Print the structure of the dataset
2. Print first 10 observations
3. Print last 15 observations
4. Sort by mpg in increasing order
5. Sort by cyl in decreasing order
6. Sort by mpg and cyl in increasing order
7. Sort by mpg and cyl in decreasing order
8. Sort by mpg (increasing) and cyl (decreasing)
2. Create the vector
Logical vector of duplicates
Logical vector of duplicates (from last)
Difference between duplicated(x) and duplicated(x, fromLast = TRUE)
• duplicated(x) flags duplicates after the first occurrence.
• duplicated(x, fromLast = TRUE) flags duplicates before the last occurrence.
Together, they help identify all duplicates.
Extract duplicate elements
Extract unique elements
Duplicate elements in reverse order
Unique elements in reverse order
Indices of duplicate elements
Indices of unique elements
Count of unique elements
Count of duplicate elements
3. Create the dataframe
Logical vector of duplicates
Extract duplicate rows
Extract unique rows
Indices of duplicate rows
Indices of unique rows
Number of unique rows
Number of duplicate rows
4. Print the dataset iris
1. Print the dataset iris
2. Structure of the dataset
3. Summary of all variables
4. Number of variables (columns)
5. Number of observations (rows)
6. Logical vector of duplicate rows
7. Extract duplicate rows
8. Extract unique rows
9. Indices of duplicate rows
10. Indices of unique rows
11. Number of unique rows
12. Number of duplicate rows
5. Assuming 'airquality_data' is your dataframe
1. Print the dataset
2. Structure of the dataset
3. Summary of all variables
4. Number of variables (columns)
5. Number of observations (rows)
6. Check for missing values
7. Indices of missing values (column-major order)
8. Indices of missing values (row-major order)
9. Row and column indices of missing values
10. Total number of missing values
11. Variables with concentrated missing values
\
12. Omit all rows with missing values
13. Records without missing values using complete.cases()
14. Records without missing values using na.omit()
15. Records without missing values using na.exclude()
16. Records with missing values using complete.cases()
6. Consider a numeric vector x <- c(3,4,5,6,7,8)
Write a command to recode the values less than 6 with zero in the vector x
Write a command to recode the values between 4 and 8 with 100
Write a command to recode the values that are less than 5 or greater than 6 with 50
Write a command to recode the values less than 6 with NA in the vector x
Write a command to recode the values between 4 and 8 with NA
Write a command to recode the values that are less than 5 or greater than 6 with NA
Count number of NA values after each operation
Find mean of x (Hint: exclude NA values)
Find median of x (Hint: exclude NA values)
Write a command to recode the values less than 6 with “NA” (enclose with double
quotes) in the vector x
Write a command to recode the values between 4 and 8 with “NA”
Write a command to recode the values that are less than 5 or greater than 6 with “NA”
Count number of NA values after each operation
Find mean of x (Hint: exclude NA values)
Find median of x (Hint: exclude NA values)
What is the difference between NA and “NA”
7. Consider the given vectors:
A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)
B <- c(3, 2, NA, 5, 3, 7, NA, “NA”, 5, 2, 6)
Find the length of the vector A
Find the length of the vector B
Sort the values in vector A and put it in p (Hint: use function sort())
Find the length of p
Sort the values in vector B and put it in q
Find the length of q
What did you infer from the above results
8. Create the “buildings” and “surveydata” dataframes to merge:
buildings <- data.frame(location=c(1, 2, 3), name=c(“building”, “building2”,
“building3”))
surveydata <- data.frame(survey=c(1,1,1,2,2,2), location=c(1,2,3,2,3,1),
efficiency=c(51,64,70,71,80,58))
The dataframes, buildings and surveydata have a common key variable called,
“location”.
Use the merge() function to merge the two dataframes by “location”, into a new
dataframe “buildingStats”.
9. Give the dataframes different key variable names:
buildings <- data.frame(location=c(1, 2, 3), name=c(“building1”, “building2”,
“building3”))
surveydata <- data.frame(survey=c(1,1,1,2,2,2), LocationID=c(1,2,3,2,3,1),
efficiency=c(51,64,70,71,80,58))
The dataframes, buildings and data now have corresponding variables called location,
and LocationID.
Use the merge() function to merge the columns of the two dataframes by the
corresponding variables.
Perform inner join, outer join, left outer join, right outer join, cross join and write the
outputs in all cases
10. Merge the rows of the following two dataframes:
buildings <- data.frame(location=c(1, 2, 3), name=c(“building1”, “building2”,
“building3”))
buildings2 <- data.frame(location=c(5, 4, 6), name=c(“building5”, “building4”,
“building6”))
Also, specify a new dataframe, “allBuidings”.
12. Read in the cars.txt dataset and call it car1. Make sure you use the “header=F”
option to specify that
there are no column names associated with the dataset. Next, assign “speed” and
“dist” to be the first and
second column names to the car1 dataset. Find the dimension and structure of the
dataset car1.
14. Create a matrix of 4 X 5 containing duplicate elements and print unique elements
from it.