0% found this document useful (0 votes)
5 views22 pages

FDA Assignment 6

The assignment involves various data analytics tasks using datasets such as 'mtcars', 'iris', and 'airquality_data'. Students are required to perform operations like sorting, extracting duplicates, handling missing values, and merging dataframes. Additionally, it includes tasks related to vector manipulation and matrix creation.

Uploaded by

venkatkollu678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views22 pages

FDA Assignment 6

The assignment involves various data analytics tasks using datasets such as 'mtcars', 'iris', and 'airquality_data'. Students are required to perform operations like sorting, extracting duplicates, handling missing values, and merging dataframes. Additionally, it includes tasks related to vector manipulation and matrix creation.

Uploaded by

venkatkollu678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Assignment -4

Foundations for Data Analytics


Name: K.Venkat
Reg no: 22MIS7153
Slot: L51+52
1. Use the dataset “mtcars” and perform the following:
1. Print the structure of the dataset

2. Print first 10 observations


3. Print last 15 observations

4. Sort by mpg in increasing order


5. Sort by cyl in decreasing order

6. Sort by mpg and cyl in increasing order


7. Sort by mpg and cyl in decreasing order
8. Sort by mpg (increasing) and cyl (decreasing)

2. Create the vector

Logical vector of duplicates


Logical vector of duplicates (from last)

Difference between duplicated(x) and duplicated(x, fromLast = TRUE)


• duplicated(x) flags duplicates after the first occurrence.
• duplicated(x, fromLast = TRUE) flags duplicates before the last occurrence.
Together, they help identify all duplicates.

Extract duplicate elements

Extract unique elements

Duplicate elements in reverse order

Unique elements in reverse order

Indices of duplicate elements

Indices of unique elements

Count of unique elements

Count of duplicate elements


3. Create the dataframe

Logical vector of duplicates

Extract duplicate rows

Extract unique rows

Indices of duplicate rows

Indices of unique rows


Number of unique rows

Number of duplicate rows

4. Print the dataset iris

1. Print the dataset iris

2. Structure of the dataset

3. Summary of all variables


4. Number of variables (columns)

5. Number of observations (rows)

6. Logical vector of duplicate rows

7. Extract duplicate rows

8. Extract unique rows

9. Indices of duplicate rows

10. Indices of unique rows


11. Number of unique rows

12. Number of duplicate rows

5. Assuming 'airquality_data' is your dataframe

1. Print the dataset

2. Structure of the dataset

3. Summary of all variables


4. Number of variables (columns)

5. Number of observations (rows)

6. Check for missing values

7. Indices of missing values (column-major order)

8. Indices of missing values (row-major order)


9. Row and column indices of missing values

10. Total number of missing values

11. Variables with concentrated missing values

\
12. Omit all rows with missing values
13. Records without missing values using complete.cases()

14. Records without missing values using na.omit()

15. Records without missing values using na.exclude()


16. Records with missing values using complete.cases()

6. Consider a numeric vector x <- c(3,4,5,6,7,8)

Write a command to recode the values less than 6 with zero in the vector x

Write a command to recode the values between 4 and 8 with 100

Write a command to recode the values that are less than 5 or greater than 6 with 50

Write a command to recode the values less than 6 with NA in the vector x

Write a command to recode the values between 4 and 8 with NA

Write a command to recode the values that are less than 5 or greater than 6 with NA

Count number of NA values after each operation

Find mean of x (Hint: exclude NA values)

Find median of x (Hint: exclude NA values)

Write a command to recode the values less than 6 with “NA” (enclose with double
quotes) in the vector x

Write a command to recode the values between 4 and 8 with “NA”

Write a command to recode the values that are less than 5 or greater than 6 with “NA”
Count number of NA values after each operation

Find mean of x (Hint: exclude NA values)

Find median of x (Hint: exclude NA values)

What is the difference between NA and “NA”

7. Consider the given vectors:

A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)

B <- c(3, 2, NA, 5, 3, 7, NA, “NA”, 5, 2, 6)

Find the length of the vector A

Find the length of the vector B

Sort the values in vector A and put it in p (Hint: use function sort())

Find the length of p

Sort the values in vector B and put it in q

Find the length of q

What did you infer from the above results


8. Create the “buildings” and “surveydata” dataframes to merge:

buildings <- data.frame(location=c(1, 2, 3), name=c(“building”, “building2”,


“building3”))

surveydata <- data.frame(survey=c(1,1,1,2,2,2), location=c(1,2,3,2,3,1),


efficiency=c(51,64,70,71,80,58))

The dataframes, buildings and surveydata have a common key variable called,
“location”.

Use the merge() function to merge the two dataframes by “location”, into a new
dataframe “buildingStats”.

9. Give the dataframes different key variable names:

buildings <- data.frame(location=c(1, 2, 3), name=c(“building1”, “building2”,


“building3”))

surveydata <- data.frame(survey=c(1,1,1,2,2,2), LocationID=c(1,2,3,2,3,1),


efficiency=c(51,64,70,71,80,58))
The dataframes, buildings and data now have corresponding variables called location,
and LocationID.

Use the merge() function to merge the columns of the two dataframes by the
corresponding variables.

Perform inner join, outer join, left outer join, right outer join, cross join and write the
outputs in all cases
10. Merge the rows of the following two dataframes:

buildings <- data.frame(location=c(1, 2, 3), name=c(“building1”, “building2”,


“building3”))

buildings2 <- data.frame(location=c(5, 4, 6), name=c(“building5”, “building4”,


“building6”))

Also, specify a new dataframe, “allBuidings”.


12. Read in the cars.txt dataset and call it car1. Make sure you use the “header=F”
option to specify that

there are no column names associated with the dataset. Next, assign “speed” and
“dist” to be the first and

second column names to the car1 dataset. Find the dimension and structure of the
dataset car1.
14. Create a matrix of 4 X 5 containing duplicate elements and print unique elements
from it.

You might also like