0% found this document useful (0 votes)
9 views17 pages

DSR - R Programming

Uploaded by

misoxyash2205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

DSR - R Programming

Uploaded by

misoxyash2205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DSR Exam Study Guide - R Programming

SHORT ANSWER QUESTIONS

1. Define R Programming
Answer: R is a free, open-source programming language and statistical computing environment
designed for data analysis, statistical modeling, and data visualization. It was developed by Ross Ihaka
and Robert Gentleman.

2. List out any five features of R


Answer:

Open Source: Free to use and modify


Statistical Analysis: Built-in statistical functions
Data Visualization: Excellent graphical capabilities

Cross-platform: Runs on Windows, Mac, Linux


Extensible: Thousands of packages available

3. What are the applications of R?


Answer:

Statistical analysis and modeling


Data mining and machine learning

Bioinformatics and genetics


Financial analysis

Market research and social sciences

4. What are the different data types in R?


Answer:

Numeric: Decimal numbers (3.14)


Integer: Whole numbers (5L)
Character: Text strings ("Hello")

Logical: TRUE/FALSE values


Complex: Complex numbers (3+2i)

5. Demonstrate the simple 3X3 matrix


Code:
r

# Creating a 3x3 matrix


matrix1 <- matrix(1:9, nrow=3, ncol=3)
print(matrix1)

6. Define order of a Matrix


Answer: The order of a matrix is expressed as m×n, where m is the number of rows and n is the number
of columns. For example, a 3×4 matrix has 3 rows and 4 columns.

7. What are the 7 measures of central tendency?


Answer:

Mean (average)

Median (middle value)

Mode (most frequent value)


Geometric mean

Harmonic mean
Weighted mean

Trimmed mean

8. Define Transpose of a matrix


Answer: Matrix transpose is an operation that flips a matrix over its diagonal, switching rows and
columns. If A is m×n, then A^T is n×m.

Code:

mat <- matrix(1:6, nrow=2, ncol=3)


transpose_mat <- t(mat)
print(transpose_mat)

9. Explain factor variable


Answer: Factor is a data type used to store categorical data (both nominal and ordinal). It stores data as
levels and is memory efficient for repeated categorical values.

Code:

r
colors <- factor(c("red", "blue", "red", "green", "blue"))
print(colors)
levels(colors)

10. List out the characteristics of a data frame


Answer:

Rectangular structure (rows and columns)


Different data types in different columns

Column names are required

Row names are optional

All columns must have same length

11. Define Vector with an example


Answer: Vector is a basic data structure in R that contains elements of the same data type arranged in a
sequence.

Code:

# Numeric vector
num_vec <- c(1, 2, 3, 4, 5)
# Character vector
char_vec <- c("apple", "banana", "cherry")
print(num_vec)

12. What are the different values that can be assigned to a numeric data type in R?
Answer:

Positive numbers (5, 3.14)

Negative numbers (-2, -7.5)

Zero (0)

Infinity (Inf, -Inf)

Not a Number (NaN)

Missing values (NA)

13. Explain RStudio


Answer: RStudio is an Integrated Development Environment (IDE) for R programming. It provides a user-
friendly interface with features like syntax highlighting, code completion, debugging tools, and integrated
graphics.

14. Write an R program to reverse the order of given vector


Code:

# Original vector
vec <- c(1, 2, 3, 4, 5)
print("Original vector:")
print(vec)

# Reversed vector
reversed_vec <- rev(vec)
print("Reversed vector:")
print(reversed_vec)

15. How to create a Matrix in R?


Answer: Use the matrix() function with data, nrow, and ncol parameters.

Code:

# Method 1: Using matrix() function


mat1 <- matrix(1:12, nrow=3, ncol=4)

# Method 2: Using rbind() or cbind()


mat2 <- rbind(c(1,2,3), c(4,5,6))
print(mat1)

16. Define R Array


Answer: Array is a multi-dimensional data structure that can store data in more than two dimensions. It's
an extension of matrices.

Code:

# Creating a 2x3x2 array


arr <- array(1:12, dim=c(2,3,2))
print(arr)
17. Difference between data frame and a matrix in R

Answer:

Data Frame Matrix

Different data types in columns Same data type only

More flexible Less flexible

Uses $ for column access Uses [ ] for access

Can have column names Optional column names


 

18. Explain the use of length() function


Answer: The length() function returns the number of elements in a vector or list.

Code:

vec <- c(1, 2, 3, 4, 5)


len <- length(vec)
print(paste("Length of vector:", len))

19. Define the structure of a data frame using str() function


Answer: The str() function displays the internal structure of a data frame, showing data types,
dimensions, and sample values.

Code:

df <- [Link](name=c("John", "Jane"), age=c(25, 30))


str(df)

20. Explain Argument matching


Answer: R uses three types of argument matching:

Exact matching: Arguments matched by exact name

Partial matching: Arguments matched by partial name

Positional matching: Arguments matched by position


LONG ANSWER QUESTIONS

1. Summarize the advantages and disadvantages of R


Theory: R has become the de facto standard for statistical computing and data analysis in academia and
industry. Its popularity stems from its comprehensive statistical capabilities and active community
support. However, like any programming language, R has both strengths and limitations that users
should understand. The advantages significantly outweigh the disadvantages for most statistical and data
analysis tasks, making R an excellent choice for data scientists and statisticians.

Advantages:

Free and Open Source: No licensing costs

Comprehensive: Vast statistical capabilities


Active Community: Large user base and support

Extensible: Thousands of packages


Platform Independent: Works on multiple OS

Disadvantages:

Memory Management: Stores data in RAM


Learning Curve: Steep for beginners

Speed: Slower than compiled languages


Graphics: Basic plotting can be limited

Documentation: Inconsistent package documentation

2. Write an R program to find maximum and minimum value of a given vector


Theory: Finding maximum and minimum values is a fundamental operation in statistical analysis and
data exploration. The max() and min() functions are built-in R functions that efficiently process vectors
to find extreme values. These functions ignore NA values by default (unless all values are NA), making
them robust for real-world data analysis. The range() function provides both minimum and maximum
values in a single call, which is computationally efficient for large datasets.

Code:

r
# Create a vector
numbers <- c(15, 8, 23, 4, 16, 42, 7)
print("Original vector:")
print(numbers)

# Find maximum value


max_val <- max(numbers)
print(paste("Maximum value:", max_val))

# Find minimum value


min_val <- min(numbers)
print(paste("Minimum value:", min_val))

# Using range() to get both


range_val <- range(numbers)
print("Range (min, max):")
print(range_val)

3. Illustrate R program to create two 2x3 matrices and perform operations


Theory: Matrix operations are fundamental in linear algebra and statistical computing, forming the
backbone of many machine learning algorithms. Element-wise operations (addition, subtraction,
multiplication, division) perform calculations between corresponding elements of matrices with the same
dimensions. These operations are vectorized in R, making them highly efficient for large datasets. Matrix
arithmetic is essential for data transformations, statistical calculations, and scientific computing
applications.

Code:

r
# Create two 2x3 matrices
mat1 <- matrix(c(1,2,3,4,5,6), nrow=2, ncol=3)
mat2 <- matrix(c(7,8,9,10,11,12), nrow=2, ncol=3)

print("Matrix 1:")
print(mat1)
print("Matrix 2:")
print(mat2)

# Addition
add_result <- mat1 + mat2
print("Addition:")
print(add_result)

# Subtraction
sub_result <- mat1 - mat2
print("Subtraction:")
print(sub_result)

# Element-wise multiplication
mult_result <- mat1 * mat2
print("Multiplication:")
print(mult_result)

# Division
div_result <- mat1 / mat2
print("Division:")
print(div_result)

4. Write R program to check if a given number is Even or Odd


Theory: Even-odd checking is a fundamental programming concept that uses the modulo operator (%)
to determine divisibility. This concept is widely used in algorithms, data filtering, and conditional
processing in statistical analysis. The modulo operation returns the remainder after division, making it
perfect for checking divisibility by any number. Understanding this logic is crucial for data manipulation
tasks like creating alternating patterns or grouping data based on numeric properties.

Code:

r
# Function to check even or odd
check_even_odd <- function(n) {
if (n %% 2 == 0) {
return(paste(n, "is Even"))
} else {
return(paste(n, "is Odd"))
}
}

# Test with different numbers


num1 <- 15
num2 <- 24
print(check_even_odd(num1))
print(check_even_odd(num2))

# Using ifelse for multiple numbers


numbers <- c(10, 15, 22, 7)
result <- ifelse(numbers %% 2 == 0, "Even", "Odd")
print([Link](Number = numbers, Type = result))

5. Write an R program to add 3 to each element of the first vector


Theory: Vectorization is one of R's most powerful features, allowing operations to be performed on
entire vectors without explicit loops. This concept makes R code more concise, readable, and
computationally efficient compared to traditional programming approaches. Vector arithmetic operations
are automatically applied element-wise, which is fundamental to R's design philosophy. Understanding
vectorization is crucial for efficient data manipulation and statistical computations in R programming.

Code:

# Create original vector


original_vec <- c(5, 10, 15, 20, 25)
print("Original vector:")
print(original_vec)

# Add 3 to each element


new_vec <- original_vec + 3
print("New vector (after adding 3):")
print(new_vec)

# Display both vectors


print("Comparison:")
print([Link](Original = original_vec, New = new_vec))
6. Explain in detail about data frame with example

Theory: Data frames are the most important data structure in R for real-world data analysis, representing
the equivalent of spreadsheets or database tables. They provide the flexibility to store different types of
data (numeric, character, logical) in different columns while maintaining structural integrity. Data frames
are the standard format for most statistical functions and are essential for data import/export operations.
Understanding data frames is crucial because most real-world datasets are naturally tabular and require
this flexible structure for effective analysis.

Key Features:

Rectangular structure

Mixed data types allowed

Column names mandatory

Functions: [Link]() , str() , summary()

Code:

r
# Create a data frame
students <- [Link](
Name = c("Alice", "Bob", "Charlie", "Diana"),
Age = c(20, 22, 21, 23),
Grade = c("A", "B", "A", "B"),
Passed = c(TRUE, TRUE, TRUE, FALSE),
stringsAsFactors = FALSE
)

print("Student Data Frame:")


print(students)

# Structure of data frame


print("Structure:")
str(students)

# Access specific column


print("Names only:")
print(students$Name)

# Access specific row


print("First student:")
print(students[1, ])

# Summary statistics
print("Summary:")
summary(students)

7. Explain scalar and vector with example


Theory: Understanding the distinction between scalars and vectors is fundamental to R programming
and statistical computing. Scalars represent single values and are the building blocks of more complex
data structures, while vectors represent collections of related data points. In R, even a single number is
technically a vector of length 1, which demonstrates R's vector-oriented design philosophy. This
distinction is important for understanding how R functions operate and how data is stored and
manipulated in memory.

Code:

r
# Scalar examples
scalar_num <- 42
scalar_char <- "Hello"
scalar_logical <- TRUE

print("Scalars:")
print(paste("Number:", scalar_num))
print(paste("Character:", scalar_char))
print(paste("Logical:", scalar_logical))

# Vector examples
num_vector <- c(1, 2, 3, 4, 5)
char_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)

print("Vectors:")
print(paste("Numeric vector:", toString(num_vector)))
print(paste("Character vector:", toString(char_vector)))
print(paste("Logical vector:", toString(logical_vector)))

# Check if scalar or vector


print(paste("scalar_num length:", length(scalar_num)))
print(paste("num_vector length:", length(num_vector)))

8. Write R program to check if vector elements are greater than 10


Theory: Logical operations and conditional checking are essential skills in data analysis for filtering,
subsetting, and data validation. The comparison operators in R return logical vectors, which can be used
for indexing and filtering operations. This approach is fundamental to data cleaning, exploratory data
analysis, and creating conditional summaries. Understanding logical indexing allows for efficient data
manipulation without explicit loops, leveraging R's vectorized operations for better performance.

Code:

r
# Create a vector
numbers <- c(5, 15, 8, 12, 3, 20, 7)
print("Original vector:")
print(numbers)

# Check which elements are greater than 10


result <- numbers > 10
print("Elements > 10 (TRUE/FALSE):")
print(result)

# Get actual values greater than 10


greater_than_10 <- numbers[numbers > 10]
print("Values greater than 10:")
print(greater_than_10)

# Create a data frame for better visualization


comparison <- [Link](
Value = numbers,
Greater_than_10 = result
)
print("Comparison table:")
print(comparison)

9. What is the use of subset() function? With example


Theory: The subset() function is a powerful tool for data filtering and extraction, providing an intuitive
way to select rows and columns based on logical conditions. It's particularly useful for exploratory data
analysis where you need to examine specific portions of your dataset. The function handles missing
values gracefully and provides cleaner syntax compared to bracket notation for complex filtering
operations. Mastering subset operations is essential for data cleaning, analysis, and generating targeted
insights from large datasets.

Code:

r
# Create a data frame
employees <- [Link](
Name = c("John", "Jane", "Mike", "Sara", "Tom"),
Age = c(25, 30, 35, 28, 32),
Salary = c(50000, 60000, 70000, 55000, 65000),
Department = c("IT", "HR", "IT", "Finance", "HR")
)

print("Original data:")
print(employees)

# Subset employees with age > 30


older_employees <- subset(employees, Age > 30)
print("Employees with age > 30:")
print(older_employees)

# Subset IT department employees


it_employees <- subset(employees, Department == "IT")
print("IT Department employees:")
print(it_employees)

# Multiple conditions
high_earners <- subset(employees, Age > 25 & Salary > 55000)
print("Young high earners:")
print(high_earners)

10. Write R code to remove empty rows and columns from a matrix
Theory: Data cleaning is a critical step in any data analysis workflow, and handling missing or empty data
is a common challenge. Removing empty rows and columns helps reduce data size, improve
computational efficiency, and prevent errors in statistical calculations. The apply() function combined
with [Link]() and logical operations provides flexible ways to identify and remove incomplete data.
Understanding these techniques is essential for preprocessing real-world datasets that often contain
missing values or incomplete records.

Code:

r
# Create a matrix with some NA values
mat <- matrix(c(1, 2, NA, 4, 5, 6, NA, NA, 9), nrow=3, ncol=3)
print("Original matrix:")
print(mat)

# Remove rows that are completely NA


mat_no_empty_rows <- mat[!apply([Link](mat), 1, all), ]
print("After removing empty rows:")
print(mat_no_empty_rows)

# Remove columns that are completely NA


mat_clean <- mat_no_empty_rows[, !apply([Link](mat_no_empty_rows), 2, all)]
print("After removing empty columns:")
print(mat_clean)

# Alternative: Remove rows/columns with any NA


mat_complete <- mat[[Link](mat), ]
print("Complete cases only:")
print(mat_complete)

11. Illustrate R code using seq(), paste(), print(), format(), mode(), order()
Theory: These six functions represent core R functionality for data generation, manipulation, and
inspection that every R programmer must master. The seq() function generates sequences for indexing
and creating regular patterns, while paste() handles string operations essential for data labeling and
reporting. The format() function controls data presentation, mode() helps with data type verification, and
order() provides sorting capabilities fundamental to data organization. Together, these functions form
the foundation for most data manipulation tasks in R programming.

Code:

r
# seq() - Generate sequences
seq1 <- seq(1, 10, by=2)
seq2 <- seq(0, 1, [Link]=5)
print("seq() function:")
print(seq1)
print(seq2)

# paste() - Concatenate strings


names <- c("John", "Jane", "Mike")
ages <- c(25, 30, 35)
combined <- paste(names, "is", ages, "years old")
print("paste() function:")
print(combined)

# print() - Display objects


print("print() function:")
print("This is printed using print()")

# format() - Format numbers


numbers <- c(123.456, 78.9, 1234.5678)
formatted <- format(numbers, digits=3, nsmall=2)
print("format() function:")
print(formatted)

# mode() - Check data mode


vec <- c(1, 2, 3, 4, 5)
char_vec <- c("a", "b", "c")
print("mode() function:")
print(paste("Numeric vector mode:", mode(vec)))
print(paste("Character vector mode:", mode(char_vec)))

# order() - Get ordering indices


values <- c(30, 10, 40, 20)
order_indices <- order(values)
ordered_values <- values[order_indices]
print("order() function:")
print(paste("Original:", toString(values)))
print(paste("Order indices:", toString(order_indices)))
print(paste("Ordered values:", toString(ordered_values)))

12. Explain the structure of a data frame using str() function


Theory: The str() function provides a compact display of the structure of any R object, showing data
types, dimensions, and sample values.

Code:
r

# Create a comprehensive data frame


company_data <- [Link](
Employee_ID = 1:5,
Name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
Age = c(25, 30, 35, 28, 32),
Salary = c(50000, 60000, 70000, 55000, 65000),
Department = factor(c("IT", "HR", "IT", "Finance", "HR")),
Full_Time = c(TRUE, TRUE, FALSE, TRUE, TRUE),
Start_Date = [Link](c("2020-01-15", "2019-05-20", "2021-03-10", "2020-08-05", "2018-12-01"))
)

print("Data frame content:")


print(company_data)

print("\nStructure using str():")


str(company_data)

print("\nWhat str() shows us:")


cat("- '[Link]': Object type\n")
cat("- '5 obs. of 7 variables': 5 rows, 7 columns\n")
cat("- Each variable shows: $ variable_name : data_type [1:5] sample_values\n")
cat("- Factor variables show levels\n")
cat("- Date variables show format\n")

EXAM TIPS:
1. Time Management: Spend more time on long answers (they carry more marks)
2. Code Comments: Always add brief comments to your code
3. Output Display: Use print() statements to show results clearly

4. Error Handling: Mention [Link]() , [Link]() when dealing with missing data
5. Function Syntax: Remember to include function syntax when asked

You might also like