DSR Exam Study Guide - R Programming
SHORT ANSWER QUESTIONS
1. Define R Programming
Answer: R is a free, open-source programming language and statistical computing environment
designed for data analysis, statistical modeling, and data visualization. It was developed by Ross Ihaka
and Robert Gentleman.
2. List out any five features of R
Answer:
Open Source: Free to use and modify
Statistical Analysis: Built-in statistical functions
Data Visualization: Excellent graphical capabilities
Cross-platform: Runs on Windows, Mac, Linux
Extensible: Thousands of packages available
3. What are the applications of R?
Answer:
Statistical analysis and modeling
Data mining and machine learning
Bioinformatics and genetics
Financial analysis
Market research and social sciences
4. What are the different data types in R?
Answer:
Numeric: Decimal numbers (3.14)
Integer: Whole numbers (5L)
Character: Text strings ("Hello")
Logical: TRUE/FALSE values
Complex: Complex numbers (3+2i)
5. Demonstrate the simple 3X3 matrix
Code:
r
# Creating a 3x3 matrix
matrix1 <- matrix(1:9, nrow=3, ncol=3)
print(matrix1)
6. Define order of a Matrix
Answer: The order of a matrix is expressed as m×n, where m is the number of rows and n is the number
of columns. For example, a 3×4 matrix has 3 rows and 4 columns.
7. What are the 7 measures of central tendency?
Answer:
Mean (average)
Median (middle value)
Mode (most frequent value)
Geometric mean
Harmonic mean
Weighted mean
Trimmed mean
8. Define Transpose of a matrix
Answer: Matrix transpose is an operation that flips a matrix over its diagonal, switching rows and
columns. If A is m×n, then A^T is n×m.
Code:
mat <- matrix(1:6, nrow=2, ncol=3)
transpose_mat <- t(mat)
print(transpose_mat)
9. Explain factor variable
Answer: Factor is a data type used to store categorical data (both nominal and ordinal). It stores data as
levels and is memory efficient for repeated categorical values.
Code:
r
colors <- factor(c("red", "blue", "red", "green", "blue"))
print(colors)
levels(colors)
10. List out the characteristics of a data frame
Answer:
Rectangular structure (rows and columns)
Different data types in different columns
Column names are required
Row names are optional
All columns must have same length
11. Define Vector with an example
Answer: Vector is a basic data structure in R that contains elements of the same data type arranged in a
sequence.
Code:
# Numeric vector
num_vec <- c(1, 2, 3, 4, 5)
# Character vector
char_vec <- c("apple", "banana", "cherry")
print(num_vec)
12. What are the different values that can be assigned to a numeric data type in R?
Answer:
Positive numbers (5, 3.14)
Negative numbers (-2, -7.5)
Zero (0)
Infinity (Inf, -Inf)
Not a Number (NaN)
Missing values (NA)
13. Explain RStudio
Answer: RStudio is an Integrated Development Environment (IDE) for R programming. It provides a user-
friendly interface with features like syntax highlighting, code completion, debugging tools, and integrated
graphics.
14. Write an R program to reverse the order of given vector
Code:
# Original vector
vec <- c(1, 2, 3, 4, 5)
print("Original vector:")
print(vec)
# Reversed vector
reversed_vec <- rev(vec)
print("Reversed vector:")
print(reversed_vec)
15. How to create a Matrix in R?
Answer: Use the matrix() function with data, nrow, and ncol parameters.
Code:
# Method 1: Using matrix() function
mat1 <- matrix(1:12, nrow=3, ncol=4)
# Method 2: Using rbind() or cbind()
mat2 <- rbind(c(1,2,3), c(4,5,6))
print(mat1)
16. Define R Array
Answer: Array is a multi-dimensional data structure that can store data in more than two dimensions. It's
an extension of matrices.
Code:
# Creating a 2x3x2 array
arr <- array(1:12, dim=c(2,3,2))
print(arr)
17. Difference between data frame and a matrix in R
Answer:
Data Frame Matrix
Different data types in columns Same data type only
More flexible Less flexible
Uses $ for column access Uses [ ] for access
Can have column names Optional column names
18. Explain the use of length() function
Answer: The length() function returns the number of elements in a vector or list.
Code:
vec <- c(1, 2, 3, 4, 5)
len <- length(vec)
print(paste("Length of vector:", len))
19. Define the structure of a data frame using str() function
Answer: The str() function displays the internal structure of a data frame, showing data types,
dimensions, and sample values.
Code:
df <- [Link](name=c("John", "Jane"), age=c(25, 30))
str(df)
20. Explain Argument matching
Answer: R uses three types of argument matching:
Exact matching: Arguments matched by exact name
Partial matching: Arguments matched by partial name
Positional matching: Arguments matched by position
LONG ANSWER QUESTIONS
1. Summarize the advantages and disadvantages of R
Theory: R has become the de facto standard for statistical computing and data analysis in academia and
industry. Its popularity stems from its comprehensive statistical capabilities and active community
support. However, like any programming language, R has both strengths and limitations that users
should understand. The advantages significantly outweigh the disadvantages for most statistical and data
analysis tasks, making R an excellent choice for data scientists and statisticians.
Advantages:
Free and Open Source: No licensing costs
Comprehensive: Vast statistical capabilities
Active Community: Large user base and support
Extensible: Thousands of packages
Platform Independent: Works on multiple OS
Disadvantages:
Memory Management: Stores data in RAM
Learning Curve: Steep for beginners
Speed: Slower than compiled languages
Graphics: Basic plotting can be limited
Documentation: Inconsistent package documentation
2. Write an R program to find maximum and minimum value of a given vector
Theory: Finding maximum and minimum values is a fundamental operation in statistical analysis and
data exploration. The max() and min() functions are built-in R functions that efficiently process vectors
to find extreme values. These functions ignore NA values by default (unless all values are NA), making
them robust for real-world data analysis. The range() function provides both minimum and maximum
values in a single call, which is computationally efficient for large datasets.
Code:
r
# Create a vector
numbers <- c(15, 8, 23, 4, 16, 42, 7)
print("Original vector:")
print(numbers)
# Find maximum value
max_val <- max(numbers)
print(paste("Maximum value:", max_val))
# Find minimum value
min_val <- min(numbers)
print(paste("Minimum value:", min_val))
# Using range() to get both
range_val <- range(numbers)
print("Range (min, max):")
print(range_val)
3. Illustrate R program to create two 2x3 matrices and perform operations
Theory: Matrix operations are fundamental in linear algebra and statistical computing, forming the
backbone of many machine learning algorithms. Element-wise operations (addition, subtraction,
multiplication, division) perform calculations between corresponding elements of matrices with the same
dimensions. These operations are vectorized in R, making them highly efficient for large datasets. Matrix
arithmetic is essential for data transformations, statistical calculations, and scientific computing
applications.
Code:
r
# Create two 2x3 matrices
mat1 <- matrix(c(1,2,3,4,5,6), nrow=2, ncol=3)
mat2 <- matrix(c(7,8,9,10,11,12), nrow=2, ncol=3)
print("Matrix 1:")
print(mat1)
print("Matrix 2:")
print(mat2)
# Addition
add_result <- mat1 + mat2
print("Addition:")
print(add_result)
# Subtraction
sub_result <- mat1 - mat2
print("Subtraction:")
print(sub_result)
# Element-wise multiplication
mult_result <- mat1 * mat2
print("Multiplication:")
print(mult_result)
# Division
div_result <- mat1 / mat2
print("Division:")
print(div_result)
4. Write R program to check if a given number is Even or Odd
Theory: Even-odd checking is a fundamental programming concept that uses the modulo operator (%)
to determine divisibility. This concept is widely used in algorithms, data filtering, and conditional
processing in statistical analysis. The modulo operation returns the remainder after division, making it
perfect for checking divisibility by any number. Understanding this logic is crucial for data manipulation
tasks like creating alternating patterns or grouping data based on numeric properties.
Code:
r
# Function to check even or odd
check_even_odd <- function(n) {
if (n %% 2 == 0) {
return(paste(n, "is Even"))
} else {
return(paste(n, "is Odd"))
}
}
# Test with different numbers
num1 <- 15
num2 <- 24
print(check_even_odd(num1))
print(check_even_odd(num2))
# Using ifelse for multiple numbers
numbers <- c(10, 15, 22, 7)
result <- ifelse(numbers %% 2 == 0, "Even", "Odd")
print([Link](Number = numbers, Type = result))
5. Write an R program to add 3 to each element of the first vector
Theory: Vectorization is one of R's most powerful features, allowing operations to be performed on
entire vectors without explicit loops. This concept makes R code more concise, readable, and
computationally efficient compared to traditional programming approaches. Vector arithmetic operations
are automatically applied element-wise, which is fundamental to R's design philosophy. Understanding
vectorization is crucial for efficient data manipulation and statistical computations in R programming.
Code:
# Create original vector
original_vec <- c(5, 10, 15, 20, 25)
print("Original vector:")
print(original_vec)
# Add 3 to each element
new_vec <- original_vec + 3
print("New vector (after adding 3):")
print(new_vec)
# Display both vectors
print("Comparison:")
print([Link](Original = original_vec, New = new_vec))
6. Explain in detail about data frame with example
Theory: Data frames are the most important data structure in R for real-world data analysis, representing
the equivalent of spreadsheets or database tables. They provide the flexibility to store different types of
data (numeric, character, logical) in different columns while maintaining structural integrity. Data frames
are the standard format for most statistical functions and are essential for data import/export operations.
Understanding data frames is crucial because most real-world datasets are naturally tabular and require
this flexible structure for effective analysis.
Key Features:
Rectangular structure
Mixed data types allowed
Column names mandatory
Functions: [Link]() , str() , summary()
Code:
r
# Create a data frame
students <- [Link](
Name = c("Alice", "Bob", "Charlie", "Diana"),
Age = c(20, 22, 21, 23),
Grade = c("A", "B", "A", "B"),
Passed = c(TRUE, TRUE, TRUE, FALSE),
stringsAsFactors = FALSE
)
print("Student Data Frame:")
print(students)
# Structure of data frame
print("Structure:")
str(students)
# Access specific column
print("Names only:")
print(students$Name)
# Access specific row
print("First student:")
print(students[1, ])
# Summary statistics
print("Summary:")
summary(students)
7. Explain scalar and vector with example
Theory: Understanding the distinction between scalars and vectors is fundamental to R programming
and statistical computing. Scalars represent single values and are the building blocks of more complex
data structures, while vectors represent collections of related data points. In R, even a single number is
technically a vector of length 1, which demonstrates R's vector-oriented design philosophy. This
distinction is important for understanding how R functions operate and how data is stored and
manipulated in memory.
Code:
r
# Scalar examples
scalar_num <- 42
scalar_char <- "Hello"
scalar_logical <- TRUE
print("Scalars:")
print(paste("Number:", scalar_num))
print(paste("Character:", scalar_char))
print(paste("Logical:", scalar_logical))
# Vector examples
num_vector <- c(1, 2, 3, 4, 5)
char_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)
print("Vectors:")
print(paste("Numeric vector:", toString(num_vector)))
print(paste("Character vector:", toString(char_vector)))
print(paste("Logical vector:", toString(logical_vector)))
# Check if scalar or vector
print(paste("scalar_num length:", length(scalar_num)))
print(paste("num_vector length:", length(num_vector)))
8. Write R program to check if vector elements are greater than 10
Theory: Logical operations and conditional checking are essential skills in data analysis for filtering,
subsetting, and data validation. The comparison operators in R return logical vectors, which can be used
for indexing and filtering operations. This approach is fundamental to data cleaning, exploratory data
analysis, and creating conditional summaries. Understanding logical indexing allows for efficient data
manipulation without explicit loops, leveraging R's vectorized operations for better performance.
Code:
r
# Create a vector
numbers <- c(5, 15, 8, 12, 3, 20, 7)
print("Original vector:")
print(numbers)
# Check which elements are greater than 10
result <- numbers > 10
print("Elements > 10 (TRUE/FALSE):")
print(result)
# Get actual values greater than 10
greater_than_10 <- numbers[numbers > 10]
print("Values greater than 10:")
print(greater_than_10)
# Create a data frame for better visualization
comparison <- [Link](
Value = numbers,
Greater_than_10 = result
)
print("Comparison table:")
print(comparison)
9. What is the use of subset() function? With example
Theory: The subset() function is a powerful tool for data filtering and extraction, providing an intuitive
way to select rows and columns based on logical conditions. It's particularly useful for exploratory data
analysis where you need to examine specific portions of your dataset. The function handles missing
values gracefully and provides cleaner syntax compared to bracket notation for complex filtering
operations. Mastering subset operations is essential for data cleaning, analysis, and generating targeted
insights from large datasets.
Code:
r
# Create a data frame
employees <- [Link](
Name = c("John", "Jane", "Mike", "Sara", "Tom"),
Age = c(25, 30, 35, 28, 32),
Salary = c(50000, 60000, 70000, 55000, 65000),
Department = c("IT", "HR", "IT", "Finance", "HR")
)
print("Original data:")
print(employees)
# Subset employees with age > 30
older_employees <- subset(employees, Age > 30)
print("Employees with age > 30:")
print(older_employees)
# Subset IT department employees
it_employees <- subset(employees, Department == "IT")
print("IT Department employees:")
print(it_employees)
# Multiple conditions
high_earners <- subset(employees, Age > 25 & Salary > 55000)
print("Young high earners:")
print(high_earners)
10. Write R code to remove empty rows and columns from a matrix
Theory: Data cleaning is a critical step in any data analysis workflow, and handling missing or empty data
is a common challenge. Removing empty rows and columns helps reduce data size, improve
computational efficiency, and prevent errors in statistical calculations. The apply() function combined
with [Link]() and logical operations provides flexible ways to identify and remove incomplete data.
Understanding these techniques is essential for preprocessing real-world datasets that often contain
missing values or incomplete records.
Code:
r
# Create a matrix with some NA values
mat <- matrix(c(1, 2, NA, 4, 5, 6, NA, NA, 9), nrow=3, ncol=3)
print("Original matrix:")
print(mat)
# Remove rows that are completely NA
mat_no_empty_rows <- mat[!apply([Link](mat), 1, all), ]
print("After removing empty rows:")
print(mat_no_empty_rows)
# Remove columns that are completely NA
mat_clean <- mat_no_empty_rows[, !apply([Link](mat_no_empty_rows), 2, all)]
print("After removing empty columns:")
print(mat_clean)
# Alternative: Remove rows/columns with any NA
mat_complete <- mat[[Link](mat), ]
print("Complete cases only:")
print(mat_complete)
11. Illustrate R code using seq(), paste(), print(), format(), mode(), order()
Theory: These six functions represent core R functionality for data generation, manipulation, and
inspection that every R programmer must master. The seq() function generates sequences for indexing
and creating regular patterns, while paste() handles string operations essential for data labeling and
reporting. The format() function controls data presentation, mode() helps with data type verification, and
order() provides sorting capabilities fundamental to data organization. Together, these functions form
the foundation for most data manipulation tasks in R programming.
Code:
r
# seq() - Generate sequences
seq1 <- seq(1, 10, by=2)
seq2 <- seq(0, 1, [Link]=5)
print("seq() function:")
print(seq1)
print(seq2)
# paste() - Concatenate strings
names <- c("John", "Jane", "Mike")
ages <- c(25, 30, 35)
combined <- paste(names, "is", ages, "years old")
print("paste() function:")
print(combined)
# print() - Display objects
print("print() function:")
print("This is printed using print()")
# format() - Format numbers
numbers <- c(123.456, 78.9, 1234.5678)
formatted <- format(numbers, digits=3, nsmall=2)
print("format() function:")
print(formatted)
# mode() - Check data mode
vec <- c(1, 2, 3, 4, 5)
char_vec <- c("a", "b", "c")
print("mode() function:")
print(paste("Numeric vector mode:", mode(vec)))
print(paste("Character vector mode:", mode(char_vec)))
# order() - Get ordering indices
values <- c(30, 10, 40, 20)
order_indices <- order(values)
ordered_values <- values[order_indices]
print("order() function:")
print(paste("Original:", toString(values)))
print(paste("Order indices:", toString(order_indices)))
print(paste("Ordered values:", toString(ordered_values)))
12. Explain the structure of a data frame using str() function
Theory: The str() function provides a compact display of the structure of any R object, showing data
types, dimensions, and sample values.
Code:
r
# Create a comprehensive data frame
company_data <- [Link](
Employee_ID = 1:5,
Name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
Age = c(25, 30, 35, 28, 32),
Salary = c(50000, 60000, 70000, 55000, 65000),
Department = factor(c("IT", "HR", "IT", "Finance", "HR")),
Full_Time = c(TRUE, TRUE, FALSE, TRUE, TRUE),
Start_Date = [Link](c("2020-01-15", "2019-05-20", "2021-03-10", "2020-08-05", "2018-12-01"))
)
print("Data frame content:")
print(company_data)
print("\nStructure using str():")
str(company_data)
print("\nWhat str() shows us:")
cat("- '[Link]': Object type\n")
cat("- '5 obs. of 7 variables': 5 rows, 7 columns\n")
cat("- Each variable shows: $ variable_name : data_type [1:5] sample_values\n")
cat("- Factor variables show levels\n")
cat("- Date variables show format\n")
EXAM TIPS:
1. Time Management: Spend more time on long answers (they carry more marks)
2. Code Comments: Always add brief comments to your code
3. Output Display: Use print() statements to show results clearly
4. Error Handling: Mention [Link]() , [Link]() when dealing with missing data
5. Function Syntax: Remember to include function syntax when asked