WEEK1 :Download and install R-Programming environment and
install basic packages using install.
packages () command in R
Installing R on Windows OS To install R on Windows OS:
• Go to the CRAN website.
• Click on "Download R for Windows".
• Click on "install R for the first time" link to download the R executable (.exe) file.
• Run the R executable file to start installation, and allow the app to make changes to
your device.
• Select the installation language.
• Follow the installation instructions.
• Click on "Finish" to exit the installation setup.
R has now been successfully installed on your Windows OS. Open the R GUI to start
writing R codes.
Installing R Packages from the CRAN Repository:
The Comprehensive R Archive Network (CRAN) repository stores thousands of stable
R packages designed for a variety of data-related tasks. Most often, you'll use this
repository to install various R packages.
To install an R package from CRAN, we can use the install.packages() function:
install.packages('readr')
Here, we've installed the readr R package used for reading data from the files of
different types: comma-separated values (CSV), tab-separated values (TSV), fixed-width
files, etc. Make sure that the name of the package is in quotation marks. We can use the
same function to install several R packages at once. In this case, we need to apply first
the c() function to create a character vector containing all the desired packages as its
items:
install.packages(c('readr', 'ggplot2', 'tidyr'))
Above, we've installed three R packages: the already-familiar readr, ggplot2 (for data
visualization), and tidyr (for data cleaning).
2.Learn all the basics of R-Programming (Data types, Variables,
Operators etc,.)
Solution :
Datatypes in R
In general, data types specify what type of data will be stored in variables. In other
words, the variables can hold values of different data types.
In R, there is no need to specify the type of variable because the variable automatically
changes its data type based on the assigned value.
R provides the class() function, which enables us to check the data type of a variable.
R has several basic data types, which include:
numeric
integer
complex
character (a.k.a. string)
logical (a.k.a. boolean)
Raw Data Type
👉 Numeric Data Type :
The numeric data type in R is used to represent all real numbers, whether they have
decimal points or not. Examples include: 12, 15.6, 456, -78, -56.3.
Example:
Filename: numeric_d.R
# without decimals
age <- 23
print(age)
# with decimals
weight <- 48.5
print(weight)
# print data type of variables
print(class(age))
print(class(weight))
Output:
[1] 23
[1] 48.5
[1] "numeric"
[1] "numeric"
Integer Data Type
The integer data type is used to represent real values without decimal points. We use
the suffix L to specify integer type. Examples: 45L, 123L, 78L, -45L.
Example
Filename: integer_d.R
# without decimals
age <- 23L
print(age)
# print data type of variables
print(class(age))
Output
[1] 23
[1] "integer"
👉 Complex Data Type
The complex data type is used to specify imaginary values in R. We use the suffix i to
represent the imaginary part. Examples: 3 + 2i, -2 + 5i.
Example
Filename: complex_d.R
val <- 5 + 6i
print(val)
# print data type of variables
print(class(val))
Output
[1] 5+6i
[1] "complex"
👉 Character Data Type
The character data type is used to represent character or string values in a variable. In
programming, a string is a set of characters. For example, 'A' is a character,
and "Apple" is a string.
Use single quotes ('') for character values
Use double quotes ("") for string values
Example
Filename: character_d.R
# create a string variable
fname <- "Apple"
print(class(fname))
# create a character variable
ch <- 'A'
print(class(ch))
Output
[1] "character"
[1] "character"
👉 Logical Data Type
The logical data type in R is also known as boolean data type. It can only have two
values: TRUE and FALSE.
Example
Filename: logical_d.R
b_val1 <- TRUE
print(b_val1)
print(class(b_val1))
b_val2 <- FALSE
print(b_val2)
print(class(b_val2))
Output
[1] TRUE
[1] "logical"
[1] FALSE
[1] "logical"
6. Raw Data Type :
A raw data type specifies values as raw bytes. You can use the following methods to
convert character data types to a raw data type and vice-versa:
charToRaw() - converts character data to raw data
rawToChar() - converts raw data to character data
For example,
# convert character to raw
raw_variable <- charToRaw("Welcome ")
print(raw_variable)
print(class(raw_variable))
# convert raw to character
char_variable <- rawToChar(raw_variable)
print(char_variable)
print(class(char_variable))
output:
[1] 57 65 6c 63 6f 6d 65 20 74 6f 20 50 72 6f 67 72 61 6d 69 7a
[1] "raw"
[1] "Welcome "
[1] "character"
We have first used the charToRaw() function to convert the string "Welcome to
Programming" to raw bytes.
This is why we get "raw" as output
when we print the class of raw_variable.
Then, we have used the rawToChar() function to convert the data in raw_variable back
to character form.
Basic Programs
How to the user input in ‘R’
There are two methods in R.
Using readline() method : In R language readline() method takes input in string format.
Example: input 255, then it will input as “255”, like a string.
To convert the inputted value to the desired data type, there are some functions in R,
as.integer(n); —> convert to integer
as.numeric(n); —> convert to numeric type (float, double etc)
as.complex(n); —> convert to complex number (i.e 3+2i)
as.Date(n) —> convert to date …, etc
Syntax:
var = readline();
var = as.integer(var);
Note that one can use “<-“ instead of “=”
Using scan() method: To read data directly from the R console
# Read numeric values from the console
my_numbers <- scan()
# Enter values like: 10 20 30
# Then press Enter twice
print(my_numbers)
Output:
10 20 30
Variables in R
A variable is a named memory location where we can keep values for a specific program.
In simpler terms, a variable is a name that points to a memory location.
A variable is also called an identifier and is used to store a value.
👉 Creating Variables in R
In R, you do not need to declare a variable explicitly. When a value is assigned to a
variable, it is automatically declared. To assign a value to a variable, use the <- symbol.
To print the variable value, just type the variable name.
Syntax:
variable_name <- value
Example:
Filename: variables.R
# creating variables
sname <- "Naveen"
sage <- 20
# printing variables
sname
sage
Output:
[1] "Naveen"
[1] 20
In the above example, sname and sage are variables, while "Naveen" and 20 are values.
In R, unlike other programming languages, you don't need to use a function to display
variables. Simply writing the variable's name will display its value.
Using print() Function
R also provides the print() function, which might feel more familiar if you're used to
languages like Python.
Example:
Filename: variables_p.R
# creating variables
sname <- "Naveen"
sage <- 20
# printing variables
print(sname)
print(sage)
Output:
[1] "Naveen"
[1] 20
Remainders
In many programming languages, = is used for assignment. In R, you can use
both = and <-.
It is generally better to use <- as some R contexts don't allow =.
You can optionally use print() to display output. However, when inside expressions
like { }, print() is recommended.
Rules for Naming Variables in R
Variable names in R must follow certain rules:
Must begin with a letter or a period (.), and can be followed by letters, numbers, ., or _.
If it starts with a ., it cannot be followed by a digit.
Cannot start with a number or an underscore (_).
Variable names are case-sensitive (a and A are different).
Reserved words (like if, TRUE, NULL) cannot be used.
Valid Variable Names:
firstname <- "Naveen"
first_name <- "Naveen"
firstName <- "Madhu"
FIRSTNAME <- "Madhu"
name1 <- "Durga"
.fname <- "Durga"
Invalid Variable Names:
first name <- "Naveen"
first-name <- "Naveen"
first@Name <- "Madhu"
_FNAME <- "Madhu"
1name <- "Durga"
.1name <- "Durga"
Multiple Variables
R allows assigning a single value to multiple variables in a single line.
Syntax:
var1 <- var2 <- var3 <- value
Example:
Filename: variables_m.R
# Assign one value to multiple variables in single line
a <- b <- c <- 10
# Print variable values
print(a)
print(b)
print(c)
Output:
[1] 10
[1] 10
[1] 10
Operators in R
In programming, an operator is a symbol that represents an action. In other words, it is
used to perform operations on variables and values.
R supports the following operators:
Arithmetic operators
Assignment operators
Comparison operators
Logical operators
Miscellaneous operators
Arithmetic Operators
Operator Name Example Output
+ Addition x+y 10 + 5 → 15
- Subtraction x-y 10 - 5 → 5
* Multiplication x*y 10 * 5 → 50
/ Division x/y 10 / 4 → 2.5
^ Exponentiation x^y 2^3→8
%% Modulus x %% y 15 %% 2 → 1
%/% Integer Division x %/% y 15 %/% 2 → 7
Example:
Filename: arith_op.R
# Define two numbers
a <- 15
b <- 2
add_result <- a + b
cat("Addition (a + b):", add_result, "\n")
sub_result <- a - b
cat("Subtraction (a - b):", sub_result, "\n")
mul_result <- a * b
cat("Multiplication (a * b):", mul_result, "\n")
div_result <- a / b
cat("Division (a / b):", div_result, "\n")
exp_result <- a ^ b
cat("Exponentiation (a ^ b):", exp_result, "\n")
mod_result <- a %% b
cat("Modulus (a %% b):", mod_result, "\n")
int_div_result <- a %/% b
cat("Integer Division (a %/% b):", int_div_result, "\n")
Output:
Addition (a + b): 17
Subtraction (a - b): 13
Multiplication (a * b): 30
Division (a / b): 7.5
Exponentiation (a ^ b): 225
Modulus (a %% b): 1
Integer Division (a %/% b): 7
👉 Assignment Operators
Operator Name Examp
<- Left assignment x <- 10
-> Right assignment 10 -> x
Used to assign values to variables:
Example:
Filename: assign_op.R
x <- 30
print(x)
40 -> y
print(y)
Output:
[1] 30
[1] 40
👉 Comparison / Relational Operators
Used to compare two values:
Operator Name Example
== Equal x == y
!= Not equal x != y
> Greater than x>y
< Less than x<y
>= Greater than or equal x >= y
<= Less than or equal x <= y
Example:
Filename: relational_op.R
a <- 10
b <- 20
cat("a = ", a, "\n")
cat("b = ", b, "\n")
cat("Is a equal to b? :", (a == b), "\n")
cat("Is a not equal to b? :", (a != b), "\n")
cat("Is a greater than b? :", (a > b), "\n")
cat("Is a less than b? :", (a < b), "\n")
cat("Is a greater than or equal to b? :", (a >= b), "\n")
cat("Is a less than or equal to b? :", (a <= b), "\n")
Output:
a = 10
b = 20
Is a equal to b? : FALSE
Is a not equal to b? : TRUE
Is a greater than b? : FALSE
Is a less than b? : TRUE
Is a greater than or equal to b? : FALSE
Is a less than or equal to b? : TRUE
👉 Logical Operators
Used for combining multiple conditions:
Operator Name Example
& Element-wise Logical AND x&y
&& Logical AND (short-circuit) x && y
| Element-wise Logical OR x|y
|| Logical OR (short-circuit) x || y
! Logical NOT !x
Example:
Filename: logical_op.R
x <- TRUE
y <- FALSE
cat("Logical AND (x & y):", x & y, "\n")
cat("Logical OR (x | y):", x | y, "\n")
cat("Logical NOT (!x):", !x, "\n")
cat("Short-circuit AND (x && y):", x && y, "\n")
cat("Short-circuit OR (x || y):", x || y, "\n")
Output:
Logical AND (x & y): FALSE
Logical OR (x | y): TRUE
Logical NOT (!x): FALSE
Short-circuit AND (x && y): FALSE
Short-circuit OR (x || y): TRUE
👉 Miscellaneous Operators
Used for specific data manipulation:
Operator Name Example
: Sequence creation x <- 1:10
%in% Element belongs to x %in% y
%*% Matrix multiplication Matrix1 %*% Matrix2
Example:
Filename: miscellaneous_op.R
x <- 1:10
print(x)
print(3 %in% x)
print(12 %in% x)
Output:
[1] 1 2 3 4 5 6 7 8 9 10
[1] TRUE
[1] FALSE
3.Write R command to
i) Illustrate summation, subtraction, multiplication, and division
operations on vectors using vectors.
ii) Enumerate multiplication and division operations between
matrices and vectors in R console
i) Illustrate summation, subtraction, multiplication, and division
operations on vectors using vectors.
In R, you can perform element-wise summation, subtraction, multiplication, and division
operations directly on vectors using standard arithmetic operators.
Program:
Filename: vectoroperations.R
# Define two numeric vectors
vector1 <- c(10, 20, 30, 40)
vector2 <- c(2, 4, 6, 8)
# Perform operations and display results using cat
cat("Vector 1: ", vector1, "\n")
cat("Vector 2: ", vector2, "\n\n")
# Summation
sum_result <- vector1 + vector2
cat("Summation: ", sum_result, "\n")
# Subtraction
sub_result <- vector1 - vector2
cat("Subtraction: ", sub_result, "\n")
# Multiplication
mul_result <- vector1 * vector2
cat("Multiplication: ", mul_result, "\n")
# Division
div_result <- vector1 / vector2
cat("Division: ", div_result, "\n")
Output:
Vector 1: 10 20 30 40
Vector 2: 2 4 6 8
Summation: 12 24 36 48
Subtraction: 8 16 24 32
Multiplication: 20 80 180 320
Division: 5 5 5 5
ii) Enumerate multiplication and division operations between
matrices and vectors in R console
# Define matrix and vectors
mat <- matrix(c(2, 4, 6, 8, 10, 12), nrow = 3, ncol = 2)
vec_col <- c(1, 2) # Length = number of columns
vec_row <- c(1, 2, 3) # Length = number of rows
vec_mul <- c(1, 2) # For matrix multiplication
cat("Matrix (3x2):\n")
print(mat)
cat("\nVector for column-wise ops:\n")
print(vec_col)
cat("\nVector for row-wise ops:\n")
print(vec_row)
# ---------------------------
# 1. Element-wise Multiplication (Column-wise)
cat("\n1. Element-wise Multiplication (Column-wise): \n")
print(mat * vec_col)
# 2. Element-wise Multiplication (Row-wise)
cat("\n2. Element-wise Multiplication (Row-wise): \n")
print(t(t(mat) * vec_row))
# 3. Element-wise Division (Column-wise)
cat("\n3. Element-wise Division (Column-wise): \n")
print(mat / vec_col)
# 4. Element-wise Division (Row-wise)
cat("\n4. Element-wise Division (Row-wise): \n")
print(t(t(mat) / vec_row))
# 5. Matrix Multiplication (%*%)
cat("\n5. Matrix Multiplication :\n")
print(mat %*% vec_mul)
Output:
Matrix (3x2):
[,1] [,2]
[1,] 2 8
[2,] 4 10
[3,] 6 12
Vector for column-wise ops:
[1] 1 2
Vector for row-wise ops:
[1] 1 2 3
1. Element-wise Multiplication (Column-wise):
[,1] [,2]
[1,] 2 16
[2,] 8 10
[3,] 6 24
2. Element-wise Multiplication (Row-wise):
[,1] [,2]
[1,] 2 16
[2,] 12 10
[3,] 12 36
3. Element-wise Division (Column-wise):
[,1] [,2]
[1,] 2 4
[2,] 2 10
[3,] 6 6
4. Element-wise Division (Row-wise):
[,1] [,2]
[1,] 2.000000 4
[2,] 1.333333 10
[3,] 3.000000 4
5. Matrix Multiplication :
[,1]
[1,] 18
[2,] 24
[3,] 30
4. Write R command to
i) Illustrates the usage of Vector subsetting and Matrix subsetting
ii) Write a program to create an array of 3×3 matrixes with 3
rows and 3 columns.
i) Illustrates the usage of Vector subsetting and Matrix subsetting
.
Vector Subsetting in R :
Program:
> # Define a vector
> vec <- c(10, 20, 30, 40, 50)
> # Subset by position
> print(vec[1]) # First element
[1] 10
> print(vec[2:4]) # Elements from position 2 to 4
[1] 20 30 40
> # Subset by negative index (exclude elements)
> print(vec[-1]) # All except the first element
[1] 20 30 40 50
> print(vec[-(2:3)]) # Exclude 2nd and 3rd elements
[1] 10 40 50
> # Subset by logical vector
> print(vec[c(TRUE, FALSE, TRUE, FALSE, TRUE)]) # Select 1st, 3rd, and 5th
[1] 10 30 50
> # Subset by condition
> print(vec[vec > 25]) # Elements greater than 25
[1] 30 40 50
Matrix Subsetting in R :
Program:
> # Define a matrix
> mat <- matrix(1:9, nrow = 3, byrow = TRUE)
> # mat =
># [,1] [,2] [,3]
> # [1,] 1 2 3
> # [2,] 4 5 6
> # [3,] 7 8 9
> # Subset by element
> mat[1, 2] # Element at 1st row, 2nd column
[1] 2
> # Subset entire row or column
> mat[2, ] # Entire 2nd row
[1] 4 5 6
> mat[, 3] # Entire 3rd column
[1] 3 6 9
> # Subset a submatrix
> mat[1:2, 2:3] # Rows 1-2, Columns 2-3
[,1] [,2]
[1,] 2 3
[2,] 5 6
> # Subset with condition
> mat[mat > 5] # All elements > 5 (returns as a vector)
[1] 7 8 6 9
> # Subset with drop = FALSE to preserve matrix structure
> mat[1, , drop = FALSE] # 1st row as a matrix
[,1] [,2] [,3]
[1,] 1 2 3
ii) Write a program to create an array of 3×3 matrixes with 3
rows and 3 columns.
Program:
Filename: ArrayMatrix.R
# Create an array of 3x3 matrices (3 rows, 3 columns, 2 layers)
my_array <- array(1:18, dim = c(3, 3, 2))
# Print the array
print("Array of 3x3 matrices (2 layers):")
print(my_array)
# Access individual matrix (layer)
print("First 3x3 matrix:")
print(my_array[,,1])
print("Second 3x3 matrix:")
print(my_array[,,2])
Output:
[1] "Array of 3x3 matrices (2 layers):"
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
,,2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
[1] "First 3x3 matrix:"
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[1] "Second 3x3 matrix:"
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
5. Write an R program to draw i) Pie chart ii) 3D Pie Chart, iii) Bar
Chart along with chart legend by considering suitable CSV file
Solution :
CSV file : "scores.csv"
Subject,Score
Math,85
Science,90
English,75
History,60
Computer,95
R Program: charts.R
# Install and load plotrix package for for 3D Pie Chart
if(!require(plotrix)) {
install.packages("plotrix")
library(plotrix)
# Read the CSV file
data <- read.csv("scores.csv")
# Extract subjects and scores
subjects <- data$Subject
scores <- data$Score
# Set colors for charts
colors <- rainbow(length(scores))
# Pie Chart with Legend
pie(scores, labels = subjects, col = colors, main = "Pie Chart - Subject Scores")
legend("topright", legend = paste(subjects, scores), fill = colors)
# To display multiple windows
windows()
# 3D Pie Chart with Legend
pie3D(scores, labels = subjects, col = colors, explode = 0.1, main = "3D Pie Chart - Subject
Scores")
legend("topright", legend = paste(subjects, scores), fill = colors)
# To display multiple windows
windows()
# Bar Chart with Legend
barplot(scores, names.arg = subjects, col = colors, main = "Bar Chart - Subject Scores",
ylab = "Scores")
legend("topright", legend = paste(subjects, scores), fill = colors)
Output:
PIE CHART
6. Create a CSV file having Speed and Distance attributes with 1000
records. Write R program to draw
i) Box plots
ii) Histogram
iii) Line Graph
iv) Multiple line graphs
v) Scatter plot
to demonstrate the relation between the cars speed and the distance.
Solution :
The following CSV file having Speed and Distance attributes with some sample records
CSV file : "speed_distance.csv"
Speed,Distance
90,427
49,155
43,159
75,297
92,430
62,244
....
46,158
89,432
55,188
20,37
49,178
R Program: SD_charts.R
# Read the dataset
data <- read.csv("speed_distance.csv")
# Attach variables
attach(data)
# Set up colors
color_speed <- "steelblue"
color_distance <- "tomato"
# i) Box Plots
boxplot(Speed, Distance,
names = c("Speed", "Distance"),
main = "Boxplot of Speed and Distance",
col = c(color_speed, color_distance))
windows()
# ii) Histogram
par(mfrow = c(1, 2))
hist(Speed, col = color_speed, main = "Histogram of Speed", xlab = "Speed", breaks = 20)
hist(Distance, col = color_distance, main = "Histogram of Distance", xlab = "Distance", breaks =
20)
par(mfrow = c(1, 1))
windows()
# iii) Line Graph
plot(Speed, type = "l", col = color_speed, main = "Line Graph - Speed over Records", ylab =
"Speed", xlab = "Record Index")
windows()
# iv) Multiple Line Graphs
plot(Speed, type = "l", col = color_speed, ylim = range(c(Speed, Distance)),
main = "Multiple Line Graphs: Speed & Distance",
xlab = "Record Index", ylab = "Value")
lines(Distance, type = "l", col = color_distance)
legend("topright", legend = c("Speed", "Distance"), col = c(color_speed, color_distance), lty = 1)
windows()
# v) Scatter Plot
plot(Speed, Distance,
main = "Scatter Plot: Speed vs Distance",
xlab = "Speed", ylab = "Distance",
col = "darkgreen", pch = 19)
abline(lm(Distance ~ Speed), col = "red", lwd = 2)
Output:
Box Plots
7. Implement different data structures in R (Vectors, Lists, Data
Frames)
Solution :
Vectors in R:
In R, a vector is a basic data structure used to store multiple values of the same type.It is a one-
dimensional data structure and can hold numeric, character, logical, or other atomic types.
In R, To create a vector, we use c() function(combine function) and in this, the elements are
separated by a comma(,).
Syntax:
vect_name<- c(e1,e2,e3,..)
Example:
Filename: VectorEx.R
# Numeric vector
numbers <- c(10, 20, 30, 40)
print("Numeric Vector:")
print(numbers)
# Character vector
subjects <- c("Math", "Science", "History")
print("Character Vector:")
print(subjects)
# Logical vector
flags <- c(TRUE, FALSE, TRUE)
print("Logical Vector:")
print(flags)
Output:
[1] "Numeric Vector:"
[1] 10 20 30 40
[1] "Character Vector:"
[1] "Math" "Science" "History"
[1] "Logical Vector:"
[1] TRUE FALSE TRUE
Lists in R
A list is a collection of elements of different types. Lists are particularly useful when you need to
store heterogeneous data.
In other words, A list is a flexible data structure that can store elements of different types,
including numbers, characters, vectors, matrices, other lists, and even functions.
In R, To create a list, we use list() function and in this, the elements are separated by a
comma(,).
Syntax:
list_name<- list(e1,e2,e3,..)
Example:
Filename: ListEx.R
# Creating a list with different types
student <- list(
name = "John",
age = 21,
scores = c(85, 90, 95),
pass = TRUE
print("List Example:")
print(student)
# Accessing list elements
print(paste("Student name is", student$name))
Output:
[1] "List Example:"
$name
[1] "John"
$age
[1] 21
$scores
[1] 85 90 95
$pass
[1] TRUE
[1] "Student name is John"
Data Frames in R
In R, A data frame is a two-dimensional array-like structure, or we can say it is a table in which
each column contains the value of one variable, and row contains the set of value from each
column. Data Frames are two-dimensional, heterogeneous data structures.
In R, To create a data frame we use the data.frame() function.
Syntax:
df <- data.frame(vector1, vector2, ..)
Example:
Filename: DataFrameEx.R
# Creating a data frame
students <- data.frame(
Names = c("Madhu", "Durga", "Naveen"),
Ages = c(22, 23, 21),
Scores = c(85.5, 90.0, 78.5)
print("Data Frame Example:")
print(students)
# Accessing a column
print("Names of students:")
print(students$Names)
Output:
[1] "Data Frame Example:"
Names Ages Scores
1 Madhu 22 85.5
2 Durga 23 90.0
3 Naveen 21 78.5
[1] "Names of students:"
[1] "Madhu" "Durga" "Naveen"
8.Write an R program to read a csv file and analyze the data in the file
using EDA (Explorative Data Analysis) techniques.
Solution :
CSV file : "students.csv"
Name,Age,Gender,Marks
Sona,21,Female,85
Madhu,18,Male,90
Naveen,31,Male,48
Meena,43,Female,92
Mohan,20,Male,
Rina,23,Female,81
Kiran,26,Male,92
Durga,,Male,72
Leena,24,Female,93
Madan,21,Male,25
R Program: EDA.R
# Load necessary packages
if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)
# Read the CSV file
data <- read.csv("students.csv")
# View first few rows
cat("---- Head of Dataset ----\n")
print(head(data))
# Summary statistics
cat("\n---- Summary Statistics ----\n")
print(summary(data))
# Structure of data
cat("\n---- Structure of Data ----\n")
print(str(data))
# Check for missing values
cat("\n---- Missing Values ----\n")
print(colSums(is.na(data)))
# Frequency of categorical variable
cat("\n---- Gender Count ----\n")
print(table(data$Gender))
# Visualizations
# 1. Histogram of Marks
hist(data$Marks, col="skyblue", main="Histogram of Marks", xlab="Marks")
# To run multiple windows
windows()
# 2. Boxplot of Marks by Gender
boxplot(Marks ~ Gender, data = data, col=c("pink", "lightblue"),
main = "Boxplot of Marks by Gender", ylab = "Marks")
# To run multiple windows
windows()
# 3. Scatter Plot: Age vs Marks
plot(data$Age, data$Marks, col="darkgreen", pch=19,
main="Scatter Plot: Age vs Marks",
xlab="Age", ylab="Marks")
Output:
---- Head of Dataset ----
Name Age Gender Marks
1 Sona 21 Female 85
2 Madhu 18 Male 90
3 Naveen 31 Male 48
4 Meena 43 Female 92
5 Mohan 20 Male NA
6 Rina 23 Female 81
---- Summary Statistics ----
Name Age Gender Marks
Length:10 Min. :18.00 Length:10 Min. :25.00
Class :character 1st Qu.:21.00 Class :character 1st Qu.:72.00
Mode :character Median :23.00 Mode :character Median :85.00
Mean :25.22 Mean :75.33
3rd Qu.:26.00 3rd Qu.:92.00
Max. :43.00 Max. :93.00
NA's :1 NA's :1
---- Structure of Data ----
'data.frame': 10 obs. of 4 variables:
$ Name : chr "Sona" "Madhu" "Naveen" "Meena" ...
$ Age : int 21 18 31 43 20 23 26 NA 24 21
$ Gender: chr "Female" "Male" "Male" "Female" ...
$ Marks : int 85 90 48 92 NA 81 92 72 93 25
NULL
---- Missing Values ----
Name Age Gender Marks
0 1 0 1
---- Gender Count ----
Female Male
4 6
9.Write an R program to illustrate Linear Regression and Multi linear
Regression considering suitable CSV file
Solution :
CSV file : "student_scores.csv"
Hours,Preparation,IQ,Score
2,3,110,50
4,4,105,60
6,5,115,65
8,6,120,80
10,8,125,90
12,9,130,95
14,10,100,90
In the above CSV file,
Hours: Study hours
Preparation: Days of preparation
IQ: Intelligence score
Score: Final exam score (target variable)
To Download above CSV file : Click Here
Simple Linear Regression
R Program: Linear_Regression.R
# Load required library
if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)
# Read the CSV file
data <- read.csv("student_scores.csv")
# View the data
cat("Dataset:\n")
print(data)
# Simple Linear Regression (Score ~ Hours)
model_linear <- lm(Score ~ Hours, data = data)
cat("\nSimple Linear Regression Summary:\n")
print(summary(model_simple))
# Plotting the regression line
plot(data$Hours, data$Score, main = "Simple Linear Regression",
xlab = "Study Hours", ylab = "Score", pch = 16, col = "blue")
abline(model_linear, col = "red", lwd = 2)
Output:
Dataset:
Hours Preparation IQ Score
1 2 3 110 50
2 4 4 105 60
3 6 5 115 65
4 8 6 120 80
5 10 8 125 90
6 12 9 130 95
7 14 10 100 90
Simple Linear Regression Summary:
Call:
lm(formula = Score ~ Hours, data = data)
Residuals:
1 2 3 4 5 6
0.2381 0.8095 -3.6190 1.9524 2.5238 -1.9048
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.3333 2.4462 16.49 7.92e-05 ***
Hours 4.7143 0.3141 15.01 0.000115 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.628 on 4 degrees of freedom
Multiple R-squared: 0.9826, Adjusted R-squared: 0.9782
F-statistic: 225.3 on 1 and 4 DF, p-value: 0.0001148
Linear Regression
Multiple Linear Regression
R Program: M_Linear_Regression.R
# Load required libraries
if(!require(scatterplot3d)) install.packages("scatterplot3d")
library(scatterplot3d)
# Read the dataset
data <- read.csv("student_scores.csv")
# Multiple Linear Regression model
model_multi <- lm(Score ~ Hours + Preparation + IQ, data = data)
cat("Multiple Linear Regression Summary:\n")
print(summary(model_multi))
# Predict the fitted values
predicted_scores <- predict(model_multi)
# 3D Scatter Plot: using Hours and Preparation as predictors
s3d <- scatterplot3d(data$Hours, data$Preparation, data$Score,
pch = 19, color = "blue",
xlab = "Hours", ylab = "Preparation", zlab = "Score",
main = "3D Plot: Hours & Preparation vs Score",
highlight.3d = TRUE, angle = 50)
# Add predicted values as a regression line
s3d$points3d(data$Hours, data$Preparation, predicted_scores,
col = "red", type = "l", lwd = 2)
Output:
Multiple Linear Regression Summary:
Call:
lm(formula = Score ~ Hours + Preparation + IQ, data = data)
Residuals:
1 2 3 4 5 6 7
-2.0056 3.0877 -3.3955 2.3134 2.5093 -1.7817 -0.7276
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.3470 16.9883 -0.256 0.8146
Hours 3.2929 3.4852 0.945 0.4145
Preparation 0.5131 5.7375 0.089 0.9344
IQ 0.4384 0.1453 3.017 0.0569 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.671 on 3 degrees of freedom
Multiple R-squared: 0.9778, Adjusted R-squared: 0.9556
F-statistic: 44.04 on 3 and 3 DF, p-value: 0.005578
Multiple Linear Regression