0% found this document useful (0 votes)

18 views32 pages

R Programing

The document outlines a series of exercises related to data manipulation and visualization using R, including importing data, performing matrix and dataframe operations, creating various types of graphs, and applying min-max normalization. Each exercise includes a clear aim, step-by-step procedures, and R code examples. The document concludes with successful execution results for each exercise.

Uploaded by

systemev206hql

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views32 pages

R Programing

Uploaded by

systemev206hql

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PAGE
[Link] DATE PROGRAM NAME SIGNATURE
NO
Import Data from other
01
formats
02 Matrix Operations

03 Dataframe Operations
Different types of
04
Graphs
Min-max Normalization
05
for a dataset
Mean, Median,
06 Standard Deviation and
t-test
Handling missing
07
Values
Statistical Correlation
08
Test
Data Transformation
09
Operations
Explore the distribution
10
of dataset
[Link]: 01
DATE: IMPORT DATA FROM docx, xls, txt AND OTHER FORMATS

Aim:
To import data from docx, xls, txt and other formats.
procedure:
Step 1 : Start the process
Step 2 : Determine the format of the data file that needs to be imported
(Excel ,SPSS, text, or CSV).
Step 3 : Install and load the necessary R packages based on the format of the
data file (readxl, XLConnect, foreign, or readr).
Step 4 : Open the data file in its respective software and save it with
An appropriate file name and extension.
Step 5 : Use the respective R function (read_excel, [Link], read_table2,
or read_csv) to import the data into R as a data frame.
Step 6 : Assign the imported data to a variable for further analysis.
Step 7 : Check the imported data using the print or head functions to ensure it
is correctly imported.
Step 8 : If required, clean and preprocess the imported data for further
analysis.
Step 9: Stop the process.
CODE:
Importing Excel File
library(readxl) # load gdata package
help(read_excel) # documentation
mydata = read_excel("[Link]") # read from first sheet

Importing SPSS File

library(foreign) # load the foreign package
help([Link]) # documentation
mydata = [Link]("myfile", [Link]=TRUE)

Importing Text File with Tables

library(readr)
mydata = read_table2("[Link]") # read text file
mydata # print data frame

Importing CSV File

mydata = read_csv("[Link]") # read csv file
mydata

Result:
Thus the above program has been executed successfully.
[Link]: 02
DATE: MATRIX OPERATIONS

Aim:
To perform matrix operations using r.
procedure:
Step 1: Create three matrices P with different dimensions and values, each with
specified row and column names.
Step 2: Print each matrix P.
Step 3: Create matrices A and B with specific dimensions and values.
Step 4: Print matrices A and B.
Step 5: Perform arithmetic operations (addition, subtraction, multiplication,
division) between matrices A and B, and print the results.
Step 6: Create a matrix P with specific dimensions and values.
Step 7: Print matrix P.
Step 8: Generate random matrix data, find the indices of its maximum and
minimum values, and print them.
CODE:
2a.
> rownames = c("row1", "row2", "row3", "row4","row5")
> colnames = c("col1", "col2", "col3","col4")
> P= matrix(c(3:22), nrow = 5, byrow = TRUE, dimnames = list(rownames,
+ colnames))
> print(P)
col1 col2 col3 col4
row1 3 4 5 6
row2 7 8 9 10
row3 11 12 13 14
row4 15 16 17 18
row5 19 20 21 22
> rownames = c("row1", "row2", "row3")
> colnames = c("col1", "col2", "col3")
> P= matrix(c(3:11), nrow = 3, byrow = TRUE, dimnames = list(rownames,
+ colnames))
> print(P)
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
> rownames = c("row1", "row2")
> colnames = c("col1", "col2")
> P= matrix(c(3:6), nrow = 2, byrow = FALSE, dimnames = list(rownames,
+ colnames))
> print(P)
col1 col2
row1 3 5
row2 4 6
>

2b.
A= matrix(c(3:8), nrow = 2, byrow = TRUE)
> print(A)
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 7 8
> B= matrix(c(3:8), nrow = 2, byrow = FALSE)
> print(B)
[,1] [,2] [,3]
[1,] 3 5 7
[2,] 4 6 8
> result=A+B
> result
[,1] [,2] [,3]
[1,] 6 9 12
[2,] 10 13 16
> result=A-B
> result
[,1] [,2] [,3]
[1,] 0 -1 -2
[2,] 2 1 0
> result=A*B
> result
[,1] [,2] [,3]
[1,] 9 20 35
[2,] 24 42 64
> result=A/B
> result
[,1] [,2] [,3]
[1,] 1.0 0.800000 0.7142857
[2,] 1.5 1.166667 1.0000000

2c.
P= matrix(c(3:18), nrow = 4, byrow = FALSE)
> print(P)
[,1] [,2] [,3] [,4]
[1,] 3 7 11 15
[2,] 4 8 12 16
[3,] 5 9 13 17
[4,] 6 10 14 18
> P[2,3]
[1] 12
> P[3,]
[1] 5 9 13 17
> P[,4]
[1] 15 16 17 18

2d.
x = matrix(1:6,nrow=2, ncol=3)
> y = matrix(13:21,nrow=3, ncol=3)
> print(x)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> y
[,1] [,2] [,3]
[1,] 13 16 19
[2,] 14 17 20
[3,] 15 18 21
> mat=rbind(x,y)
> mat
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 13 16 19
[4,] 14 17 20
[5,] 15 18 21
2e.
[Link](123)
> matrix_data= matrix(sample(1:100, 25,replace=TRUE), nrow = 5, ncol = 5)
> matrix_data
[,1] [,2] [,3] [,4] [,5]
[1,] 31 42 90 92 26
[2,] 79 50 91 9 7
[3,] 51 43 69 93 42
[4,] 14 14 91 99 9
[5,] 67 25 57 72 83
> max_index =which(matrix_data == max(matrix_data), [Link] = TRUE)
> min_index =which(matrix_data == min(matrix_data), [Link] = TRUE)
> max_index
row col
[1,] 4 4
> min_index
row col
[1,] 2 5

Result:
Thus the above program has been executed successfully.
[Link]: 03
DATE: DATAFRAME OPERATIONS

Aim:
To perform dataframe operations using r.

procedure:
Step 1: Create a dataframe [Link] with employee details including ID, name,
salary, and start date.
Step 2: Print the dataframe [Link] and its structure using print() and str()
functions.
Step 3: Print summary statistics of the dataframe using summary() and check
the class of the dataframe using class().
Step 4: Extract the column 'emp_name' from the dataframe and store it in a new
dataframe result. Print result.
Step 5: Add a new column 'dept' to the dataframe [Link] with department
names. Print the updated dataframe.
Step 6: Create a new row with employee details and append it to the dataframe
[Link]. Print the updated dataframe with the new row added.
CODE:
3a.
[Link] =[Link](
+ emp_id = c (1:5),
+ emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
+ salary = c(623.3,515.2,611.0,729.0,843.25),
+ start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
+ "2015-03-27")),
+ stringsAsFactors = FALSE)
> print([Link])
emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
> str([Link])
'[Link]': 5 obs. of 4 variables:
$ emp_id : int 1 2 3 4 5
$ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
$ salary : num 623 515 611 729 843
$ start_date: Date, format: "2012-01-01" "2013-09-23" ...

3b.
print("summary")
[1] "summary"
> summary([Link])
emp_id emp_name salary start_date
Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27
dept
Length:5
Class :character
Mode :character

> print("nature of df")

[1] "nature of df"
> class(df)
[1] "function"
3c.
result <- [Link]([Link]$emp_name)
> print(result)
[Link].emp_name
1 Rick
2 Dan
3 Michelle
4 Ryan
5 Gary

3d.
[Link]$dept <- c("IT","Operations","IT","HR","Finance")
> v <- [Link]
> print(v)
emp_id emp_name salary start_date dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
> #####adding new row#########
> v = c(6,"Anand",999.00,"2024-01-05","Business")
> new_df=rbind([Link],v)
> print(new_df)
emp_id emp_name salary start_date dept
1 1 Rick 623.3 2012-01-01 IT
2 2 Dan 515.2 2013-09-23 Operations
3 3 Michelle 611 2014-11-15 IT
4 4 Ryan 729 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
6 6 Anand 999 2024-01-05 Business

Result:
Thus the above program has been executed successfully.
[Link]: 04
DATE: DIFFERENT TYPES OF GRAPHS

Aim:
To demonstrate different types of graphs using ggplot for a dataset.
procedure:
Scatter Plot:
Plot Sepal Width against Sepal Length, color points by Species, and use a
minimalistic theme.

Box Plot:
Create a box plot of Petal Length by Species, color boxes by Species, and use a
minimalistic theme (optionally remove legend).

Histogram:
Generate a histogram of Sepal Length, fill bars by Species, and use a
minimalistic theme.

Heatmap:
Construct a heatmap of Sepal Length against Petal Length, fill cells by Species,
and use a minimalistic theme.

Bar Chart:
Plot the mean Petal Length by Species as a bar chart, color bars by Species, and
use a minimalistic theme.
CODE:
Scatter Plot:
ggplot(data=iris,aes(x=[Link], y=[Link],color=Species)) +
geom_point() +
theme_minimal()

Box plot
ggplot(data=iris,aes(x=Species, y=[Link],color=Species)) +
geom_boxplot()
+theme_minimal()+
theme([Link]="none")

Histogram:
ggplot(data=Iris,aes(x=SepalLengthCm,fill=Species)) + geom_histogram()
+theme_minimal()
Heat Map:
ggplot(data=Iris,aes(x=SepalLengthCm,y=PetalLengthCm,fill=Species)) +
geom_bin2d() +
theme_minimal()
Bar chart :
ggplot(data=iris,aes(x=Species,y=[Link],fill=Species))+geom_bar(stat
="summary",fun.y="mean")+theme_minimal()

Result:
Thus the above program has been executed successfully.
[Link]: 05
DATE: MIN-MAX NORMALIZATION IN DATASET

Aim:
To perform min-max normalization in dataset and show the result using
ggplot.
procedure:
Step 1: Load the ggplot2 library and the iris dataset.
Step 2: Examine the structure of the iris dataset.
Step 3: Create a scatter plot using Petal Width and Petal Length, color points by
species.
Step 4: Normalize the numerical columns of the iris dataset to a range of [0,1]
and retain the species information.
Step 5: Examine the structure and summary statistics of the normalized iris
dataset.
Step 6: Create another scatter plot using the normalized Petal Width and Petal
Length, color points by species.
CODE:
> library(ggplot2)
> data("iris")
> str(iris)
'[Link]': 150 obs. of 5 variables:
$ [Link]: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ [Link] : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ [Link]: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ [Link] : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1
1 1 ...
> ggplot(data = iris, aes(x = [Link], y = [Link],color = Species))
+ geom_point()
> iris_norm <- [Link](apply(iris[, 1:4], 2, function(x) (x - min(x))/(m
ax(x)-min(x))))
> iris_norm$Species <- iris$Species
> str(iris_norm)
'[Link]': 150 obs. of 5 variables:
$ [Link]: num 0.2222 0.1667 0.1111 0.0833 0.1944 ...
$ [Link] : num 0.625 0.417 0.5 0.458 0.667 ...
$ [Link]: num 0.0678 0.0678 0.0508 0.0847 0.0678 ...
$ [Link] : num 0.0417 0.0417 0.0417 0.0417 0.0417 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1
1 1 ...
> summary(iris_norm)
[Link] [Link] [Link] [Link]
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:0.2222 1st Qu.:0.3333 1st Qu.:0.1017 1st Qu.:0.08333
Median :0.4167 Median :0.4167 Median :0.5678 Median :0.50000
Mean :0.4287 Mean :0.4406 Mean :0.4675 Mean :0.45806
3rd Qu.:0.5833 3rd Qu.:0.5417 3rd Qu.:0.6949 3rd Qu.:0.70833
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
Species
setosa :50
versicolor:50
virginica :50

> ggplot(data = iris_norm, aes(x = [Link], y = [Link],color = Speci

es)) +
+ geom_point()
Before Noramalization:

After Normalization:
Result:
Thus the above program has been executed successfully.
[Link]: 06
DATE: MEAN, MEDIAN, STANDARD DEVIATION AND T -TEST

Aim:
To calculate mean, median, standard deviation and t-test in a dataset.
procedure:
Step 1: Load the dplyr library and the iris dataset.
Step 2: Calculate summary statistics (mean, median, and standard deviation) for
Sepal Length across all species using dplyr's summarise() function with across()
and list() functions.
Step 3: Perform a t-test comparing the Sepal Length between the "setosa" and
"versicolor" species.
Step 4: Store Sepal Length data for "setosa" and "versicolor" species in separate
vectors, x and y.
Step 5: Conduct the t-test between vectors x and y using the [Link]() function.
Step 6: Print the results of the t-test.
CODE:

library(dplyr)
data("iris")
> iris %>%
+ summarise(across([Link], list(mean = mean, median = median, sd = s
d)))
Sepal.Length_mean Sepal.Length_median Sepal.Length_sd
1 5.843333 5.8 0.8280661
> #t test
> x <- iris[iris$Species == "setosa", ]$[Link]
> y <- iris[iris$Species == "versicolor", ]$[Link]
> tt <- [Link](x, y)
> tt

Welch Two Sample t-test

data: x and y
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.1057074 -0.7542926
sample estimates:
mean of x mean of y
5.006 5.936

Result:
Thus the above program has been executed successfully.
[Link]: 07
DATE: HANDLING MISSING VALUES

Aim:
To handle the missing values in a dataset.
procedure:
Step 1: Load the "airquality" dataset.
Step 2: Convert the "airquality" dataset into a dataframe called "df".
Step 3: Print a summary of the dataframe "df".
Step 4: Calculate the number of missing values in the "Ozone" column of the
dataframe.
Step 5: Compute the mean of the "Ozone" column, excluding missing values.
Step 6: Replace missing values in the "Ozone" column with the computed mean.
Step7: Recalculate the number of missing values in the "Ozone" column to
confirm replacement.
CODE:
> df=[Link](airquality)
> summary(df)
Ozone Solar.R Wind Temp Month
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00 Min. :5.0
00
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00 1st Qu.:6.0
00
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00 Median :7.0
00
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88 Mean :6.9
93
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00 3rd Qu.:8.0
00
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00 Max. :9.0
00
NA's :37 NA's :7
Day
Min. : 1.0
1st Qu.: 8.0
Median :16.0
Mean :15.8
3rd Qu.:23.0
Max. :31.0

> sum([Link](df$Ozone))
[1] 37
> mean(df$Ozone,[Link]=T)
[1] 42.12931
> df$Ozone[[Link](df$Ozone)]=mean(df$Ozone,[Link]=T)
> sum([Link](df$Ozone))
[1] 0

Result:
Thus the above program has been executed successfully.
[Link]: 08
DATE: STATISTICAL CORRELATION TEST

Aim:
To perform statistical correlation test for comparing two variables.

procedure:
Step 1: Load the iris dataset using .data(iris).
Step 2: Assign the "[Link]" column of the iris dataset to the variable
variable1.
Step 3: Assign the "[Link]" column of the iris dataset to the variable
variable2.
Step 4: Perform a correlation test between variable1 and variable2 using the
[Link]() function.
Step 5: Store the results of the correlation test in the variable correlation_test.
step 6: Print the results of the correlation test using the print() function.
CODE:
> data(iris)
> variable1 <- iris$[Link]
> variable2 <- iris$[Link]
> correlation_test <- [Link](variable1, variable2)
> print(correlation_test)

Pearson's product-moment correlation

data: variable1 and variable2

t = 21.646, df = 148, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8270363 0.9055080
sample estimates:
cor
0.8717538

Result:
Thus the above program has been executed successfully.
[Link]: 09
DATE: DATA TRANSFORMATION OPERATIONS

Aim:
To perform various data transformation operations using filter, arrange,
select, mutate, summarize functions.
procedure:

Step 1: Load the dplyr library and the iris dataset.

Step 2: Filter the iris dataset to include only rows where the species is 'virginica'
and Sepal Length is greater than 7.
Step 3: Arrange the iris dataset in ascending order based on Petal Length.
Step 4: Select the "Species" column from the iris dataset.
Step 5: Add a new column "PetalDiff" to the iris dataset, calculated as the
difference between Petal Length and Petal Width.
Step 6: Summarize Sepal Length column in the iris dataset, calculating mean,
median, and standard deviation.
CODE:
library(dplyr)
> data("iris")
> iris %>% filter(Species == 'virginica' & [Link]>7)
[Link] [Link] [Link] [Link] Species
1 7.1 3.0 5.9 2.1 virginica
2 7.6 3.0 6.6 2.1 virginica
3 7.3 2.9 6.3 1.8 virginica
4 7.2 3.6 6.1 2.5 virginica
5 7.7 3.8 6.7 2.2 virginica
6 7.7 2.6 6.9 2.3 virginica
7 7.7 2.8 6.7 2.0 virginica
8 7.2 3.2 6.0 1.8 virginica
9 7.2 3.0 5.8 1.6 virginica
10 7.4 2.8 6.1 1.9 virginica
11 7.9 3.8 6.4 2.0 virginica
12 7.7 3.0 6.1 2.3 virginica
> iris %>% filter(Species == "versicolor") %>% arrange(desc([Link]))
[Link] [Link] [Link] [Link] Species
1 7.0 3.2 4.7 1.4 versicolor
2 6.9 3.1 4.9 1.5 versicolor
3 6.8 2.8 4.8 1.4 versicolor
4 6.7 3.1 4.4 1.4 versicolor
5 6.7 3.0 5.0 1.7 versicolor
6 6.7 3.1 4.7 1.5 versicolor
7 6.6 2.9 4.6 1.3 versicolor
8 6.6 3.0 4.4 1.4 versicolor
9 6.5 2.8 4.6 1.5 versicolor
10 6.4 3.2 4.5 1.5 versicolor
11 6.4 2.9 4.3 1.3 versicolor
12 6.3 3.3 4.7 1.6 versicolor
13 6.3 2.5 4.9 1.5 versicolor
14 6.3 2.3 4.4 1.3 versicolor
15 6.2 2.2 4.5 1.5 versicolor
16 6.2 2.9 4.3 1.3 versicolor
17 6.1 2.9 4.7 1.4 versicolor
18 6.1 2.8 4.0 1.3 versicolor
19 6.1 2.8 4.7 1.2 versicolor
20 6.1 3.0 4.6 1.4 versicolor
21 6.0 2.2 4.0 1.0 versicolor
22 6.0 2.9 4.5 1.5 versicolor
23 6.0 2.7 5.1 1.6 versicolor
24 6.0 3.4 4.5 1.6 versicolor
25 5.9 3.0 4.2 1.5 versicolor
26 5.9 3.2 4.8 1.8 versicolor
27 5.8 2.7 4.1 1.0 versicolor
28 5.8 2.7 3.9 1.2 versicolor
29 5.8 2.6 4.0 1.2 versicolor
30 5.7 2.8 4.5 1.3 versicolor
31 5.7 2.6 3.5 1.0 versicolor
32 5.7 3.0 4.2 1.2 versicolor
33 5.7 2.9 4.2 1.3 versicolor
34 5.7 2.8 4.1 1.3 versicolor
35 5.6 2.9 3.6 1.3 versicolor
36 5.6 3.0 4.5 1.5 versicolor
37 5.6 2.5 3.9 1.1 versicolor
38 5.6 3.0 4.1 1.3 versicolor
39 5.6 2.7 4.2 1.3 versicolor
40 5.5 2.3 4.0 1.3 versicolor
41 5.5 2.4 3.8 1.1 versicolor
42 5.5 2.4 3.7 1.0 versicolor
43 5.5 2.5 4.0 1.3 versicolor
44 5.5 2.6 4.4 1.2 versicolor
45 5.4 3.0 4.5 1.5 versicolor
46 5.2 2.7 3.9 1.4 versicolor
47 5.1 2.5 3.0 1.1 versicolor
48 5.0 2.0 3.5 1.0 versicolor
49 5.0 2.3 3.3 1.0 versicolor
50 4.9 2.4 3.3 1.0 versicolor
> iris %>% select(Species) %>% distinct()
Species
1 setosa
2 versicolor
3 virginica
> iris_with_new_column <- iris %>% mutate(SepalRatio = [Link] / [Link]
dth)
> head(iris_with_new_column)
[Link] [Link] [Link] [Link] Species SepalRatio
1 5.1 3.5 1.4 0.2 setosa 1.457143
2 4.9 3.0 1.4 0.2 setosa 1.633333
3 4.7 3.2 1.3 0.2 setosa 1.468750
4 4.6 3.1 1.5 0.2 setosa 1.483871
5 5.0 3.6 1.4 0.2 setosa 1.388889
6 5.4 3.9 1.7 0.4 setosa 1.384615
> iris %>% summarise(across([Link], list(mean = mean, median = median, s
d = sd)))
Sepal.Length_mean Sepal.Length_median Sepal.Length_sd
1 5.843333 5.8 0.8280661

Result:
Thus the above program has been executed successfully.
[Link]: 10
DATE: EXPLORE THE DISTRIBUTION OF DATASET

Aim:
To explore the distribution of variables in a dataset.
procedure:
Step 1: Load the ggplot2 library.
Step 2: Load the diamonds dataset.
Step 3: Plot histograms for x, y, and z variables:
- Use ggplot2 to create histograms for each variable.
- Set the binwidth to 0.5 for x, y, and z histograms.
- Customize colors for better visualization.
- Add titles to indicate the variables.
Step 4: Calculate summary statistics for x, y, and z variables:
- Use the summary() function to get mean, median, min, max, etc.
Step 5: Plot a histogram for the price variable:
- Use ggplot2 to create a histogram for the price variable.
- Set the binwidth to 500 for the price histogram.
- Customize colors for better visualization.
- Add a title to indicate the variable.
Step 6: Calculate summary statistics for the price variable:
- Use the summary() function to get mean, median, min, max, etc.
CODE:
library(ggplot2)
>
> # Load the diamonds dataset
> data("diamonds")
>
> # Explore the distribution of x, y, z variables
> # Plot histograms for each variable
> ggplot(diamonds, aes(x = x)) + geom_histogram(binwidth = 0.5, fill = "skyblue", color = "
black") + ggtitle("Distribution of x Variable")
> ggplot(diamonds, aes(x = y)) + geom_histogram(binwidth = 0.5, fill = "lightgreen", color
= "black") + ggtitle("Distribution of y Variable")
> ggplot(diamonds, aes(x = z)) + geom_histogram(binwidth = 0.5, fill = "salmon", color = "b
lack") + ggtitle("Distribution of z Variable")
>
> # Summary statistics for x, y, z variables
> summary(diamonds[c("x", "y", "z")])
x y z
Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 4.710 1st Qu.: 4.720 1st Qu.: 2.910
Median : 5.700 Median : 5.710 Median : 3.530
Mean : 5.731 Mean : 5.735 Mean : 3.539
3rd Qu.: 6.540 3rd Qu.: 6.540 3rd Qu.: 4.040
Max. :10.740 Max. :58.900 Max. :31.800
>
> # Explore the distribution of the price variable
> # Plot a histogram for the price variable
> ggplot(diamonds, aes(x = price)) + geom_histogram(binwidth = 500, fill = "orange", color
= "black") + ggtitle("Distribution of Price")
>
> # Summary statistics for price variable
> summary(diamonds$price)
Min. 1st Qu. Median Mean 3rd Qu. Max.
326 950 2401 3933 5324 18823
Explore the distribution of price. Do you discover anything unusual or
surprising? (Hint: Carefully think about the binwidth and make sure you try a
wide range of values.)

The price data has many spikes, but I can’t tell what each spike corresponds to.
The following plots don’t show much difference in the distributions in the last
one or two digits.
There are no diamonds with a price of $1,500 (between $1,455 and $1,545,
including).
There’s a bulge in the distribution around $750.

ggplot(filter(diamonds, price < 2500), aes(x = price,fill=color)) +

geom_histogram(binwidth = 10, center = 0)
Result:
Thus the above program has been executed successfully.

R Exam
No ratings yet
R Exam
18 pages
Model 1
No ratings yet
Model 1
14 pages
R Machine Learning Lab Guide
0% (1)
R Machine Learning Lab Guide
9 pages
R-Programming Record - Odd Sem 21-22
No ratings yet
R-Programming Record - Odd Sem 21-22
35 pages
DS Tutorial-2 Dinesh Dodeja 52119
No ratings yet
DS Tutorial-2 Dinesh Dodeja 52119
5 pages
R Basics for Beginners
No ratings yet
R Basics for Beginners
24 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
Base R
No ratings yet
Base R
9 pages
R File Code
No ratings yet
R File Code
16 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
R-Basics Knit
No ratings yet
R-Basics Knit
13 pages
R Data Operations and Analysis Guide
No ratings yet
R Data Operations and Analysis Guide
22 pages
An Introduction To R Language
No ratings yet
An Introduction To R Language
11 pages
RStudio
No ratings yet
RStudio
31 pages
RemoveWatermark pdf24 Merged+
No ratings yet
RemoveWatermark pdf24 Merged+
76 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
R Programming LAB
No ratings yet
R Programming LAB
32 pages
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
No ratings yet
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
47 pages
Summary R - Coding
No ratings yet
Summary R - Coding
2 pages
Practical Test 1222678
No ratings yet
Practical Test 1222678
5 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
R Programming Basics: Vectors, Matrices, Dataframes
No ratings yet
R Programming Basics: Vectors, Matrices, Dataframes
13 pages
Data Analytics Lab R Experiments Guide
No ratings yet
Data Analytics Lab R Experiments Guide
20 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Week 7
No ratings yet
Week 7
10 pages
Cost Lab 1
No ratings yet
Cost Lab 1
13 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Saurabh
No ratings yet
Saurabh
22 pages
Practical Test 1222678
No ratings yet
Practical Test 1222678
5 pages
Experiment 1: Working With Objects in Memory
No ratings yet
Experiment 1: Working With Objects in Memory
6 pages
L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R Studio
No ratings yet
R Studio
8 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
Materi 4
No ratings yet
Materi 4
30 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Assignment 10
No ratings yet
R Assignment 10
12 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
R Programming
No ratings yet
R Programming
34 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
Daur Unit 2
No ratings yet
Daur Unit 2
28 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
R Commands
No ratings yet
R Commands
18 pages
R Functions
No ratings yet
R Functions
8 pages
Unit - 3 Learning Notes
No ratings yet
Unit - 3 Learning Notes
8 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
R-Programming - Final Lab Manual-2022
No ratings yet
R-Programming - Final Lab Manual-2022
31 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
CH 3
No ratings yet
CH 3
33 pages
DA Lab 1-7
No ratings yet
DA Lab 1-7
26 pages
Teaching R
No ratings yet
Teaching R
15 pages
Question 1 Ans (DAR)
No ratings yet
Question 1 Ans (DAR)
17 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Excel Skills - Exercises - Pivot Tables: Tutorial
No ratings yet
Excel Skills - Exercises - Pivot Tables: Tutorial
12 pages
Oracle 1Z0-1057 Exam Guide
No ratings yet
Oracle 1Z0-1057 Exam Guide
8 pages
Import MS Excel Data To SQL Server Table Using C# - CodeProject
No ratings yet
Import MS Excel Data To SQL Server Table Using C# - CodeProject
4 pages
Excel vs Arena BOM: Detailed Comparison
No ratings yet
Excel vs Arena BOM: Detailed Comparison
20 pages
Excel VBA for Beginners
No ratings yet
Excel VBA for Beginners
20 pages
FAS SRP Excel Upload Reporting Template v2.2
No ratings yet
FAS SRP Excel Upload Reporting Template v2.2
105 pages
WBSEDCL Office Executive Official Paper (Held On - 08 Sept, 2021 Shift 1)
No ratings yet
WBSEDCL Office Executive Official Paper (Held On - 08 Sept, 2021 Shift 1)
45 pages
Financial Projections Spreadsheet
No ratings yet
Financial Projections Spreadsheet
46 pages
Oracle Fusion Financials General Ledger Essentials Q & A
No ratings yet
Oracle Fusion Financials General Ledger Essentials Q & A
32 pages
Using ICT To Improve Your Monitoring & Evaluation: A Workbook To Help You Develop An Effective ICT System (Davey, Parkinson and Wadia (2008)
No ratings yet
Using ICT To Improve Your Monitoring & Evaluation: A Workbook To Help You Develop An Effective ICT System (Davey, Parkinson and Wadia (2008)
92 pages
Tracer Survey Manual - Final 2
100% (1)
Tracer Survey Manual - Final 2
36 pages
Software Verification: Automating The Sap2000 Verification Checking Process
No ratings yet
Software Verification: Automating The Sap2000 Verification Checking Process
7 pages
Spreadsheet Solver Tutorial Guide
No ratings yet
Spreadsheet Solver Tutorial Guide
14 pages
Excel Date and Time
No ratings yet
Excel Date and Time
14 pages
Stock Connector Guide for Excel Users
No ratings yet
Stock Connector Guide for Excel Users
4 pages
MS Excel Interview Questions and Answers 471
No ratings yet
MS Excel Interview Questions and Answers 471
26 pages
Completed Google Sheet Exercise
No ratings yet
Completed Google Sheet Exercise
24 pages
Conformiq Creator 2 Tutorial
No ratings yet
Conformiq Creator 2 Tutorial
32 pages
Pintasan Word & Excel
No ratings yet
Pintasan Word & Excel
1 page
Excel Formulas for Beginners
No ratings yet
Excel Formulas for Beginners
5 pages
Module 1 Questions and Answers
No ratings yet
Module 1 Questions and Answers
2 pages
XLSForm Guide for Form Creators
100% (1)
XLSForm Guide for Form Creators
24 pages
Supplier Forecast Collaboration Guide
No ratings yet
Supplier Forecast Collaboration Guide
19 pages
Investor Details Excel Upload Guide
No ratings yet
Investor Details Excel Upload Guide
424 pages
Year 7 ICT: Excel Data Validation Lesson
100% (1)
Year 7 ICT: Excel Data Validation Lesson
2 pages
TreePlan: Excel Decision Tree Add-In
No ratings yet
TreePlan: Excel Decision Tree Add-In
22 pages
EHS Document Bundle for Safety Management
No ratings yet
EHS Document Bundle for Safety Management
1 page
239 Essential Excel Keyboard Shortcuts
100% (1)
239 Essential Excel Keyboard Shortcuts
17 pages
Enders
No ratings yet
Enders
41 pages
Microsoft Excel and Sanet - Me
94% (18)
Microsoft Excel and Sanet - Me
160 pages

R Programing

Uploaded by

R Programing

Uploaded by

CONTENTS

Importing SPSS File

Importing Text File with Tables

Importing CSV File

> print("nature of df")

> ggplot(data = iris_norm, aes(x = [Link], y = [Link],color = Speci

Welch Two Sample t-test

Pearson's product-moment correlation

data: variable1 and variable2

Step 1: Load the dplyr library and the iris dataset.

ggplot(filter(diamonds, price < 2500), aes(x = price,fill=color)) +

You might also like