1
CREATING A PROJECT IN RSTUDIO
This exercise requires that you create and use a project in RStudio.
Steps
1. Download the data set Diamonds.csv from the Canvas module.
2. Open up RStudio.
3. Click on File > New Project
a. The following dialog appears:
b. Select New Directory
c. Select New Project
d. In the following dialog, enter Diamonds in the Directory Name box and then
Browse to the Subdirectory where you want to create the project. Leave the other
checkboxes blank.
2
e. Click Create Project.
f. You should see the following in the Files pane in RStudio:
g. Click on New folder; create the following five folders in the Files pane:
Data
Code
Documents
Output
Plots
h. Copy the file Diamonds.csv data into the Data folder of the project.
i. Open a new script file: File > New File > RScript.
3
j. Copy and paste the following code into the script pane of RStudio (usually the
pane in the upper left).
getwd()# Check the default directory.
diamonds <- read.csv("./Data/Diamonds.csv")# Read Diamonds.csv into diamonds.
head(diamonds,10) # Print out the first 10 rows.
tail(diamonds,10) # Print the last 10 rows
#
summary(diamonds) # Summarize the data in diamonds.
#
#### Subset the data to include only observations where carets <= 2.5.
diamonds <- diamonds[which(diamonds$carat <= 2.5),]
#
#
summary(diamonds) # Run the summary again.
#
hist(diamonds$carat) # Produce a histogram of the carat sizes.
# Save the plot your project Plots folder.
dev.copy(jpeg,'./Plots/carat_hist.jpg', width = 800, height = 600)
dev.off() # Turn off the output to files.
#
table(diamonds$clarity) # Produce a table of the clarity values.
#
#### Install the following ggplot2 on your computer.
#### (You may be asked to select a repository; any repository will work,
#### if you select one geographically close, things will go faster.
# install.packages("ggplot2") # Note: the quote marks are necessary.
library(ggplot2)# Load ggplot2; note the quote marks are optional.
# Generate a scatterplot of price versus carat using ggplot.
# aes stands for "aesthetics"; geom_point() plots a point for each observation.
#
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) + geom_point()
+ geom_smooth(se = FALSE)
# Save the plot your project Plots folder.
dev.copy(jpeg,"./Plots/scatterplot.jpg", width = 800, height = 600)
dev.off() # Turn off the output to files.
#
#
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_smooth(se = FALSE)
# Save the plot your project Plots folder.
dev.copy(jpeg,'./Plots/lines.jpg', width = 800, height = 600)
dev.off() # Turn off the output to files.
k. Select the first few lines and click Run.
l. Fix any errors.
m. Run the rest of the code. Check for and fix any errors.
n. Determine the following statistics from the diamonds data frame:
4
a. Mean of price; use the following code: mean(diamonds$price).
b. Mean of carat.
c. Number of observations; use the code: nrow(diamonds)
d. Standard deviation of price; use the code sd(diamonds$price)
e. Standard deviation of carat.
f. Record the statistics on paper and then run the quiz exercise which asks you
to enter each of the statistics.