Tutorial 3: Manually Entering Data in R - A Better Way
Manually entering data into R is time consuming and difficult. Throughout this course, the
datasets will also become larger making this even more difficult.
The code excerpt below shows how much work manually entering some data would entail and
how much screen space the resulting data frame would take up.
Below will be two ways to make this process easier.
Textbook to R
One method is to copy the data from the textbook and follow the guide below to create a .csv and
load that .csv into R.
In the Quality Control textbook, data often comes in a table that looks like this.
A faster way to transfer this into R is to copy it onto the clipboard (Ctrl + C), and then have R to
read the data from clipboard.
Step 1: TextBook Clipboard
Select the data in the table and copy into clipboard via (Ctrl + c)
Step 2: Clipboard R
Use the read.delim function to get the data from the clipboard into R. In this case, the
delimiter that separates numbers is a space, so the sep parameter is set to “ “. If it was a comma
instead, the sep would be set to”,“.
There is no header in our data, so we set the header parameter to F for FALSE. We then use
the colnames function to give names to the columns in the matrix.
The data in the textbook is now in the R environment, assigned to variable x. Perfect!
Step 3: R .csv
The issue with this method is that if you change what is on your clipboard, your workflow
changes. In the interest of reproducibility, it is necessary to save your data in a safe place that
you can retrieve it from at a later time.
We can use the write.csv() function to save program data in a .csv file (comma separated
value file).
Note that write.csv() generated an additional column with auto-numbered row names. If
this is not desired, parameter row.names should be included as shown below:
Now, the resulting .csv file contains only two columns:
Step 4: .csv R
We can use the read.csv() function to load data saved in a .csv file into our R code.
The data frame y now contains the following data loaded from the .csv file:
The issue with this method is that any time the environment is cleared the data will have to be
copied to the clipboard again for each problem.
Loading data from a .csv file
An even better solution is to load the data from a .csv file is one exists. If a .csv file does not
exist, we can create one in Excel.
Creating a .csv
Any Excel worksheet can be saved as a .csv file. To save as a .csv file, you can change the file
type in the Save File Dialog (shown below).
From this point, we can load the .csv with the same code in the previous method. Since the .csv
was not written using R, the path location of the .csv file needs to be identified.
To find the path location, you can right click on the .csv file and select Properties. Below you
can see the Location of the file highlighted below.
As you can see from the line of code below, listing the entire path location can be very long (not
all of the path is seen here).
Note: If you copy the location, you will need to switch the backslashes to forward slashes as also
shown below.
When dealing with files (saving and loading data), it is convenient to identify a folder where all
the files are located. We do this with the setwd (set working directory) function. The code at
line 3 sets the working directory which is where the read.csv function at line 6 looks for the
file to read from. This is why the filename parameter in read.csv does not include the entire
file path.
The code above loads the flow width data from a .csv file into a data frame named flow_width.
To use a column of data from a data frame, use the $ operator to address a column by name.