0% found this document useful (0 votes)
18 views4 pages

Dplyr

The document outlines tasks to be performed using the dplyr library in R, specifically focusing on the hflights dataset. It includes loading the dataset, checking its dimensions, viewing the first and last few records, and using the glimpse function to get an overview of the data structure. The expected outputs for each task are provided to guide the implementation.

Uploaded by

xavo_27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Dplyr

The document outlines tasks to be performed using the dplyr library in R, specifically focusing on the hflights dataset. It includes loading the dataset, checking its dimensions, viewing the first and last few records, and using the glimpse function to get an overview of the data structure. The expected outputs for each task are provided to guide the implementation.

Uploaded by

xavo_27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

asdfasdv

Perform the following tasks in the function dplyr:

Include libraries dplyr and read hflights dataset from the CSV file [Link]
and save it as a data frame in the variable name hflights. Use the dim function of
the dplyr package to discover the dimensionality of the dataset. Running this
function on a dataset will return the number of records and columns in the dataset
and print it.

2. Use the functions head() and tail() to take a look at some instances of the
data. Try running this function on the hflights dataset and print it separately.

Expected Output:
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
1 5424 2011 1 1 6 1400 1500 AA 428
2 5425 2011 1 2 7 1401 1501 AA 428
3 5426 2011 1 3 1 1352 1502 AA 428
4 5427 2011 1 4 2 1403 1513 AA 428
5 5428 2011 1 5 3 1405 1507 AA 428
6 5429 2011 1 6 4 1359 1503 AA 428
TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance
1 N576AA 60 40 -10 0 IAH DFW 224
2 N557AA 60 45 -9 1 IAH DFW 224
3 N541AA 70 48 -8 -8 IAH DFW 224
4 N403AA 70 39 3 3 IAH DFW 224
5 N492AA 62 44 -3 5 IAH DFW 224
6 N262AA 64 45 -7 -1 IAH DFW 224
TaxiIn TaxiOut Cancelled CancellationCode Diverted
1 7 13 0 NA 0
2 6 9 0 NA 0
3 5 17 0 NA 0
4 9 22 0 NA 0
5 9 9 0 NA 0
6 6 13 0 NA 0
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
95 19272 2011 1 7 5 1630 1733 AA
96 19273 2011 1 8 6 1627 1736 AA
97 19274 2011 1 9 7 1835 1951 AA
98 19275 2011 1 10 1 1639 1740 AA
99 19276 2011 1 11 2 1752 1855 AA
100 19277 2011 1 12 3 1631 1739 AA
FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest
95 1121 N525AA 63 43 -12 0 IAH DFW
96 1121 N583AA 69 45 -9 -3 IAH DFW
97 1121 N574AA 76 50 126 125 IAH DFW
98 1121 N531AA 61 41 -5 9 IAH DFW
99 1121 N586AA 63 41 70 82 IAH DFW
100 1121 N468AA 68 44 -6 1 IAH DFW
Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
95 224 6 14 0 NA 0
96 224 13 11 0 NA 0
97 224 9 17 0 NA 0
98 224 8 12 0 NA 0
99 224 8 14 0 NA 0
100 224 5 19 0 NA 0

3. Running the head function on a local data frame, prints the given number of rows
of the data frame. Try running this function on the hflights dataset to print 20
records.
Expected Output:
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
1 5424 2011 1 1 6 1400 1500 AA 428
2 5425 2011 1 2 7 1401 1501 AA 428
3 5426 2011 1 3 1 1352 1502 AA 428
4 5427 2011 1 4 2 1403 1513 AA 428
5 5428 2011 1 5 3 1405 1507 AA 428
6 5429 2011 1 6 4 1359 1503 AA 428
7 5430 2011 1 7 5 1359 1509 AA 428
8 5431 2011 1 8 6 1355 1454 AA 428
9 5432 2011 1 9 7 1443 1554 AA 428
10 5433 2011 1 10 1 1443 1553 AA 428
11 5434 2011 1 11 2 1429 1539 AA 428
12 5435 2011 1 12 3 1419 1515 AA 428
13 5436 2011 1 13 4 1358 1501 AA 428
14 5437 2011 1 14 5 1357 1504 AA 428
15 5438 2011 1 15 6 1359 1459 AA 428
16 5439 2011 1 16 7 1359 1509 AA 428
17 5440 2011 1 17 1 1530 1634 AA 428
18 5441 2011 1 18 2 1408 1508 AA 428
19 5442 2011 1 19 3 1356 1503 AA 428
20 5443 2011 1 20 4 1507 1622 AA 428
TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance
1 N576AA 60 40 -10 0 IAH DFW 224
2 N557AA 60 45 -9 1 IAH DFW 224
3 N541AA 70 48 -8 -8 IAH DFW 224
4 N403AA 70 39 3 3 IAH DFW 224
5 N492AA 62 44 -3 5 IAH DFW 224
6 N262AA 64 45 -7 -1 IAH DFW 224
7 N493AA 70 43 -1 -1 IAH DFW 224
8 N477AA 59 40 -16 -5 IAH DFW 224
9 N476AA 71 41 44 43 IAH DFW 224
10 N504AA 70 45 43 43 IAH DFW 224
11 N565AA 70 42 29 29 IAH DFW 224
12 N577AA 56 41 5 19 IAH DFW 224
13 N476AA 63 44 -9 -2 IAH DFW 224
14 N552AA 67 47 -6 -3 IAH DFW 224
15 N462AA 60 44 -11 -1 IAH DFW 224
16 N555AA 70 41 -1 -1 IAH DFW 224
17 N518AA 64 48 84 90 IAH DFW 224
18 N507AA 60 42 -2 8 IAH DFW 224
19 N523AA 67 46 -7 -4 IAH DFW 224
20 N425AA 75 42 72 67 IAH DFW 224
TaxiIn TaxiOut Cancelled CancellationCode Diverted
1 7 13 0 NA 0
2 6 9 0 NA 0
3 5 17 0 NA 0
4 9 22 0 NA 0
5 9 9 0 NA 0
6 6 13 0 NA 0
7 12 15 0 NA 0
8 7 12 0 NA 0
9 8 22 0 NA 0
10 6 19 0 NA 0
11 8 20 0 NA 0
12 4 11 0 NA 0
13 6 13 0 NA 0
14 5 15 0 NA 0
15 6 10 0 NA 0
16 12 17 0 NA 0
17 8 8 0 NA 0
18 7 11 0 NA 0
19 10 11 0 NA 0
20 9 24 0 NA 0

4. glimpse() is like a transposed version of print. Columns run down the page, and
data runs across.
This makes it possible to view every column in a data frame.
Run this function on the hflights dataset. Note: Here don’t print, instead run only
glimpse() of the dataset.

Expected Output:
Rows: 100
Columns: 22
$ X <int> 5424, 5425, 5426, 5427, 5428, 5429, 5430, 5431, 5432…
$ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
$ Month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ DayofMonth <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ DayOfWeek <int> 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2…
$ DepTime <int> 1400, 1401, 1352, 1403, 1405, 1359, 1359, 1355, 1443…
$ ArrTime <int> 1500, 1501, 1502, 1513, 1507, 1503, 1509, 1454, 1554…
$ UniqueCarrier <fct> AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, …
$ FlightNum <int> 428, 428, 428, 428, 428, 428, 428, 428, 428, 428, 42…
$ TailNum <fct> N576AA, N557AA, N541AA, N403AA, N492AA, N262AA, N493…
$ ActualElapsedTime <int> 60, 60, 70, 70, 62, 64, 70, 59, 71, 70, 70, 56, 63, …
$ AirTime <int> 40, 45, 48, 39, 44, 45, 43, 40, 41, 45, 42, 41, 44, …
$ ArrDelay <int> -10, -9, -8, 3, -3, -7, -1, -16, 44, 43, 29, 5, -9, …
$ DepDelay <int> 0, 1, -8, 3, 5, -1, -1, -5, 43, 43, 29, 19, -2, -3, …
$ Origin <fct> IAH, IAH, IAH, IAH, IAH, IAH, IAH, IAH, IAH, IAH, IA…
$ Dest <fct> DFW, DFW, DFW, DFW, DFW, DFW, DFW, DFW, DFW, DFW, DF…
$ Distance <int> 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 22…
$ TaxiIn <int> 7, 6, 5, 9, 9, 6, 12, 7, 8, 6, 8, 4, 6, 5, 6, 12, 8,…
$ TaxiOut <int> 13, 9, 17, 22, 9, 13, 15, 12, 22, 19, 20, 11, 13, 15…
$ Cancelled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CancellationCode <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

5. Perform the following tasks:


*create a data frame hflights1 which will have first 50 rows of the data set
hflights
*Convert hflights1 into a tbl.
*To see how tbl behaves like data frames, save the UniqueCarrier column of hflights
tbl as an object named carriers, by using standard R syntax and print it.

Expected Output:
[1] AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA
[26] AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA
Levels: AA

6. Perform the following tasks:


*Create a list abrCarrier which will contain actual carrier names corresponding to
the values in the variable UniqueCarrier:

abrCarrier <- c(
"AA" = "American", "AS" = "Alaska", "B6" = "JetBlue", "CO" = "Continental",
"DL" = "Delta", "OO" = "SkyWest", "UA" = "United", "US" = "US_Airways",
"WN" = "Southwest", "EV" = "Atlantic_Southeast", "F9" = "Frontier",
"FL" = "AirTran", "MQ" = "American_Eagle", "XE" = "ExpressJet", "YV" = "Mesa"
)
*Add a new column Carrier to hflights which will contain the actual carrier name by
referring to abrCarrier and the UniqueCarrier column of hflights.
*hflights$Carrier <- abrCarrier[hflights$UniqueCarrier]
*Print the first 10 rows of the dataset to view the values in the newly added
column.

Expected Output:
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
1 5424 2011 1 1 6 1400 1500 AA 428
2 5425 2011 1 2 7 1401 1501 AA 428
3 5426 2011 1 3 1 1352 1502 AA 428
4 5427 2011 1 4 2 1403 1513 AA 428
5 5428 2011 1 5 3 1405 1507 AA 428
6 5429 2011 1 6 4 1359 1503 AA 428
7 5430 2011 1 7 5 1359 1509 AA 428
8 5431 2011 1 8 6 1355 1454 AA 428
9 5432 2011 1 9 7 1443 1554 AA 428
10 5433 2011 1 10 1 1443 1553 AA 428
TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance
1 N576AA 60 40 -10 0 IAH DFW 224
2 N557AA 60 45 -9 1 IAH DFW 224
3 N541AA 70 48 -8 -8 IAH DFW 224
4 N403AA 70 39 3 3 IAH DFW 224
5 N492AA 62 44 -3 5 IAH DFW 224
6 N262AA 64 45 -7 -1 IAH DFW 224
7 N493AA 70 43 -1 -1 IAH DFW 224
8 N477AA 59 40 -16 -5 IAH DFW 224
9 N476AA 71 41 44 43 IAH DFW 224
10 N504AA 70 45 43 43 IAH DFW 224
TaxiIn TaxiOut Cancelled CancellationCode Diverted Carrier
1 7 13 0 NA 0 American
2 6 9 0 NA 0 American
3 5 17 0 NA 0 American
4 9 22 0 NA 0 American
5 9 9 0 NA 0 American
6 6 13 0 NA 0 American
7 12 15 0 NA 0 American
8 7 12 0 NA 0 American
9 8 22 0 NA 0 American
10 6 19 0 NA 0 American

You might also like