asdfasdv
Perform the following tasks in the function dplyr:
Include libraries dplyr and read hflights dataset from the CSV file [Link]
and save it as a data frame in the variable name hflights. Use the dim function of
the dplyr package to discover the dimensionality of the dataset. Running this
function on a dataset will return the number of records and columns in the dataset
and print it.
2. Use the functions head() and tail() to take a look at some instances of the
data. Try running this function on the hflights dataset and print it separately.
Expected Output:
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
1 5424 2011 1 1 6 1400 1500 AA 428
2 5425 2011 1 2 7 1401 1501 AA 428
3 5426 2011 1 3 1 1352 1502 AA 428
4 5427 2011 1 4 2 1403 1513 AA 428
5 5428 2011 1 5 3 1405 1507 AA 428
6 5429 2011 1 6 4 1359 1503 AA 428
TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance
1 N576AA 60 40 -10 0 IAH DFW 224
2 N557AA 60 45 -9 1 IAH DFW 224
3 N541AA 70 48 -8 -8 IAH DFW 224
4 N403AA 70 39 3 3 IAH DFW 224
5 N492AA 62 44 -3 5 IAH DFW 224
6 N262AA 64 45 -7 -1 IAH DFW 224
TaxiIn TaxiOut Cancelled CancellationCode Diverted
1 7 13 0 NA 0
2 6 9 0 NA 0
3 5 17 0 NA 0
4 9 22 0 NA 0
5 9 9 0 NA 0
6 6 13 0 NA 0
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
95 19272 2011 1 7 5 1630 1733 AA
96 19273 2011 1 8 6 1627 1736 AA
97 19274 2011 1 9 7 1835 1951 AA
98 19275 2011 1 10 1 1639 1740 AA
99 19276 2011 1 11 2 1752 1855 AA
100 19277 2011 1 12 3 1631 1739 AA
FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest
95 1121 N525AA 63 43 -12 0 IAH DFW
96 1121 N583AA 69 45 -9 -3 IAH DFW
97 1121 N574AA 76 50 126 125 IAH DFW
98 1121 N531AA 61 41 -5 9 IAH DFW
99 1121 N586AA 63 41 70 82 IAH DFW
100 1121 N468AA 68 44 -6 1 IAH DFW
Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
95 224 6 14 0 NA 0
96 224 13 11 0 NA 0
97 224 9 17 0 NA 0
98 224 8 12 0 NA 0
99 224 8 14 0 NA 0
100 224 5 19 0 NA 0
3. Running the head function on a local data frame, prints the given number of rows
of the data frame. Try running this function on the hflights dataset to print 20
records.
Expected Output:
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
1 5424 2011 1 1 6 1400 1500 AA 428
2 5425 2011 1 2 7 1401 1501 AA 428
3 5426 2011 1 3 1 1352 1502 AA 428
4 5427 2011 1 4 2 1403 1513 AA 428
5 5428 2011 1 5 3 1405 1507 AA 428
6 5429 2011 1 6 4 1359 1503 AA 428
7 5430 2011 1 7 5 1359 1509 AA 428
8 5431 2011 1 8 6 1355 1454 AA 428
9 5432 2011 1 9 7 1443 1554 AA 428
10 5433 2011 1 10 1 1443 1553 AA 428
11 5434 2011 1 11 2 1429 1539 AA 428
12 5435 2011 1 12 3 1419 1515 AA 428
13 5436 2011 1 13 4 1358 1501 AA 428
14 5437 2011 1 14 5 1357 1504 AA 428
15 5438 2011 1 15 6 1359 1459 AA 428
16 5439 2011 1 16 7 1359 1509 AA 428
17 5440 2011 1 17 1 1530 1634 AA 428
18 5441 2011 1 18 2 1408 1508 AA 428
19 5442 2011 1 19 3 1356 1503 AA 428
20 5443 2011 1 20 4 1507 1622 AA 428
TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance
1 N576AA 60 40 -10 0 IAH DFW 224
2 N557AA 60 45 -9 1 IAH DFW 224
3 N541AA 70 48 -8 -8 IAH DFW 224
4 N403AA 70 39 3 3 IAH DFW 224
5 N492AA 62 44 -3 5 IAH DFW 224
6 N262AA 64 45 -7 -1 IAH DFW 224
7 N493AA 70 43 -1 -1 IAH DFW 224
8 N477AA 59 40 -16 -5 IAH DFW 224
9 N476AA 71 41 44 43 IAH DFW 224
10 N504AA 70 45 43 43 IAH DFW 224
11 N565AA 70 42 29 29 IAH DFW 224
12 N577AA 56 41 5 19 IAH DFW 224
13 N476AA 63 44 -9 -2 IAH DFW 224
14 N552AA 67 47 -6 -3 IAH DFW 224
15 N462AA 60 44 -11 -1 IAH DFW 224
16 N555AA 70 41 -1 -1 IAH DFW 224
17 N518AA 64 48 84 90 IAH DFW 224
18 N507AA 60 42 -2 8 IAH DFW 224
19 N523AA 67 46 -7 -4 IAH DFW 224
20 N425AA 75 42 72 67 IAH DFW 224
TaxiIn TaxiOut Cancelled CancellationCode Diverted
1 7 13 0 NA 0
2 6 9 0 NA 0
3 5 17 0 NA 0
4 9 22 0 NA 0
5 9 9 0 NA 0
6 6 13 0 NA 0
7 12 15 0 NA 0
8 7 12 0 NA 0
9 8 22 0 NA 0
10 6 19 0 NA 0
11 8 20 0 NA 0
12 4 11 0 NA 0
13 6 13 0 NA 0
14 5 15 0 NA 0
15 6 10 0 NA 0
16 12 17 0 NA 0
17 8 8 0 NA 0
18 7 11 0 NA 0
19 10 11 0 NA 0
20 9 24 0 NA 0
4. glimpse() is like a transposed version of print. Columns run down the page, and
data runs across.
This makes it possible to view every column in a data frame.
Run this function on the hflights dataset. Note: Here don’t print, instead run only
glimpse() of the dataset.
Expected Output:
Rows: 100
Columns: 22
$ X <int> 5424, 5425, 5426, 5427, 5428, 5429, 5430, 5431, 5432…
$ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
$ Month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ DayofMonth <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ DayOfWeek <int> 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2…
$ DepTime <int> 1400, 1401, 1352, 1403, 1405, 1359, 1359, 1355, 1443…
$ ArrTime <int> 1500, 1501, 1502, 1513, 1507, 1503, 1509, 1454, 1554…
$ UniqueCarrier <fct> AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, …
$ FlightNum <int> 428, 428, 428, 428, 428, 428, 428, 428, 428, 428, 42…
$ TailNum <fct> N576AA, N557AA, N541AA, N403AA, N492AA, N262AA, N493…
$ ActualElapsedTime <int> 60, 60, 70, 70, 62, 64, 70, 59, 71, 70, 70, 56, 63, …
$ AirTime <int> 40, 45, 48, 39, 44, 45, 43, 40, 41, 45, 42, 41, 44, …
$ ArrDelay <int> -10, -9, -8, 3, -3, -7, -1, -16, 44, 43, 29, 5, -9, …
$ DepDelay <int> 0, 1, -8, 3, 5, -1, -1, -5, 43, 43, 29, 19, -2, -3, …
$ Origin <fct> IAH, IAH, IAH, IAH, IAH, IAH, IAH, IAH, IAH, IAH, IA…
$ Dest <fct> DFW, DFW, DFW, DFW, DFW, DFW, DFW, DFW, DFW, DFW, DF…
$ Distance <int> 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 22…
$ TaxiIn <int> 7, 6, 5, 9, 9, 6, 12, 7, 8, 6, 8, 4, 6, 5, 6, 12, 8,…
$ TaxiOut <int> 13, 9, 17, 22, 9, 13, 15, 12, 22, 19, 20, 11, 13, 15…
$ Cancelled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CancellationCode <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
5. Perform the following tasks:
*create a data frame hflights1 which will have first 50 rows of the data set
hflights
*Convert hflights1 into a tbl.
*To see how tbl behaves like data frames, save the UniqueCarrier column of hflights
tbl as an object named carriers, by using standard R syntax and print it.
Expected Output:
[1] AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA
[26] AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA
Levels: AA
6. Perform the following tasks:
*Create a list abrCarrier which will contain actual carrier names corresponding to
the values in the variable UniqueCarrier:
abrCarrier <- c(
"AA" = "American", "AS" = "Alaska", "B6" = "JetBlue", "CO" = "Continental",
"DL" = "Delta", "OO" = "SkyWest", "UA" = "United", "US" = "US_Airways",
"WN" = "Southwest", "EV" = "Atlantic_Southeast", "F9" = "Frontier",
"FL" = "AirTran", "MQ" = "American_Eagle", "XE" = "ExpressJet", "YV" = "Mesa"
)
*Add a new column Carrier to hflights which will contain the actual carrier name by
referring to abrCarrier and the UniqueCarrier column of hflights.
*hflights$Carrier <- abrCarrier[hflights$UniqueCarrier]
*Print the first 10 rows of the dataset to view the values in the newly added
column.
Expected Output:
X Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
1 5424 2011 1 1 6 1400 1500 AA 428
2 5425 2011 1 2 7 1401 1501 AA 428
3 5426 2011 1 3 1 1352 1502 AA 428
4 5427 2011 1 4 2 1403 1513 AA 428
5 5428 2011 1 5 3 1405 1507 AA 428
6 5429 2011 1 6 4 1359 1503 AA 428
7 5430 2011 1 7 5 1359 1509 AA 428
8 5431 2011 1 8 6 1355 1454 AA 428
9 5432 2011 1 9 7 1443 1554 AA 428
10 5433 2011 1 10 1 1443 1553 AA 428
TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance
1 N576AA 60 40 -10 0 IAH DFW 224
2 N557AA 60 45 -9 1 IAH DFW 224
3 N541AA 70 48 -8 -8 IAH DFW 224
4 N403AA 70 39 3 3 IAH DFW 224
5 N492AA 62 44 -3 5 IAH DFW 224
6 N262AA 64 45 -7 -1 IAH DFW 224
7 N493AA 70 43 -1 -1 IAH DFW 224
8 N477AA 59 40 -16 -5 IAH DFW 224
9 N476AA 71 41 44 43 IAH DFW 224
10 N504AA 70 45 43 43 IAH DFW 224
TaxiIn TaxiOut Cancelled CancellationCode Diverted Carrier
1 7 13 0 NA 0 American
2 6 9 0 NA 0 American
3 5 17 0 NA 0 American
4 9 22 0 NA 0 American
5 9 9 0 NA 0 American
6 6 13 0 NA 0 American
7 12 15 0 NA 0 American
8 7 12 0 NA 0 American
9 8 22 0 NA 0 American
10 6 19 0 NA 0 American