Advanced R programming
Computational Complexity
Loops <- O(N)
Exam 20/10/1017
1 - Provide a command to empty your workspace.
rm(list=ls())
package.skeleton()
2 - Explain in your own words
Directories and files created:
DESCRIPTION -> Description is a file that stores all the important metadata of our package.
This metadata can be other packages that we will need (Imports) in order to use them in our
package. This file contains Imports, Suggests and Depends.
NAMESPACE -> The NAMESPACE file, as the name states, ensures that every name in the
package will be unique for every object. Contains importFrom () – What is used from other
packages and export () – What functions do we want to export / make available for users
Read – and – delete - me -> It is a text file that contains some important instructions about
customizing the package, so we need to make sure to read it before deleting it.
R -> Directory that contains R code files
Data-> Directory that contains data files
Man-> Directory that contains documentation files (.Rd)
3 - Tests:
Always run this command in order to use test that functions
library(testthat)
Exam 20/10/1017
a) (1p) What are the potential risks if you start working in R with a non–empty workspace?
If we start working in R with a non-empty workspace, some variables or functions can still be
declared from previous sessions and that can lead to wrong outputs in our program????
c) (3p) The function package.skeleton() creates a number of files and directories. However,
other directories are possible. List at least three other directories that can be present in an R
package and write IN YOUR OWN WORDS a few (not more than three) sentences what is
each ones role in an R package.
Other Directories can be:
R: Directory that contains the code of the package (R files for example, of each function)
Vignettes: Contain the RMD file that generate the package vignette.
Tests: Contain the tests (inside TestThat folder) for each function. Usually unitary tests, input
and output checks, expected errors check , etc.
Man: Directory with documentation
Doc: Contain the package vignettes (with rmd. and r. extensions) and generated in other
extensions (like html for example – how users visualize the vignette ).
Inst:
b) (2p) Using the function package.skeleton() create an R package that contains only one
function. This function should take two numbers and return their sum.
sum_2_number <-function(x,y)
{
output<- sum(x,y)
return(output)
}
sum_2_number(2,5)
package.skeleton(list = c("sum_2_number"), name = "mypkg")
Exam 24/10/2019
a) (3p) Briefly discuss, in a few sentences, some good coding practices like writing style,
commenting, variable, function naming e.t.c.
Good coding practices improve the understanding of our code to other users that will read it
and perhaps change it, specially in a company environment. We should always comment our
code (at a least a brief explanation of what we are doing). Functions should have names that
match it’s objective, so it can be quickly perceptible by reading it. Ex: Function that calculates
the profit of some company – Profit_calculation(company) .
b) (2p) What information should be provided in a function’s documentation?
In a functions documentation there should be provided an explanation of the function/
description (it’s objective); How to use this function, for example, which values and which
types (character, numeric, vectors etc.) can we input and what type of parameters we can use.
Some examples of it’s usage should be provided as well; The source that was used (Wikipedia
link for example)) should be also be shown.
Exam 22/10/2020
a) (2p) Describe how one makes S3 methods implemented in a package for some class
available for the user of the package.
If we want to make a S3 method available for the user of the package we just need to export it,
if we use roxygen, or just add it manually to the namespace.
What is the Depends field for in the DESCRIPTION file of a R package? It attaches packages
Why does CRAN not allow much to be listed there? Too heavy for the environment
With Depends Packages here will be attached. NOT recommended, heavy on the environment,
CRAN has a limit on the number of packages in Depends. Can be used to specify version of R.
3 fields inside Description (ISD) :
Imports: These packages must be present (or installed) when your package is installed.
However, attaching your package library(your package) will load them (not attach). To use
functions from them it is recommended to call them in your package
aspackagename::function().
Suggests: Your package can use these functions, but does not require them, e.g. datasets, for
tests, vignettes. Before using functions from them you need to check if they are available,
require Namespace()
Depends: Packages here will be attached. NOT recommended, heavy on the environment,
CRAN has a limit on the number of packages in Depends. Can be used to specify version of R.
NAMESPACE(LABS!!) What is used from other packages and provided to others!
importFrom(package, function): package has to be listed in DESCRIPTION, do not
import(package)(i.e. all) but only what you need.
No need to call function aspackage::function now but RECOMMENDED export(function): what
you make available for your users export Pattern(regular expression): make available functions
with name matching a pattern, e.g. does not start with.
Exam 2018/02/28
a) (2p) Both library() and require() are used to load packages. Explain the difference
between them.
The library() and require() can be used to attach and load add-on packages which are
already installed.
The library() by default returns an error if the requested package does not exist.
The require() is designed to be used inside functions as it gives a warning message and
returns a logical value say, FALSE if the requested package is not found and TRUE if the
package is loaded.
b) (2p) When writing your own package it will nearly always depend on functions from
other packages. Is it a good idea to use either library() or require() to load these
packages inside your package? Why or why not? How should your package load the
necessary dependent packages?
we should never use library() or require() in a package, because they affect the user's
search list, possibly causing errors for the user.
You can import functions from another package by listing it in the Imports: field in
the DESCRIPTION file and specifying the imports in the NAMESPACE file
c) (1p) You need to use two functions that are provided by two different packages but
they have the same name, i.e. function f1() from package pkg1 and function f1() from
package pkg2. How do you solve this problem?
We can solve this problem by specifying the package before the function like the
following example: pkg1::f1() and pkg2::f1()
Exam 2018/10/23
a) (2p) What is the difference between <- and <<-?
<- Declares objects inside it’s current environment
<<- Declares objects inside the parent environment
b) (1p) If you don’t supply an explicit environment, where do ls() and rm() look?
It will look in the current environment where ls() and rm() is called.
c) (2p) What is the difference between the fields Depends and Imports in the
DESCRIPTION file in an R package? What needs to be provided in the importFrom()
statements in the NAMESPACE file?
Depends will attach packages (not load them) – Too heavy for the environment. CRAN has a
limit of packages that can be here
Imports : Package in imports must be installed in our package
Namespace: ImportFrom(package, function ) package has to be listed in DESCRIPTION, do not
import(package)(i.e. all) but only what you need.
Exam 2020/02/28
a) (3p) When using the S3 system of OO programming how can one implement different
functions with the same name that exhibit different behavior depending on the class of the
input parameter? For example how to construct separate implementations of the function
my fun(x) so that different (appropriate) behavior will be exhibited when x is of class my
class1 and my class2?
We have to implement methods - The class is used when selecting methods, functions
that behave differently depending on the class of their input.
b) (2p) In an R package what is the correct way to make an S3 method associated with a
particular class available to the user?
Using Roxygen2 for documenting, we need to add @export function.class , and the export will
also be automatically added to NAMESPACE (generated automatically by roxygen2) in the
formatting bellow . Example: @export print.foo
S3method (name function, name class)
export (name function, name class)
Exam 2020/10/22
(2p) Describe how one makes S3 methods implemented in a package for some class available
for the user of the package.
Using Roxygen2 for documenting, we need to add @export function.class , and the export will
also be automatically added to NAMESPACE (generated automatically by roxygen2) in the
formatting bellow . Example: @export print.foo
S3method (name function, name class)
export (name function, name class)
b) (3p) What is the Depends field for in the DESCRIPTION file of a R package? Why does
CRAN not allow much to be listed there?
Depends in the description file will attach the package (not load them). CRAN has a limit for
the packages inside Depends because it’s too heavy for the environment.
Exam 2020/12/02
a) (2p) Describe briefly R’s coercion mechanism for different variable types
When you call a function with an argument of the wrong type, R will try to coerce values to a
different type so that the function will work. There are two types of coercion that occur
automatically in R: Explicit and Implicit coercion.
In explicit coercion, we can change one data type to another data type by applying function.
Implicit Coercion: When conversion occurs by itself in R.
We input numeric and character data in an object. R converts numeric data to
character data by itself.
We input logical and numeric data in an object. Logical data convert to numeric data
implicitly.
Summary:
Logical values are converted to numbers: TRUE is converted to 1 and FALSE to 0.
Values are converted to the simplest type required to represent all information.
b) (3p) What will be the results of running 1+"a", 1+TRUE and 1+1L. Provide and explain the
output. What is R doing?
1+"a" : R is trying to convert data to the simplest type to represent all information, which will
be character. So, 1 will be converted to “1”. However, it gives an error while summing because
the elements are not numeric.
1+TRUE : R converts TRUE to 1 , so the output is 2 since the sum is applied.
1+1L: R is converting 1L (integer) to double (same type as 1) , and then applies the sum. Output
is 2.
a) (2p) Linear regression, where a response depends linearly on a set of predictors, is a key
tool in Statistics. Implement a function that takes as its input the model matrix (matrix of
predictors’ measurements), X, and the response vector y and returns the vector of estimates
of the regression coefficients. These coefficients have to be calculated directly from the least
squares formula
data(iris)
formula = Petal.Length ~ Species
X = model.matrix(object = formula, data = iris)
y = iris[[all.vars(formula)[[1]]]]
y= iris$Petal.Length
b) (1p) What is the computational complexity of your implementation in terms of the
dimensions of X? Assume that for an m × m matrix inverting it takes O(m3 ) time
Do inverse -> O(m3)
Multiply matrixes -> O(m3)
Matrix transpost -> O(m2) -> ) (m*n = m*m=m2 in this case)
Multiply matrix by constant -> O(m2) (m*n = m*m=m2 in this case)
O(m2) * O(m3) *O(m3) *O(m2) *O(m2)
O(m12)
How to call function inside RC
build_mgb<- function(name,speed)
{
MGB_5044 <- MGB(name,speed)
MGB_504<- MGB_5044$build_mgb()
MGB_504 <- build_mgb(name="Hopewell",speed=46)
How to call print:
How to call plot:
London Rc tem de ser um objecto que retorna o dataframe para o input
Exam 2018/12/10
a) (2p) Provide the names of R’s object–oriented systems. Very briefly write how they differ
between each other.
S3 :
Simple
Methods belong to functions.
S4 :
More formal
Methods belong to functions
Has @fields
Can have parents
RC :
Methods belong to objects
Objects can have fields and $methods
b) (3p) The behaviour of many functions, e.g. print() or plot() is different when applied to
different types of input. How is this achieved?
When implementing methods, the class can be defined and functions can be set to behave
differently depending on the class of their input.
Functions can be programmed to receive different types of output and respond according to
that?
OR
It is a generic function which means that new printing methods can be easily added
for new classes.
OR
Due to R’s coercion mechanism, depending on the input, R will try to convert the input in
order to try to avoid errors in the code. It will try to convert the simplest type to represent all
the information (“a” , 1 ) -> (“a”,”1”) .Also converts bolean to 1 and zero (true and false
respectively). sum(TRUE,1) = 2
a) (3p) What R package is required to run code in parallel? Provide an example of a function
from that package that is used to run parallel code and discuss its most important
parameters.
“parallel” Package The parallel package in R can perform tasks in parallel by providing
the ability to allocate cores to R. The working involves finding the number of cores in
the system and allocating all of them or a subset to make a cluster.
# Use the detectCores() function to find the number of cores in system
no_cores <- detectCores()
makeCluster
parLapply()
b) (2p) What sort of R functions are directly parallelizable, that is with minimum code
changes?
Apply-family functions, like Lapply() for example, can be replaced for the ones in package
“parallel”, like parLapply(), for example, which will make a parallelizable approach with
minimal code changes
Doubts
Exam 20/10/2017:
b) (1p) What is the complexity of your solution in terms of the number of elements of x?
What does this mean ?
The computational complexity of the first function will be O(n) , since it has a loop inside that
will go through every 1 to n .
Assuming that min/max function also have to go through every element of the vector, the the
computational complexity of the second function must also be O(n).
d) d) (2p) Implement unit tests that check both find max value() and find max value 2(). Check
if the functions return the correct values and identify wrong input.
Should this be implemented with the normal test_that functions? But if we are not creating a
package, its not possible right? how should we do it ?
Just run library(testthat)
General Question
2) How can I call my RC function through
## S3 and RC call to create_household() function
household_1 <- create_household()(address="Circle Drive 3",number_of_devices=5)
??
Mine is called this way :
household_1<-househould(address="Circle Drive 3",number_of_devices=5)
household_1$create_household()