In this vignette, we compare the computation time/memory usage of dense matrix and sparse Matrix.

Allocation and length

We begin with an analysis of the time/memory it takes to create these objects. In the atime code below, we allocate a vector for comparison, and we specify a result function which computes the length of the object x created by each expression. This means atime will save length as a function of data size N (in addition to time and memory).

library(Matrix)
N_seq <- unique(as.integer(10^seq(0,7,by=0.25)))
vec.mat.result <- atime::atime(
  N=N_seq,
  vector=numeric(N),
  matrix=matrix(0, N, N),
  Matrix=Matrix(0, N, N),
  result=function(x)data.frame(length=length(x)))
plot(vec.mat.result)
#> log-10 transformation introduced infinite values.
#> log-10 transformation introduced infinite values.
#> log-10 transformation introduced infinite values.

The plot above shows three panels, one for each unit.

Comparison with bench::press

An alternative method to compute asymptotic timings is via bench::press, which provides functionality for parameterized benchmarking (similar to atime_grid). Because atime() has special treatment of the N parameter, the code required for asymptotic measurement is relatively simple; compare the atime code above to the bench::press code below, which measures the same asymptotic quantities (seconds, kilobytes, length).

seconds.limit <- 0.01
done.vec <- NULL
measure.vars <- c("seconds","kilobytes","length")
press_result <- bench::press(N = N_seq, {
  exprs <- function(...)as.list(match.call()[-1])
  elist <- exprs(
    vector=numeric(N),
    matrix=matrix(0, N, N),
    Matrix=Matrix(0, N, N))
  elist[names(done.vec)] <- NA #Don't run exprs which already exceeded limit.
  mark.args <- c(elist, list(iterations=10, check=FALSE))
  mark.result <- do.call(bench::mark, mark.args)
  ## Rename some columns for easier interpretation.
  desc.vec <- attr(mark.result$expression, "description")
  mark.result$description <- desc.vec
  mark.result$seconds <- as.numeric(mark.result$median)
  mark.result$kilobytes <- as.numeric(mark.result$mem_alloc/1024)
  ## Compute length column to measure in addition to time/memory.
  mark.result$length <- NA
  for(desc.i in seq_along(desc.vec)){
    description <- desc.vec[[desc.i]]
    result <- eval(elist[[description]])
    mark.result$length[desc.i] <- length(result)
  }
  ## Set NA time/memory/length for exprs which were not run.
  mark.result[desc.vec %in% names(done.vec), measure.vars] <- NA
  ## If expr went over time limit, indicate it is done.
  over.limit <- mark.result$seconds > seconds.limit
  over.desc <- desc.vec[is.finite(mark.result$seconds) & over.limit]
  done.vec[over.desc] <<- TRUE
  mark.result
})
#> Running with:
#>           N
#>  1        1
#>  2        3
#>  3        5
#>  4       10
#>  5       17
#>  6       31
#>  7       56
#>  8      100
#>  9      177
#> 10      316
#> 11      562
#> 12     1000
#> 13     1778
#> 14     3162
#> 15     5623
#> 16    10000
#> 17    17782
#> 18    31622
#> 19    56234
#> 20   100000
#> 21   177827
#> 22   316227
#> 23   562341
#> 24  1000000
#> 25  1778279
#> 26  3162277
#> 27  5623413
#> 28 10000000
#> Some expressions had a GC in every iteration; so filtering is disabled.

The bench::press code above is relatively complicated, because it re-implements two functions that are provided by atime:

Below we visualize the results from bench::press,

library(data.table)
(press_long <- melt(
  data.table(press_result),
  measure.vars=measure.vars,
  id.vars=c("N","description"),
  na.rm=TRUE))
N description variable value
1 vector seconds 0.000000e+00
1 matrix seconds 0.000000e+00
1 Matrix seconds 0.000000e+00
3 vector seconds 0.000000e+00
3 matrix seconds 0.000000e+00
â‹® â‹® â‹® â‹®
3162277 Matrix length 9.999996e+12
5623413 vector length 5.623413e+06
5623413 Matrix length 3.162277e+13
10000000 vector length 1.000000e+07
10000000 Matrix length 1.000000e+14
if(require(ggplot2)){
  gg <- ggplot()+
    ggtitle("bench::press results for comparison")+
    facet_grid(variable ~ ., labeller=label_both, scales="free")+
    geom_line(aes(
      N, value,
      color=description),
      data=press_long)+
    scale_x_log10(limits=c(NA, max(press_long$N*2)))+
    scale_y_log10("")
  if(requireNamespace("directlabels")){
    directlabels::direct.label(gg,"right.polygons")
  }else gg
}
#> log-10 transformation introduced infinite values.
#> log-10 transformation introduced infinite values.
#> log-10 transformation introduced infinite values.

We can see that the plot from atime and bench::press are consistent.

Complexity class estimation with atime

Below we estimate the best asymptotic complexity classes:

vec.mat.best <- atime::references_best(vec.mat.result)
plot(vec.mat.best)
#> log-10 transformation introduced infinite values.