fast cast to matrix using Rfast #4134
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

After waiting several hours for
as.matrix.data.tableto convert a 3000 row 1.7 million column data.table to a matrix, I had a look around to see if there were faster ways of converting a data.frame/data.table to a matrix. This lead me to thedata.frame.to_matrix()function in theRfastpackage, which was able to perform the above task in under a minute.I've set about incorporating this into the
as.matrixfunction in thedata.tablepackage, but this has proved less straightforward than first anticipated. A few things I need to follow-up on:data.frame.to_matrixreturns an error when columns in the data.table are different types, this is intentional, as the code does not do type conversion/pre checks to enhance speed. We would need to do the type conversion ourselves but I'm not sure how best to detect the type that should be converted to across all columns.data.frame.to_matrix` itself is quite janky, it throws a few warnings when it shouldn't due to some bad if-statement checks on their end (hence the somewhat odd choice I've currently made of setting the rownames to NULL post-conversion if there are no rownames)
When supplied integer columns
data.frame.to_matrixconverts these to numerics, causing many tests to fail. This may need to be fixed upstream, and I also need to check how this function works on other non-numeric types (e.g. character vectors, factors, dates, etc) to make sure it returns the equivalent of as.matrix.For now, I've simply added an
Rfast=FALSEargument toas.matrix. If the above points can be solved we can simply remove this, otherwise we could potentially give the user the option of explicitly setting Rfast=TRUE to get a numeric matrix (this would be useful purely for therownamesfunctionality ofas.matrix.data.frame).