-
Notifications
You must be signed in to change notification settings - Fork 78
Closed
Description
Is your feature request related to a problem? Please describe.
It does not seem to be possible to run openxlsx::getSheetNames in parallel. Here's my code:
getSheetNames <- function(excel_files, parallel = FALSE, BPPARAM = BiocParallel::bpparam()) {
if (parallel) {
sheet_list <- BiocParallel::bplapply(excel_files,
FUN = function(excel_file) {
if (!file.exists(excel_file)) {
return(NULL)
}
openxlsx::getSheetNames(excel_file)
},
BPPARAM = BPPARAM
)
} else {
sheet_list <- lapply(excel_files,
FUN = function(excel_file) {
if (!file.exists(excel_file)) {
warning("File '", excel_file, "' could not be found.")
return(NULL)
}
openxlsx::getSheetNames(excel_file)
}
)
}
names(sheet_list) <- excel_files
sheet_list
}Now, getSheetNames(excel_files, parallel = TRUE) sometimes throws an error
Error: BiocParallel errors
element index: 1
first error: cannot open file '/tmp/Rtmpxttndw/_excelXMLRead/[Content_Types].xml': No such file or directory or
Error: BiocParallel errors
element index: 1
first error: cannot open the connection Describe the solution you'd like
My guess would be that each of the parallel tasks writes the same temporary file.
Maybe using a directory or a file based on tempfile() would be a solution.
Describe alternatives you've considered
Parallelization does work with readxl::excel_sheets, but that is much slower than openxlsx::getSheetNames.
Another alternative is to not run it in parallel, but with a growing number of excel files this will be more time consuming.
Metadata
Metadata
Assignees
Labels
No labels