How do I perform parallel processing in R?
Monday 21 August 2023 |
Category:
* We recommend you have administrative rights and internet access as Windows users will need to install Rtools and grant R networking access to assigning tasks to individual cores (as parallel processing on Windows requires the use of ‘sockets’). Parallel processing can take time to set up, and unless you are doing tasks well suited to parallel processing (e.g. analysing multiple very large files), it is probably better (faster) to run your analyses as normal. Nevertheless, instructions for parallel processing below:
-
- (Windows only): Install RTools from https://cran.r-project.org/bin/windows/Rtools/
-
- Load the ‘parallel’ package (part of R core’s code after 2.14.0, i.e. no need to install):
library(parallel)
-
- Install and load the ‘doParallel’ package.
-
- This requires additional dependencies, i.e. ‘foreach’ and ‘iterators’. If R is connected to the internet, these should be automatically installed – otherwise you will need to manually install them.
-
- Install and load the ‘doParallel’ package.
install.packages('doParallel')
library(doParallel)
-
- Detect the number of cores:
clusters <- detectCores()
-
- Specify the number of cores you want to use.
-
- This will depend on just how much of your computer’s processing ability you want to devote to the task.
-
- Though you CAN use all cores, it can be useful to keep one core free to avoid crashes and allow you some processing capability to check whether the process is still running, etc. In the example below we keep one core free for the above reason.
-
- Specify the number of cores you want to use.
cl <- makeCluster(detectCores()-1)
registerDoParallel(cl)
-
- Load variables and packages to each core.
-
- By default, variables and packages which are already loaded into your R environment will not exist or be recognised by individual cores. You will therefore need to parse variables and packages over to each core using the function below
-
- Load variables and packages to each core.
#create an example variable
x <- 99
#parse variable to cluster
clusterExport(cl, "x")
#parse a required R package to the core, e.g. tidyr. Note, if you do not have not tidyr installed, you will first need to call install.packages('tidyr')
clusterEvalQ(cl, library(tidyr))
#Note that you can also parse multiple variables and libraries by replacing the variable with { } and entering multiple inbetween:
dummyArray <- c(42,36,21)
n <- length(dummyArray)
clusterExport(cl, {
"dummyArray"
"n"
})
clusterEvalQ(cl, {
library(tidyr) #Should be installed (as above)
library(ggplot2) #if not installed, will require installing
})
-
- You can now tell R to process the data. It will automatically assign the next task to the next free core.
#Example parallel processing - e.g. add x to each number in dummyArray. We combine each result to a new column using cbind. We could add them to new rows using rbind.
result <- foreach(a=1:n, .combine = cbind) %dopar% {
dummyArray[a]+x
}
result
#equivalent simple 'for' loop
result2 <- NULL
for(a in 1:n){
result2 <- cbind(result2, dummyArray[a]+x)
}
result2
The above example is an overly simplistic example. Using parallel processing on such a simple process is not efficient as there would be a neglible difference in processing time (/in actual fact, compiling data from different cores can lead to longer processing times when the actual analytical processing is not the bottleneck). Parallel processing should therefore be kept to CPU-intensive processing tasks (e.g. within machine learning).
Though demonstrating this with an ML example is beyond the scope of this, we can instead demonstrate the difference in timing to create a very large matrix of 144 columns, with each column consisting of 1 million random points, with some quick calculations on each value thrown in for good measure:
#Sequentially create each column and combine
system.time(m <- foreach(i=1:144, .combine=cbind) %do% { (100/2.3)^3.17*matrix(rnorm(1000*1000)) } )
#my total elapsed time was 4.74
#create each column in parallel processing and combine
system.time(m <- foreach(i=1:144, .combine=cbind) %dopar% { (100/2.3)^3.17*matrix(rnorm(1000*1000))} )
#my total elapsed time was 2.75
The calculation was 1.72x faster via parallel processing.
Note that before running, I had intentionally set R to use all 12 of my laptop’s cores, and had somewhat optimised the process such that no core should really be idle at any point (hence the deliberate choosing of 144 repetitions). To be fair, I also checked with 200 columns, which still performed 1.28x faster via parallel processing. Despite the faster speed, the above example again is not very well suited to parallel processing (in addition to the time and difficulty setting it up).
Data intensive machine learning tasks can be many multiples faster (approaching multiples of the number of cores of your CPU), particularly when analysing large datasets where other processors would otherwise go unused whilst the initial process is completed.
After finishing processing, you may choose to stop the cluster to return R back to it’s default (using one core) state:
stopCluster(cl)