How do I perform parallel processing in R?

Monday 21 August 2023 | by Kris Chan

Category:

* We recommend you have administrative rights and internet access as Windows users will need to install Rtools and grant R networking access to assigning tasks to individual cores (as parallel processing on Windows requires the use of ‘sockets’). Parallel processing can take time to set up, and unless you are doing tasks well suited to parallel processing (e.g. analysing multiple very large files), it is probably better (faster) to run your analyses as normal. Nevertheless, instructions for parallel processing below:

1. (Windows only): Install RTools from https://cran.r-project.org/bin/windows/Rtools/

1. Load the ‘parallel’ package (part of R core’s code after 2.14.0, i.e. no need to install):

library(parallel)

1. Install and load the ‘doParallel’ package.
  - - This requires additional dependencies, i.e. ‘foreach’ and ‘iterators’. If R is connected to the internet, these should be automatically installed – otherwise you will need to manually install them.

install.packages('doParallel')
library(doParallel)

1. Detect the number of cores:

clusters <- detectCores()

1. Specify the number of cores you want to use.
  - - This will depend on just how much of your computer’s processing ability you want to devote to the task.
  - - Though you CAN use all cores, it can be useful to keep one core free to avoid crashes and allow you some processing capability to check whether the process is still running, etc. In the example below we keep one core free for the above reason.

cl <- makeCluster(detectCores()-1)
registerDoParallel(cl)

1. Load variables and packages to each core.
  - - By default, variables and packages which are already loaded into your R environment will not exist or be recognised by individual cores. You will therefore need to parse variables and packages over to each core using the function below

#create an example variable
x <- 99

#parse variable to cluster
clusterExport(cl, "x")

#parse a required R package to the core, e.g. tidyr. Note, if you do not have not tidyr installed, you will first need to call install.packages('tidyr')
clusterEvalQ(cl, library(tidyr))

#Note that you can also parse multiple variables and libraries by replacing the variable with { } and entering multiple inbetween:
dummyArray <- c(42,36,21)
n <- length(dummyArray)

clusterExport(cl, {
  "dummyArray"
  "n"
})

clusterEvalQ(cl, {
  library(tidyr)   #Should be installed (as above)
  library(ggplot2) #if not installed, will require installing
})

1. You can now tell R to process the data. It will automatically assign the next task to the next free core.

#Example parallel processing - e.g. add x to each number in dummyArray. We combine each result to a new column using cbind. We could add them to new rows using rbind.
result <- foreach(a=1:n, .combine = cbind) %dopar% {
  dummyArray[a]+x
}
result

#equivalent simple 'for' loop
result2 <- NULL
for(a in 1:n){
  result2 <- cbind(result2, dummyArray[a]+x)
}
result2

The above example is an overly simplistic example. Using parallel processing on such a simple process is not efficient as there would be a neglible difference in processing time (/in actual fact, compiling data from different cores can lead to longer processing times when the actual analytical processing is not the bottleneck). Parallel processing should therefore be kept to CPU-intensive processing tasks (e.g. within machine learning). Though demonstrating this with an ML example is beyond the scope of this, we can instead demonstrate the difference in timing to create a very large matrix of 144 columns, with each column consisting of 1 million random points, with some quick calculations on each value thrown in for good measure:

#Sequentially create each column and combine
system.time(m <- foreach(i=1:144, .combine=cbind) %do% { (100/2.3)^3.17*matrix(rnorm(1000*1000)) } )
#my total elapsed time was 4.74

#create each column in parallel processing and combine
system.time(m <- foreach(i=1:144, .combine=cbind) %dopar% { (100/2.3)^3.17*matrix(rnorm(1000*1000))} )
#my total elapsed time was 2.75

The calculation was 1.72x faster via parallel processing. Note that before running, I had intentionally set R to use all 12 of my laptop’s cores, and had somewhat optimised the process such that no core should really be idle at any point (hence the deliberate choosing of 144 repetitions). To be fair, I also checked with 200 columns, which still performed 1.28x faster via parallel processing. Despite the faster speed, the above example again is not very well suited to parallel processing (in addition to the time and difficulty setting it up). Data intensive machine learning tasks can be many multiples faster (approaching multiples of the number of cores of your CPU), particularly when analysing large datasets where other processors would otherwise go unused whilst the initial process is completed. After finishing processing, you may choose to stop the cluster to return R back to it’s default (using one core) state:

stopCluster(cl)

Disadvantages and caveats

Note that some processes (e.g. sequential processes) can’t be run in parallel. As demonstrated above, there is no direct way to perform parallel processing upon/with a single connected object or variable. Similarly, parsing results from parallel processing back to the main R environment is not very intuitive; the lack of ability to use the print command from tasks on parallel processing certainly means that parallel processing in R is not for the faint of heart as you ultimately must have faith that your analysis won’t perform unexpectedly and minor error messages will likely not be showable.

Run example code snippet

The example below is a stripped-down version of the final code above.