Series of events - that is how the asynchronous event loop in Node.js works. Because of it, we can benefit from non-blocking I/O operations and wait for many scheduled events in parallel. The event loop allows Node.js to process many long operations in a single thread. That is enough for a simple backend server, but it works only if we stick to the rule: Do not block the event loop. How can it be blocked? By heavy computations, infinite loops, synchronous I/O, or other operations that make the engine still busy. Here is a place for another solution: threads.
Writing multi-threaded programs is considered to be difficult. We need to take care of shared memory, race conditions, and many more issues that can cause an unpredictable error. It often happens that these errors cannot be easily detected or repeated for debugging. Of course, programming languages provide some features for preventing such situations. Usually, they provide mutexes, atomic types and operations, some ways of communication between threads, etc. But that makes these languages complicated.
Worker threads
In this post, I am writing about Node.js, which is a JavaScript environment, and JavaScript was always meant to be easy. So how the authors of Node deal with threads? In 2018 they released Node v10.5.0 with an experimental feature - worker threads, which became stable in January 2019. Let’s see how it works!
As I mentioned before, Node usually runs in one thread. We have one process, one thread, one piece of executed code, one memory heap, one event loop, and one Node instance. When we use workers, we still have one process, but many threads. For each thread, there are individual: code, memory, event loop, and JS engine.
We can have some shared memory with Worker threads using for example SharedArrayBuffer
. But there exists another way of communication. Threads can send messages using ports. When a Worker thread sends a message with data to their parent, the data is copied, and the copy is accessible for the parent. And vice versa.
Worker threads seem to be great. We avoid problems with race conditions and shared memory, they are easy to use (they just execute the code we provide passing i.e. filename of their script), and overall look useful. Unfortunately, they have some disadvantages. They are expensive. It takes some time to start a new Worker because it needs to run a new Node instance. Copying large amounts of data between threads does not seem effective as well. Of course, there are ways of handling these problems. One of them is called thread pool. It is a design pattern, which is used to avoid starting new threads all the time by keeping some threads always alive and waiting for tasks. I am going to show you a simple example of a something-like-this solution.
Let’s draw!
I was wondering what would be a nice example of using threads, and I thought about plotting the Mandelbrot set. It is shown at the beginning of the post, and that particular plot was generated using the script I present below.
In our code, we will use two libraries - mathjs for dealing with complex numbers and canvas for drawing. You can install it with this command:
npm install mathjs canvas
To understand what we are going to obtain, and to measure the efficiency of Worker threads, at first we will write the synchronous version of the program. This is the function, which calculates the values of every pixel in our plot.
// Calculates values of pixels
const plotMandelbrot = (size, boundaries) => {
// Canvas data is Uint8ClampedArray
const imgData = new Uint8ClampedArray(size.w * size.h * 4);
const samples = 4; // Amount of samples per pixel
const maxIter = 256; // Limit of iterations per sample
const color = {r: 256, g: 64, b: 0}; // Color of our plot
const setColor = (x, y, {r, g, b}) => {
// Put a color in the array of pixels
const index = 4 * (size.w * y + x);
imgData[index + 0] = r;
imgData[index + 1] = g;
imgData[index + 2] = b;
imgData[index + 3] = 255;
}
for(let y = 0; y < size.h; y++) {
for(let x = 0; x < size.w; x++) {
let sum = 0;
for(let t = 0; t < samples; t++){
// We are making a random sample and calculate a point in space for it
let rx = x + Math.random() - 0.5;
let ry = y + Math.random() - 0.5;
let c = math.complex(boundaries.x + boundaries.w * rx / size.w, boundaries.y + boundaries.h * ry / size.h);
let z = 0;
let n = 0;
// The Mandelbrot set iteration z = z^2 + c
while(n++ < maxIter && math.abs(z) < 2)
z = math.add(math.pow(z, 2), c);
sum += n;
}
let avg = sum / samples / maxIter;
// Put the pixel in the array
setColor(x, y, {r: avg * color.r, g: avg * color.g, b: avg * color.b});
}
}
// Returns array of pixels
return imgData;
}
Let’s see how the synchronous version looks like:
const writeImage = (canvas, filename) => {
const out = fs.createWriteStream(__dirname + '/' + filename);
const stream = canvas.createPNGStream();
stream.pipe(out);
out.on('finish', () => console.log('The PNG file was created.'));
}
// Prints some info about arguments
if(process.argv.length != 9){
console.log(`Usage: ${process.argv[1]} [IMAGE_WIDTH] [IMAGE_HEIGHT] [PLOT_CORNER_X] [PLOT_CORNER_Y] [PLOT_WIDTH] [PLOT_HEIGHT] [FILENAME]`);
process.exit(0);
}
// Size of the generated image
const size = {
w: +process.argv[2],
h: +process.argv[3]
}
// The part of space we want to draw
const boundaries = {
x: parseFloat(process.argv[4]),
y: parseFloat(process.argv[5]),
w: parseFloat(process.argv[6]),
h: parseFloat(process.argv[7])
}
console.time("total");
// Our canvas
const canvas = createCanvas(size.w, size.h);
const ctx = canvas.getContext('2d');
const img = ctx.getImageData(0, 0, size.w, size.h);
// Here we do the plot
img.data.set(plotMandelbrot(size, boundaries))
ctx.putImageData(img, 0, 0);
console.timeEnd("total");
writeImage(canvas, process.argv[8]);
Our programme takes some arguments from the terminal, so we need to run it like this:
node mandelbrot2.js 1000 1000 -0.65 -0.72 0.1875 0.1875 test35.png
If your Node has version prior to 11.7.0, you need to add the flag --experimental-worker
after node
.
Here is a result of calling our script with the parameters above:
It took 317 seconds on my laptop to draw this picture. Of course, we can do it much faster, but the program I wrote makes it very detailed and computationally heavy to show you the difference between the single-threaded version and the version with Workers. So now let’s write it with them!
Drawing in parallel
It is very important when working with many threads to design the program properly. The first idea about writing a drawing program with multiple threads could be to subdivide the picture into as many pieces as many Workers we have, and each Worker would compute its one part. But it is worth to mention that the computations might not be equal for every part of the image. In our case, the brighter parts of it take much more time that the dark ones, so if we had two threads and divided the image horizontally into halves, the second thread would have much more work to do, and we would wait for it when the first thread would have already finished.
The second option, which I consider better, is to start some threads at the beginning, subdivide the image into many pieces (i.e. strips with a height of 25 pixels), and then give them to threads one by one as soon as they finish their previous job. Computations are processed equally on every thread and there is no need to wait for any thread as they will do their overall job almost at the same time.
Let’s see how it could be implemented with Workers:
// Ckecks in which thread we are - the main or the worker's thread
if(isMainThread){
const { createCanvas, loadImage } = require('canvas');
if(process.argv.length != 10){
console.log(`Usage: ${process.argv[1]} [IMAGE_WIDTH] [IMAGE_HEIGHT] [PLOT_CORNER_X] [PLOT_CORNER_Y] [PLOT_WIDTH] [PLOT_HEIGHT] [FILENAME] [THREADS]`);
process.exit(0);
}
// Here would be reading parameters from argv
console.time("total");
const canvas = createCanvas(size.w, size.h);
const ctx = canvas.getContext('2d');
// Place for our workers
const workers = new Set();
let workersCount = +process.argv[9];
// Counter of already scheduled lines to compute by workers and height of one line
let sheduledLines = 0;
const chunk = 25;
// When the plot is done, we want to save it
const finish = () => {
writeImage(canvas, process.argv[8]);
console.timeEnd("total");
}
// Gives a task for a worker
const runTask = (worker) => {
// Is there still something to do?
if(sheduledLines < size.h){
// Sends a data about the region that the worker will calculate
// We divide the whole area into strips of 25 pixels
worker.postMessage({
size: {
h: chunk,
w: size.w
},
pos: sheduledLines,
boundaries: {
x: boundaries.x,
y: boundaries.y + sheduledLines / size.h * boundaries.h,
w: boundaries.w,
h: chunk / size.h * boundaries.h
}
});
sheduledLines += chunk;
}
else {
// If there is nothing else to calculate, stop the worker
worker.terminate();
workers.delete(worker);
// If this was the last worker, finish the plot
if(workers.size == 0){
finish();
}
}
}
for(let i = 0; i < workersCount; i++){
// Creates a worker, which will execute the current file
let worker = new Worker(__filename);
workers.add(worker);
// What to do when the worker has done its job
worker.on('message', (data) => {
// We put the calculated pixels onto the main canvas
const img = ctx.getImageData(0, 0, size.w, chunk);
img.data.set(data.imgData);
ctx.putImageData(img, 0, data.pos);
// We give next job
runTask(worker);
});
// Worker starts calculating its first piece of image
runTask(worker);
}
}
// Here starts the worker
else {
// When it receives a data from the main thread, it starts calculations
parentPort.on('message', (data) => {
const imgData = plotMandelbrot(data.size, data.boundaries);
// Sends results to the main thread
parentPort.postMessage({
imgData,
pos: data.pos
});
});
}
Note that I have added another parameter read from process.argv
- number of threads. Functions writeImage
, plotMandelbrot
, and reading arguments stay the same.
We have more code, but it was worth to write it. Calling our script with the same parameters as above and 2 threads yields the same image generated in only 164 seconds, compared to 317 seconds with the synchronous version above. It is almost 2 times faster! But if you think that increasing the number of threads would make the process faster, you are rather wrong. My experiments revealed that with 3 threads, the times are very similar and with more than 3, it is getting even worse. That is probably because of the issue I stated above. Threads in Node are expensive. It takes lots of time to manage just the thread, even not including the executed code, and our computers do not usually have many cores.
There is also another issue, which took me over an hour to solve. Not every Node package (like from npm
) works well with Workers. I had a problem with the canvas
. When using it withing the Worker’s part of code, I got a strange error:
Error: Module did not self-register
In the beginning, I did not know what was going on, but by trial-and-error debugging, I found out that this module cannot be used within Workers. I made a little research about it, analyzed the source code of Node.js, browsed the Node Docs, and finally, I found that there could some problems with using modules written with C/C++, as they do not work well with multiple contexts, like with Workers. The issue seems pretty complicated, but if you want to know more, you can read the docs about it.
More nice examples!
I really like the results of the program. Here are some results of playing with it, you can click on them to see in better quality:
I have also encountered an interesting fail when writing the pararell version.
And at the end, something which looks for me like a sun: