I spent a good amount of time during last few weeks trying to understand the internals of Gatsby build process. There are many interesting approaches I encountered there, like using Redux store on the server side to communicate relevant information to different actors of the build process. I’m going to write a series of articles soon exploring this technique and others in depth.

One thing that gave me some troubles during debugging was Gatsby's reliance on Node’s child process functionality. It is used to run HTML builds in parallel for each static page. In most languages like Java or C++ parallelization is commonly achieved using threads, but since JavaScript has no threads, spawning multiple child processes is a way for Node to parallelize work. You can read more about it here.

Node.js includes two main modules to create a child process: child_process and a newer one worker_threads that emulates threads in other languages by sharing memory between a parent process and its children.

Under the hood, Gatsby uses jest-workers package to run build in parallel. It’s easy to discover if you inspect the implementation of the WorkerPool:

const Worker = require(`jest-worker`).default
const { cpuCoreCount } = require(`gatsby-core-utils`)

const create = () =>
  new Worker(require.resolve(`./child`), {
    numWorkers: cpuCoreCount(),
    forkOptions: {
      silent: false,
    },
  })

module.exports = {
  create,
}

The package by default uses Node’s child_process module, but can be switched to worker_threads by passing enableWorkerThreads: true when instantiating the worker.

This article is going to be useful for both front-end and back-end engineers. And if you’re not particularly interested in Gatsby.js, skip the next chapter and go directly to “Patching jest-workers to debug child process in Node” section.

Finding the way to debug child process in Gatsby.js

I think over the years of exploring sources of frontend frameworks and libraries I’ve learned quite a lot about debugging and reverse-engineering from sources. I’ve shared this information in the article Level Up Your Reverse Engineering Skills on the inDepth.dev platform.

To start debugging Gatsby I generated a project using Gatsby’s CLI with the gatsby new gatsby-site command. To run the application with Node’s debug inspector enabled I use the following command:

$ node --inspect-brk node_modules/gatsby/dist/bin/gatsby.js build

Then I find the process in the chrome://inspect and click on the inspect:

When Chrome dev tools is opened with a debugger paused on the first line I use standard controls to continue the execution:

One thing that I haven’t encountered before was debugging spawned child processes. As mentioned in the beginning, Gatsby uses this technique to run the rendering part the build stage.

As part of my exploration, I put a debugger statement inside the Header component like this:

I ran the process in the debug mode, opened the Chrome dev tools with a debugger paused in the first line and pressed “Resume” (F8) to go there. Surprisingly, my breakpoint inside the component wasn’t hit and the build finished without hitting the breakpoint. It wasn’t he first time my breakpoint didn’t hit, so I assumed that somehow this code isn’t being executed.

It took me a while to realize that the code inside the Header component was running in a separate child process and the Chrome debugger didn’t attach itself to these child processes.

This was very bad. If I can’t debug the script, I can’t understand the details. So I started googling and found a few issues on the web suggesting the Chrome doesn’t connect to child processes.

My first solution was to simply copy an entire render-html.js file with the renderHTML function that Gatsby runs in a spawned child process:

and trigger  it manually with node debugger passing required parameters. But it wasn’t really convenient and productive because I had to figure out all the time what environment variables to pass to this process to avoid errors.

At about this time I published my article on debugging Webpack builds, where I explained how to find the javascript file in node_modules to use with a node debugger. In a comment somebody suggested that it makes sense to use ndb as a node debugger instead to not bother with figuring out the required file location. For example, instead of this:

$ node --inspect-brk node_modules/webpack/bin/webpack.js

you could simply write:

$ ndb webpack

I started exploring this tool and found the following interesting feature:

which was exactly what I needed. So I ran the debugger with Gatsby like this to  test it:

$ ndb gatsby build

The ndb launched a debugging environment that looked like Chrome only in a dark mode:

This time the breakpoint in my component paused the execution:

And it even showed me all the spawned processes which was very convenient:

However, the problem with ndb is that it’s a bit buggy, encountering something like this is very common:

It was still very useful tool nevertheless. But I needed something better.

Patching jest-workers to debug child process in Node

As I was discussing my struggle with Victor, he suggested to try to patch the code that forked a child process and pass the inspect-brk option to the child processes.

I decided to first explore the possibility in a very basic application that uses jest-workers. I usually try to separate technologies and work with each as a separate unit. This later helps to understand how they are combined and know which part of the system causes a problem.

I went to the jest-worker docs page and used the first example demonstrated there. I just needed to replace ECMAScript module with CommonJS in the examples:

const Worker = require('jest-worker').default;

async function main() {
    const worker = new Worker(require.resolve('./worker'), {numWorkers: 1});
    const result = await worker.hello('Alice'); // "Hello, Alice"
    console.log(result);
}

main();
parent.js

and

exports.hello = function hello(param) {
    debugger;
    return 'Hello, ' + param;
};
worker.js

I put a debugger statement inside the hello function exported by the worker.js and ran the script in the debug mode:

$ node --inspect-brk index.js

Just as I was expecting, the debugger didn’t stop at the breakpoint because the hello function was running in a child process.

I needed to figure out how exactly to patch code. My thinking was that if I run the current process in the debug mode I pass in the --inspect-brk option to Node. So I assumed I basically need to pass the same option to all child processes.

Running  --inspect-brk enables the inspector agent to bind to default host 127.0.0.1 and listen on the default port 9229. Since multiple debuggers can’t run on the same port, I also needed to indicate different port to each child process. And that’s exactly what I did.

I navigated to node_modules\jest-worker\build\workers\ChildProcessWorker.js and added the following code:

Here’s a gist for you to copy if you want to try it yourself:

const execArgv = process.execArgv.filter(value=>!value.includes('inspect-brk'));
const randromNumber = Math.floor(Math.random() * 9 + 1);
execArgv.push("--inspect-brk=:700" + randromNumber);

Don't forget to actually pass the updated execArgv to the fork method as shown on the picture above.

You can also see here that in the original implementation jest removes all inspect and debug flags:

The small patch I added simply passes the inspect-brk option and port with the last number randomly generated to a child process through argV. I assumed that because I pass inspect-brk Node will pause the child process on the first line so it will wait until I attach a debugger and won’t miss any execution logic.

I decided to use the port 7000 for the parent process instead of the default one. So after patching the code I ran the following command:

$ node - inspect-brk=:7000 index.js

Configuring Chrome

Once that was done, I needed to configure Chrome to attach itself the range of ports I pass in the options, which in my case were 7001-7009. Here’s how to do it. Go to chrome://inspect and click on the Configure button:

Then just add the ports:

You’ll need to re-open the dialog after every addition to be able to add a new one. Alternatively, you can simply log the port number from the patch. Then you’ll only need to add one port to the Chrome debugger configuration.

Once I added the port 7000, I could see the parent process:

Once I attached the debugger to it and continued execution, the new child process appeared in the list just as I expected:

Clicking  on the inspect brought up Chrome DevTools and paused the code inside child_process module:

After resuming the execution, my breakpoint inside the hello function was hit:

I was expecting that and felt a rush of dopamine flowing into my brain. I always feel happy when I manage to find something interesting.

Patching `jest-worker` inside Gatsby.js

After successfully patching the standalone application, I could easily apply the same technique to patch jest-worker inside my Gatsby application.

So, to debug a child process in Gatsby navigate to node_modules\jest-worker\build\workers\ChildProcessWorker.js inside Gatsby project and add the same code I showed above, particularly:

By default, jest-workers spawns the number of child processes equal to CPU cores it detects on the machine. It can be checked using the os module:

const os = require('os')
console.log(os.cpus().length)

For me the number of CPU cores is 8.

So when I went through the process and opened chrome://inspect I expected to see eight child processes spawned, but there were only four:

I figured this was due to the fact that port numbers were generated from the range of [1–10] and hence most likely collided resulting in one instance replacing the other. If you need to see all instances, the range of ports should be expanded to mitigate the likelihood of collision. It might not be feasible in this case to add all ports to the Chrome, so the easy solution could be simply log the port number and add only the ports used by the Node inspector.

However, when debugging child processes in Gatsby, I recommend spawning just one child processes to avoid the hassle of configuring Chrome to attach to a range of IP addresses. To do that, go to node_modules\gatsby\dist\utils\worker\pool.js and add the following:

const create = () => new Worker(require.resolve(`./child`), {
  // numWorkers: cpuCoreCount(true),
  numWorkers: 1,   <---------- specify the number of child workers equal to 1
  forkOptions: {
    silent: false
  }
});

Another interesting detail is that you need to click on the process that is spawned to execute the build for that page that your component is part of. Otherwise the breakpoint won’t be hit even if a child process is spawned. To get there, you might need to click through some of them before you find the right one. I clicked on the first two before I found the one that rendered the page with my component. That’s when I finally had my breakpoint hit and execution paused:

As a small hint, you can check the paths variable inside renderHTML function in node_modules\gatsby\dist\utils\worker\render-html.js. to  find out which page is being rendered:

To pause in the every file, simply put a breakpoint there. To only stop at the certain page, add a conditional statement like this:

Or a conditional breakpoint.

That’s it folks, happy debugging! Ask questions or share your thoughts in the respective topic on indepth.community.