Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loaders that use childProcess.fork lead to endless recursion of processes #47615

Open
Jamesernator opened this issue Apr 19, 2023 · 19 comments
Open
Labels
loaders Issues and PRs related to ES module loaders

Comments

@Jamesernator
Copy link

Jamesernator commented Apr 19, 2023

Version

v20.0.0

Platform

Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

loaders

What steps will reproduce the bug?

Use childProcess.fork within a loader, for example in the following loader:

// loader.mjs
import childProcess from "node:child_process";

const cp = childProcess.fork("./worker.mjs");
// worker.mjs
// Nothing needs to be in here for the behaviour to occur
// test.mjs
import readline from "node:readline";

function input(prompt = "") {
    return new Promise((resolve) => {
        const rl = readline.createInterface({
            input: process.stdin,
            output: process.stdout,
        });
        rl.question(prompt, (answer) => {
            resolve(answer);
            rl.close();
        });
    });
}

// This is included so we can control when the process ends
await input(`[Continue] `);

And run node --loader ./loader.mjs ./test.mjs.

How often does it reproduce? Is there a required condition?

This happens consistently.

What is the expected behavior? Why is that the expected behavior?

The short answer is I'm not sure what should be done here. But it seems problematic that libraries (such as typescript) that uses childProcess.fork cannot be used within a loader without endless recursion of processes.

Perhaps the solution is simply not to carry over loaders onto child processes that are forked within loaders.

What do you see instead?

Loaders get continually created spamming the console with:

Loaded loader
(node:414589) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
Loaded loader
(node:414589) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
Loaded loader
(node:414589) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)

while also rapidly growing memory usage from the endless chain of processes.

Additional information

I noticed this because ts-node rapidly memory leaks in Node 20. I experimented it a bit and found that TypeScript is calling childProcess.fork which causes this error. i.e. A minimal loader that exhibits this behaviour would just be:

// loader.mjs
import ts from "typescript";

// Haven't even written any hooks yet

Using this loader with node --loader ./loader.mjs ./test.mjs will rapidly consume all memory on the machine until the OS kills the process (if not terminated sooner).

@Jamesernator
Copy link
Author

Jamesernator commented Apr 19, 2023

Perhaps the solution is simply not to carry over loaders onto child processes that are forked within loaders.

Oh also, if such a solution were to be used, for chained loaders this only needs to be the last loader in the current chain that is removed. i.e. If we have node --loader ./loader1.mjs --loader ./loader2.mjs ./some-file.mjs then if loader2.mjs uses childProcess.fork, the forked process can have --loader ./loader1.mjs just fine (as long as this continues inductively).

@bnoordhuis
Copy link
Member

This seems to me firmly a case of "just don't do that."

@Jamesernator
Copy link
Author

This seems to me firmly a case of "just don't do that."

Third party libraries, like typescript, are out of my control. This change has actively broken the ability to import typescript within a loader at all (because TS uses child processes to split up work).

@bnoordhuis
Copy link
Member

Then you should work with the typescript people to resolve that. I don't think it's node's job to protect against fork bombs.

@JALabba
Copy link

JALabba commented Apr 20, 2023

Running your test.mjs through esbuild-kit/tsx (run ts files, similar to ts-node ) straight up without forking a child process is crashing.
doing npx tsx test.mjs and waiting will crash with a warning about custom ESM loaders. Similar to this issue.

I tried adding process.on('warning', e => console.warn(e.stack)); to the program, and this (small part of the whole) error ->
FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory
is created.

This is like the error I came to the issues to find. This is only v20.0.0. I was using a similar readline prompt, and creating a listener on stdin. I thought that creating the listener was going infinite and creating the memory leak, but my problem might be the same as this as tsx uses esbuild-kit/esm-loader to transform typescript to ESM.

Given that I can do node test.mjs without crashing, it points to loaders.

@Jamesernator
Copy link
Author

Jamesernator commented Apr 20, 2023

I don't think it's node's job to protect against fork bombs.

Well in this case I don't think the behaviour even makes sense, the threaded implementation of loaders still has execArgv set within the worker to include the loader:

// loader.mjs

// ["--loader", "./loader.mjs"]
console.log(process.execArgv);

This means forked processes receive this by default in their execArgv.

But this doesn't make sense, the loader thread isn't loaded using loader.mjs (otherwise any loader would be an immediate fork bomb), it's loaded using the next loader in the chain.

@bnoordhuis
Copy link
Member

That's changing the subject / moving the goal posts, isn't it? Your original bug report is about child_process.fork() and I'm of the opinion it isn't a bug.

@Jamesernator
Copy link
Author

That's changing the subject / moving the goal posts, isn't it?

Well the problem comes from the fact that childProcess.fork defaults to using process.execArgv.

Ultimately I don't really care about what solutions are applied, but the fact that existing libraries are broken within a loader compared to previously is to me the problem here.

The reason I think changing process.execArgv is a particularly good solution though is because from the perspective of the loader thread having ["--loader", "./loader.mjs"] is simply wrong, the loader thread does not have such a loader applied. From it's point of view no loaders are applied yet process.execArgv contains one.

@targos
Copy link
Member

targos commented Apr 20, 2023

@nodejs/loaders

@targos targos added the loaders Issues and PRs related to ES module loaders label Apr 20, 2023
@jacobq
Copy link
Contributor

jacobq commented Apr 20, 2023

Could this be a duplicate of #47566?

@JakobJingleheimer
Copy link
Contributor

Could this be a duplicate of #47566?

That's what I was thinking. Let's land the fix for #47566 and see.

@aduh95
Copy link
Contributor

aduh95 commented Apr 20, 2023

I can confirm it's not a duplicate of #47566, #47615 (comment) seems to be correct. I agree that not supporting this use case is tempting, but not very manageable indeed.

FWIW, as a workaround, you might be interested in sucrase as an alternative for transpile TS to JS that doesn't use forks.

@arcanis
Copy link
Contributor

arcanis commented Apr 20, 2023

I imagine that the problem would also happen if the loaders are set by NODE_OPTIONS, not necessarily the CLI itself?

@GeoffreyBooth
Copy link
Member

So the obvious solution seems to be to eliminate loader info from process.execArgv (and likewise NODE_OPTIONS) passed into loaders, but surely that has side effects, I would assume? Like loaders would have no way of discovering what other loaders might have been registered? Which maybe isn’t such a big deal, as we could provide a specific API for that if there’s a use case for such.

@koshic
Copy link

koshic commented Apr 20, 2023

@GeoffreyBooth there is no problem to communicate with another loader in complex setup - nextResolve / nextLoad can be used as bi-directional channel (as an example, esmock client detects loader via this way, not by argv / NODE_OPTIONS parsing).

@Jamesernator
Copy link
Author

Jamesernator commented Apr 20, 2023

Like loaders would have no way of discovering what other loaders might have been registered?

Just to be clear the suggestion I am making here is that the active loader is removed from the list.

So if we have the command node --loader ./loader1.mjs --loader ./loader2.mjs file.mjs then the process.execArgv for each would be as follows:

  • file.mjs -> ["--loader", "./loader1.mjs", "--loader", "./loader2.mjs"]
  • loader2.mjs -> ["--loader", "./loader1.mjs"]
  • loader1.mjs -> []

Essentially from the perspective of each loader thread, their process.execArgv is set such that they know the next loaders in the chain.

Put another way, from the perspective of each thread here the list of loaders is simply the loaders the apply to the current thread.

@aduh95
Copy link
Contributor

aduh95 commented Apr 21, 2023

Just to be clear the suggestion I am making here is that the active loader is removed from the list.

I think that assumes that the fork call would be made at the top-level of the loader, but that's not necessarily the case, and there are no real way of detecting which loader made the fork call AFAIK.

@Jamesernator
Copy link
Author

I think that assumes that the fork call would be made at the top-level of the loader, but that's not necessarily the case, and there are no real way of detecting which loader made the fork call AFAIK.

Are all loaders in one thread? How does this work if loader2.mjs needs resolution from loader1.mjs and so forth?

@aduh95
Copy link
Contributor

aduh95 commented Apr 21, 2023

Loaders are loaded serially in the loader thread, so when loader2.mjs is loading, loader1.mjs is already loaded and will intercept all the resolve and load requests. Note that if loader2.mjs calls import() outside of its top-level, it will go through its own resolve and load hooks, but that's not as bad as a fork bomb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
loaders Issues and PRs related to ES module loaders
Projects
None yet
Development

No branches or pull requests

10 participants