Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: add releaseLoadedModule #50618

Closed
wants to merge 5 commits into from
Closed

module: add releaseLoadedModule #50618

wants to merge 5 commits into from

Conversation

laverdet
Copy link
Contributor

@laverdet laverdet commented Nov 7, 2023

This is a new utility as discussed in #49442 which provides the ability to release a loaded module. Releasing a module means that it can be garbage collected (if it is abandoned) and reloaded (by importing it again with the same specifier).

This is a new utility as discussed in nodejs#49442 which provides the ability
to release a loaded module. Releasing a module means that it can be
garbage collected (if it is abandoned) and reloaded (by importing it
again with the same specifier).
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. labels Nov 7, 2023
@laverdet
Copy link
Contributor Author

laverdet commented Nov 7, 2023

@GeoffreyBooth as discussed. This also clears the resolve cache, and therefore the parameters must be (specifier, parent) instead of a resolved URL. The name releaseLoadedModule isn't entirely accurate since there's a chance it could only clear a resolve cache (i.e. if you invoke the function twice in a row with different specifier+parent pairs which resolve to the same module URL).

@aduh95
Copy link
Contributor

aduh95 commented Nov 8, 2023

The name releaseLoadedModule isn't entirely accurate since there's a chance it could only clear a resolve cache (i.e. if you invoke the function twice in a row with different specifier+parent pairs which resolve to the same module URL).

Or if you call import.meta.resolve with that specifier.

Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs documentation.

/**
* Evict saved result of `resolve` and `load` for the given import parameters.
*/
evict(specifier, parentURL, importAttributes = {}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
evict(specifier, parentURL, importAttributes = {}) {
evict(specifier, parentURL, importAttributes = kEmptyObject) {

Comment on lines 236 to 242
const requestKey = this.#resolveCache.serializeKey(specifier, importAttributes);
let didEvict = this.#resolveCache.delete(requestKey, parentURL);
if (this.loadCache.delete(resolved.url, importAttributes.type)) {
// nb: Careful with short-circuits here, we want to run both deletes unconditionally.
didEvict = true;
}
return didEvict;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps evict from the load cache first? I know these calls are sync, but the load hook is not; if evict is called while a module is being loaded (so after it’s resolved but before load has finished and it’s been added to the load cache) then a module could end up deleted from the resolve cache but not from the load cache. Whereas if you delete from the load cache first, and only on success of that do you continue on to delete from the resolve cache, we avoid leaving the caches in a “broken” state where the resolution stays cached by the load got uncached.

This raises a broader question: do we need to delete from the resolve cache? What are the use cases for this API, and could they be fulfilled by deleting only from the load cache? As in, when would we need to redo the resolution? Perhaps after new module customization hooks were registered, that would affect resolution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the resolution is nice because otherwise updates to package.json cannot be hot loaded and would force an app restart. But deleting the resolve cache in the manner in my diff is definitely a half measure. I think a more reliable feature would need to record all the resolutions to a given loaded module, so that when the loaded module is evicted all resolutions to it are deleted as well. The abstract specification method HostLoadImportedModule is decomposed into our resolve and load stages-- each with their own cache. To actually clear that abstract method requires more machinery which I'm not sure is worth it.

lib/module.js Outdated
const { SourceMap } = require('internal/source_map/source_map');

Module.findSourceMap = findSourceMap;
Module.register = register;
Module.releaseLoadedModule = releaseLoadedModule;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be available to the main thread as its effects are forbidden by ECMAScript spec. The best we can do is to have it available only in the loader thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain? Is this just a way of saying that hooks need to be registered before users are allowed to use this? Why?

How would this work from the other thread since the caches are on the main thread? Cross thread communication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the relevant specification is https://tc39.es/ecma262/#sec-HostLoadImportedModule:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we're aware that this API explicitly violates spec if used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that the spec gives some guarantees that await import("a") will always return the same value from the same module (as long as the module resolution has succeeded at least once)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, as an alternative, we could choose an approach where an explicitly opt-in command line flag is required to enable this kind of non-standard ESM behaviors.

node --esm-module-ext

This makes it absolutely clear that the individual running the node.js process is deciding on the non-standard behavior as opposed to some arbitrary library they are depending on.

Copy link
Member

@GeoffreyBooth GeoffreyBooth Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it absolutely clear that the individual running the node.js process is deciding on the non-standard behavior as opposed to some arbitrary library they are depending on.

I don’t see the need for this. You might as well call the flag --allow-me-to-violate-spec. Why should a user need to opt into this via flag when they’re already opting into it by using the API, either explicitly or via a dependency that they knowingly added? If I were the author of Vite, I would be annoyed; I would need to choose between the following:

  1. Leak memory.
  2. Tell users to always run Vite with --allow-me-to-violate-spec with an explanation that doing so prevents a memory leak. (Sounds like a bad spec, doesn’t it?)
  3. Run Node in a child process with this flag set, so that it’s hidden from the user.

The third option seems like the obvious choice, and now we’re back to the user not having explicitly opted into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that decision should be revisited, then - otherwise an onslaught of security vulnerabilities is likely to be forthcoming where random packages use the loader api, and this one, to do literally anything they want to an application.

The built-in node:* module namespace is chocked full of security vectors and npm is already a huge hazard. Tinkering with module resolution is hardly a major concern given that node:fs and node:child_process exist. The real risk is ecosystem fragmentation, like what we have with CommonJS and ESM. This is comparable to delete require.cache[key] under CommonJS which wasn't even widely used in that ecosystem.

HMR is already possible; that’s what Vite is doing

Vite's HMR implementation is a client (browser)-side feature. For SSR you have to use await vite.ssrLoadModule('/main.js') which operates outside of nodejs's module graph as I understand it. I might be wrong here because I personally don't use Vite.

dynohot implements import.meta.hot on the server-side as a nodejs loader. To my knowledge it's the only implementation that truly operates under nodejs's module graph. I implemented the es262 linking algorithm to specification and use some code transformations to tie it all together. I'd love to see such a feature in nodejs but without support from v8 you need a code transformation, otherwise there's no way to update the linking of an already resolved and imported name. It seems like code transpilation is something that is outside the scope of node's runtime (and thus we have the loader API).

Copy link
Member

@jasnell jasnell Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should a user need to opt into this via flag when they’re already opting into it by using the API, either explicitly or via a dependency that they knowingly added?

Based on experience, I would wager very few users know what most of their dependencies are doing.

More importantly, I'm not saying no to this, I'm saying I'd like us to make sure we're we're being careful to not introduce new headaches we might regret later. I guess what I'm missing is a higher level description of the use cases, how these various bits fit in with that, and a example that ties it all together. Has such a thing been written up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#49442 has pieces of that but yes, a better writeup would be good. One of my notes on this PR is that it needs documentation, so that seems like a good place to start. @laverdet Could you please make the example written for the docs one that shows the most common use case?

@laverdet
Copy link
Contributor Author

The more I think on it, I'm feeling that it's tough to justify the feature.

  • For the "import -> fail -> retry" workflow there is the cache-busting technique of adding a query parameter to an async import. This will have more predictable behavior for most users.
  • For the HMR case we can also cache bust but there is a memory leak. That upsets me a little bit but for all practical purposes I've worked around it with the code transformation.
  • For unit test frameworks they should probably be using SourceTextModule of node:vm.

Anyway I pushed a change for consideration. By walking the resolution graph we can ensure that all resolved references to the file are evicted, though it may be more expensive on larger applications. It's probably not that bad though since it shouldn't be a common case. There may also still be a race condition after a module is resolved but before it's been loaded. There is also another inconsistency since we can't clear the package.json cache, so updates to exports won't be seen.

@pygy
Copy link

pygy commented Jan 13, 2024

I'd like to use this to test a library that branches at load-time based on NODE_ENV, without restarting Node.

A CLI flag (e.g. --unsafe-module-extensions) seems good here in that it gives the power to deviate from the spec to the person that launches node, preventing dependencies from breaking invariants unexpectedly.

Edit: This could be further limited by putting the API on an object that can only be instantiated once, ensuring that deps can't use it even if the flag is set, provided the user instantiates the unloader before loading the deps beside its control.

@giltayar
Copy link
Contributor

giltayar commented Jan 13, 2024

Idea: what if this api is available to loaders only, and alao the loaders guarantee that no one will ever load this url again ( the url post resolve). In this case, we're still somewhat following the spec. We can even have nodejs check this by adding a tombstone to the released module in the cache.

I would hazard a guess that mocking loaders and hmr loaders need to anyway change the module url (by adding some query parameter) to enable the functionality, so it would work in most cases.

@GeoffreyBooth
Copy link
Member

Idea: what if this api is available to loaders only, and alao the loaders guarantee that no one will ever load this url again ( the url post resolve).

If it’s available to hooks (on the hooks thread) then it’s available everywhere, because the register communications port can make it available to be indirectly called by application code. And I don’t know how we make loaders “guarantee” anything; we rename the API to be releaseLoadedModuleAndIfYouEverReloadItAgainYouWillBeFired?

That said, I’ve never been concerned about users being able to violate spec if they choose to do so. Hooks can already enable spec violations, and native add-ons certainly can. As long as Node never violates spec by default, I think it’s fine to provide APIs for users to do so if they wish. I don’t know what “adding a tombstone” means, but if there’s some way to make the API less likely to enable spec violations, sure, we might as well do so.

@laverdet
Copy link
Contributor Author

Creating a tombstone doesn't really fix anything because you are still leaking memory, though much less. In an HMR case you'll be reloading modules over and over again and the list of tombstones will grow forever.

@pygy can you better explain what you're trying to do? The only use-case for this feature I can think of is for hot module reloading. It sounds like your use case is better served with a query parameter but I can't say for sure without knowing more.

@pygy
Copy link

pygy commented Jan 15, 2024

@laverdet I want to evict a file and its dependencies. AFAIK using query params only reload the target file, not its deps.

@laverdet
Copy link
Contributor Author

This PR also only evicts a single module. Unless you want to parse the file to acquire all its imports and evict those too. You're only increasing the surface area of spooky double-imports that way.

OR you could write a 4 line resolve loader that appends reload=${...} to the resolved string of all imports from modules which also have reload=${...}.

@pygy
Copy link

pygy commented Jan 16, 2024

I'm currently evicting all dependencies manually (using the POC you posted in the sibling issue). It is indeed brittle, but the module is small enough for this to be manageable.

I'm not familiar with the loader API... What you suggest sounds appealing, I'll look into it, thanks.

@laverdet
Copy link
Contributor Author

marcel@marcel [16:03:06] [~/tmp] 
-> % cat loader.mjs 
import { register } from "node:module";

if (!import.meta.url.includes("?loader")) {
	register(`${import.meta.url}?loader`);
}

export async function resolve(specifier, context, nextResolve) {
	const result = await nextResolve(specifier, context);
	if (context.parentURL) {
		const parentUrl = new URL(context.parentURL);
		const reload = parentUrl.searchParams.get("reload");
		if (reload != null) {
			const url = new URL(result.url);
			url.searchParams.set("reload", reload);
			return {
				...result,
				url: `${url}`,
			};
		}
	}
	return result;
}

marcel@marcel [16:03:11] [~/tmp] 
-> % cat main.mjs 
import {} from "./dep.mjs";

marcel@marcel [16:03:13] [~/tmp] 
-> % cat dep.mjs 
console.log("in dep", import.meta.url);

marcel@marcel [16:03:18] [~/tmp] 
-> % cat start.mjs 
await import("./main.mjs");
await import("./main.mjs?reload=1");

marcel@marcel [16:03:20] [~/tmp] 
-> % node --import ./loader.mjs start.mjs
in dep file:///Users/marcel/tmp/dep.mjs
in dep file:///Users/marcel/tmp/dep.mjs?reload=1

I think what we're seeing in #49442 is an XY Problem. The question users are asking is "how do I delete module cache entries" but the question that they should be asking is "how do I hot reload" or "how do I write unit tests" or "how do I retry a failed module".

Actually I think this PR will end up harming the ecosystem because people don't understand the utility and will inevitably misuse it. The only valid use of this utility is to relieve a memory issue in hot reloaders. Furthermore I think the memory issue can be resolved within dynohot with nodejs/loaders#116

@pygy
Copy link

pygy commented Jan 16, 2024

Sweet, thanks!

With this tweaks it also supports import("./path?reload") and adds an auto-incremented value:

let id = 0;
export async function resolve(specifier, context, nextResolve) {
	const result = await nextResolve(specifier, context);
	if (context.parentURL) {
		const url = new URL(result.url);
		const parentUrl = new URL(context.parentURL);
		const reload = url.searchParams.get("reload") === ""
			? String(id++)
			: parentUrl.searchParams.get("reload");
		if (reload !== null) {
			url.searchParams.set("reload", reload);
			return {
				...result,
				url: `${url}`,
			};
		}
	}
	return result;
}
// index.js
const y = await import("./foo.js?reload")
const z = await import("./foo.js?reload")
const w = await import("./foo.js?reload=34")
// foo.js
export * from './dep'
// dep.js
console.log(import.meta.url)

Output:

file:///Users/pygy/tmp/dep.js?reload=0
file:///Users/pygy/tmp/dep.js?reload=1
file:///Users/pygy/tmp/dep.js?reload=34

This looks sufficiently useful to exist as an npm module, if you don't mind I'll publish it tomorrow (tagging you as the author).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants