Skip to content
This repository has been archived by the owner on Jan 13, 2024. It is now read-only.

fix(boootstrap): prevent to override existing node addon file #1611

Merged
merged 4 commits into from
May 9, 2022

Conversation

renkei
Copy link
Contributor

@renkei renkei commented May 1, 2022

This PR fixes #1589. The fix has two effects:

  1. .node files that must exist on the real filesystem instead of the virtual snapshot filesystem (because of process.dlopen) are only copied if they do not exist. If they already exist then it doesn't matter anymore whether they are currently opened by an running instance of the pkg-packaged app or not, because existing files are untouched now.
  2. The start of the pkg app a second, third, etc. time is as fast as with pkg 5.5.2 again since (in best case) no copy action takes place

prelude/bootstrap.js Outdated Show resolved Hide resolved
@robertsLando robertsLando changed the title fix https:/vercel/pkg/issues/1589 fix(boootstrap): prevent to override existing node addon file May 2, 2022
@robertsLando robertsLando requested a review from jesec May 2, 2022 15:23
@jesec
Copy link
Contributor

jesec commented May 2, 2022

Hmm. In some cases, we may want to overwrite the file:

  1. File extracted becomes corrupted, caused by forced termination of the executable, a power loss or other reasons.
  2. File extracted becomes outdated.

I think we should implement an isSameFile check so we overwrite it if necessary, or just try catch do nothing the statement so we overwrite the file if possible.

@renkei
Copy link
Contributor Author

renkei commented May 2, 2022

  1. File extracted becomes corrupted, caused by forced termination of the executable, a power loss or other reasons.

Agree, good point.

  1. File extracted becomes outdated.

Hmm, wouldn't an updated pkg-packaged app with a newer .node file result in a new hash folder in tmp? Anyway, with the isSameFile check this should be covered as well.

For security reasons, I would prefer to fail / stop the app start if the isSameFile check results in false and the copy action fails for some reason (not sufficient acces rights, file already opened by another process, etc.). In other words: Once we have detected that the .node file is not the expected one and we are not able to overwrite it we should not trust the existing one.

What do you think? I can update the PR, just let me know if e.g. a SHA-256 check is sufficient for you for the isSameFile check or if you prefer another way to implement such a method.

A disadvantage could be the slower app start in general, because the previous best case (all files already exist, so no copy action takes place) is the worst case now, we never trust an existing file anymore because each file will be checked with isSameFile now.

@luckyyyyy
Copy link

My application will start multiple workers, they will read the .node file at the same time, and if it does not exist, will they write to disk at the same time?
A lot of questions will happen if we need to consider this scenario.

@robertsLando
Copy link
Contributor

I think we should implement an isSameFile check so we overwrite it if necessary, or just try catch do nothing the statement so we overwrite the file if possible.

We already calculate the sha of the file, isn't that enough?

@jesec
Copy link
Contributor

jesec commented May 5, 2022

I think we should implement an isSameFile check so we overwrite it if necessary, or just try catch do nothing the statement so we overwrite the file if possible.

We already calculate the sha of the file, isn't that enough?

We calculated the hash of the file to be dlopen-ed, but not the hashes of all contents under the modulePkgFolder directory.

Additionally, the hash only determines the folder name in the temporary directory, not if the file on the disk is the same as the file in the snapshot.

@robertsLando
Copy link
Contributor

I think we should implement an isSameFile check so we overwrite it if necessary, or just try catch do nothing the statement so we overwrite the file if possible.

Ok so @renkei I think you could try with the second approach for now, it's easier

@renkei
Copy link
Contributor Author

renkei commented May 5, 2022

Which method you would prefer for a isSameFile check? A SHA-256 check? Will this slow down the app start or is this irrelevant regarding start time if we will do this on each start for each file?

Copy link

@AlejandroSotoCastro AlejandroSotoCastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed a mayor bug for us

Copy link

@alaminopu alaminopu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected and fixed the bug in current version

@robertsLando
Copy link
Contributor

robertsLando commented May 5, 2022

Which method you would prefer for a isSameFile check? A SHA-256 check? Will this slow down the app start or is this irrelevant regarding start time if we will do this on each start for each file?

I let @jesec answer to this

@jesec
Copy link
Contributor

jesec commented May 5, 2022

Which method you would prefer for a isSameFile check? A SHA-256 check? Will this slow down the app start or is this irrelevant regarding start time if we will do this on each start for each file?

Correctness is more essential than performance. Let's ensure correctness first, then try to optimize it.

Actually, with either method (isSameFile or try catch), there is a risk of race condition when multiple same processes are launched. We may want to have a temporary directory for every process instead, but the cleanup and disk usage would be an issue.

@luckyyyyy
Copy link

luckyyyyy commented May 6, 2022

there is a risk of race condition when multiple same processes are launched.

@jesec This is what I worry about. My application will start multiple worker_threads and use a lot of private C/Rust modules,. I temporarily move all node files to the main process and require it, which is very helpless

@luckyyyyy
Copy link

@jesec Is it possible to ask the user to tag node modules to load them early in a single process?

@robertsLando
Copy link
Contributor

robertsLando commented May 6, 2022

What about to use process pid to identify the processe that is using a specific module and store this info somewhere in the temporary directory ?

We could also make the checks needed to verify that the file is the same and if not create a new one in another directory and/or use symlinks

@renkei
Copy link
Contributor Author

renkei commented May 8, 2022

I've update the PR to also perform the copy even if the file already exists but is outdated.

There is still no solution for the use case that multiple instances of the app might start simultaneously. I see only two options:

  1. The start of instance 2, 3, 4, etc. is delayed, so that instance 1 can finish the copy
  2. Each instance don't care about other instances because each instance has its own directory

Solution 1 must be implemented by the user that tries to start multiple instances because we cannot implement a delayed check&copy (with some retries to give the other instance time to finish the copy) in bootstrap.js because everything inside process.dlopen must be synchronous. To be honest, I would prefer this solution, it is much easier for the user to control the start of multiple instances than for pkg.

Solution 2 sounds interesting but has so much disadvantages. It makes things more complicated and can produce a lot of spam on disk since we copy the same files again and again, they cannot be shared across multiple instances anymore. I'm not sure if it's worth the effort, maybe we're leaving the KISS pattern too much here...

@luckyyyyy
Copy link

luckyyyyy commented May 8, 2022

I loaded all node files in the main process. for me this is the only solution so far.

image
image

image

Copy link
Contributor

@jesec jesec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more correct and less error-prone to let each process have its own temporary directory. I have implemented the approach: jesec@71f697d, but I am still sorting out issues with Node 12 on Windows at the moment. Duplication would be annoying but correctness is more essential.

Nonetheless, this PR does provide some improvements over the status quo and we can get it merged first.

@renkei
Copy link
Contributor Author

renkei commented May 9, 2022

I have implemented the approach: jesec@71f697d

Nice, I will try this out. Thanks!

Duplication would be annoying but correctness is more essential.

Beside the duplication (which is not really an issue, I guess, as long as everything is removed before the app exits) this has another effect: Native modules cannot be shared across instances, no matter whether multiple instances are running simultaneously or not because the temporary folder name depends now on the process ID and randomFillSync() which makes the name practically unique. This avoids write conflicts but results in copy actions from virtual to real file system whenever an instance starts. Therefore the instance will start slower, which could be irrelevant or clearly noticeable. It depends on the number and size of native modules that are copied. But this is probably the price for correctness. I don't know if this is really an issue, I just want to mention this.

@luckyyyyy
Copy link

It would be more correct and less error-prone to let each process have its own temporary directory. I have implemented the approach: jesec@71f697d, but I am still sorting out issues with Node 12 on Windows at the moment. Duplication would be annoying but correctness is more essential.

Nonetheless, this PR does provide some improvements over the status quo and we can get it merged first.

Verified, has run successfully, but affects startup speed. But there is no other better solution

@jesec jesec merged commit f0c4e8c into vercel:main May 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: EBUSY: resource busy or locked on native modules
6 participants