Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Entry bundle hashing and manifest file #4227

Open
wbinnssmith opened this issue Feb 26, 2020 · 17 comments
Open

RFC: Entry bundle hashing and manifest file #4227

wbinnssmith opened this issue Feb 26, 2020 · 17 comments
Labels

Comments

@wbinnssmith
Copy link
Contributor

wbinnssmith commented Feb 26, 2020

💬 RFC

Certain entry bundles can benefit from content digests in their names and shared bundle splitting. Let's make it possible to do this, perhaps by outputting a manifest file.

🔦 Context

Currently entry bundles are expected to have a predictable name so that they can be either manually referenced or referenced in a way that is stable across deployments. This also causes entry bundles not to have common assets extracted into a shared bundle, as the presence and name(s) of shared bundle(s) is not predictable.

For certain types of bundles, this is a negative that can be avoided. It prevents these bundles from being served with long-term/immutable http cache headers and prevents them from sharing common dependencies with other bundles. This can be achieved by writing a manifest file with a predictable name which includes a mapping of entry assets (files passed to parcel build) to the bundles in the bundle group they created, which includes shared bundles and bundles of different types like css.

There are bundles that should never include hashes though as they benefit from or require stable names:

  • Service workers
  • manifest.json for web app manifests and web extensions
  • html pages

💻 Examples

For instance, if parcel built the following assets:

a.js

import 'large-js-dependency';
import 'styles.css';

b.js

import 'large-js-dependency';

The following manifest file could be produced:

{
  "a.js": [
    "large-js-dependency.[hash].js", 
    "styles.[hash].css", 
    "a.[hash].js"
  ],
  "b.js": [
    "large-js-dependency.[hash].js",
    "b.[hash].js"
  ]
}

...allowing every bundle to include a content digest in its name for long-term/immutable cache expiry, and for shared common bundles to be extracted from both a.js and b.js. This manifest file, which would have a predictable name, could be read by a server or build process creating html pages to insert the appropriate <script> and <link> tags.

🛠Possible Implementations

User points Parcel to a real manifest file entry

Currently, isEntry is used to enforce predictability, and nonentries are hashed. If the user pointed Parcel at an input file representing the manifest, any referenced assets from there would be non-entry dependencies of the manifest itself. The manifest would be "transformed" from an input and result in an output manifest, just like html files.

Pros:

  • Implementable entirely outside of core
  • Benefits from the existing assumptions about isEntry
  • Analagous to the current plugins for web app manifests and web extensions, as well as with how we build html files
  • The user explicitly lists the resources which should be hashed inside the manifest, and any other entries alongside the manifest won't be.

Cons:

  • Creating a file representing the manifest is potentially awkward. Could we use package.json?

An asset representing a manifest is inserted by Parcel into its graph and becomes the entry

Like the above, but instead of requiring the user to point Parcel at an input manifest file, Parcel core could create an entry asset in its graph representing one.

Pros:

  • Removes the potentially awkward creation of an input manifest file
  • Benefits from the existing assumptions about isEntry

Cons:

  • Requires changes to core
  • Creates an asset in the graph that isn't backed by a real file
  • Parcel may need to make assumptions about which files should be dependencies of the manifest and which should not be (service workers, etc.)

Related: #4200, #4203

Seeking feedback on the above two proposals as well as any others from the community 🙂

cc @jamiebuilds @devongovett @mischnic @padmaia @kevincox @DeMoorJasper

@wbinnssmith wbinnssmith added 💬 RFC Request For Comments ✨ Parcel 2 labels Feb 26, 2020
@kevincox
Copy link
Contributor

kevincox commented Feb 27, 2020

I think I am moving away from the idea of having an explicit manifest input file for two main reasons.

  1. It isn't really "transformed" the same way that other files are so it isn't a semantic match.
  2. As described in Generated Files Manifest #4200 there are usecases for a manifest which contains information on all files. I would rather solve these together.

That being said we need a way to specify "non-entry roots" (I need to find a better name for this). I would be okay with a package.json entry or command line arguments (matching current "entry root" style).

@mekhami
Copy link

mekhami commented Feb 27, 2020

Forgive my ignorance, but would something like this replicate what https:/owais/webpack-bundle-tracker does for webpack? I'm very interested in this because it enables https:/owais/django-webpack-loader, a critical part of django's local dev toolchain for SPA integration.

@devongovett
Copy link
Member

devongovett commented Feb 27, 2020

I think this could be split into two parts:

  1. Including content hash in entry bundle names
  2. Outputting a manifest

I don't think it's unreasonable to always hash entry bundle names, except in a few cases:

  1. HTML files. These should have predictable URLs.
  2. Service workers, <link rel="manifest">, etc. where the URLs need to be consistent between builds.
  3. An explicit target filename has been specified, e.g. in package.json#main or other targets. In this case, the package.json is an explicit manifest. When building a library, you're already going to have these fields set.
  4. You're building for a non-browser target.

So, if you just point parcel at some JS files, haven't specified an output filename, and are building for a browser target, we can assume that you don't need predictable filenames and should include a content hash.

Outputting a manifest can be a reporter plugin that outputs a simple JSON map from entries to bundles. We could run it by default (just one extra file), or make it opt-in.

Are there cases where these assumptions don't hold and we need to be more explicit about it?

@rynpsc
Copy link

rynpsc commented Feb 27, 2020

@devongovett Just for clarification would this solve #3307 (see #3307 (comment) for clarification).

@devongovett
Copy link
Member

Yes I believe so?

@wbinnssmith
Copy link
Contributor Author

HTML files. These should have predictable URLs.
Service workers, , etc. where the URLs need to be consistent between builds.

Should the transformers for these types signal that they don't want their bundle names to include hashes? This would have to be something beyond isEntry, as other entry bundles would include them.

@kevincox
Copy link
Contributor

I don't think it's unreasonable to always hash entry bundle names, except in a few cases:

What is an entry then? I thought the idea of an entry is "this is a point where a user will enter the app". And if a user needs to find the entry then it needs a stable name.

@mischnic
Copy link
Member

What is an entry then? I thought the idea of an entry is "this is a point where a user will enter the app". And if a user needs to find the entry then it needs a stable name.

If you can find the entry via a manifest file mapping, it would rather be "the point where a bundler enters the app"

@kevincox
Copy link
Contributor

Ok, I guess we need a better definition of bundler entry points and "production" entry points then.

Bundler entry points: Places where parcel starts spidering your app.
Production entry points: Places that require a stable URL when deployed.

Name proposals welcome.

@dapicester
Copy link

I found this parcel-plugin-bundle-manifest that more or less does what I need.

My use case is the following: I need to build a JS bundle to be used on different websites. I build my bundle which has the name foo.[hash].js. The manifest.jsonlooks like this:

{"foo.js": "./foo.[hash].js"}

I host both foo.[hash].js and manifest.json on a CDN, and I have an API endpoint https://example.com/foo that reads the manifest and responds with a redirect to https://example.com/foo.hash.js.
When I update my bundle, say foo.[hash2].js the manifest gets updated with the entry "foo.js": "./foo.[hash2].js". When calling the entrypoint the new foo.[hash2].js content is returned because it responds with a different redirect to https://example.com/foo.[hash2].js.

While the actual API endpoint and redirect logic is outside the bundling logic, I still see a clear need to:

  1. allow to hash the JS bundle name,
  2. produce a manifest.

The current problem with parcel is that when one bundles with a JS entrypoint, e.g. parcel bundle src/foo.js one gets the dist/foo.js bundle, without hash. To workaround this I have a dummy HTML entrypoint that includes my foo.js. So when I bundle with parcel bundle src/dummy.html the output files are dist/dummy.html and dist/foo.[hash].js. I also use the plugin mentioned above and get the manifest.

I'd love to be able to get the hashed bundle without using the dummy entrypoint. Maybe we can have a CLI flag to stay backward compatible, for example parcel src/foo.js --force-hash and get the dist/foo.[hash].js.

@dapicester
Copy link

I just found out that what I described is already can be in parcel 2 with plugins:

  1. allow to hash the JS bundle name: use parcel-namer-rewrite
  2. produce manifest: use parcel-reporter-bundle-manifest

@wbinnssmith
Copy link
Contributor Author

wbinnssmith commented May 5, 2021

@dapicester Those are helpful, thanks! The bundler also has the opportunity to do things like extract shared bundles from entries, but chooses not to in order to keep the one-to-one relationship of entry asset to entry bundle. We'd need a custom bundler to do this, or agree on something in the official bundler and namer to opt-into this behavior.

@lenovouser
Copy link

I, and so also the company I work for, really need this feature.

To be more specific, I'd like to be able to create a few entries (TypeScript, SCSS), and then:

  • Get a result where when a lot of different entries import the same module or file inside the project, a shared bundle is created
  • I would be able to figure out the dependencies / new created shared bundles of each original entry file (so that I can include it inside my dynamically created HTML)
  • Possibly even control how bundles are named and created (by combining specific modules / files). The easiest, and most un-complicated way of solving this would be by probably creating something like commons.ts that imports and re-exports certain modules from node_modules, making that it's own entry, and then Parcel recognizing all that and just adding the commons.ts file as dependency in the manifest to the other files importing it.
  • All without needing to create and maintain things like a dummy.html

This proposal seems to solve basically what I just described, so I got the company I work for to fund this issue - but apparently BountySource doesn't notify inside the issue when that happens.

app.bountysource.com/parcel-bundler/parcel/issues/4227

@lenovouser
Copy link

Related: #8106

@devongovett
Copy link
Member

Let me try to explain why this hasn't been implemented, and maybe it will help you. We realized that in order to implement this we'd basically need to reinvent HTML. The script/stylesheet URLs alone are not enough for a server to generate valid HTML. <script> tags have a bunch of other attributes as well, e.g. type="module" or nomodule, async, defer, etc. Stylesheets also might have attributes like media.

The HTML transformer and packager already handle all of this, so either we could recreate all of that but using JSON rather than HTML, or you could just create an HTML file with your entries, let Parcel do its thing, and then parse the resulting HTML to get the information you need (or insert it as a fragment server side). I believe @wbinnssmith and team ended up doing the latter.

I guess we would be open to supporting a manifest file as an entry, with a transformer and packager that outputs a transformed manifest with all of that information, but we wondered if it was worth inventing our own format for that vs just using standard HTML. Do you have any thoughts about that?

@lenovouser
Copy link

Yeah, what you are saying makes sense.

Wouldn't it be possible to just include that metadata information you mentioned in the manifest? So something like

{
    "type": "module",
    "async": false,
    "defer": false
}

per outputted file?

I think the HTML workaround will not completely work for us, or be very complicated. But I might be mistaken, not sure.

First of all, we would have to write the HTML with the entries to disk, run parcel, and then read, parse and somehow generate the results in JSON from the dummy.html to be used in our already existing backend. To me that already feels like it will be very error-prone, but if @wbinnssmith made it work it might not be at all.

Even if this can be implemented reliably, I'd still not know how to smartly parse what each outputted entry depends on. Like if I have three pages page-1.ts, page-2.ts and page-3.ts, I might be able to parse them from the resulting HTML, but how do I know which of the other script tags are dependencies of each page? Same goes for SCSS obviously.

I could maybe create dummy-page-1.html, dummy-page-2.html and dummy-page-3.html, and then get the sources from that, so yeah, there are maybe ways to work around this, but in the end I would very much prefer the manifest for the simplicity and safety it provides.

It would also make Parcel usable and integrate-able with a lot of already existing backends in my opinion, e.g. I could replace two completely different frontend build systems we are currently using (Webpack and Gulp), which are both producing manifests for us currently, with Parcel's manifest support.

@itsdouges
Copy link

With React 18 render to pipeable stream encouraging apps to render the entire HTML document there's going to be a need for a manfiest file that can be available.

For now what I've done is write a custom reporter which its good enough with some hard coded assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants