RFC: Entry bundle hashing and manifest file #4227

wbinnssmith · 2020-02-26T19:49:49Z

💬 RFC

Certain entry bundles can benefit from content digests in their names and shared bundle splitting. Let's make it possible to do this, perhaps by outputting a manifest file.

🔦 Context

Currently entry bundles are expected to have a predictable name so that they can be either manually referenced or referenced in a way that is stable across deployments. This also causes entry bundles not to have common assets extracted into a shared bundle, as the presence and name(s) of shared bundle(s) is not predictable.

For certain types of bundles, this is a negative that can be avoided. It prevents these bundles from being served with long-term/immutable http cache headers and prevents them from sharing common dependencies with other bundles. This can be achieved by writing a manifest file with a predictable name which includes a mapping of entry assets (files passed to parcel build) to the bundles in the bundle group they created, which includes shared bundles and bundles of different types like css.

There are bundles that should never include hashes though as they benefit from or require stable names:

Service workers
manifest.json for web app manifests and web extensions
html pages

💻 Examples

For instance, if parcel built the following assets:

a.js

import 'large-js-dependency';
import 'styles.css';

b.js

import 'large-js-dependency';

The following manifest file could be produced:

{
  "a.js": [
    "large-js-dependency.[hash].js", 
    "styles.[hash].css", 
    "a.[hash].js"
  ],
  "b.js": [
    "large-js-dependency.[hash].js",
    "b.[hash].js"
  ]
}

...allowing every bundle to include a content digest in its name for long-term/immutable cache expiry, and for shared common bundles to be extracted from both a.js and b.js. This manifest file, which would have a predictable name, could be read by a server or build process creating html pages to insert the appropriate <script> and <link> tags.

🛠Possible Implementations

User points Parcel to a real manifest file entry

Currently, isEntry is used to enforce predictability, and nonentries are hashed. If the user pointed Parcel at an input file representing the manifest, any referenced assets from there would be non-entry dependencies of the manifest itself. The manifest would be "transformed" from an input and result in an output manifest, just like html files.

Pros:

Implementable entirely outside of core
Benefits from the existing assumptions about isEntry
Analagous to the current plugins for web app manifests and web extensions, as well as with how we build html files
The user explicitly lists the resources which should be hashed inside the manifest, and any other entries alongside the manifest won't be.

Cons:

Creating a file representing the manifest is potentially awkward. Could we use package.json?

An asset representing a manifest is inserted by Parcel into its graph and becomes the entry

Like the above, but instead of requiring the user to point Parcel at an input manifest file, Parcel core could create an entry asset in its graph representing one.

Pros:

Removes the potentially awkward creation of an input manifest file
Benefits from the existing assumptions about isEntry

Cons:

Requires changes to core
Creates an asset in the graph that isn't backed by a real file
Parcel may need to make assumptions about which files should be dependencies of the manifest and which should not be (service workers, etc.)

Related: #4200, #4203

Seeking feedback on the above two proposals as well as any others from the community 🙂

cc @jamiebuilds @devongovett @mischnic @padmaia @kevincox @DeMoorJasper

The text was updated successfully, but these errors were encountered:

kevincox · 2020-02-27T10:46:04Z

I think I am moving away from the idea of having an explicit manifest input file for two main reasons.

It isn't really "transformed" the same way that other files are so it isn't a semantic match.
As described in Generated Files Manifest #4200 there are usecases for a manifest which contains information on all files. I would rather solve these together.

That being said we need a way to specify "non-entry roots" (I need to find a better name for this). I would be okay with a package.json entry or command line arguments (matching current "entry root" style).

mekhami · 2020-02-27T18:30:28Z

Forgive my ignorance, but would something like this replicate what https:/owais/webpack-bundle-tracker does for webpack? I'm very interested in this because it enables https:/owais/django-webpack-loader, a critical part of django's local dev toolchain for SPA integration.

devongovett · 2020-02-27T19:21:58Z

I think this could be split into two parts:

Including content hash in entry bundle names
Outputting a manifest

I don't think it's unreasonable to always hash entry bundle names, except in a few cases:

HTML files. These should have predictable URLs.
Service workers, <link rel="manifest">, etc. where the URLs need to be consistent between builds.
An explicit target filename has been specified, e.g. in package.json#main or other targets. In this case, the package.json is an explicit manifest. When building a library, you're already going to have these fields set.
You're building for a non-browser target.

So, if you just point parcel at some JS files, haven't specified an output filename, and are building for a browser target, we can assume that you don't need predictable filenames and should include a content hash.

Outputting a manifest can be a reporter plugin that outputs a simple JSON map from entries to bundles. We could run it by default (just one extra file), or make it opt-in.

Are there cases where these assumptions don't hold and we need to be more explicit about it?

rynpsc · 2020-02-27T19:49:28Z

@devongovett Just for clarification would this solve #3307 (see #3307 (comment) for clarification).

devongovett · 2020-02-27T20:02:21Z

Yes I believe so?

wbinnssmith · 2020-02-27T21:30:36Z

HTML files. These should have predictable URLs.
Service workers, , etc. where the URLs need to be consistent between builds.

Should the transformers for these types signal that they don't want their bundle names to include hashes? This would have to be something beyond isEntry, as other entry bundles would include them.

kevincox · 2020-02-28T09:28:54Z

I don't think it's unreasonable to always hash entry bundle names, except in a few cases:

What is an entry then? I thought the idea of an entry is "this is a point where a user will enter the app". And if a user needs to find the entry then it needs a stable name.

mischnic · 2020-03-19T17:07:13Z

What is an entry then? I thought the idea of an entry is "this is a point where a user will enter the app". And if a user needs to find the entry then it needs a stable name.

If you can find the entry via a manifest file mapping, it would rather be "the point where a bundler enters the app"

kevincox · 2020-03-20T16:10:55Z

Ok, I guess we need a better definition of bundler entry points and "production" entry points then.

Bundler entry points: Places where parcel starts spidering your app.
Production entry points: Places that require a stable URL when deployed.

Name proposals welcome.

dapicester · 2021-04-24T03:48:37Z

I found this parcel-plugin-bundle-manifest that more or less does what I need.

My use case is the following: I need to build a JS bundle to be used on different websites. I build my bundle which has the name foo.[hash].js. The manifest.jsonlooks like this:

{"foo.js": "./foo.[hash].js"}

I host both foo.[hash].js and manifest.json on a CDN, and I have an API endpoint https://example.com/foo that reads the manifest and responds with a redirect to https://example.com/foo.hash.js.
When I update my bundle, say foo.[hash2].js the manifest gets updated with the entry "foo.js": "./foo.[hash2].js". When calling the entrypoint the new foo.[hash2].js content is returned because it responds with a different redirect to https://example.com/foo.[hash2].js.

While the actual API endpoint and redirect logic is outside the bundling logic, I still see a clear need to:

allow to hash the JS bundle name,
produce a manifest.

The current problem with parcel is that when one bundles with a JS entrypoint, e.g. parcel bundle src/foo.js one gets the dist/foo.js bundle, without hash. To workaround this I have a dummy HTML entrypoint that includes my foo.js. So when I bundle with parcel bundle src/dummy.html the output files are dist/dummy.html and dist/foo.[hash].js. I also use the plugin mentioned above and get the manifest.

I'd love to be able to get the hashed bundle without using the dummy entrypoint. Maybe we can have a CLI flag to stay backward compatible, for example parcel src/foo.js --force-hash and get the dist/foo.[hash].js.

dapicester · 2021-04-25T01:14:22Z

I just found out that what I described is already can be in parcel 2 with plugins:

allow to hash the JS bundle name: use parcel-namer-rewrite
produce manifest: use parcel-reporter-bundle-manifest

wbinnssmith · 2021-05-05T16:49:40Z

@dapicester Those are helpful, thanks! The bundler also has the opportunity to do things like extract shared bundles from entries, but chooses not to in order to keep the one-to-one relationship of entry asset to entry bundle. We'd need a custom bundler to do this, or agree on something in the official bundler and namer to opt-into this behavior.

lenovouser · 2022-05-03T15:15:18Z

I, and so also the company I work for, really need this feature.

To be more specific, I'd like to be able to create a few entries (TypeScript, SCSS), and then:

Get a result where when a lot of different entries import the same module or file inside the project, a shared bundle is created
I would be able to figure out the dependencies / new created shared bundles of each original entry file (so that I can include it inside my dynamically created HTML)
Possibly even control how bundles are named and created (by combining specific modules / files). The easiest, and most un-complicated way of solving this would be by probably creating something like commons.ts that imports and re-exports certain modules from node_modules, making that it's own entry, and then Parcel recognizing all that and just adding the commons.ts file as dependency in the manifest to the other files importing it.
All without needing to create and maintain things like a dummy.html

This proposal seems to solve basically what I just described, so I got the company I work for to fund this issue - but apparently BountySource doesn't notify inside the issue when that happens.

app.bountysource.com/parcel-bundler/parcel/issues/4227

lenovouser · 2022-05-20T11:36:18Z

Related: #8106

devongovett · 2022-05-23T02:20:45Z

Let me try to explain why this hasn't been implemented, and maybe it will help you. We realized that in order to implement this we'd basically need to reinvent HTML. The script/stylesheet URLs alone are not enough for a server to generate valid HTML. <script> tags have a bunch of other attributes as well, e.g. type="module" or nomodule, async, defer, etc. Stylesheets also might have attributes like media.

The HTML transformer and packager already handle all of this, so either we could recreate all of that but using JSON rather than HTML, or you could just create an HTML file with your entries, let Parcel do its thing, and then parse the resulting HTML to get the information you need (or insert it as a fragment server side). I believe @wbinnssmith and team ended up doing the latter.

I guess we would be open to supporting a manifest file as an entry, with a transformer and packager that outputs a transformed manifest with all of that information, but we wondered if it was worth inventing our own format for that vs just using standard HTML. Do you have any thoughts about that?

lenovouser · 2022-05-24T11:15:35Z

Yeah, what you are saying makes sense.

Wouldn't it be possible to just include that metadata information you mentioned in the manifest? So something like

{
    "type": "module",
    "async": false,
    "defer": false
}

per outputted file?

I think the HTML workaround will not completely work for us, or be very complicated. But I might be mistaken, not sure.

First of all, we would have to write the HTML with the entries to disk, run parcel, and then read, parse and somehow generate the results in JSON from the dummy.html to be used in our already existing backend. To me that already feels like it will be very error-prone, but if @wbinnssmith made it work it might not be at all.

Even if this can be implemented reliably, I'd still not know how to smartly parse what each outputted entry depends on. Like if I have three pages page-1.ts, page-2.ts and page-3.ts, I might be able to parse them from the resulting HTML, but how do I know which of the other script tags are dependencies of each page? Same goes for SCSS obviously.

I could maybe create dummy-page-1.html, dummy-page-2.html and dummy-page-3.html, and then get the sources from that, so yeah, there are maybe ways to work around this, but in the end I would very much prefer the manifest for the simplicity and safety it provides.

It would also make Parcel usable and integrate-able with a lot of already existing backends in my opinion, e.g. I could replace two completely different frontend build systems we are currently using (Webpack and Gulp), which are both producing manifests for us currently, with Parcel's manifest support.

itsdouges · 2023-09-20T00:45:27Z

With React 18 render to pipeable stream encouraging apps to render the entire HTML document there's going to be a need for a manfiest file that can be available.

For now what I've done is write a custom reporter which its good enough with some hard coded assumptions.

wbinnssmith added 💬 RFC Request For Comments ✨ Parcel 2 labels Feb 26, 2020

mischnic mentioned this issue Mar 11, 2020

Getting access to the final file names in my ServiceWorkre #4315

Closed

mischnic mentioned this issue Feb 17, 2021

[Web Extension Transformer] doesn't support code splitting #5859

Closed

wbinnssmith mentioned this issue Jun 9, 2021

API audit: Refactor dependency options #6420

Merged

devongovett added the namer label Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Entry bundle hashing and manifest file #4227

RFC: Entry bundle hashing and manifest file #4227

wbinnssmith commented Feb 26, 2020 •

edited

Loading

kevincox commented Feb 27, 2020 •

edited

Loading

mekhami commented Feb 27, 2020

devongovett commented Feb 27, 2020 •

edited

Loading

rynpsc commented Feb 27, 2020

devongovett commented Feb 27, 2020

wbinnssmith commented Feb 27, 2020

kevincox commented Feb 28, 2020

mischnic commented Mar 19, 2020

kevincox commented Mar 20, 2020

dapicester commented Apr 24, 2021

dapicester commented Apr 25, 2021

wbinnssmith commented May 5, 2021 •

edited

Loading

lenovouser commented May 3, 2022

lenovouser commented May 20, 2022

devongovett commented May 23, 2022

lenovouser commented May 24, 2022

itsdouges commented Sep 20, 2023

RFC: Entry bundle hashing and manifest file #4227

RFC: Entry bundle hashing and manifest file #4227

Comments

wbinnssmith commented Feb 26, 2020 • edited Loading

💬 RFC

🔦 Context

💻 Examples

🛠Possible Implementations

User points Parcel to a real manifest file entry

An asset representing a manifest is inserted by Parcel into its graph and becomes the entry

kevincox commented Feb 27, 2020 • edited Loading

mekhami commented Feb 27, 2020

devongovett commented Feb 27, 2020 • edited Loading

rynpsc commented Feb 27, 2020

devongovett commented Feb 27, 2020

wbinnssmith commented Feb 27, 2020

kevincox commented Feb 28, 2020

mischnic commented Mar 19, 2020

kevincox commented Mar 20, 2020

dapicester commented Apr 24, 2021

dapicester commented Apr 25, 2021

wbinnssmith commented May 5, 2021 • edited Loading

lenovouser commented May 3, 2022

lenovouser commented May 20, 2022

devongovett commented May 23, 2022

lenovouser commented May 24, 2022

itsdouges commented Sep 20, 2023

wbinnssmith commented Feb 26, 2020 •

edited

Loading

kevincox commented Feb 27, 2020 •

edited

Loading

devongovett commented Feb 27, 2020 •

edited

Loading

wbinnssmith commented May 5, 2021 •

edited

Loading