-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gatsby-core-utils): make createContentDigest deterministic #19832
fix(gatsby-core-utils): make createContentDigest deterministic #19832
Conversation
When would we want to actually sort arrays before generating hash?
Did you do perf comparisons for |
Jobs API would favor this as mistakes can easily be made where arguments have different order but the same work will be done. If we look at gatsby-plugin-sharp, the transformations needed on an input file could be in a different order, which doesn't mean the result would be different.
I haven't yet, I was going to run it against .org and see what the difference was and also do an isolated micro benchmark. |
I feel like this is potentially dangerous: for image transformation order of transformations doesn't matter, but it might matter for other jobs in the future. To me it seems like we should guarantee stable transformations order in sharp plugin instead (so that hash will always be the same, without resorting to createContentDigest sorting arrays) |
We can let the plugin authors decide. They could sort it upfront. I'll revert that. |
I did benchmarks on .org and microbenchmarks. It looks like hash-object is slower, I don't think it is slow enough to not do it. Technically I could only use it for jobs but I feel proper hashing is not a waste. Summary:
Microbench:
When running strings only so no objects So for non-objects I could use the old createContentDigest and for objects go for node-object-hash. Run 1 - node-hash-object$ time gatsby build --no-uglify
success open and validate gatsby-configs - 0.165s
success load plugins - 0.851s
success onPreInit - 0.005s
success delete html and css files from previous builds - 0.008s
success initialize cache - 0.007s
success copy gatsby files - 0.016s
success onPreBootstrap - 0.007s
success createSchemaCustomization - 0.153s
success source and transform nodes - 108.856s
warn There are conflicting field types in your data.
If you have explicitly defined a type for those fields, you can safely ignore this warning message.
Otherwise, Gatsby will omit those fields from the GraphQL schema.
If you know all field types in advance, the best strategy is to explicitly define them with the `createTypes` action, and skip inference with the `@dontInfer` directive.
See https://www.gatsbyjs.org/docs/actions/#createTypes
NPMPackage.types.ts:
- type: boolean
value: false
- type: string
value: 'included'
success building schema - 5.275s
warn code block or inline code language not specified in markdown. applying generic code block
warn unable to find prism language 'sh' for highlighting. applying generic code block
warn unable to find prism language 'console' for highlighting. applying generic code block
warn unable to find prism language ' for highlighting. applying generic code block
warn unable to find prism language '.gitignore' for highlighting. applying generic code block
warn unable to find prism language '.json' for highlighting. applying generic code block
warn unable to find prism language 'grahpql' for highlighting. applying generic code block
warn unable to find prism language 'gatsby-config.js' for highlighting. applying generic code block
warn unable to find prism language 'plain' for highlighting. applying generic code block
success createPages - 27.180s
success createPagesStatefully - 1.001s
success onPreExtractQueries - 0.004s
success update schema - 0.093s
success extract queries from components - 0.791s
success write out requires - 0.037s
success write out redirect data - 0.002s
success Build manifest and related icons - 0.098s
success onPostBootstrap - 0.165s
�
info bootstrap finished - 146.793 s
�
success Building production JavaScript and CSS bundles - 26.710s
success Rewriting compilation hashes - 0.003s
success run queries - 282.961s - 3449/3449 12.19/s
success Building static HTML for pages - 32.678s - 3444/3444 105.39/s
info Generated public/sw.js, which will precache 7 files, totaling 1440680 bytes.
The following pages will be precached:
/offline-plugin-app-shell-fallback/index.html
info Done building in 485.215770107 sec
gatsby build --no-uglify 620.22s user 18.88s system 131% cpu 8:05.74 total Run 2 - node-hash-object$ time gatsby build --no-uglify
success open and validate gatsby-configs - 0.154s
success load plugins - 0.838s
success onPreInit - 0.005s
success delete html and css files from previous builds - 0.008s
success initialize cache - 0.008s
success copy gatsby files - 0.015s
success onPreBootstrap - 0.012s
success createSchemaCustomization - 0.149s
success source and transform nodes - 128.456s
warn There are conflicting field types in your data.
If you have explicitly defined a type for those fields, you can safely ignore this warning message.
Otherwise, Gatsby will omit those fields from the GraphQL schema.
If you know all field types in advance, the best strategy is to explicitly define them with the `createTypes` action, and skip inference with the `@dontInfer` directive.
See https://www.gatsbyjs.org/docs/actions/#createTypes
NPMPackage.types.ts:
- type: boolean
value: false
- type: string
value: 'included'
success building schema - 5.113s
warn code block or inline code language not specified in markdown. applying generic code block
warn unable to find prism language 'sh' for highlighting. applying generic code block
warn unable to find prism language 'console' for highlighting. applying generic code block
warn unable to find prism language ' for highlighting. applying generic code block
warn unable to find prism language '.gitignore' for highlighting. applying generic code block
warn unable to find prism language '.json' for highlighting. applying generic code block
warn unable to find prism language 'grahpql' for highlighting. applying generic code block
warn unable to find prism language 'gatsby-config.js' for highlighting. applying generic code block
warn unable to find prism language 'plain' for highlighting. applying generic code block
success createPages - 25.406s
success createPagesStatefully - 0.907s
success onPreExtractQueries - 0.004s
success update schema - 0.096s
success extract queries from components - 0.763s
success write out requires - 0.037s
success write out redirect data - 0.002s
success Build manifest and related icons - 0.108s
success onPostBootstrap - 0.174s
�
info bootstrap finished - 164.296 s
�
success Building production JavaScript and CSS bundles - 26.500s
success Rewriting compilation hashes - 0.003s
success run queries - 283.734s - 3449/3449 12.16/s
success Building static HTML for pages - 32.600s - 3444/3444 105.64/s
info Generated public/sw.js, which will precache 7 files, totaling 1440680 bytes.
The following pages will be precached:
/offline-plugin-app-shell-fallback/index.html
info Done building in 503.376172368 sec
gatsby build --no-uglify 617.69s user 18.63s system 126% cpu 8:23.85 total Run 3 - node-hash-object$ time gatsby build --no-uglify
success open and validate gatsby-configs - 0.157s
success load plugins - 0.837s
success onPreInit - 0.005s
success delete html and css files from previous builds - 0.008s
success initialize cache - 0.008s
success copy gatsby files - 0.016s
success onPreBootstrap - 0.007s
success createSchemaCustomization - 0.152s
success source and transform nodes - 127.890s
warn There are conflicting field types in your data.
If you have explicitly defined a type for those fields, you can safely ignore this warning message.
Otherwise, Gatsby will omit those fields from the GraphQL schema.
If you know all field types in advance, the best strategy is to explicitly define them with the `createTypes` action, and skip inference with the `@dontInfer` directive.
See https://www.gatsbyjs.org/docs/actions/#createTypes
NPMPackage.types.ts:
- type: boolean
value: false
- type: string
value: 'included'
success building schema - 4.970s
warn code block or inline code language not specified in markdown. applying generic code block
warn unable to find prism language 'sh' for highlighting. applying generic code block
warn unable to find prism language 'console' for highlighting. applying generic code block
warn unable to find prism language ' for highlighting. applying generic code block
warn unable to find prism language '.gitignore' for highlighting. applying generic code block
warn unable to find prism language '.json' for highlighting. applying generic code block
warn unable to find prism language 'grahpql' for highlighting. applying generic code block
warn unable to find prism language 'gatsby-config.js' for highlighting. applying generic code block
warn unable to find prism language 'plain' for highlighting. applying generic code block
success createPages - 24.542s
success createPagesStatefully - 1.010s
success onPreExtractQueries - 0.004s
success update schema - 0.093s
success extract queries from components - 0.749s
success write out requires - 0.037s
success write out redirect data - 0.002s
success Build manifest and related icons - 0.096s
success onPostBootstrap - 0.161s
�
info bootstrap finished - 162.822 s
�
success Building production JavaScript and CSS bundles - 26.153s
success Rewriting compilation hashes - 0.003s
success run queries - 282.775s - 3449/3449 12.20/s
success Building static HTML for pages - 32.417s - 3444/3444 106.24/s
info Generated public/sw.js, which will precache 7 files, totaling 1440680 bytes.
The following pages will be precached:
/offline-plugin-app-shell-fallback/index.html
info Done building in 500.548101177 sec
gatsby build --no-uglify 615.29s user 18.94s system 126% cpu 8:21.04 total Run 1 - createContentDigest$ time gatsby build --no-uglify
success open and validate gatsby-configs - 0.154s
success load plugins - 0.886s
success onPreInit - 0.006s
success delete html and css files from previous builds - 0.008s
success initialize cache - 0.007s
success copy gatsby files - 0.016s
success onPreBootstrap - 0.008s
success createSchemaCustomization - 0.158s
success source and transform nodes - 123.055s
warn There are conflicting field types in your data.
If you have explicitly defined a type for those fields, you can safely ignore this warning message.
Otherwise, Gatsby will omit those fields from the GraphQL schema.
If you know all field types in advance, the best strategy is to explicitly define them with the `createTypes` action, and skip inference with the `@dontInfer` directive.
See https://www.gatsbyjs.org/docs/actions/#createTypes
NPMPackage.types.ts:
- type: boolean
value: false
- type: string
value: 'included'
success building schema - 5.246s
warn code block or inline code language not specified in markdown. applying generic code block
warn unable to find prism language 'sh' for highlighting. applying generic code block
warn unable to find prism language 'console' for highlighting. applying generic code block
warn unable to find prism language ' for highlighting. applying generic code block
warn unable to find prism language '.gitignore' for highlighting. applying generic code block
warn unable to find prism language '.json' for highlighting. applying generic code block
warn unable to find prism language 'grahpql' for highlighting. applying generic code block
warn unable to find prism language 'gatsby-config.js' for highlighting. applying generic code block
warn unable to find prism language 'plain' for highlighting. applying generic code block
success createPages - 25.544s
success createPagesStatefully - 0.237s
success onPreExtractQueries - 0.005s
success update schema - 0.091s
success extract queries from components - 0.748s
success write out requires - 0.036s
success write out redirect data - 0.002s
success Build manifest and related icons - 0.098s
success onPostBootstrap - 0.164s
�
info bootstrap finished - 158.578 s
�
success Building production JavaScript and CSS bundles - 27.632s
success Rewriting compilation hashes - 0.003s
success run queries - 273.994s - 3450/3450 12.59/s
success Building static HTML for pages - 32.656s - 3445/3445 105.49/s
info Generated public/sw.js, which will precache 7 files, totaling 1445083 bytes.
The following pages will be precached:
/offline-plugin-app-shell-fallback/index.html
info Done building in 491.666674589 sec
gatsby build --no-uglify 621.83s user 19.22s system 130% cpu 8:12.21 total Run 2 - createContentDigest$ time gatsby build --no-uglify
success open and validate gatsby-configs - 0.156s
success load plugins - 0.873s
success onPreInit - 0.006s
success delete html and css files from previous builds - 0.008s
success initialize cache - 0.007s
success copy gatsby files - 0.016s
success onPreBootstrap - 0.008s
success createSchemaCustomization - 0.149s
success source and transform nodes - 123.039s
warn There are conflicting field types in your data.
If you have explicitly defined a type for those fields, you can safely ignore this warning message.
Otherwise, Gatsby will omit those fields from the GraphQL schema.
If you know all field types in advance, the best strategy is to explicitly define them with the `createTypes` action, and skip inference with the `@dontInfer` directive.
See https://www.gatsbyjs.org/docs/actions/#createTypes
NPMPackage.types.ts:
- type: boolean
value: false
- type: string
value: 'included'
success building schema - 5.270s
warn code block or inline code language not specified in markdown. applying generic code block
warn unable to find prism language 'sh' for highlighting. applying generic code block
warn unable to find prism language 'console' for highlighting. applying generic code block
warn unable to find prism language ' for highlighting. applying generic code block
warn unable to find prism language '.gitignore' for highlighting. applying generic code block
warn unable to find prism language '.json' for highlighting. applying generic code block
warn unable to find prism language 'grahpql' for highlighting. applying generic code block
warn unable to find prism language 'gatsby-config.js' for highlighting. applying generic code block
warn unable to find prism language 'plain' for highlighting. applying generic code block
success createPages - 25.942s
success createPagesStatefully - 0.252s
success onPreExtractQueries - 0.005s
success update schema - 0.088s
success extract queries from components - 0.752s
success write out requires - 0.032s
success write out redirect data - 0.002s
success Build manifest and related icons - 0.092s
success onPostBootstrap - 0.156s
�
info bootstrap finished - 158.972 s
�
success Building production JavaScript and CSS bundles - 27.425s
success Rewriting compilation hashes - 0.003s
success run queries - 276.823s - 3450/3450 12.46/s
success Building static HTML for pages - 33.062s - 3445/3445 104.20/s
info Generated public/sw.js, which will precache 7 files, totaling 1445083 bytes.
The following pages will be precached:
/offline-plugin-app-shell-fallback/index.html
info Done building in 495.69511604 sec
gatsby build --no-uglify 628.87s user 18.73s system 130% cpu 8:16.20 total Run 3 - createContentDigest$ time gatsby build --no-uglify
success open and validate gatsby-configs - 0.160s
success load plugins - 0.893s
success onPreInit - 0.006s
success delete html and css files from previous builds - 0.008s
success initialize cache - 0.013s
success copy gatsby files - 0.016s
success onPreBootstrap - 0.009s
success createSchemaCustomization - 0.151s
success source and transform nodes - 116.077s
warn There are conflicting field types in your data.
If you have explicitly defined a type for those fields, you can safely ignore this warning message.
Otherwise, Gatsby will omit those fields from the GraphQL schema.
If you know all field types in advance, the best strategy is to explicitly define them with the `createTypes` action, and skip inference with the `@dontInfer` directive.
See https://www.gatsbyjs.org/docs/actions/#createTypes
NPMPackage.types.ts:
- type: boolean
value: false
- type: string
value: 'included'
success building schema - 5.263s
warn code block or inline code language not specified in markdown. applying generic code block
warn unable to find prism language 'sh' for highlighting. applying generic code block
warn unable to find prism language 'console' for highlighting. applying generic code block
warn unable to find prism language ' for highlighting. applying generic code block
warn unable to find prism language '.gitignore' for highlighting. applying generic code block
warn unable to find prism language '.json' for highlighting. applying generic code block
warn unable to find prism language 'grahpql' for highlighting. applying generic code block
warn unable to find prism language 'gatsby-config.js' for highlighting. applying generic code block
warn unable to find prism language 'plain' for highlighting. applying generic code block
success createPages - 25.951s
success createPagesStatefully - 0.236s
success onPreExtractQueries - 0.004s
success update schema - 0.085s
success extract queries from components - 0.744s
success write out requires - 0.035s
success write out redirect data - 0.002s
success Build manifest and related icons - 0.093s
success onPostBootstrap - 0.159s
�
info bootstrap finished - 152.103 s
�
success Building production JavaScript and CSS bundles - 27.888s
success Rewriting compilation hashes - 0.003s
success run queries - 273.915s - 3450/3450 12.60/s
success Building static HTML for pages - 33.649s - 3445/3445 102.38/s
info Generated public/sw.js, which will precache 7 files, totaling 1445083 bytes.
The following pages will be precached:
/offline-plugin-app-shell-fallback/index.html
info Done building in 483.561198363 sec
gatsby build --no-uglify 622.61s user 19.46s system 132% cpu 8:04.09 total |
I think this is good to merge (only snapshot need updating) |
Did you check for performance differences at scale? Slightly concerned about the perf impact of manually walking and reordering a lot / big objects like this. I understand the need, totally support it, and just want to make sure it scales properly :) |
I did run .org, I can run on govbook and check for another large site in your issue. |
A govbook smoke check would be nice :) Thanks! |
Wait, what kind of objects actually go through this digest generator? Like, how is that data impacted at scale? (Does it require many pages, many queries, many images?) govbook is a particular subset so I just want to make sure it's actually the relevant benchmark to run here :) |
Every node needs a contentDigest and mostly passes down a node object so it's used pretty heavily for anything data related. |
Then govbook will serve you well here (: |
Running govbook didn't give me any slow down. Both were averaging around 146s with latest gatsby sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like its good to go in! Thanks as always @wardpeet ❤️
Published in |
Description
Make createContentDigest deterministic. Currently, we use JSON.stringify to convert an object to a string which is fine but not great. Whenever the order of the arguments change the hash differs which bust caches when it shouldn't.
example
{ arg1: 'tst', arg2: 'x' }
is not equal to{arg2: 'x', arg1: 'tst'}
{ arg1: ['tst', 'tst2'], arg2: 'x' }
is not equal to{arg2: 'x', arg1: ['tst2', 'tst']}
The first example should be equal and the second one shouldn't as we can't make sure if arrays are deterministic. Honestly, I feel like it should for our use cases but I didn't want to break anything so I've added an option to make it so.
Jobs (#19831) will be 100% deterministic so I want to sort arrays too.
I've compared hash-object & node-object-hash. It looked like node-object-hash was faster.
Related Issues
#19831