Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slim down the manifest.json artifact #2169

Closed
drewbanin opened this issue Feb 28, 2020 · 3 comments · Fixed by #2237
Closed

Slim down the manifest.json artifact #2169

drewbanin opened this issue Feb 28, 2020 · 3 comments · Fixed by #2237
Labels
enhancement New feature or request

Comments

@drewbanin
Copy link
Contributor

Describe the feature

We've seen some pretty big manifests in the wild. The size of this file is necessarily going to scale linearly with the number of objects in your project + lines of code, but if there are obvious things we should do to slim down this file size, we should do them.

Ideas:

  • remove duplicated/unused sql statements (raw/injected/wrapped - do we always need all of these? or are some no longer sensible?)
  • remove file_contents from docs nodes
  • remove raw_sql from macro nodes

Who will this benefit?

This should make websites (like the docs site) more responsive and faster to load

@drewbanin drewbanin added the enhancement New feature or request label Feb 28, 2020
@drewbanin drewbanin added this to the Octavius Catto milestone Feb 28, 2020
@beckjake
Copy link
Contributor

remove duplicated/unused sql statements (raw/injected/wrapped - do we always need all of these? or are some no longer sensible?)

#2220 removes wrapped_sql. Given that PR, I think removing injected_sql should be straightforward - we can just overwrite the old compiled_sql instead of setting injected_sql. Whenever that merges I'll rebase on it and get rid of injected as well.

remove file_contents from docs nodes

This was as straightforward as it seemed.

remove raw_sql from macro nodes

Currently we use this in the template caching code. I've remove it from parsed macros by making the cache operate on the macro_sql instead of the full file. I'm not sure what impact that will have on performance in the average dbt project (positive or negative). I think it might hurt error messages, too.

Some other things I thought of:

  • We should also consider if we want to flip the omit_none flag on the manifest.
  • We should find a way to minimize repetitive config blocks/node fields by omitting empty dicts/lists/objects. post-hooks: [] should just result in an omitted post-hooks config item, etc. Same goes for empty metadata, depends_on, column info, refs, ...
  • Seeds don't need sql-related fields
  • trim out path vs original_file_path vs root_path vs seed_file_path
    • root_path probably belongs somewhere else entirely in the manifest, in a project-name-to-project-metadata mapping. That would save some repetition!

@beckjake
Copy link
Contributor

beckjake commented Mar 24, 2020

Some preliminary work:

|-------------------------------|----------------------|----------------------|
|            desc               |     size (bytes)     | % of octavius-catto  |
|-------------------------------|----------------------|----------------------|
|   dev/octavius-catto          |        867056        |         100          |
|-------------------------------|----------------------|----------------------|
|  No file_contents or raw_sql  |        174359        |         20.1         |
|-------------------------------|----------------------|----------------------|
|   above, plus omit_none=True  |        170837        |         19.7         |
|-------------------------------|----------------------|----------------------|
| no file_contents or raw_sql,  |        172927        |         19.9         |
| rebased to #2220              |                      |                      |
|-------------------------------|----------------------|----------------------|
|   above, plus omit_none=True  |        169531        |         19.6         |
|-------------------------------|----------------------|----------------------|

The size comes from rm -rf target; dbt run && dbt docs generate && ls -l target/manifest.json

I spent a bit of time on the omitting defaults code - it was less helpful than I hoped and involved some breaking changes to how hologram serializes things, so that's probably a dead end.

Significant further changes are probably going to have really diminishing returns.

@drewbanin
Copy link
Contributor Author

@beckjake awesome! An 80% reduction from removing file_contents and raw_sql sounds like a really awesome start. I have some reservations about using omit_none=True on principal, so it's convenient that it doesn't provide a meaningful lift here :)

Can you open up a PR for this change?

@beckjake beckjake mentioned this issue Mar 24, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants