Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write out manifest.json file when the RPC server is HUP'd #2168

Closed
drewbanin opened this issue Feb 28, 2020 · 6 comments · Fixed by #2232
Closed

Write out manifest.json file when the RPC server is HUP'd #2168

drewbanin opened this issue Feb 28, 2020 · 6 comments · Fixed by #2232
Labels
enhancement New feature or request rpc Issues related to dbt's RPC server

Comments

@drewbanin
Copy link
Contributor

Describe the feature

Use case: view the documentation site rendered from the state in the rpc server. Desired flow:

  • edit some model sql, documentation, tests, etc
  • HUP the server
  • reload the index.html file from the target/ dir and see the new docs

Challenges:

  • serializing the manifest can be quite slow. Is this an unavoidable cost, or is there something we can do to support this feature without adding a significant delay to server HUP time?
  • this should work well with partial parsing
  • starting or HUP'ing the server does not actually invoke a compile -- in order to achieve the flow stated above, we would in fact need to invoke a whole dbt compile!

Tagging @beckjake to confirm these challenges above in case I got anything wrong.

Describe alternatives you've considered

I think that the use case described above is absolutely something we should work to support in the future, but it might require us to make server reloading work meaningfully differently. A shorter path to supporting this type of use case could just be to make dbt invocations (like dbt compile, dbt run) write out the manifest (and other artifacts) at the end of the run. This is already supported for dbt docs generate.

Who will this benefit

This will allow us to support docs viewing in the dbt Cloud IDE, as well as a similar flow for anyone else hosting the RPC server in a dev environment.

@drewbanin drewbanin added enhancement New feature or request rpc Issues related to dbt's RPC server labels Feb 28, 2020
@drewbanin drewbanin added this to the Octavius Catto milestone Feb 28, 2020
@beckjake
Copy link
Contributor

serializing the manifest can be quite slow. Is this an unavoidable cost, or is there something we can do to support this feature without adding a significant delay to server HUP time?

On reload, the manifest is built in a background thread. We could just serialize it there too. I'm not sure what we'd do to make serializing not slow - I guess we could replace our current json.dumps with something faster: orjson looks like it's the current state of the art. I think a lot of the time and effort is spent on jinja-rendering (required) and converting data into json (obviously required...). We could write less data to the manifest file, or present an alternative manifest-slim.json, etc. We could even just try calling the to_dict method with the flag that tells it to skip serializing None values!

starting or HUP'ing the server does not actually invoke a compile -- in order to achieve the flow stated above, we would in fact need to invoke a whole dbt compile!

If you want the compiled SQL: Yeah.
If you don't care, we could serialize the parsed manifest before compiling (that's what dbt compile does).

@cmcarthur
Copy link
Member

cmcarthur commented Feb 28, 2020

can we try out orjson (specifically https:/ijl/orjson#dataclass), and see how it performs for manifest serialization before we make any big decisions here?

@beckjake if i remember correctly, the last time we tried to do this, it was specifically json serialization that was slow, and why we decided to use pickles for partial parsing. do you still have any JSON vs. pickle benchmark data we can look at?

if it works as advertised (40-50x speedup on dataclasses) then it seems like it could get us very far.

EDIT: I realize I may be confusing parseresult with compiled manifest... anyway still interested to see how orjson compares

@beckjake
Copy link
Contributor

Well, it says it's 40-50x faster than other json libraries that serialize dataclasses, who knows what those or or what they're up to. I'm not sure it supports all the stuff we use hologram for.

I don't have any benchmark data, I didn't save it. I'm pretty sure it was the json part is what was slow, though of course if you pickle you also get to bypass the hologram to_dict stuff, which is extra bonus. If memory serves, it was not even close for large-ish manifests. I think parse results will be comparable to manifests - the differences are pretty minor, really: parse results are just sort of proto-manifests.

@drewbanin
Copy link
Contributor Author

Let's make this an RPC method that returns the manifest from memory. TODO:

@beckjake
Copy link
Contributor

The manifest that the RPC server has in memory is a manifest of parsed nodes, not compiled - we want the compiled form, right?

@drewbanin
Copy link
Contributor Author

drewbanin commented Mar 20, 2020

Yep - we're going to want the compiled manifest for sure. It's ok if this is slow at first, I bet there will be compelling ways for us to make it faster in the future. For the moment, we should just make the functionality available in the dbt Server and then proceed from there in a future issue if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rpc Issues related to dbt's RPC server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants