Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BLOCKING] Define what values end up in the rendering context, and where #1255

Closed
beckjake opened this issue Jan 23, 2019 · 8 comments · Fixed by #2085
Closed

[BLOCKING] Define what values end up in the rendering context, and where #1255

beckjake opened this issue Jan 23, 2019 · 8 comments · Fixed by #2085
Assignees

Comments

@beckjake
Copy link
Contributor

Feature

Currently, rendering in dbt is pretty ad-hoc. We should define what is rendered, where, and with what context very explicitly. It might end up being complex or layered, but at least having it defined and somewhat consistent should be possible.

Feature description

The status quo:

  • description fields in schema.yml are rendered in a context that only supports doc
  • The dbt_project.yml rendering supports var and env_var, but does not render any hooks or vars fields (they're used elsewhere)
  • Fields in the sources section of schema.yml are rendered with var and env_var available.
  • Fields in the models section of schema.yml are generally not rendered at all, except descriptions as described above.
  • Models/analyses get a full context for rendering/execution

We should define some rules for these things and formally document them. I think a good start would be:

  • var and env_var are availble in all rendering contexts
  • doc is additionally available in description field rendering
  • models gets treated the same as sources in schema.yml
  • target is available in schema.yml - in dbt_project.yml it's kind of a dependency ordering issue, since we potentially use information in there to pick the target!

@drewbanin any feedback on other variables you'd expect to be available in schema.yml? It would be nice if we had some sort of user-facing documentation on what you can use and where. I've implemented a lot of this stuff and I find it pretty confusing myself...

Who will this benefit?

developers, end users, anyone who wants to understand what dbt is doing with their config.

@beckjake beckjake added this to the Stephen Girard milestone Jan 24, 2019
@cmcarthur
Copy link
Member

cmcarthur commented Jan 24, 2019

var and env_var are availble in all rendering contexts

I'm on board, but we need to document how var precedence and scoping works.

@beckjake
Copy link
Contributor Author

Yeah! A great example is: should vars defined or overridden in dbt_project.yml's models section influence the behavior of var statements in schema.yml's models section? What about for tests defiend in that models section? On the one hand, that could be really useful and has obvious applications. On the other hand, I think the results will be extremely unintuitive: Tests would not get vars from their target models, but description fields, etc would.

I would prefer to only support cli-supplied vars in most rendering contexts, and allow scoped var declarations in models and such only.

@drewbanin
Copy link
Contributor

@beckjake strong agree that we should document these contexts more clearly. I'm all for documenting the current state of things (even if it's dumb, unintuitive) but I don't think we should actually mutate any contexts right now.

If anything, I want to bias in favor of restricting the available context as much as possible (for now). We ran into this with var() usage in generate_schema_name(). That's one are where there was indeed a var variable in the context, but it was pretty useless! I suspect the same is true in other places throughout the codebase.

When we do document this, I want to document only the set of context variables that we're sure we can support well going forwards. That might mean things like not documenting ref inside of macros, which works today, but might not in the future. vars really throw me for a loop - their scoping makes no sense for nonstandard applications (what does var('abc') resolve to in an Analysis or a Custom Data Test?).

I'm very on board with documenting the craziness, but let's produce two sorts of descriptions:

  1. for dbt developers, indicating the state of the world as-is right now
  2. for dbt users, indicating how they should write code in their dbt projects

The differences between the two will point out where things are inconsistent/poorly defined. We should then queue up work to resolve these inconsistencies. @beckjake if you can do the first part, I can do the second part :)

@beckjake
Copy link
Contributor Author

beckjake commented Jan 24, 2019

That might mean things like not documenting ref inside of macros, which works today, but might not in the future.

refs work in macros because it's pure text substitution, not because we interpret them. I don't imagine that changing!

vars really throw me for a loop - their scoping makes no sense for nonstandard applications (what does var('abc') resolve to in an Analysis or a Custom Data Test?).

Without checking, pretty sure that if you've defined abc on the cli it will work and otherwise it'll complain that the variable doesn't exist (which is why I think a global-scoped vars in dbt_project.yml would not be a bad idea).

But I agree, that's why I think we should only support cli-defined vars in most of these places (it is super useful for tests, at a minimum) and only do the SourceConfig stuff in rendered models that support {{ config(...) }} calls.

I'll do my best to document the status quo over the next few days. As you indicate, var scoping is arcane at best!

@cmcarthur cmcarthur changed the title Define what values end up in the rendering context, and where [BLOCKING] Define what values end up in the rendering context, and where Feb 6, 2019
@drewbanin
Copy link
Contributor

drewbanin commented Mar 3, 2019

I just kicked this into Wilt Chamberlain. We should do it, but I just need to find some time to write everything down (and figure out a sensible format, since it's a lot of data!)

Would it be reasonable to unit test the context? I'm thinking we could write a contract, then use that contract to generate (even manually!) the docs. Otherwise, it's kind of a mess to keep everything straight everywhere.

This doesn't really handle things like the difference in var scoping across dbt_project.yml and models, but I think we can figure something out there...

@beckjake
Copy link
Contributor Author

beckjake commented Mar 4, 2019

Would it be reasonable to unit test the context? I'm thinking we could write a contract, then use that contract to generate (even manually!) the docs. Otherwise, it's kind of a mess to keep everything straight everywhere.

Yes, I think that's a fantastic idea!

@kconvey
Copy link
Contributor

kconvey commented Oct 29, 2019

Just a quick thought:

But I agree, that's why I think we should only support cli-defined vars in most of these places (it is super useful for tests, at a minimum) and only do the SourceConfig stuff in rendered models that support {{ config(...) }} calls.

I think this would be a really useful jumping off point for consistently giving at least a minimal scope to these edge cases you've mentioned, but is there any reason why anything cli-defined should not be included in this minimal scope? A real case where this could be useful is sources that are dependent on target (which may be redundant to mirror as a var to ensure schema.yml has this context).

Taking the sources example a step farther, it could be that a target or var(s) (specified at the command line) determine some logic / several other vars that in turn influence how sources should be handled. At present, doing something like this would require some pretty ugly Jinja embedded in schema.yml and / or more CLI variables whose logic you have to enforce elsewhere.

It would be extra nice if there was a way to package logic like this (macros?) that would depend only on what is cli-defined, and be in the scope of some of these edge cases. I've heard mentioned that there may be some work towards "pure" macros which don't have dependency on .yml, and could then be used in .yml. Is there an open issue / any update for something like this? Also curious to hear any update on this issue / any thoughts on broadening it to include anything CLI defined.

@beckjake
Copy link
Contributor Author

I think this would be a really useful jumping off point for consistently giving at least a minimal scope to these edge cases you've mentioned, but is there any reason why anything cli-defined should not be included in this minimal scope? A real case where this could be useful is sources that are dependent on target (which may be redundant to mirror as a var to ensure schema.yml has this context).

Yeah! I think a lot about this, target should be available almost everywhere. One nit: the target is not purely cli-defined: The dbt_project.yml specifies a profile that is overridden by the CLI. So we can't render all of dbt_project.yml with a simple rule that includes the target, since we need the profile field to determine what our target even is! This is usually where we get bogged down carving out exclusions and decide to deal with this later. Also, currently target in any context includes a config attribute, which we can't have until we've fully parsed+rendered dbt_project.yml, but people want the target to decide how dbt renders dbt_project.yml - so that probably has to change too!

Taking the sources example a step farther, it could be that a target or var(s) (specified at the command line) determine some logic / several other vars that in turn influence how sources should be handled. At present, doing something like this would require some pretty ugly Jinja embedded in schema.yml and / or more CLI variables whose logic you have to enforce elsewhere.

It would be extra nice if there was a way to package logic like this (macros?) that would depend only on what is cli-defined, and be in the scope of some of these edge cases. I've heard mentioned that there may be some work towards "pure" macros which don't have dependency on .yml, and could then be used in .yml. Is there an open issue / any update for something like this? Also curious to hear any update on this issue / any thoughts on broadening it to include anything CLI defined.

"pure macros" are an idea I bring up periodically during planning meetings, usually in the context of this issue and stuff like it... There's no issue for them because it's kind of a half-baked idea with a lot of tiny annoying aspects to consider. Off the top of my head:

  • Where do they live?
  • Do we distinguish them from non-pure macros? If so, how? If not, how do we decide what macros are available in what context?
  • What would be allowed in a "pure macro" beyond var, env_var, and target? The bigger your answer to this, the fewer fields you can specify using pure macros.
  • dbt's initial config parsing step is already quite complicated (don't even get me started on the hoops dbt debug has to jump through!) - how much worse will this make it?
  • How early in dbt's init process can we start calling pure macros, given the various answers we pick here?
  • Are there multiple tiers of pure-ness in macros, depending upon what they need? Is this really a capabilities-based system?
  • if pure macros can be used in dbt_project.yml (which seems like the point!), can you specify where to look for pure macros in dbt_project.yml?
  • What if I want to write a pure macro that tells dbt where to find pure macros based on the target? Obviously that can't be allowed (the same issue comes up with the profile field) - so how far does that go? Do we only allow pure macros in certain fields, and the *-path fields + the profile field are "extra-pure"? Can you render anything at all in these special fields, or not even CLI vars?

So yeah, there's my partial brain-dump on why this is so complicated! If you wrote up an issue for "pure macros" and specify what you think the answers to some of these questions should be, I'd love it! I can't guarantee we'd implement it, but user feedback would definitely help us make progress on this front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants