-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-344] New context + RefResolver for compile_sql + execute_sql methods #4851
Comments
From this week's edition of "DMing with @dave-connors-3": It's going to be very very common that folks want models/snapshots built off one or more metrics queries. The current Is there any chance this problem is just solved by... [waving hands furiously] ...making it possible to That might solve half of the problem, but leave another half unsolved. Compiling metric code would need to work a bit more like ephemeral model compilation (gross!). And ephemeral models (as explained above) aren't exactly supported by the macro-like use case: "I'm just a visitor to this DAG, not a member." |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers. |
Motivation
#4641 (comment)
It should be possible to run this code as arbitrary SQL, via the
compile_sql
orrun_sql
methods ofdbt.lib
:That should run successfully, without needing to manually specify
-- depends on: {{ ref('dim_customers') }}
— even though this macro does indeed include a nested reference todim_customers
, which cannot be known at parse time.If the same code were to be included in a model, it should require the manual reference, since it now has implications for DAG ordering:
dbt was unable to infer all dependencies...
Background
The
compile_sql
+execute_sql
methods ofdbt.lib
render arbitrary dbt-SQL using the context produced bygenerate_runtime_model_context
, which usesRuntimeRefResolver
to resolveref()
. The basic codepath:task.sql.GenericSqlRunner.compile
→compile_node
→_compile_node
→_create_node_context
)If
RuntimeRefResolver
finds aref
at runtime that the current node didn't capture as a dependency at parse time, it raises an error. When defined inside a DAG node, that makes sense! We ask users to explicitly specify refs, to ensure that the dependency is captured at parse time, and the DAG runs in the right order (docs). This all works well.When compiling/running a block of arbitrary SQL, outside the context of a DAG-running task, it makes a lot less sense. The SQL block simply isn't a DAG node. To that end, we support
OperationRefResolver
, fordbt run-operation
, which does not care about finding aref
it didn't expect to be there (by skippingvalidate
).The catch:
OperationRefResolver
does not supportref()
of ephemeral models:dbt-core/core/dbt/context/providers.py
Lines 474 to 481 in c251dae
Operations haven't been able to
ref()
ephemeral models for a long time (the comment goes back to #2085). This has to do with the compilation behavior of ephemeral models—they're really pointers that get stuck onto the referencing model, to be compiled/run when that model is compiled/run. That doesn't work withParsedMacro
today.That's an ongoing bummer for
dbt run-operation
, but it's actually okay in this case! Thecompile_sql
+run_sql
methods createCompiledSqlNode
(node type:SqlOperation
), which inherits fromCompiledNode
and does understand how to add CTEs.Implementation details
I think we could pick one of two paths here:
OperationRefResolver
should check the class/node type ofself.model
, or better yethasattr(self.model, "set_cte")
, before assuming that ephemeral models cannot be referencedSqlOperation
node type ("compile/run this arbitrary SQL") deserves its own provider context, which serves as a halfway point between the standard runtime for DAG nodes and the operation runtime for macros.In either case,
_create_node_context
needs to know when and how to create the alternate context. Plumbing that through will be the challenge (and testing this, of course, given that we don't currently have any test coverage indbt-core
for theselib
methods). By way of demonstration, here's a janky first stab with minimal code changes: feb37c5. I made the change for bothCompiledSqlNode
+CompiledRPCNode
, since they're very very similar, anddbt-rpc
was easier for me to test locally.Example
With an ephemeral model named
ephemeral
, and a macro like:SQL block:
This works in neither a model nor a macro operation.
Using the RPC
run_sql
method, before the code change:After the code change in feb37c5:
Questions
Should the rule of thumb be: Any code you can run as arbitrary SQL, you could also stick verbatim in a model (DAG node)? Or any code you can run as arbitrary SQL, you could also run as a macro operation?
The proposed answer in this issue is neither: you could
ref
things without worrying about DAG implications (no dice for models), and you canref
ephemeral models (no dice for macro operations... until we fix that, someday).Note: there are cases where this may feel awkward. The way the dbt Cloud IDE works today, when code is written in a model, and passed to compile SQL / preview data, it would work—and then would quickly fail, with the exact same code, in a subsequent
dbt run
. (That feels like an issue worth solving, by compiling that SQL as a model node, rather than a chunk of arbitrary SQL, for lots of other reasons too: dbt-labs/dbt-rpc#46, #3931)The text was updated successfully, but these errors were encountered: