[CT-344] New context + RefResolver for compile_sql + execute_sql methods #4851

jtcohen6 · 2022-03-10T11:18:47Z

Motivation

It should be possible to run this code as arbitrary SQL, via the compile_sql or run_sql methods of dbt.lib:

select * 
from {{ metrics.metric(
    metric_name='new_customers',
    grain='week',
    dimensions=['plan', 'country']
) }}

That should run successfully, without needing to manually specify -- depends on: {{ ref('dim_customers') }} — even though this macro does indeed include a nested reference to dim_customers, which cannot be known at parse time.

If the same code were to be included in a model, it should require the manual reference, since it now has implications for DAG ordering: dbt was unable to infer all dependencies...

Background

The compile_sql + execute_sql methods of dbt.lib render arbitrary dbt-SQL using the context produced by generate_runtime_model_context, which uses RuntimeRefResolver to resolve ref(). The basic codepath: task.sql.GenericSqlRunner.compile → compile_node → _compile_node → _create_node_context)

If RuntimeRefResolver finds a ref at runtime that the current node didn't capture as a dependency at parse time, it raises an error. When defined inside a DAG node, that makes sense! We ask users to explicitly specify refs, to ensure that the dependency is captured at parse time, and the DAG runs in the right order (docs). This all works well.

When compiling/running a block of arbitrary SQL, outside the context of a DAG-running task, it makes a lot less sense. The SQL block simply isn't a DAG node. To that end, we support OperationRefResolver, for dbt run-operation, which does not care about finding a ref it didn't expect to be there (by skipping validate).

The catch: OperationRefResolver does not support ref() of ephemeral models:

dbt-core/core/dbt/context/providers.py

Lines 474 to 481 in c251dae

 # In operations, we can't ref() ephemeral nodes, because 

 # ParsedMacros do not support set_cte 

 raise_compiler_error( 

 "Operations can not ref() ephemeral nodes, but {} is ephemeral".format( 

 target_model.name 

 ), 

 self.model, 

 )

Operations haven't been able to ref() ephemeral models for a long time (the comment goes back to #2085). This has to do with the compilation behavior of ephemeral models—they're really pointers that get stuck onto the referencing model, to be compiled/run when that model is compiled/run. That doesn't work with ParsedMacro today.

That's an ongoing bummer for dbt run-operation, but it's actually okay in this case! The compile_sql + run_sql methods create CompiledSqlNode (node type: SqlOperation), which inherits from CompiledNode and does understand how to add CTEs.

Implementation details

I think we could pick one of two paths here:

OperationRefResolver should check the class/node type of self.model, or better yet hasattr(self.model, "set_cte"), before assuming that ephemeral models cannot be referenced
The SqlOperation node type ("compile/run this arbitrary SQL") deserves its own provider context, which serves as a halfway point between the standard runtime for DAG nodes and the operation runtime for macros.

In either case, _create_node_context needs to know when and how to create the alternate context. Plumbing that through will be the challenge (and testing this, of course, given that we don't currently have any test coverage in dbt-core for these lib methods). By way of demonstration, here's a janky first stab with minimal code changes: feb37c5. I made the change for both CompiledSqlNode + CompiledRPCNode, since they're very very similar, and dbt-rpc was easier for me to test locally.

Example

With an ephemeral model named ephemeral, and a macro like:

{% macro nested_ref() %}

  {% if execute %}
    select * from {{ ref('ephemeral') }}
  {% endif %}

{% endmacro %}

SQL block:

{{ nested_ref() }}

This works in neither a model nor a macro operation.

Using the RPC run_sql method, before the code change:

Compilation Error in rpc my_first_query (from remote system)\\n  dbt was unable to infer all dependencies for the model \"my_first_query\".\\n  This typically happens when ref() is placed within a conditional block.\\n  \\n  To fix this, add the following hint to the top of the model \"my_first_query\":\\n  \\n  -- depends_on: {{ ref(\\'ephemeral\\') }}\\n  \\n  > in macro nested_ref (macros/select_from_eph.sql)\\n  > called by rpc my_first_query (from remote system)\\n  > called by rpc my_first_query (from remote system)

After the code change in feb37c5:

\n\n  \n    with __dbt__cte__ephemeral as (\n\n\nselect 1 as id\n)select * from __dbt__cte__ephemeral\n  \n\n

Questions

Should the rule of thumb be: Any code you can run as arbitrary SQL, you could also stick verbatim in a model (DAG node)? Or any code you can run as arbitrary SQL, you could also run as a macro operation?

The proposed answer in this issue is neither: you could ref things without worrying about DAG implications (no dice for models), and you can ref ephemeral models (no dice for macro operations... until we fix that, someday).

Note: there are cases where this may feel awkward. The way the dbt Cloud IDE works today, when code is written in a model, and passed to compile SQL / preview data, it would work—and then would quickly fail, with the exact same code, in a subsequent dbt run. (That feels like an issue worth solving, by compiling that SQL as a model node, rather than a chunk of arbitrary SQL, for lots of other reasons too: dbt-labs/dbt-rpc#46, #3931)

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2022-03-15T20:45:50Z

From this week's edition of "DMing with @dave-connors-3":

It's going to be very very common that folks want models/snapshots built off one or more metrics queries. The current metrics package macros will break DAG lineage; we should seek to mend it, following the thrust of this proposal.

Is there any chance this problem is just solved by... [waving hands furiously] ...making it possible to ref metrics, as real DAG nodes? Then the dependency on the underlying model would be passed along, like any parent's parent?

That might solve half of the problem, but leave another half unsolved. Compiling metric code would need to work a bit more like ephemeral model compilation (gross!). And ephemeral models (as explained above) aren't exactly supported by the macro-like use case: "I'm just a visitor to this DAG, not a member."

github-actions · 2022-09-12T02:15:26Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions · 2022-09-19T02:15:57Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

jtcohen6 added enhancement New feature or request Team:Language labels Mar 10, 2022

github-actions bot changed the title ~~New context + RefResolver for compile_sql + execute_sql methods~~ [CT-344] New context + RefResolver for compile_sql + execute_sql methods Mar 10, 2022

jtcohen6 mentioned this issue Mar 10, 2022

[CT-109] [Feature] Show a hint in the error message when trying to access the graph whenever execute = False #4641

Closed

1 task

jtcohen6 mentioned this issue Mar 23, 2022

[CT-376] add ability to calculate metrics based off of other metrics #4884

Closed

jtcohen6 mentioned this issue Apr 7, 2022

[CT-466] [Feature] Make run-operation accept selectors to be able to use the selected_resources Jinja variable #5005

Open

1 task

github-actions bot added the stale Issues that have gone stale label Sep 12, 2022

github-actions bot closed this as completed Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-344] New context + RefResolver for compile_sql + execute_sql methods #4851

[CT-344] New context + RefResolver for compile_sql + execute_sql methods #4851

jtcohen6 commented Mar 10, 2022 •

edited

Loading

jtcohen6 commented Mar 15, 2022

github-actions bot commented Sep 12, 2022

github-actions bot commented Sep 19, 2022

[CT-344] New context + RefResolver for compile_sql + execute_sql methods #4851

[CT-344] New context + RefResolver for compile_sql + execute_sql methods #4851

Comments

jtcohen6 commented Mar 10, 2022 • edited Loading

Motivation

Background

Implementation details

Example

Questions

jtcohen6 commented Mar 15, 2022

github-actions bot commented Sep 12, 2022

github-actions bot commented Sep 19, 2022

jtcohen6 commented Mar 10, 2022 •

edited

Loading