Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slim CI runs, via state comparison and deferred refs #2641

Closed
jtcohen6 opened this issue Jul 22, 2020 · 5 comments · Fixed by #2695
Closed

Slim CI runs, via state comparison and deferred refs #2641

jtcohen6 opened this issue Jul 22, 2020 · 5 comments · Fixed by #2695
Labels
enhancement New feature or request state Stateful selection (state:modified, defer)

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 22, 2020

Slim CI: MVP

For 0.18.0, we want to prioritize making the following use case possible:

dbt run -m state:modified+ --defer --state prod-target/

What does this do?

  1. --state prod-target/: Compare the current project against the manifest specified in the --state arg (Add pseudo selectors that select models based on artifact states #2465)
  2. state:modified+: Select the set of model nodes that are not present in the state manifest, or differ from the same-named model nodes in state manifest, and their children
  3. --defer: Compile any ref() in a selected model that points to a node not in the selection group by using the state manifest, instead of the default (target-informed) namespace (Defer to prod manifest for partial runs #2527)
  4. Run all selected models

In terms of the user-side setup required for the V1 of this, we expect:

  1. dbt Cloud will have a behind-the-scenes mechanism to accept user input (job_id) and surface up-to-date artifacts, either:
    • to a relative path that is predictable, stable, and git-ignored (e.g. .dbt-cloud/)
    • to an absolute path that is encoded in an env var
  2. In Airflow/CircleCI/etc, in the steps before running dbt, users write bash commands to grab the artifact files from wherever they are and copy them to wherever they need to be

The underlying constructs open a pandora's box of questions that we'll need to continue thinking through:

  • Deferring refs in ephemeral models? Is this a performance killer during compilation?
  • How do we reflect "deferral" in dbt artifacts? The run results should reflect that the model selected from the "deferred" namespace. What should the manifest do?
  • What about models that should never/always be deferred? We want to be wary of global flags to control behavior which users really want to control on a per-model basis (see: --full-refresh).
  • Are there cases where someone wants to compare state against one manifest and defer references to a different one?
  • Should dbt offer handy mechanisms to move artifacts around?

There's more thinking to do. We're not going to block a first version of this by worrying about all the cascading implications.

Related art

As a selector (#2640):

selectors:
  - name: delta_plus
     definition:
       method: state   # this would return an error if `--state` is not passed
       value: modified
       children: true

As a workflow (#1842):

workflows:
  - name: slim_ci
    steps:
      # assuming the prod manifest is already in a folder called `prod-target`
      - name: run_changed
        description: Run changed models, parents, children
        command:
          subcommand: run
          selector: delta_plus
          defer: true
          state: 'prod-target'
      - name: good_tests
        description: Run my good tests
        command:
          subcommand: test
          selector: delta_plus
          on_error: continue
      - name: generate_docs
        description: Make my docs
        command:
          subcommand: docs generate

This is effectively a Makefile, with a few added benefits:

  • Define dbt subcommands as dicts instead of CLI arg syntax
  • Control on_error behavior: continue to next workflow step, or stop the workflow altogether

I feel good about adding this functionality, and the syntax above feels reasonable. The big unknowns are (a) how people will use these, and (b) how dbt Cloud could compellingly hook into these. If possible, I'd like to add very initial support for workflows in 0.18.0 and leave it undocumented / call it out as liable to change.

@jtcohen6 jtcohen6 added the enhancement New feature or request label Jul 22, 2020
@jtcohen6 jtcohen6 added this to the Marian Anderson milestone Jul 22, 2020
@beckjake
Copy link
Contributor

beckjake commented Jul 22, 2020

For --defer and --state, does defer require state? Does --state just default to target/manifest.json if you provide a --defer?

How do we reflect "deferral" in dbt artifacts? The run results should reflect that the model selected from the "deferred" namespace. What should the manifest do?

If we don't mind destructive updates:
We can add a deferred flag to all nodes that defaults to False. The deferral managing code can update the manifest to clobber all the old nodes with the deferred ones, and flip the flag in the process.

What about models that should never/always be deferred? We want to be wary of global flags to control behavior which users really want to control on a per-model basis (see: --full-refresh).

I think full-refresh at least is fine. dbt will do the right thing in the models that are actually running, and will use good data from models that aren't (because presumably it ran successfully). Right? I don't see any reason why a model should never be deferred.

Control on_error behavior: continue to next workflow step, or stop the workflow altogether

we should offer resume next as an alias to continue so I can get that nice warm visual basic feeling

@jtcohen6
Copy link
Contributor Author

Thanks for taking a look so quickly @beckjake!

For --defer and --state, does defer require state? Does --state just default to target/manifest.json if you provide a --defer?

In the first version, let's require that --state have an explicit argument for any uses of the --defer flag or state: selectors. I'm mostly sure that it's desirable behavior for --state to point to target/ by default, but @drewbanin and I went in some mental circles trying to think through this.

If we don't mind destructive updates:
We can add a deferred flag to all nodes that defaults to False. The deferral managing code can update the manifest to clobber all the old nodes with the deferred ones, and flip the flag in the process.

Flipping a deferred flag is what I had in mind. The question is whether to clobber the old nodes, or to keep the original node and append the deferred node's path as an additional field. What if you deferred-run off a previous deferred run's manifest? I think clobbering is fine; I think things would work as expected. Since we're not supporting --defer or --state for dbt docs generate, none of this should impact documentation.

I don't see any reason why a model should never be deferred.

At this moment, I don't either. A member of the community may come up with a great reason, at which point we can adapt accordingly.

@beckjake
Copy link
Contributor

Here's a neat wrinkle that will require a manifest change!

Currently, dbt doesn't store what the database form of a node should look like as a string when it runs. That means that given an existing manifest, the best we can do with the existing manifest is to tell you what the stored node would have been, if you loaded it with the current project's config - in particular, quoting.

That's probably too restrictive! Instead, we should add a field to compiled nodes that indicates what the node's "resolved name" is. I'm pretty sure this can be filled in at compile-time, with something like str(adapter.Relation.create_from(config, node)).

Should that be part of this issue, or should we handle it separately?

@jtcohen6
Copy link
Contributor Author

Let's open a new issue! It sounds like that change could be quite self-contained, and that it would simplify #2643 significantly.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Aug 5, 2020

Custom error code if --state/state env var missing, to pick up in dbt Cloud. Let's do the same with --defer as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request state Stateful selection (state:modified, defer)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants