Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBT v0.18 introduced data processed. Can DBT_ML do this too? #10

Closed
switzer opened this issue Oct 9, 2020 · 4 comments
Closed

DBT v0.18 introduced data processed. Can DBT_ML do this too? #10

switzer opened this issue Oct 9, 2020 · 4 comments
Labels
question Further information is requested

Comments

@switzer
Copy link

switzer commented Oct 9, 2020

Now in DBT 0.18, when executing dbt run, you get a message stating the amount of data processed, as follows:

09:27:32 | 4 of 19 OK created incremental model dbt_prod.my_table [MERGE (16.6m rows, 354.1 GB processed) in 48.58s]

When running a DBT_ML model, the message is similar to the following:

14:01:16 | 1 of 1 OK created model model dbt_prod.mdl_my_model................... [OK in 496.66s]

Can you add the amount of data processed as well, as is done in DBT?

@rbjerrum rbjerrum added the enhancement New feature or request label Oct 12, 2020
@rbjerrum
Copy link
Collaborator

Thank you for opening this issue @switzer! Unfortunately the bytes processed is retrieved from the google.cloud.bigquery.QueryJob object and is not available in the Jinja-context that packages have access to. In order to have this functionality, the BigQuery adapter would need to have a condition on the CREATE_MODEL-statement type when handling the response from BigQuery like it is done for other statement types.

@jtcohen6 What do you think about making a change like this to the BigQuery adapter?

@rbjerrum rbjerrum added question Further information is requested and removed enhancement New feature or request labels Oct 12, 2020
@rbjerrum
Copy link
Collaborator

Related to dbt-labs/dbt-core#2747, currently planned for dbt v0.19.

@jtcohen6
Copy link
Contributor

the BigQuery adapter would need to have a condition on the CREATE_MODEL-statement type when handling the response from BigQuery like it is done for other statement types.

@rbjerrum I'd welcome this change to the dbt-bigquery plugin!

As you noted, in v0.19 we're seeking to implement a more generalizable solution for storing adapter-specific structured data in run_results.json. Right now, bytes processed is just stored as part of the status string. We still need to figure out the exact mechanism through which that information will be shared both to the artifact and CLI output.

@switzer
Copy link
Author

switzer commented Sep 8, 2024

Note - as you say, this has changed in the latest version of DBT. My model build process now response as follows:

1 of 1 OK created sql model model dbt_dev_ml.my_ml_model .... [None (2.2 GiB processed) in 160.37s]

Closing this issue.

@switzer switzer closed this as completed Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants