Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add include_data_types argument to generate_model_yaml macro #122

Merged
merged 26 commits into from
Sep 25, 2023

Conversation

linbug
Copy link
Contributor

@linbug linbug commented Apr 7, 2023

resolves dbt-labs/dbt-core#120

Also:

  • updates the default include_data_types value for generate_source to True
  • lowers returned data types for generate_source and generate_model_yaml

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

Checklist

  • This code is associated with an issue which has been triaged and accepted for development.
  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@linbug linbug changed the title Add column data types Add include_data_types argument to generate_model_yaml macro Apr 8, 2023
@linbug linbug marked this pull request as ready for review April 14, 2023 16:58
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linbug Thank you for picking this up! This will be a great addition for folks who start using model contracts in v1.5

{% if parent_column_name %}
{% set column_name = parent_column_name ~ "." ~ column.name %}
{% else %}
{% set column_name = column.name %}
{% endif %}

{% do model_yaml.append(' - name: ' ~ column_name | lower ) %}
{% if include_data_types %}
{% do model_yaml.append(' data_type: ' ~ column.data_type | lower) %}
Copy link
Contributor

@jtcohen6 jtcohen6 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the adapter / data platform, we may want this to be column.dtype instead of column.data_type — I believe that will enable this to return text or varchar instead of character varying(1234).

We could use the dbt.format_column macro, which is also what's used when doing the column type assertion for contracted models.

{% set formatted = adapter.dispatch('format_column', 'dbt')(column) %}
{% do model_yaml.append('        data_type: ' ~ formatted['data_type'] | lower) %}

If we do make this change, the format_column macro is new in v1.5, so this would require a version bump to codegen and to its require-dbt-version

Copy link
Contributor

@dbeatty10 dbeatty10 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 good point about column.dtype vs. column.data_type!

Using dbt.format_column macro

I like your idea of using the logic within the dbt.format_column macro. I think we should aim to preserve the wide range of require-dbt-version: [">=1.0.0", "<2.0.0"] though.

There's multiple approaches we could take to use this logic into codegen without forcing an upgrade to 1.5.0. The quick'n'dirty option is to "vendor" it by just copy-pasting it into codegen. Another option is to inspect the dbt version, and fall back to a vendored version of the macro if it is less than 1.5.0.

Difference between dtype and data_type

My understanding of the difference between the two is dtype gives the database-specific data type with no size, scale, or precision (like varchar or decimal) whereas data_type is intended to give the database-specific data type with those included (like varchar(80) or decimal(18, 2)).

Although dbt model contracts can operate with either input, it will effectively ignore any size/precision/scale that is supplied.

So best would be to exclude size/precision/scale to avoid any false impressions that it will be verified as part of the dbt model contract.

💡 We should fix column.data_type so its value is always valid. It feels very doable to make it work across adapters, and relevant issues opened here, here, and here.

💡 Once column.data_type is fixed, we should consider expanding dbt.format_column() to include it as well.

Copy link
Contributor Author

@linbug linbug Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I added this which I feel should work, but for some reason my tests fail locally because formatted is empty. My dbt version is 1.5.0-b5 which is maybe pre-release and doesn't have dbt.format_columns yet? If you have suggestions for getting around this please let me know, otherwise I can use the vendored version for everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 @jtcohen6 any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dave-connors-3 @benmosher we may want to get this across the line now that generate model is live in the IDE. would be a good improvement to the functionality.

README.md Show resolved Hide resolved
@@ -62,7 +62,7 @@
{% for column in columns %}
{% do sources_yaml.append(' - name: ' ~ column.name | lower ) %}
{% if include_data_types %}
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | upper ) ) %}
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | lower ) ) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we may want to use .dtype instead of .data_type, depending on the data platform — and the format_column macro (new in v1.5) could be our friend here.

@linbug linbug requested review from jtcohen6 and dbeatty10 May 3, 2023 22:00
@xrpza
Copy link

xrpza commented Jul 20, 2023

Are you looking for additional contributors on this to help keep it moving? Happy to help however I can but obviously don't want to inject myself where I shouldn't.

@dave-connors-3
Copy link
Contributor

@linbug -- this looks great -- if you are able to resolve the conflicts, we can do a final review and release on this!

@jenna-jordan
Copy link

@linbug -- this looks great -- if you are able to resolve the conflicts, we can do a final review and release on this!

Jumping in as someone who has been watching the PR for over a month now - I'm so excited that this is finally going to be merged in! This is a feature I've been waiting for before our team implements contracts, thank you all so much for working on this!

@thomas-vl
Copy link

@linbug do you need any help getting this PR over the finish line? If you don't have time I would love to help!

@dbeatty10
Copy link
Contributor

I'm going to take a shot at resolving the merge conflicts and then do final review.

@gwenwindflower
Copy link
Contributor

@dbeatty10 you are a champion of life

@thomas-vl
Copy link

I'm going to take a shot at resolving the merge conflicts and then do final review.

Let me know if you need help.

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for contributing this @linbug!

My apologies to you and everyone else that have been waiting for this to be merged 🙏

I pushed two main changes:

  1. b2f463c - rely only on the "vendored" version of format_column (rather than using dbt.default__format_column)
    • comparing dbt versions robustly is more trouble than it was worth in this case, so this change greatly simplified things
  2. 6f6aea6 - allow end users to configure their own data type formatting by using multiple dispatch to override the data_type_format_source and/or data_type_format_model macros.

@dbeatty10 dbeatty10 merged commit da3731b into dbt-labs:main Sep 25, 2023
1 check passed
@linbug
Copy link
Contributor Author

linbug commented Oct 3, 2023

Apologies for missing the pings about resolving conflicts. Thank you for merging this!

@dbeatty10
Copy link
Contributor

No prob @linbug -- really appreciate the great work you did here! 🏆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants