Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Dynamic Schemas / Columns #1374

Closed
jars opened this issue Mar 25, 2019 · 2 comments
Closed

Feature: Dynamic Schemas / Columns #1374

jars opened this issue Mar 25, 2019 · 2 comments
Labels
more_information_needed stale Issues that have gone stale

Comments

@jars
Copy link

jars commented Mar 25, 2019

Feature: Dynamic Schemas / Columns

Problem:

To illustrate, here's a artificial example. Assume I have a models already called dog and cat.

Also assume that the SQL required to query any animal type is very complex and is worthy of needing a macro for DRY reasons. Then, we would have a macro like:

macros/pet.sql:
{% macro pet(type) %}
  SELECT * FROM {{ref(type)}} 
  WHERE im_actually_a_super_complex_macro_needing_sql_statement IS TRUE
{% endmacro %}

And we would express our models like:

models/dog.sql:
{{ pet("dog") }}
models/cat.sql:
{{ pet("cat")

And finally, our schemas like:

models/dog.yml:
version: 2
models:
  - name: dog
    columns:
        - name: name
          description: 'the name of the dog'
        - name: age
          description: 'the age of the dog'
        - name: breed
          description: 'the breed of the dog'
models/cat.yml:
version: 2
models:
  - name: cat
    columns:
        - name: name
          description: 'the name of the cat'
        - name: age
          description: 'the age of the cat'
        - name: breed
          description: 'the breed of the cat'

Again, this is a contrived example, but the problem is magnified in larger projects. i.e. Both dog.yml and cat.yml files are nearly identical except the animal type. Leading to WET code.


Some Possible Solutions:

  1. Allow schema.yml files to be macroable:
models/dog.yml:
{{ schema_pet("dog") }}
models/cat.yml:
{{ schema_pet("cat") }}
macros/schema_pet.yml
{% macro schema_pet(type) %}
version: 2
models:
  - name: {{ type }}
    columns:
        - name: name
          description: 'the name of the {{ type }}'
        - name: age
          description: 'the age of the {{ type }}'
        - name: breed
          description: 'the breed of the {{ type }}'
{% endmacro %}

In this situation, the model may not know which schema belonged to it (barring file name parsing), so the models would also need to specify, I believe:

models/dog.sql:
{{ config(schema = 'dog') }}
{{ pet("dog") }}
models/cat.sql:
{{ config(schema = 'cat') }}
{{ pet("cat") }}

  1. Actually, the majority of the duplication occurs in the columns section of the yml file. I believe introducing a dynamic models.name parameter inside the schema.yml will cause a lot of complications, so even if this was static, but the rest of the file could be dynamic, this would be great.
macros/pet_columns.yml
{% macro pet_columns(type) %}
- name: name
  description: 'the name of the {{ type }}'
- name: age
  description: 'the age of the {{ type }}'
- name: breed
  description: 'the breed of the {{ type }}'
{% endmacro %}
models/dog.yml
version: 2
models:
  - name: dog
    columns: {{ pet_columns('dog') }}
models/cat.yml
version: 2
models:
  - name: cat
    columns: {{ pet_columns('cat') }}

The doc macro would be great for this too, however it's not possible to provide args to the docs/doc macro.

Forgive me if some of my suggestions are incompatible with the current dbt structure. I am new, and not familiar with how the internals work. These are only suggestions, and I welcome the feedback from the dbt masters!

@jars jars changed the title Feature Request: Dynamic Schemas / Columns Feature: Dynamic Schemas / Columns Mar 26, 2019
@drewbanin
Copy link
Contributor

drewbanin commented Mar 26, 2019

Hey @jordanarseno - cool ideas!

I'm picturing something that looks a lot like these suggestions, but may be a more tractable given how dbt parses schema.yml files. Some options:

  1. Provide a method to register model/column definitions

We can provide a method in the manifest that accepts a schema spec and incorporates it into the manifest. You could call this method from model code (or via a macro):

-- models/base/cat.sql

{% do add_pet_schema_columns('cat') %}

select ...

coupled with:

{% macro add_pet_schema_columns(type) %}

{% do manifest.set_column_schema(this, column='name', description='the name of the ' ~ type) %}
{% do manifest.set_column_schema(this, column='age', description='the age of the ' ~ type) %}
{% do manifest.set_column_schema(this, column='breed', description='the breed of the ' ~ type) %}

{% endmacro %}

We could also provide a higher-level method that configures many columns at once, eg:

{% macro add_pet_schema_columns(type) %}

{% do manifest.update_column_schemas(this,
  {
    "name": {"description": "...."},
    "age": {"description": "...."},
    "breed": {"description": "...."}
  }) %}

{% endmacro %}
  1. Provide arguments to docs blocks

This is a really good idea, and something that I think we're going to tackle implicitly via #1334. While you'll still need to enumerate all of the columns, you'll be able to pull in descriptions from templates docs blocks for sure.


So, this is a solvable problem, and the question becomes: do we want to do this? I think a feature like this would be valuable, but it will also add a bunch of complexity to the parsing code in dbt. Further, it will become more difficult to understand where certain model attributes are being configured from. This would be problematic if we eg. wanted to link to a schema.yml spec from dbt docs. While some descriptions (or tests) may be set in schema.yml files, others might be contributed from macros! I think the lack of provenance for these specifications is something that we should think really hard about when considering to prioritize a change like this.

Let me know what you think!

@drewbanin drewbanin added more_information_needed stale Issues that have gone stale labels Nov 26, 2019
@drewbanin
Copy link
Contributor

I agree that schema.yml files can be verbose to create and information is often repeated across schema specs. I don't think we'll want to leverage jinja to solve this problem, but it is very much an issue worth solving.

Closing this in favor of a more actionable issue, but I'm certainly very happy to re-open it for discussion if anyone feels strongly about this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more_information_needed stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

2 participants