model blocks #184

jthandy · 2016-10-18T22:20:30Z

This issue presents a suggested alternative to the way that dbt code is currently organized.

Currently, every .sql file in the models directory is its own model. This proposal is to break that 1:1 relationship between files and models, and instead allow models to be defined similarly to how functions are defined in a programming language. For example:

def [model_1_name]

SELECT a, b
from table_c
where d is not null

def [model_2_name]

...

By itself, this change doesn't accomplish anything except for adding some light organization to a models directory. But the relaxation of the constraint of 1 file : 1 model allows users to create models very differently.

Currently, the 1:1 constraint encourages users to write models that are fairly long and complex, frequently 50-100 lines long or more. These models often contain several CTEs, each of which represents a separate data transformation. These CTEs, because they are wrapped within a larger model, cannot be tested and are not a part of the dependency graph. Plus, long code with multiple CTEs is harder to read, write, and maintain.

This file structure would allow model authors to decompose their models into multiple components, and then control each component as a part of the DAG. Each def is a model, but the default for all models is to be ephemeral, unless instructed otherwise. Ephemeral models can still be tested but are not materialized directly into the database. We still need to consider how this default interacts with model configuration scoping.

Additionally, authors would be able to provide common aliases for queries within that file using an import directive. The import directive is a convenient way for model authors to create a single alias for a table that is then used throughout the entire file. Imports would happen at the beginning of a file:

import {{ref('a_long_model_name')}} as table_c

def [model_1_name]

SELECT a, b
from table_c
where d is not null

def [model_2_name]

...

This import functionality will increase code quality and reduce lines of code, as currently model authors are writing verbose CTEs at the beginning of every model to provide this aliasing functionality.

The text was updated successfully, but these errors were encountered:

jthandy · 2016-10-18T23:05:03Z

/cc @drewbanin

jthandy · 2016-10-24T17:15:31Z

here are some notes on syntax after discussing this extensively over the past few days.

--how users create things:

create_test(args)
create_model(args)
create_macro(args)


--where args is:

args = [
  sql = "",
  materialized = "",
  ...
]

--syntax options follow


--HCL-like example

model "quickbooks_bill_payments" {
  name = "something_else"
  config {
  materialized = true
  }
  sql = """
    select *
    from ${model.quickbooks_billpayment_lines.table_name}
    where whatever
  """
}


--python-like example

def quickbooks_bill_payments (config=(args)) {

  with bill_payments as (

    select * from {{ref('quickbooks_bill_payments')}}

  ), billpayment_lines as (

    select * from {{ref('quickbooks_billpayment_lines')}}

  ), d1 as (

    select
      bp.id, bp.txn_date, bpl.amount, payment_account_id, ap.id as ap_id
    from bill_payments bp
      inner join billpayment_lines bpl on bp.id = bpl.bill_payment_id
      join (select id from {{ref('quickbooks_accounts')}} where type = 'Accounts Payable') ap
        on 1 = 1

  )

  select
    id, txn_date, amount, payment_account_id as account_id, 'credit' as transaction_type,
    'bill payment' as source
    {% if var('uses_classes') == "true" %}
      , null::bigint as class_id
    {% endif %}
  from
    d1

  union all

  select
    id, txn_date, amount, ap_id, 'debit', 'bill payment'

    {% if var('uses_classes') == "true" %}
      , null::bigint
    {% endif %}
  from
    d1

}

drewbanin · 2016-10-24T17:48:59Z

Sweet writeup -- thanks @jthandy

jthandy · 2016-12-02T20:06:54Z

Reasons we might want this:

type safety when passing data between packages
specific functions (like import)
managing scope
managing file count
more idiomatic override

jthandy · 2017-10-12T18:30:02Z

A quick update here: this is still very relevant. We've doubled down much harder on jinja since this issue was originally raised, so my guess is the most likely syntax would look something closer to this:

--jinja-like example

{% model quickbooks_bill_payments() %}

  with bill_payments as (

    select * from {{ref('quickbooks_bill_payments')}}

  ), billpayment_lines as (

    select * from {{ref('quickbooks_billpayment_lines')}}

  ), d1 as (

    select
      bp.id, bp.txn_date, bpl.amount, payment_account_id, ap.id as ap_id
    from bill_payments bp
      inner join billpayment_lines bpl on bp.id = bpl.bill_payment_id
      join (select id from {{ref('quickbooks_accounts')}} where type = 'Accounts Payable') ap
        on 1 = 1

  )

  select
    id, txn_date, amount, payment_account_id as account_id, 'credit' as transaction_type,
    'bill payment' as source
    {% if var('uses_classes') == "true" %}
      , null::bigint as class_id
    {% endif %}
  from
    d1

  union all

  select
    id, txn_date, amount, ap_id, 'debit', 'bill payment'

    {% if var('uses_classes') == "true" %}
      , null::bigint
    {% endif %}
  from
    d1

{% endmodel %}

clrcrl · 2019-01-16T20:57:39Z

A use case that came up today that I think is a good example of where the one-to-one paradigm breaks down:
A dbt user wants to create sanitized views over their fct_ tables, that excludes PII information (assume we have a list of known PII column names).
At the moment this would be achieved one of two ways:

creating an extra model for each fct_ table, leading to lots of redundant code
creating the view via a post-hook, which means that the objects don't show up in the DAG.

Just thought it worth adding this as a use-case to think through :)

drewbanin · 2019-07-25T03:33:16Z

cooked this up to explore what sub-tables (like @clrcrl is describing) could look like as custom materializations: https://gist.github.com/drewbanin/ea256260dcb8474414660d1c90d39e3f

github-actions · 2022-01-10T01:51:25Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

jthandy added the enhancement New feature or request label Jan 11, 2017

cmcarthur modified the milestone: Rethink DBT Syntax Jan 17, 2017

drewbanin removed this from the Rethink DBT Syntax milestone May 10, 2017

drewbanin changed the title ~~change the dbt file structure~~ model blocks Apr 26, 2018

drewbanin mentioned this issue Jul 12, 2019

dbt should throw a compilation error for shared resource paths #1606

Closed

drewbanin mentioned this issue Sep 26, 2019

partial parsing: support overlapping sets of files by parser #1778

Closed

drewbanin mentioned this issue Mar 31, 2020

Milestone: 1.0 #2277

Closed

11 tasks

drewbanin mentioned this issue Apr 8, 2020

Seed blocks #2276

Closed

jtcohen6 mentioned this issue Jun 24, 2021

Make default test-paths and analysis-paths configs consistent #2659

Closed

jtcohen6 mentioned this issue Jul 3, 2021

Make it possible for DBT to generate models from a piece of JINJA code #3469

Closed

yu-iskw pushed a commit to yu-iskw/dbt that referenced this issue Aug 17, 2021

Update readme with postgres note (dbt-labs#184)

01c525b

jtcohen6 mentioned this issue Oct 14, 2021

Er/3250 generic tests #4052

Merged

4 tasks

github-actions bot added the stale Issues that have gone stale label Jan 10, 2022

ismailsimsek mentioned this issue Jan 12, 2022

[Feature] Multi step materialization/ supporting Update and Delete #4313

Closed

1 task

github-actions bot closed this as completed Jan 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model blocks #184

model blocks #184

jthandy commented Oct 18, 2016 •

edited

Loading

jthandy commented Oct 18, 2016

jthandy commented Oct 24, 2016 •

edited

Loading

drewbanin commented Oct 24, 2016

jthandy commented Dec 2, 2016

jthandy commented Oct 12, 2017 •

edited

Loading

clrcrl commented Jan 16, 2019

drewbanin commented Jul 25, 2019

github-actions bot commented Jan 10, 2022

model blocks #184

model blocks #184

Comments

jthandy commented Oct 18, 2016 • edited Loading

jthandy commented Oct 18, 2016

jthandy commented Oct 24, 2016 • edited Loading

drewbanin commented Oct 24, 2016

jthandy commented Dec 2, 2016

jthandy commented Oct 12, 2017 • edited Loading

clrcrl commented Jan 16, 2019

drewbanin commented Jul 25, 2019

github-actions bot commented Jan 10, 2022

jthandy commented Oct 18, 2016 •

edited

Loading

jthandy commented Oct 24, 2016 •

edited

Loading

jthandy commented Oct 12, 2017 •

edited

Loading