Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize dbt-generated archival column names #251

Closed
jthandy opened this issue Dec 13, 2016 · 5 comments · Fixed by #1361
Closed

Standardize dbt-generated archival column names #251

jthandy opened this issue Dec 13, 2016 · 5 comments · Fixed by #1361
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality

Comments

@jthandy
Copy link
Member

jthandy commented Dec 13, 2016

Enhancement

When looking at the output schema from dbt archive, it's difficult to tell dbt-generated fields from other fields. Most tools solve this by appending a suffix to table/fields names. I think we should do the same:

Archive column mapping:

  • dbt_updated_at (unchanged)
  • dbt_pk (unchanged)
  • valid_from --> dbt_valid_from
  • valid_to --> dbt_valid_to
  • scd_id --> dbt_scd_id

When we make this change, we should also unquote these identifiers (see #1167)

Migration

Hoo boy. Unlike models, we don't have the luxury of asking users to drop and recreate their archives. Here's a general strategy for migration:

create table database.schema.archive_table__dbt_upgrade as (

  select *,
    "valid_from" as dbt_valid_from,
    "valid_to" as dbt_valid_to,
    "scd_id" as dbt_scd_id

  from database.schema.archive_table

);

alter table database.schema.archive_table__dbt_upgrade drop column "valid_from";
alter table database.schema.archive_table__dbt_upgrade drop column "valid_to";
alter table database.schema.archive_table__dbt_upgrade drop column "scd_id";

-- fix incorrect quoting (snowflake only?)
alter table database.schema.archive_table__dbt_upgrade rename column "dbt_updated_at" to dbt_updated_at;
alter table database.schema.archive_table__dbt_upgrade rename column "dbt_pk" to dbt_pk;

-- swap old and new
alter table database.schema.archive_table rename to archive_table__dbt_backup;
alter table database.schema.archive_table__dbt_upgrade rename to archive_table;

I don't believe that this migration code is sensible to bake into the materialization itself. Instead, we may be able to leverage run-operations to implement database-aware migration code.

@jthandy jthandy added the enhancement New feature or request label Dec 13, 2016
@jthandy
Copy link
Member Author

jthandy commented Dec 13, 2016

@drewbanin agree?

@jthandy jthandy modified the milestone: Archival Updates Dec 13, 2016
@drewbanin
Copy link
Contributor

@jthandy yeah, totally agree. We do things like this in other places too. Mainly, in generating temp tables during the run process (eg. visitors__dbt_tmp).

It would be really good to differentiate between dbt-land and user-land in a consistent way

@drewbanin drewbanin self-assigned this Mar 10, 2017
@drewbanin
Copy link
Contributor

drewbanin commented Mar 13, 2017

@jthandy we can change this pretty easily, but we'd have to think through the migration process for existing dbt archive users

@drewbanin drewbanin modified the milestone: 0.8.2 Release May 19, 2017
@drewbanin drewbanin added the snapshots Issues related to dbt's snapshot functionality label Jul 11, 2018
@drewbanin drewbanin added this to the Wilt Chamberlain milestone Nov 28, 2018
@drewbanin drewbanin changed the title archive schema is somewhat confusing Standardize dbt-generated column names Mar 23, 2019
@drewbanin drewbanin changed the title Standardize dbt-generated column names Standardize dbt-generated archival column names Mar 23, 2019
@drewbanin drewbanin removed their assignment Mar 23, 2019
@clausherther
Copy link
Contributor

This is great, definitely makes sense to namespace the dbt-added columns. I don't think it'd be too much to ask users to migrate to the new format, although I can see how this could scare some folks since this might be the only source of this archived data for them. Is there any way to gracefully use the old schema and column names if someone hasn't upgraded yet? E.g. if you don't detect dbt_scd_id assume it's the old schema and the old (and quoted) column names?

@drewbanin
Copy link
Contributor

@clausherther we're changing a whole lot around archival in the Wilt Chamberlain release, so I'm inclined to make it a clean break i think. Really good docs + migration instructions coupled with some intelligence in dbt will set us up for success here long-term I think!

This is currently undocumented, but we added a command called dbt run-operation in 0.13.0. This task will run an arbitrary macro with arbitrary options. I imagine we can create a macro that performs this migration, and users can call it with:

dbt run-operation migrations.migrate_archive_v2 --args '{schema: archive_schema, table: archive_table}'

We can be smart about making a backup table, using a transaction, etc. I'd like to implement this in a way that will fail immediately if you try to use the new archive logic on an unmigrated archive table. All told, this scheme should help users leverage the new and compelling functionality coming to archival

yu-iskw pushed a commit to yu-iskw/dbt that referenced this issue Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants