Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logic to unpivot to handle boolean columns. #208

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,7 @@ Arguments:

#### unpivot ([source](macros/sql/unpivot.sql))
This macro "un-pivots" a table from wide format to long format. Functionality is similar to pandas [melt](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html) function.
Boolean values are replaced with the strings 'true'|'false'

Usage:
```
Expand Down
4 changes: 4 additions & 0 deletions integration_tests/data/sql/data_unpivot_bool.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
customer_id,created_at,status,segment,is_updated
123,2017-01-01,active,tier 1,TRUE
234,2017-02-01,active,tier 3,FALSE
567,2017-03-01,churned,tier 2,TRUE
10 changes: 10 additions & 0 deletions integration_tests/data/sql/data_unpivot_bool_expected.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
customer_id,created_at,prop,val
123,"2017-01-01","segment","tier 1"
123,"2017-01-01","status","active"
123,"2017-01-01","is_updated","true"
234,"2017-02-01","segment","tier 3"
234,"2017-02-01","status","active"
234,"2017-02-01","is_updated","false"
567,"2017-03-01","status","churned"
567,"2017-03-01","is_updated","true"
567,"2017-03-01","segment","tier 2"
5 changes: 5 additions & 0 deletions integration_tests/models/sql/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,11 @@ models:
- dbt_utils.equality:
compare_model: ref('data_unpivot_expected')

- name: test_unpivot_bool
tests:
- dbt_utils.equality:
compare_model: ref('data_unpivot_bool_expected')

- name: test_star
tests:
- dbt_utils.equality:
Expand Down
32 changes: 32 additions & 0 deletions integration_tests/models/sql/test_unpivot_bool.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

-- snowflake messes with these tests pretty badly since the
-- output of the macro considers the casing of the source
-- table columns. Using some hacks here to get this to work,
-- but we should consider lowercasing the unpivot macro output
-- at some point in the future for consistency

{% if target.name == 'snowflake' %}
{% set exclude = ['CUSTOMER_ID', 'CREATED_AT'] %}
{% else %}
{% set exclude = ['customer_id', 'created_at'] %}
{% endif %}


select
customer_id,
created_at,
case
when '{{ target.name }}' = 'snowflake' then lower(prop)
else prop
end as prop,
val

from (
{{ dbt_utils.unpivot(
relation=ref('data_unpivot_bool'),
cast_to=dbt_utils.type_string(),
exclude=exclude,
field_name='prop',
value_name='val'
) }}
) as sbq
7 changes: 6 additions & 1 deletion macros/sql/unpivot.sql
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,12 @@ Arguments:
{%- endfor %}

cast('{{ col.column }}' as {{ dbt_utils.type_string() }}) as {{ field_name }},
cast({{ col.column }} as {{ cast_to }}) as {{ value_name }}
cast( {% if col.data_type == 'boolean' %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to make a cross-database adapter macro that renders this case when statement. That will let us use this logic in other macros. Additionally, I think that databases like BigQuery will use the type BOOL, not boolean (though I'd need to double-check this to be sure!). The adapter macro pattern will let us define custom macro implementations on a per-adapter basis here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I wrap this in a targtt=redshift test?
(I'm too lazy to connect to databases that I don't normally use to test. maybe other members have them ready.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - I'd just add a file in this dir using a file like this one as a template. You can call this file something like bool_cast.sql. The default__ implementation can just return

cast({{ col }} as {{ type }})

and the Redshift implementation can use the case when ... statement you've shown here.

I actually did some more digging, and it looks like Redshift is the only database dbt supports which lacks the ability to cast a boolean to a string. Go figure.

Regardless, I'd rather encapsulate this logic in an adapter-macro than hardcode any if target == redshift logic in here. Code like that becomes a big mess real fast :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wasn't familiar with that pattern.

oh, it looks like safe_cast
already exists, though I'm not 100% sure what the behavior is,
can you override an sql function with a macro?
or do you need to invoke it as .<macro_name>() ?
within the namespace of the package dbt_utils line 19 is ambiguous with a macro named the same as a function.
the fact that it isn't an error probably means that with no package qualifier the native function takes precedence.

case when {{ col.column }} then 'true' else 'false' end
{% else %}
{{ col.column }}
{% endif %}
as {{ cast_to }}) as {{ value_name }}

from {{ relation }}

Expand Down