Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

override seed column types when inferred to be of type agate_helper.ISODateTime #4282

Closed
1 task done
dataders opened this issue Nov 16, 2021 · 3 comments
Closed
1 task done
Labels
bug Something isn't working stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code

Comments

@dataders
Copy link
Contributor

dataders commented Nov 16, 2021

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

this issue came up with dbt-firebolt because the JDBC driver doesn't like non-string bindings for prepared statements. My goal is to cast them all to strings.

the issue came up when using dbt-adapter-tests's base.csv. IIRC, this issue did not happen in dbt-core 0.20.0

line 49 of agate_helper.build_type_tester() (snippet below) results in an agate_table where some_date to be of type : ISODateTime. commenting out line 49 fixes this problem, as it allows the values in agate table to be overridden.

types = [
agate.data_types.Number(null_values=('null', '')),
agate.data_types.Date(null_values=('null', ''),
date_format='%Y-%m-%d'),
agate.data_types.DateTime(null_values=('null', ''),
datetime_format='%Y-%m-%d %H:%M:%S'),
ISODateTime(null_values=('null', '')),
agate.data_types.Boolean(true_values=('true',),
false_values=('false',),
null_values=('null', '')),
agate.data_types.Text(null_values=string_null_values)
]

it is insufficient to modify adapter.convert_datetime_type because that will only change the column's type, not the actual values. though, in my head, I thought that this would work.

@classmethod
def convert_datetime_type(cls, agate_table: agate.Table, col_idx: int) -> str:
    # there's an issue with timestamp currently
    return 'TEXT'

logging/diagnosis

logging set up

here's the log statements I added to seed.sql

{% macro firebolt__reset_csv_table(model,
                                   full_refresh,
                                   old_relation,
                                   agate_table) %}
    {{ log('agate_table col types: ' ~ agate_table.column_types, info=True) }}
    {{ log('agate_table schema: ' ~ agate_table.print_structure(), info=True) }}
    {{ log('agate_table vals: ' ~ agate_table.print_table(), info=True) }}
....
{% macro firebolt__load_csv_rows(model, agate_table) %}

  {% set batch_size = get_batch_size() %}

  {% set cols_sql = get_seed_column_quoted_csv(model, agate_table.column_names) %}
  {% set bindings = [] %}

  {% set statements = [] %}

  {% for chunk in agate_table.rows | batch(batch_size) %}
      {% set bindings = [] %}

      {% for row in chunk %}
            {{ log('row: ' ~ row, info=True) }}
          {% do bindings.extend(row) %}
      {% endfor %}

logging output

col_name: some_date
inferred_type: TEXT
type: TEXT
row: <agate.Row: (Decimal('1'), 'Easton', datetime.datetime(1981, 5, 20, 6, 46, 51))>

however, if I comment out line 49 from above, then not only is the column type changed to be of type TEXT but also the parameters in the INSERT INTO ... VALUES statement also become strings.

Expected Behavior

I expect that adapter.convert_datetime_type() would allow any seed column that's datetime-like to be overrided to the returned type. However, this does not work if the column type is inferred to be agate_helper.ISODateTime.

Environment

- OS: macOS `11.6.1`
- Python: `3.9.5`
- dbt: `0.21.1rc1`

What database are you using dbt with?

firebolt==0.20.2

@dataders dataders added bug Something isn't working triage labels Nov 16, 2021
@dataders dataders changed the title allow overriding of seed columns inferred to be of type agate_helper.ISODateTime override seed column types when inferred to be of type agate_helper.ISODateTime Nov 16, 2021
@jtcohen6
Copy link
Contributor

@swanderz Thanks for opening!

The code for ISODateTime dates back to #1920, which attacked some of the gnarliest earliest issues around agate type inheritance. I'm really hesitant to merge potentially breaking changes after we've released v1.0.0rc1.

dbt is very used to the idea that different adapters may have a different data type name corresponding to each agate/python data type. This is implemented via the convert_x_type adapter methods, which all feed into the convert_type adapter method. That then gets used in create_csv_table, to populate the empty table with the right inferred database types, based on the actual content that's about to be loaded. It can also be used to adjust the SQL in load_csv_rows, so that w explicitly cast each value, as it's being inserted, to match its specified/inferred type. dbt-spark has to do this, now that Spark 3 strictly enforces table schemas.

What's new and tricky about this issue is that the client/driver requires all binding parameters to be string type. That's much harder to work around, and the agate convert_x_type won't do much for us. My question is, wouldn't you see this issue for numeric types as well?

I do think there may be an entirely adapter-side workaround for this, which requires some clever materialization code:

{% macro firebolt__load_csv_rows(model, agate_table) %}
    ...
    {% for row in chunk %}
          {% set all_text_row = [] %}
          {% for value in row.values() %}
              {% do all_text_row.append(value | string) %}
          {% endfor %}
          {% do bindings.extend(all_text_row) %}
          
          {# here for debugging purposes only #}
          {{ log('row: ' ~ row, info=True) }}
          {{ log('all_text_row: ' ~ all_text_row, info=True) }}
      
      {% endfor %}
      ...

Then, here's what I see:

13:15:37 | [ info  ] | row: <agate.Row: (Decimal('1'), 'Easton', datetime.datetime(1981, 5, 20, 6, 46, 51))>
13:15:37 | [ info  ] | all_text_row: ['1', 'Easton', '1981-05-20 06:46:51']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('2'), 'Lillian', datetime.datetime(1978, 9, 3, 18, 10, 33))>
13:15:37 | [ info  ] | all_text_row: ['2', 'Lillian', '1978-09-03 18:10:33']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('3'), 'Jeremiah', datetime.datetime(1982, 3, 11, 3, 59, 51))>
13:15:37 | [ info  ] | all_text_row: ['3', 'Jeremiah', '1982-03-11 03:59:51']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('4'), 'Nolan', datetime.datetime(1976, 5, 6, 20, 21, 35))>
13:15:37 | [ info  ] | all_text_row: ['4', 'Nolan', '1976-05-06 20:21:35']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('5'), 'Hannah', datetime.datetime(1982, 6, 23, 5, 41, 26))>
13:15:37 | [ info  ] | all_text_row: ['5', 'Hannah', '1982-06-23 05:41:26']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('6'), 'Eleanor', datetime.datetime(1991, 8, 10, 23, 12, 21))>
13:15:37 | [ info  ] | all_text_row: ['6', 'Eleanor', '1991-08-10 23:12:21']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('7'), 'Lily', datetime.datetime(1971, 3, 29, 14, 58, 2))>
13:15:37 | [ info  ] | all_text_row: ['7', 'Lily', '1971-03-29 14:58:02']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('8'), 'Jonathan', datetime.datetime(1988, 2, 26, 2, 55, 24))>
13:15:37 | [ info  ] | all_text_row: ['8', 'Jonathan', '1988-02-26 02:55:24']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('9'), 'Adrian', datetime.datetime(1994, 2, 9, 13, 14, 23))>
13:15:37 | [ info  ] | all_text_row: ['9', 'Adrian', '1994-02-09 13:14:23']
13:15:37 | [ info  ] | row: <agate.Row: (Decimal('10'), 'Nora', datetime.datetime(1976, 3, 1, 16, 51, 39))>
13:15:37 | [ info  ] | all_text_row: ['10', 'Nora', '1976-03-01 16:51:39']

By feeding the text-only values into bindings, instead of the typed python objects, I'm hopeful that will make the JDBC driver. Could you give it a try, and let me know if it gets you what you need?


IIRC, this issue did not happen in dbt-core 0.20.0

It's possible that the changes to prevent agate from type-coercing query results played a role here. I'd be surprised, though; we didn't touch agate's type inference for seeds at all.

@jtcohen6 jtcohen6 added Team:Adapters Issues designated for the adapter area of the code and removed triage labels Nov 16, 2021
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label May 18, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

No branches or pull requests

2 participants