Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display BigQuery error stream when a load fails during dbt seed. #1079

Merged

Conversation

joshtemple
Copy link
Contributor

@joshtemple joshtemple commented Oct 22, 2018

Creates and raises a new exception, augmenting the errors attribute of the exception with the detailed error stream from the job object. This errors attribute is unpacked downstream by the handle_error method.

I tested this out with a toy CSV file, adding a leading comma in the header row to induce a BigQuery load API error.

Before this change, the error is displayed as follows:

Database Error in seed test (data/test.csv)
  Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.

After this change, the full error details are included:

Runtime Error in seed test (data/test.csv)
  Runtime Error
    Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
    Error while reading data, error message: CSV table references column position 2, but line starting at position:11 contains only 2 columns.

Fixes #1076

@drewbanin
Copy link
Contributor

drewbanin commented Oct 22, 2018

@joshtemple nice! I just tried to kick off tests for this PR, but I think GitHub is still working it's way through webhooks. This provisionally looks good to me :)

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general idea, I do have concerns about the type(e)(...) pattern.

@@ -278,7 +278,8 @@ def poll_until_job_completes(cls, job, timeout):
raise dbt.exceptions.RuntimeException("BigQuery Timeout Exceeded")

elif job.error_result:
raise job.exception()
e = job.exception()
raise type(e)(message=e.message, errors=job.errors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about calling type to get a class object and just assuming that it works to call as a constructor. I mean, I know it's ok here, but job and job.exception() come from google, not us.

Is this interface (in particular, the fact that the __init__ of the exception class returned by job.exception() accepts an errors keyword argument) considered stable in any way?

I think I would prefer something like:

msg = '{}\n{}'.format(e.message, '\n'.join(str(e) for e in job.errors)).strip()
raise dbt.exceptions.RuntimeException(msg)

I haven't tested it, and I'm not 100% sure on the type of job.errors, but I assume something like that would work.

Copy link
Contributor Author

@joshtemple joshtemple Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, totally fair, I went back and forth on that myself. In the end I decided not to hardcode a dbt exception since I wasn't sure about the implications of that downstream for logging. If you're more comfortable with raising a RuntimeException as you outlined, I'll change it.

Google API Errors inherit from a base class (GoogleAPICallError) that accepts errors and message as keyword arguments, so it should be safe to assume we can pass those args. Alternatively, we could hardcode a generic GoogleAPICallError exception (see here) or BadRequest (which is what is actually raised in this case) which would ensure we can pass those args, rather than using type.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it has a negative downstream impact (triggering the exception handler in the wrong way, comes to mind) I would prefer to raise a dbt-native exception. At some point we'll convert it anyway for display, might as well get it done early.

@joshtemple
Copy link
Contributor Author

Made the change. Only slight difference now is that the error message displays RuntimeError twice (see below) due to the way exception_handler works, but I see this happening other places in the code anyway.

Runtime Error in seed test (data/test.csv)
  Runtime Error
    Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
    Error while reading data, error message: CSV table references column position 2, but line starting at position:11 contains only 2 columns.

@drewbanin
Copy link
Contributor

woop woop! Nice work @joshtemple :) I'm going to let the tests run, and then will merge this in. This will go out in out 0.12.0 release!

@drewbanin drewbanin added this to the Guion Bluford milestone Oct 24, 2018
@drewbanin drewbanin merged commit 61af974 into dbt-labs:dev/guion-bluford Oct 24, 2018
@joshtemple joshtemple deleted the hotfix/bq-load-errormsg branch October 24, 2018 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants