Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1031 Add macro to add table description from schema.yml for BQ #1285

Merged
merged 17 commits into from
Jun 12, 2019

Conversation

darrenhaken
Copy link
Contributor

WIP in progress. @drewbanin please take a look

@darrenhaken
Copy link
Contributor Author

This relates to #1031

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, happy to discuss

plugins/bigquery/dbt/include/bigquery/macros/adapters.sql Outdated Show resolved Hide resolved
plugins/bigquery/dbt/include/bigquery/macros/adapters.sql Outdated Show resolved Hide resolved
@darrenhaken
Copy link
Contributor Author

@drewbanin made the changes you asked and I'll start trying to test it

Also make changes to table_options macro based on testing against a
real project.
@darrenhaken
Copy link
Contributor Author

@drewbanin I got this working now. I also added the table_options to apply to views as well as table relations.

Can you take a look and tell me what you think?

@drewbanin
Copy link
Contributor

@darrenhaken just gave this a look and tried it out locally -- it's really great!!

I have a couple of small recommendations, but overall, I really like the way this is taking shape.

  1. Let's add persist_docs as a "known" config, that way it can be specified in the models: section of dbt_project.yml. I think you can do that by adding it to this list. By adding this to ExtendDictFields, users can set these configs hierarchically, eg:
models:
  persist_docs:
    relation: true
  specific_package:
    columns: true # future

Adding descriptions is effectively "free" on BQ, but it will require extra queries in databases like Snowflake and Redshift. This is a pretty powerful thing to be able to do with a one line code change!

  1. Let's add a little love to dbt's error handling. Right now, if you provide an invalid value like
{{ config(persist_docs=true) }}

dbt will fail with an error like: 'bool object' has no attribute 'get'

I think instead, we can add some code to the table_options macro to handle invalid input. Maybe:

  {%- set raw_persist_docs = config.get('persist_docs', {}) -%}

   {%- if raw_persist_docs is mapping -%}
    {%- set raw_relation = raw_persist_docs.get('relation', false) -%}
    OPTIONS(
      {%- if raw_relation -%}
      description={{ model.description | tojson }}
      {% endif %}
    )
  {%- else -%}
    {{ exceptions.raise_compiler_error("Invalid value provided for 'persist_docs'. Expected dict but got value: " ~ raw_persist_docs) }}
  {% endif %}

So, this is looking really close! Happy to discuss :)

@darrenhaken
Copy link
Contributor Author

@drewbanin I've pushed changes based on your comments and now you can set persist_docs at a project level.

However, I've encountered an issue when testing out the error handling logic.
When I put {{ config(persist_docs=true) }} I get the following error:

 File "/Users/darren.haken/workarea/github/dbt/core/dbt/main.py", line 78, in main
    results, succeeded = handle_and_check(args)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/main.py", line 152, in handle_and_check
    task, res = run_from_args(parsed)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/main.py", line 208, in run_from_args
    results = run_from_task(task, cfg, parsed)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/main.py", line 216, in run_from_task
    result = task.run()
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/task/runnable.py", line 242, in run
    self._runtime_initialize()
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/task/runnable.py", line 50, in _runtime_initialize
    self.manifest = load_manifest(self.config)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/task/runnable.py", line 29, in load_manifest
    internal_manifest=internal_manifest)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/loader.py", line 173, in load_all
    internal_manifest)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/loader.py", line 166, in _load_from_projects
    loader.load(internal_manifest=internal_manifest)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/loader.py", line 139, in load
    self._load_nodes()
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/loader.py", line 78, in _load_nodes
    self._load_sql_nodes(ModelParser, NodeType.Model, 'source_paths')
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/loader.py", line 43, in _load_sql_nodes
    **kwargs
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/parser/base_sql.py", line 63, in load_and_parse
    return self.parse_sql_nodes(result, tags)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/parser/base_sql.py", line 82, in parse_sql_nodes
    node_parsed = self.parse_node(node, node_path, project, tags=tags)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/parser/base.py", line 224, in parse_node
    self._render_with_context(parsed_node, config)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/parser/base.py", line 160, in _render_with_context
    capture_macros=True)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/clients/jinja.py", line 279, in get_rendered
    return render_template(template, ctx, node)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/clients/jinja.py", line 266, in render_template
    return template.render(ctx)
  File "/Users/darren.haken/.virtualenvs/dbt/lib/python3.7/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/Users/darren.haken/.virtualenvs/dbt/lib/python3.7/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/Users/darren.haken/.virtualenvs/dbt/lib/python3.7/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/darren.haken/.virtualenvs/dbt/lib/python3.7/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "dbt-7eb4c5e97b7bfc0d0e7e2cab", line 2, in top-level template code
    from jinja2.runtime import LoopContext, TemplateReference, Macro, Markup, TemplateRuntimeError, missing, concat, escape, markup_join, unicode_join, to_string, identity, TemplateNotFound, Namespace
  File "/Users/darren.haken/.virtualenvs/dbt/lib/python3.7/site-packages/jinja2/sandbox.py", line 427, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/context/parser.py", line 86, in __call__
    self.source_config.update_in_model_config(opts)
  File "/Users/darren.haken/workarea/github/dbt/core/dbt/parser/source_config.py", line 102, in update_in_model_config
    current.update(value)
TypeError: 'bool' object is not iterable

It looks like the error is coming from https:/fishtown-analytics/dbt/blob/61c345955ee6f6a80504d8e9abef9e7267e38b0e/core/dbt/parser/source_config.py#L97

@drewbanin
Copy link
Contributor

@darrenhaken good catch! This change might be a tiny bit outside the scope of your PR, but it would be a good thing to fix for sure.

This code confirms that the config collector is a dict. We could add another assertion here which ensures that the value var is also a dict. That might look like:

                if not isinstance(current, dict) or not isinstance(value, dict):
                    dbt.exceptions.raise_compiler_error(
                        'Invalid config field: "{}" must be a dict'.format(key)
                    )

That would result in an error message like:

$ dbt run
Running with dbt=0.13.0-a2
Encountered an error:
Compilation Error
  Invalid config field: "persist_docs" must be a dict

This isn't the best error message in the world, but it's certainly better than TypeError: 'bool' object is not iterable :)

@beckjake
Copy link
Contributor

beckjake commented Feb 19, 2019

Quick thought here, I agree totally on fixing this, but it should almost certainly be more like:

try:
    current.update(value)
except (ValueError, TypeError):
    dbt.exceptions.raise_compiler_error(
        'Invalid config field: "{}" must be a dict'.format(key)
    )

There are a lot of valid values we might end up wanting to insert there in the code that are not dict but are valid and convenient for update() calls, such iterators of key/value pairs.

edit: I do agree that the initial isinstance(config, dict) check is useful/important though. I don't want tricky types there.

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Feb 19, 2019

@beckjake like this?

    def update_in_model_config(self, config):
        config = self._translate_adapter_aliases(config)
        for key, value in config.items():
            if key in self.AppendListFields:
                current = self.in_model_config.get(key, [])
                if not isinstance(value, (list, tuple)):
                    value = [value]
                current.extend(value)
                self.in_model_config[key] = current
            elif key in self.ExtendDictFields:
                current = self.in_model_config.get(key, {})
                if not isinstance(current, dict):
                    dbt.exceptions.raise_compiler_error(
                        'Invalid config field: "{}" must be a dict'.format(key)
                    )
                try:
                    current.update(value)
                except (ValueError, TypeError):
                    dbt.exceptions.raise_compiler_error(
                        'Invalid config field: "{}" must be a dict'.format(key)
                    )
                self.in_model_config[key] = current
            else:  # key in self.ClobberFields or self.AdapterSpecificConfigs
                self.in_model_config[key] = value

or will the try replace the if statement?

@beckjake
Copy link
Contributor

@darrenhaken Yup! Assuming I'm not missing anything and it works ok, of course

@darrenhaken
Copy link
Contributor Author

@beckjake i think it makes sense to just do the try without the if

This resolves the issue that `{{ config(persist_docs=true) }}`
would not raise a useful exception.
@darrenhaken
Copy link
Contributor Author

OK I've pushed, can you take a look? @drewbanin / @beckjake . Did some testing and it seems to work

@beckjake
Copy link
Contributor

if you skip the if block, I think you'll also want to add an AttributeError to the except block then, to handle the case of config=None.

@darrenhaken
Copy link
Contributor Author

@beckjake so this?

 try:
                    current.update(value)
                except (ValueError, TypeError, AttributeError):
                    dbt.exceptions.raise_compiler_error(
                        'Invalid config field: "{}" must be a dict'.format(key)
                    )

@darrenhaken
Copy link
Contributor Author

@beckjake pushed, please review when you can

@beckjake
Copy link
Contributor

It looks like you'll have to update the docs generate tests, I think you'll need to change this bit:
https:/fishtown-analytics/dbt/blob/dev/stephen-girard/test/integration/029_docs_generate_tests/test_docs_generate.py#L785-L794 to include 'persist_docs': {},

Otherwise, this PR looks fine to me.

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Feb 19, 2019 via email

@drewbanin
Copy link
Contributor

@darrenhaken I'd recommend just updating some of the BQ test models to persist their docs. Some good candidates are:

You'll also need to provide documentation for these models in a schema.yml file to make sure the docs are persisted. As long as the models run without error, I'd be ok with merging this.

@darrenhaken
Copy link
Contributor Author

darrenhaken commented Feb 28, 2019

@drewbanin I've made the changes but I wasn't sure how I ran the tests locally for the test suite. Will the CI build pick these up?

Also if you can review that would be great

@drewbanin
Copy link
Contributor

@darrenhaken looks like these tests are failing in AppVeyor: https://ci.appveyor.com/project/DrewBanin/dbt/builds/22720326

I think the big problems is just that persist_docs is now being supplied as a model config, but the test suite isn't expecting it. This is a little tricky, as I think there are a bunch of different places we'll need to update the model config dicts.

Here's one example of a place where the integration test isn't expecting to see a persist_docs dict, but gets one anyway: https:/fishtown-analytics/dbt/blob/dev/stephen-girard/test/integration/029_docs_generate_tests/test_docs_generate.py#L864

I think the best way to proceed is to get the test suite running on BQ for you locally. Let's chat on Slack, happy to help you get set up! This should also be a helpful resource: https:/fishtown-analytics/dbt/blob/dev/stephen-girard/CONTRIBUTING.md#testing

So, this is super close! Let me know how i can help get it over the line :)

@darrenhaken
Copy link
Contributor Author

@drewbanin to test locally do I need to fill in all the BigQuery fields:

BIGQUERY_TYPE=
BIGQUERY_PROJECT_ID=
BIGQUERY_PRIVATE_KEY_ID=
BIGQUERY_PRIVATE_KEY=
BIGQUERY_CLIENT_EMAIL=
BIGQUERY_CLIENT_ID=
BIGQUERY_AUTH_URI=
BIGQUERY_TOKEN_URI=
BIGQUERY_AUTH_PROVIDER_X509_CERT_URL=
BIGQUERY_CLIENT_X509_CERT_URL=

@drewbanin
Copy link
Contributor

@darrenhaken yep! We should fix this, but it's a function of how we need to set up our CI tools. Circle/AppVeyor et al let you put in Environment Variables, but I don't think we can just paste in a whole Service Account JSON file (I can check on this).

Pasting in the service account file keys one-by-one is kind of annoying, but once it's done you never have to think about it again!

@darrenhaken
Copy link
Contributor Author

@drewbanin Alright I'm giving it a shot now, I'll keep you posted 👍

@drewbanin drewbanin changed the base branch from dev/stephen-girard to dev/wilt-chamberlain March 27, 2019 14:31
@drewbanin
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brandfocus
Copy link

Can I do anything to help get this merged?

@darrenhaken
Copy link
Contributor Author

@drewbanin?

@drewbanin
Copy link
Contributor

drewbanin commented Jun 12, 2019

re-opened in #1532 to bring this up-to-date with recent dbt changes + test fixes

@beckjake beckjake merged commit 8270ef7 into dbt-labs:dev/wilt-chamberlain Jun 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants