Bigquery catalog generation (#830) #857

beckjake · 2018-07-17T12:58:40Z

Add catalog/manifest generation and tests for BigQuery, fixes #830 . This one was easy, Drew already did the hard work in the issue comments!

Note that the commit history looks terrible because of some ugly branching and merging, but the diff is right.

…un + dbt docs generate

… default one

…d for snowflake

cmcarthur · 2018-07-17T15:06:35Z

test/integration/029_docs_generate_tests/test_docs_generate.py

@@ -287,3 +287,63 @@ def test__snowflake__run_and_generate(self):

 self.verify_catalog(expected_catalog)
 self.verify_manifest()
+
+ @attr(type='bigquery')
+ def test__bigquery__run_and_generate(self):


you should add another test for a table with nested records. i'm not sure how to do that in the model SQL but i think this is potentially a difference in how BQ works vs. other warehouses.

see:

https://cloud.google.com/bigquery/docs/nested-repeated

https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#insert_examples they don't have an example of inserting nested records, but they do have an example of an update with nested records. idk how this works with seed and/or dbt SELECT-based models

Really great point @cmcarthur!

I usually test with an array of structs, eg:

select [ struct( 'Drew' as first_name, 'Banin' as last_name ), struct( 'Connor' as first_name, 'McArthur' as last_name ) ] as people

I don't think we'll be able to use a CSV file to construct nested or repeated records.

Regarding behavior, I think there are two options:

BigQuery exposes nested fields (ie. the structs above) as:

people RECORD (REPEATED) people.first_name STRING people.last_name STRING

We have a function to flatten the nested fields here: https:/fishtown-analytics/dbt/blob/development/dbt/schema.py#L103

The other option to to just show a single field, but render out a complete type. For people above, that type would be:

ARRAY<STRUCT<first_name STRING, last_name STRING>>

There is new logic to build this type name in my bq-incremental-and-archive branch, over here: https:/fishtown-analytics/dbt/blob/feature/bq-incremental-and-archive/dbt/schema.py#L170

I think the first option is probably preferable, but I'm open to discussion!

Ok, I think I've implemented this test and fixed the code sucessfully - good thing you asked, it definitely did not work and I fixed it. I went with @drewbanin's flatten() suggestion.

drewbanin · 2018-07-18T20:35:33Z

@beckjake

jwerderits added 19 commits July 11, 2018 09:13

Write out some SQL

046942b

Merge branch 'development' into snowflake-get-catalog

4921294

Add snowflake tests, rework existing postgres to use dbt seed + dbt r…

22d8ad6

…un + dbt docs generate

Add tests for dbt run generating a manifest

5d97937

Implement get_catalog for snowflake, move adapter-side logic into the…

7832322

… default one

I never remember to run pep8

b1ab19f

Fix up paths so they make work in CI as well asl ocally

3bfa6bb

Merge branch 'development' into snowflake-get-catalog

85605dd

Remove misleading comment, explicity order postgres results like I di…

461da2f

…d for snowflake

Merge branch 'dev/isaac-asimov' into snowflake-get-catalog

0e3edf1

PR feedback: union all and dbt schemas only

85a4413

bigquery catalog/manifest support

d4e2cfd

pep8

1454572

Don't need to union anything here

d4597df

Merge branch 'snowflake-get-catalog' into bigquery-catalog-generation

94a4102

Windows path nonsense

26df721

Merge branch 'snowflake-get-catalog' into bigquery-catalog-generation

5d27b2b

Merge branch 'dev/isaac-asimov' into bigquery-catalog-generation

f5d5bbc

Remove extra line

7fb6e95

beckjake requested a review from drewbanin July 17, 2018 14:54

cmcarthur reviewed Jul 17, 2018

View reviewed changes

jwerderits added 6 commits July 17, 2018 09:19

Merge branch 'dev/isaac-asimov' into bigquery-catalog-generation

ad55c4c

Handle nested records properly in bigquery

694c508

Add a little decorator to do attr + use_profile

75f0a22

Add new BQ test for nested records, manifest not complete

e7a4641

ok, tests now work again

ae9ee71

update a docstring to be true again

251cb19

drewbanin approved these changes Jul 18, 2018

View reviewed changes

beckjake merged commit e5bc9c0 into dev/isaac-asimov Jul 18, 2018

beckjake deleted the bigquery-catalog-generation branch July 18, 2018 20:36

drewbanin mentioned this pull request Jul 19, 2018

catalog generation for bigquery #830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigquery catalog generation (#830) #857

Bigquery catalog generation (#830) #857

beckjake commented Jul 17, 2018 •

edited

Loading

cmcarthur Jul 17, 2018

cmcarthur Jul 17, 2018

drewbanin Jul 17, 2018

beckjake Jul 17, 2018

drewbanin commented Jul 18, 2018

Bigquery catalog generation (#830) #857

Bigquery catalog generation (#830) #857

Conversation

beckjake commented Jul 17, 2018 • edited Loading

cmcarthur Jul 17, 2018

Choose a reason for hiding this comment

cmcarthur Jul 17, 2018

Choose a reason for hiding this comment

drewbanin Jul 17, 2018

Choose a reason for hiding this comment

beckjake Jul 17, 2018

Choose a reason for hiding this comment

drewbanin commented Jul 18, 2018

beckjake commented Jul 17, 2018 •

edited

Loading