Improve DX with `just` #165

gwenwindflower · 2024-03-08T14:47:01Z

resolves #144

This is a:

documentation update
bug fix with no breaking changes
new functionality
a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

codegen is awesome, but the developer experience is not the best. Due to its aim of using dbt itself to template the code, and not comprising on any sketchy hacks to write files, you're left getting command line output that you need to copy and paste into files. Many people over the years (myself included) have wrapped codegen with shell scripts, Python scripts etc to solve for this, but largely these efforts have remained isolated and haven't made their way back into this repo.

Thanks to some excellent work and thinking from @wjhrdy we're aiming to change that, starting with adding a robust set of just commands to the repo.

Using just we can wrap and chain the macros with all the power of shell scripts (and even other scripting languages!)

Checklist

This code is associated with an issue which has been triaged and accepted for development.
I have verified that these changes work locally
I have updated the README.md (if applicable)
I have added tests & descriptions to my models (and macros if applicable)
I have added an entry to CHANGELOG.md

gwenwindflower · 2024-03-08T14:52:23Z

@wjhrdy here's our branch to collaborate on. I've left the core recipes as they are, but have expanded with heavy commenting/documentation in the file and allowing for people to choose their package manager and related config for that. We don't offer any official documentation from dbt Labs on using poetry with dbt Core, so I don't feel comfortable with relying on that as the only path for these recipes to work, thus the added logic. Totally good with keeping that as an option though and I made sure to highlight why it's a good choice for this purpose in the commenting/documentation. I haven't fully tested that this system works, just a draft for now, but hopefully this should let people use pip by default, while still optionally going with poetry or uv (my personal preference).

Let me know if this works for you, and feel free to push commits from here! Thanks again for the amazing work on this already, looking forward to seeing where we can go with this!

Like all things, I'm very open to discussing and changing my mind about how to approach this 😸. We're trying to push people towards having per-project installs of dbt in a virtual environment, so this is in-keeping with that.

gwenwindflower · 2024-03-08T14:53:07Z

Also @dbeatty10 if you want to get in on this, since you've also got the just bug!

wjhrdy · 2024-03-22T19:12:03Z

justfile

+
+alias gmy := dbt-generate-model-yaml
+
+# Generate YAML config for any unconfigured SQL models.


This should be a one line comment and move down to just above
dbt-generate-missing-yaml

otherwise it doesn't show up when you list recipes

wjhrdy · 2024-03-22T19:18:29Z

justfile

+# Optionally accepts a parameter for folder to search in.
+default_folder := 'models'
+dbt-generate-missing-yaml folder=default_folder:
+ @for sql_file in $(find {{folder}} -type f -name '*.sql'); do \


Suggested change

@for sql_file in $(find {{folder}} -type f -name '*.sql'); do \

@for sql_file in $(find {{folder}} -type f -name '\*.sql' ! -name '\*\_v\[0-9\]\*.sql' ! -name '\*\_V\[0-9\]\*.sql'); do \\

this should be changed to acomidate the pattern of having versioned sql files with a singular yml definition.

wjhrdy · 2024-03-22T19:22:41Z

justfile

+ @{{dbt}} run-operation codegen.generate_model_import_ctes --args '{"model_name": "{{model_name}}"}' > /tmp/{{model_name}}.tmpsql
+ @awk '/with .* as \($/{p=1} p' /tmp/{{model_name}}.tmpsql | sed 's/^.*with/with/' > /tmp/temp{{model_name}} && mv /tmp/temp{{model_name}} {{generated_folder}}/{{model_name}}.sql
+ @echo "SQL {{model_name}} updated in {{generated_folder}}/{{model_name}}.sql"
+


This is more likely a feature request for codegen its self but this is a feature I added to our justfile

Suggested change

# Generate docs blocks markdown for all unique columns in a folder. For making DRY documentation. Inspired by https://docs.getdbt.com/blog/generating-dynamic-docs-dbt

dbt-generate-column-docs-md folder=default_folder:

#!/usr/bin/env bash

models=$({{dbt}} ls -m {{folder}} -q --output name | xargs -I{} echo -n ' "{}",' | sed 's/,#$//')

models="[$models]"

model_name="_$(echo {{folder}} | cut -d'/' -f2- | tr '/' '_')"

{{dbt}} run-operation generate_model_yaml --args "{ \"include_data_types\": false, \"model_names\": $models}" > /tmp/${model_name}.tmpyml

awk '/models:/{p=1} p' /tmp/${model_name}.tmpyml > /tmp/temp${model_name} && mv /tmp/temp${model_name} {{folder}}/${model_name}.yml.tmp

grep ' \- name:' {{folder}}/${model_name}.yml.tmp | cut -c 15- | sort -u > {{folder}}/${model_name}.md.tmp

if [ ! -d "generated" ]; then \

mkdir -p generated; \

fi

if [ -f {{folder}}/${model_name}.md ]; then \

merge=true; \

output_folder="generated"; \

else \

output_folder="{{folder}}"; \

fi

while read line; do

echo "{% docs column_${model_name}__${line} %}" >> ${output_folder}/${model_name}.md;

echo "" >> ${output_folder}/${model_name}.md;

echo "{% enddocs %}" >> ${output_folder}/${model_name}.md;

echo "" >> ${output_folder}/${model_name}.md;

done < {{folder}}/${model_name}.md.tmp

rm {{folder}}/${model_name}.md.tmp

rm {{folder}}/${model_name}.yml.tmp

echo "Docs blocks ${model_name} generated in ${output_folder}/${model_name}.md"

if [ "$merge" = true ] ; then

file1="${output_folder}/${model_name}.md"

file2="{{folder}}/${model_name}.md"

output="{{folder}}/${model_name}.md"

if [[ "$OSTYPE" == "darwin"* ]]; then

echo "Warning: If you are running this script on a Mac, please make sure you have gawk installed. You can install it using 'brew install gawk'"

awk_cmd="gawk"

else

awk_cmd="awk"

fi

$awk_cmd '

/^{% docs / {tag=$0; next}

/^{% enddocs %}/ {a[tag]=doc; doc=""; next}

{doc=(doc==""?"":doc "\n") $0}

END {n = asorti(a, b); for (i=1; i<=n; i++) print b[i] "\n" a[b[i]] "\n{% enddocs %}\n"}

' "$file1" "$file2" > /tmp/${model_name}.md.tmp && mv /tmp/${model_name}.md.tmp $output

echo "Docs blocks ${model_name} merged in ${output}"

fi

wjhrdy · 2024-03-22T19:33:30Z

justfile

+ @echo "Model {{table}} generated in generated/stg_{{table}}.sql"
+
+alias gbase := dbt-generate-base-model
+


This is probably more useful for established projects but I use this all the time:

Suggested change

dbt-merge-model-yaml new_yaml old_yaml output:

#!/usr/bin/env python3

import yaml

from collections import OrderedDict

import sys

import re

def selective_quote_presenter(dumper, data):

if re.search(r"[\'\{\}\[\],:]", data) or data == '': # Check if the string contains characters that are hard for yaml to parse

return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')

return dumper.represent_scalar('tag:yaml.org,2002:str', data, style=None)

yaml.add_representer(str, selective_quote_presenter)

# Add a constructor for OrderedDict to the yaml module

def dict_constructor(loader, node):

return OrderedDict(loader.construct_pairs(node))

yaml.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, dict_constructor)

# Add a representer for OrderedDict to the Dumper class

def dict_representer(dumper, data):

return dumper.represent_mapping(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, data.items())

yaml.Dumper.add_representer(OrderedDict, dict_representer)

# Get the file names from command line arguments

generated = {{new_yaml}}

base = {{old_yaml}}

output_file = {{old_yaml}}

# Load the contents of the first file

with open(generated, 'r') as file:

data1 = yaml.load(file, Loader=yaml.Loader)

# Load the contents of the second file

with open(base, 'r') as file:

data2 = yaml.load(file, Loader=yaml.Loader)

# merge the two dictionaries

for model2 in data2['models']:

model1 = next((item for item in data1['models'] if item['name'] == model2['name']), None)

if model1:

merged_model = {**model1, **{k: model2[k] for k in model2.keys() - {'columns'}}}

model1.update(merged_model)

for column2 in model2.get('columns', []):

column1 = next((item for item in model1.get('columns', []) if item['name'] == column2['name']), None)

if column1:

merged_column = {**column1, **column2}

column1.update(merged_column)

else:

model1.get('columns', []).append(column2)

else:

data1['models'].append(model2)

model_keys_order = ['name', 'tags', 'description', 'docs', 'latest_version', 'deprecation_date', 'access', 'config', 'constraints', 'tests', 'columns', 'versions']

column_keys_order = ['name', 'data_type', 'description', 'meta', 'quote', 'constraints', 'tests', 'tags']

# After merging, sort the keys

for model in data1['models']:

model1 = OrderedDict(sorted(model.items(), key=lambda i: model_keys_order.index(i[0])))

for column in model1.get('columns', []):

column1 = OrderedDict(sorted(column.items(), key=lambda i: column_keys_order.index(i[0])))

column.clear()

column.update(column1)

model.clear()

model.update(model1)

# Assume `output` is the string containing your YAML output

output = yaml.dump(data1, Dumper=yaml.Dumper)

# Add a newline after each line starting with " description:"

output = re.sub(r"(?<!columns:\n)( - name:.*\n)", r"\n\1", output)

# Now `output` contains the modified YAML

# Write the merged data back to the output file

with open(output_file, 'w') as file:

file.write(output)

dbt-update-column-yaml folder=default_folder target=target_default:

#!/usr/bin/env bash

for yml_file in $(find {{folder}} -type f -name '*.yml' ! -name '_*'); do

model_name=${yml_file##*/}

model_name=${model_name%.yml}

just dbt-generate-model-yaml $model_name generated {{target}}

just dbt-merge-model-yaml generated/$model_name.yml $yml_file $yml_file

done

wjhrdy

I added a few features and fixes. If you want me to do this in a different way please let me know

jasnonaz · 2024-04-04T19:46:27Z

Hi Willy,

I’m Jason - manager of the DX team here at dbt Labs. I want to thank you for all of the time and effort you’ve put into thinking through this PR and how it could improve dbt-codegen.

I was running through the upcoming release with Winnie and seeing this PR made me realize two things:

This is an extremely interesting angle to investigate
This is a relatively large change to the usage of dbt-codegen - one that most likely falls outside the size of change we’re looking to make in the upcoming minor release, which is mainly aimed at addressing potential fixes and improvements that fit squarely within existing functionality

For that reason, we’re going to close this as won’t do for now. It’s possible that in the future we might be looking to more holistically revamp the codegen package, at which time we’d be happy to pick this up with you.

I want to apologize for having this happen at this stage in the process - typically the best time for us to mark something as won’t do as when the initial issue is created, not after you put in a bunch of work. That’s on me for not setting the bounds correctly on this codegen release - I hope you know that we really appreciate all the care and thought you’ve put into this.

If I might make a suggestion, you’ve done a bunch of cool investigation here and this might make for an interesting #show-and-tell post on the dbt Community Forum - I think the folks there would really appreciate seeing the thinking you've done.

Thank you again,

Jason

gwenwindflower added 3 commits March 7, 2024 17:00

Add justfile; add .venv to gitignore

fcd3a91

Add initial justfile

d1b95fb

Format recipe descriptions

fcc81fc

gwenwindflower added 3 commits March 8, 2024 15:29

Fix space and quoting in dbt var

26b6565

Clarify attribution section

7604fe0

Clarify dev recipes desc

c6e2084

wjhrdy reviewed Mar 22, 2024

View reviewed changes

jasnonaz closed this Apr 4, 2024

jasnonaz mentioned this pull request Apr 4, 2024

Improving interface within a dbt repo using just #144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve DX with `just` #165

Improve DX with `just` #165

gwenwindflower commented Mar 8, 2024 •

edited

Loading

gwenwindflower commented Mar 8, 2024 •

edited

Loading

gwenwindflower commented Mar 8, 2024

wjhrdy Mar 22, 2024

wjhrdy Mar 22, 2024

wjhrdy Mar 22, 2024

wjhrdy Mar 22, 2024 •

edited

Loading

wjhrdy Mar 22, 2024

wjhrdy left a comment

jasnonaz commented Apr 4, 2024


		alias gmy := dbt-generate-model-yaml

		# Generate YAML config for any unconfigured SQL models.

	@for sql_file in $(find {{folder}} -type f -name '*.sql'); do \
	@for sql_file in $(find {{folder}} -type f -name '\.sql' ! -name '\\_v\[0-9\]\.sql' ! -name '\\_V\[0-9\]\*.sql'); do \\

+# Generate docs blocks markdown for all unique columns in a folder. For making DRY documentation. Inspired by https://docs.getdbt.com/blog/generating-dynamic-docs-dbt
+dbt-generate-column-docs-md folder=default_folder:
+ #!/usr/bin/env bash
+ models=$({{dbt}} ls -m {{folder}} -q --output name | xargs -I{} echo -n ' "{}",' | sed 's/,#$//')
+ models="[$models]"
+ model_name="_$(echo {{folder}} | cut -d'/' -f2- | tr '/' '_')"
+ {{dbt}} run-operation generate_model_yaml --args "{ \"include_data_types\": false, \"model_names\": $models}" > /tmp/${model_name}.tmpyml
+ awk '/models:/{p=1} p' /tmp/${model_name}.tmpyml > /tmp/temp${model_name} && mv /tmp/temp${model_name} {{folder}}/${model_name}.yml.tmp
+ grep ' \- name:' {{folder}}/${model_name}.yml.tmp | cut -c 15- | sort -u > {{folder}}/${model_name}.md.tmp
+ if [ ! -d "generated" ]; then \
+ mkdir -p generated; \
+ fi
+ if [ -f {{folder}}/${model_name}.md ]; then \
+ merge=true; \
+ output_folder="generated"; \
+ else \
+ output_folder="{{folder}}"; \
+ fi
+ while read line; do
+ echo "{% docs column_${model_name}__${line} %}" >> ${output_folder}/${model_name}.md;
+ echo "" >> ${output_folder}/${model_name}.md;
+ echo "{% enddocs %}" >> ${output_folder}/${model_name}.md;
+ echo "" >> ${output_folder}/${model_name}.md;
+ done < {{folder}}/${model_name}.md.tmp
+ rm {{folder}}/${model_name}.md.tmp
+ rm {{folder}}/${model_name}.yml.tmp
+ echo "Docs blocks ${model_name} generated in ${output_folder}/${model_name}.md"
+ if [ "$merge" = true ] ; then
+ file1="${output_folder}/${model_name}.md"
+ file2="{{folder}}/${model_name}.md"
+ output="{{folder}}/${model_name}.md"
+ if [[ "$OSTYPE" == "darwin"* ]]; then
+ echo "Warning: If you are running this script on a Mac, please make sure you have gawk installed. You can install it using 'brew install gawk'"
+ awk_cmd="gawk"
+ else
+ awk_cmd="awk"
+ fi
+ $awk_cmd '
+ /^{% docs / {tag=$0; next}
+ /^{% enddocs %}/ {a[tag]=doc; doc=""; next}
+ {doc=(doc==""?"":doc "\n") $0}
+ END {n = asorti(a, b); for (i=1; i<=n; i++) print b[i] "\n" a[b[i]] "\n{% enddocs %}\n"}
+ ' "$file1" "$file2" > /tmp/${model_name}.md.tmp && mv /tmp/${model_name}.md.tmp $output
+ echo "Docs blocks ${model_name} merged in ${output}"
+ fi

		@echo "Model {{table}} generated in generated/stg_{{table}}.sql"

		alias gbase := dbt-generate-base-model

+dbt-merge-model-yaml new_yaml old_yaml output:
+ #!/usr/bin/env python3
+ import yaml
+ from collections import OrderedDict
+ import sys
+ import re
+ def selective_quote_presenter(dumper, data):
+ if re.search(r"[\'\{\}\[\],:]", data) or data == '': # Check if the string contains characters that are hard for yaml to parse
+ return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
+ return dumper.represent_scalar('tag:yaml.org,2002:str', data, style=None)
+ yaml.add_representer(str, selective_quote_presenter)
+ # Add a constructor for OrderedDict to the yaml module
+ def dict_constructor(loader, node):
+ return OrderedDict(loader.construct_pairs(node))
+ yaml.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, dict_constructor)
+ # Add a representer for OrderedDict to the Dumper class
+ def dict_representer(dumper, data):
+ return dumper.represent_mapping(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, data.items())
+ yaml.Dumper.add_representer(OrderedDict, dict_representer)
+ # Get the file names from command line arguments
+ generated = {{new_yaml}}
+ base = {{old_yaml}}
+ output_file = {{old_yaml}}
+ # Load the contents of the first file
+ with open(generated, 'r') as file:
+ data1 = yaml.load(file, Loader=yaml.Loader)
+ # Load the contents of the second file
+ with open(base, 'r') as file:
+ data2 = yaml.load(file, Loader=yaml.Loader)
+ # merge the two dictionaries
+ for model2 in data2['models']:
+ model1 = next((item for item in data1['models'] if item['name'] == model2['name']), None)
+ if model1:
+ merged_model = {**model1, **{k: model2[k] for k in model2.keys() - {'columns'}}}
+ model1.update(merged_model)
+ for column2 in model2.get('columns', []):
+ column1 = next((item for item in model1.get('columns', []) if item['name'] == column2['name']), None)
+ if column1:
+ merged_column = {**column1, **column2}
+ column1.update(merged_column)
+ else:
+ model1.get('columns', []).append(column2)
+ else:
+ data1['models'].append(model2)
+ model_keys_order = ['name', 'tags', 'description', 'docs', 'latest_version', 'deprecation_date', 'access', 'config', 'constraints', 'tests', 'columns', 'versions']
+ column_keys_order = ['name', 'data_type', 'description', 'meta', 'quote', 'constraints', 'tests', 'tags']
+ # After merging, sort the keys
+ for model in data1['models']:
+ model1 = OrderedDict(sorted(model.items(), key=lambda i: model_keys_order.index(i[0])))
+ for column in model1.get('columns', []):
+ column1 = OrderedDict(sorted(column.items(), key=lambda i: column_keys_order.index(i[0])))
+ column.clear()
+ column.update(column1)
+ model.clear()
+ model.update(model1)
+ # Assume `output` is the string containing your YAML output
+ output = yaml.dump(data1, Dumper=yaml.Dumper)
+ # Add a newline after each line starting with " description:"
+ output = re.sub(r"(?<!columns:\n)( - name:.*\n)", r"\n\1", output)
+ # Now `output` contains the modified YAML
+ # Write the merged data back to the output file
+ with open(output_file, 'w') as file:
+ file.write(output)
+dbt-update-column-yaml folder=default_folder target=target_default:
+ #!/usr/bin/env bash
+ for yml_file in $(find {{folder}} -type f -name '*.yml' ! -name '_*'); do
+ model_name=${yml_file##*/}
+ model_name=${model_name%.yml}
+ just dbt-generate-model-yaml $model_name generated {{target}}
+ just dbt-merge-model-yaml generated/$model_name.yml $yml_file $yml_file
+ done

Improve DX with just #165

Improve DX with just #165

Conversation

gwenwindflower commented Mar 8, 2024 • edited Loading

Description & motivation

Checklist

gwenwindflower commented Mar 8, 2024 • edited Loading

gwenwindflower commented Mar 8, 2024

wjhrdy Mar 22, 2024

Choose a reason for hiding this comment

wjhrdy Mar 22, 2024

Choose a reason for hiding this comment

wjhrdy Mar 22, 2024

Choose a reason for hiding this comment

wjhrdy Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

wjhrdy Mar 22, 2024

Choose a reason for hiding this comment

wjhrdy left a comment

Choose a reason for hiding this comment

jasnonaz commented Apr 4, 2024

Improve DX with `just` #165

Improve DX with `just` #165

gwenwindflower commented Mar 8, 2024 •

edited

Loading

gwenwindflower commented Mar 8, 2024 •

edited

Loading

wjhrdy Mar 22, 2024 •

edited

Loading