-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve DX with just
#165
Improve DX with just
#165
Conversation
@wjhrdy here's our branch to collaborate on. I've left the core recipes as they are, but have expanded with heavy commenting/documentation in the file and allowing for people to choose their package manager and related config for that. We don't offer any official documentation from dbt Labs on using Let me know if this works for you, and feel free to push commits from here! Thanks again for the amazing work on this already, looking forward to seeing where we can go with this! Like all things, I'm very open to discussing and changing my mind about how to approach this 😸. We're trying to push people towards having per-project installs of dbt in a virtual environment, so this is in-keeping with that. |
Also @dbeatty10 if you want to get in on this, since you've also got the |
|
||
alias gmy := dbt-generate-model-yaml | ||
|
||
# Generate YAML config for any unconfigured SQL models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a one line comment and move down to just above
dbt-generate-missing-yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise it doesn't show up when you list recipes
# Optionally accepts a parameter for folder to search in. | ||
default_folder := 'models' | ||
dbt-generate-missing-yaml folder=default_folder: | ||
@for sql_file in $(find {{folder}} -type f -name '*.sql'); do \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@for sql_file in $(find {{folder}} -type f -name '*.sql'); do \ | |
@for sql_file in $(find {{folder}} -type f -name '\*.sql' ! -name '\*\_v\[0-9\]\*.sql' ! -name '\*\_V\[0-9\]\*.sql'); do \\ |
this should be changed to acomidate the pattern of having versioned sql files with a singular yml definition.
@{{dbt}} run-operation codegen.generate_model_import_ctes --args '{"model_name": "{{model_name}}"}' > /tmp/{{model_name}}.tmpsql | ||
@awk '/with .* as \($/{p=1} p' /tmp/{{model_name}}.tmpsql | sed 's/^.*with/with/' > /tmp/temp{{model_name}} && mv /tmp/temp{{model_name}} {{generated_folder}}/{{model_name}}.sql | ||
@echo "SQL {{model_name}} updated in {{generated_folder}}/{{model_name}}.sql" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more likely a feature request for codegen its self but this is a feature I added to our justfile
# Generate docs blocks markdown for all unique columns in a folder. For making DRY documentation. Inspired by https://docs.getdbt.com/blog/generating-dynamic-docs-dbt | |
dbt-generate-column-docs-md folder=default_folder: | |
#!/usr/bin/env bash | |
models=$({{dbt}} ls -m {{folder}} -q --output name | xargs -I{} echo -n ' "{}",' | sed 's/,#$//') | |
models="[$models]" | |
model_name="_$(echo {{folder}} | cut -d'/' -f2- | tr '/' '_')" | |
{{dbt}} run-operation generate_model_yaml --args "{ \"include_data_types\": false, \"model_names\": $models}" > /tmp/${model_name}.tmpyml | |
awk '/models:/{p=1} p' /tmp/${model_name}.tmpyml > /tmp/temp${model_name} && mv /tmp/temp${model_name} {{folder}}/${model_name}.yml.tmp | |
grep ' \- name:' {{folder}}/${model_name}.yml.tmp | cut -c 15- | sort -u > {{folder}}/${model_name}.md.tmp | |
if [ ! -d "generated" ]; then \ | |
mkdir -p generated; \ | |
fi | |
if [ -f {{folder}}/${model_name}.md ]; then \ | |
merge=true; \ | |
output_folder="generated"; \ | |
else \ | |
output_folder="{{folder}}"; \ | |
fi | |
while read line; do | |
echo "{% docs column_${model_name}__${line} %}" >> ${output_folder}/${model_name}.md; | |
echo "" >> ${output_folder}/${model_name}.md; | |
echo "{% enddocs %}" >> ${output_folder}/${model_name}.md; | |
echo "" >> ${output_folder}/${model_name}.md; | |
done < {{folder}}/${model_name}.md.tmp | |
rm {{folder}}/${model_name}.md.tmp | |
rm {{folder}}/${model_name}.yml.tmp | |
echo "Docs blocks ${model_name} generated in ${output_folder}/${model_name}.md" | |
if [ "$merge" = true ] ; then | |
file1="${output_folder}/${model_name}.md" | |
file2="{{folder}}/${model_name}.md" | |
output="{{folder}}/${model_name}.md" | |
if [[ "$OSTYPE" == "darwin"* ]]; then | |
echo "Warning: If you are running this script on a Mac, please make sure you have gawk installed. You can install it using 'brew install gawk'" | |
awk_cmd="gawk" | |
else | |
awk_cmd="awk" | |
fi | |
$awk_cmd ' | |
/^{% docs / {tag=$0; next} | |
/^{% enddocs %}/ {a[tag]=doc; doc=""; next} | |
{doc=(doc==""?"":doc "\n") $0} | |
END {n = asorti(a, b); for (i=1; i<=n; i++) print b[i] "\n" a[b[i]] "\n{% enddocs %}\n"} | |
' "$file1" "$file2" > /tmp/${model_name}.md.tmp && mv /tmp/${model_name}.md.tmp $output | |
echo "Docs blocks ${model_name} merged in ${output}" | |
fi |
@echo "Model {{table}} generated in generated/stg_{{table}}.sql" | ||
|
||
alias gbase := dbt-generate-base-model | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably more useful for established projects but I use this all the time:
dbt-merge-model-yaml new_yaml old_yaml output: | |
#!/usr/bin/env python3 | |
import yaml | |
from collections import OrderedDict | |
import sys | |
import re | |
def selective_quote_presenter(dumper, data): | |
if re.search(r"[\'\{\}\[\],:]", data) or data == '': # Check if the string contains characters that are hard for yaml to parse | |
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"') | |
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style=None) | |
yaml.add_representer(str, selective_quote_presenter) | |
# Add a constructor for OrderedDict to the yaml module | |
def dict_constructor(loader, node): | |
return OrderedDict(loader.construct_pairs(node)) | |
yaml.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, dict_constructor) | |
# Add a representer for OrderedDict to the Dumper class | |
def dict_representer(dumper, data): | |
return dumper.represent_mapping(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, data.items()) | |
yaml.Dumper.add_representer(OrderedDict, dict_representer) | |
# Get the file names from command line arguments | |
generated = {{new_yaml}} | |
base = {{old_yaml}} | |
output_file = {{old_yaml}} | |
# Load the contents of the first file | |
with open(generated, 'r') as file: | |
data1 = yaml.load(file, Loader=yaml.Loader) | |
# Load the contents of the second file | |
with open(base, 'r') as file: | |
data2 = yaml.load(file, Loader=yaml.Loader) | |
# merge the two dictionaries | |
for model2 in data2['models']: | |
model1 = next((item for item in data1['models'] if item['name'] == model2['name']), None) | |
if model1: | |
merged_model = {**model1, **{k: model2[k] for k in model2.keys() - {'columns'}}} | |
model1.update(merged_model) | |
for column2 in model2.get('columns', []): | |
column1 = next((item for item in model1.get('columns', []) if item['name'] == column2['name']), None) | |
if column1: | |
merged_column = {**column1, **column2} | |
column1.update(merged_column) | |
else: | |
model1.get('columns', []).append(column2) | |
else: | |
data1['models'].append(model2) | |
model_keys_order = ['name', 'tags', 'description', 'docs', 'latest_version', 'deprecation_date', 'access', 'config', 'constraints', 'tests', 'columns', 'versions'] | |
column_keys_order = ['name', 'data_type', 'description', 'meta', 'quote', 'constraints', 'tests', 'tags'] | |
# After merging, sort the keys | |
for model in data1['models']: | |
model1 = OrderedDict(sorted(model.items(), key=lambda i: model_keys_order.index(i[0]))) | |
for column in model1.get('columns', []): | |
column1 = OrderedDict(sorted(column.items(), key=lambda i: column_keys_order.index(i[0]))) | |
column.clear() | |
column.update(column1) | |
model.clear() | |
model.update(model1) | |
# Assume `output` is the string containing your YAML output | |
output = yaml.dump(data1, Dumper=yaml.Dumper) | |
# Add a newline after each line starting with " description:" | |
output = re.sub(r"(?<!columns:\n)( - name:.*\n)", r"\n\1", output) | |
# Now `output` contains the modified YAML | |
# Write the merged data back to the output file | |
with open(output_file, 'w') as file: | |
file.write(output) | |
dbt-update-column-yaml folder=default_folder target=target_default: | |
#!/usr/bin/env bash | |
for yml_file in $(find {{folder}} -type f -name '*.yml' ! -name '_*'); do | |
model_name=${yml_file##*/} | |
model_name=${model_name%.yml} | |
just dbt-generate-model-yaml $model_name generated {{target}} | |
just dbt-merge-model-yaml generated/$model_name.yml $yml_file $yml_file | |
done | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few features and fixes. If you want me to do this in a different way please let me know
Hi Willy, I’m Jason - manager of the DX team here at dbt Labs. I want to thank you for all of the time and effort you’ve put into thinking through this PR and how it could improve dbt-codegen. I was running through the upcoming release with Winnie and seeing this PR made me realize two things:
For that reason, we’re going to close this as won’t do for now. It’s possible that in the future we might be looking to more holistically revamp the codegen package, at which time we’d be happy to pick this up with you. I want to apologize for having this happen at this stage in the process - typically the best time for us to mark something as won’t do as when the initial issue is created, not after you put in a bunch of work. That’s on me for not setting the bounds correctly on this codegen release - I hope you know that we really appreciate all the care and thought you’ve put into this. If I might make a suggestion, you’ve done a bunch of cool investigation here and this might make for an interesting #show-and-tell post on the dbt Community Forum - I think the folks there would really appreciate seeing the thinking you've done. Thank you again, Jason |
resolves #144
This is a:
All pull requests from community contributors should target the
main
branch (default).Description & motivation
codegen
is awesome, but the developer experience is not the best. Due to its aim of using dbt itself to template the code, and not comprising on any sketchy hacks to write files, you're left getting command line output that you need to copy and paste into files. Many people over the years (myself included) have wrapped codegen with shell scripts, Python scripts etc to solve for this, but largely these efforts have remained isolated and haven't made their way back into this repo.Thanks to some excellent work and thinking from @wjhrdy we're aiming to change that, starting with adding a robust set of
just
commands to the repo.Using
just
we can wrap and chain the macros with all the power of shell scripts (and even other scripting languages!)Checklist