Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-850] [Feature] Customizable and centralized paths for all dbt commands #5485

Closed
1 task done
yashbhatianyt opened this issue Jul 16, 2022 · 3 comments
Closed
1 task done
Labels
cli enhancement New feature or request

Comments

@yashbhatianyt
Copy link

Is this your first time opening an issue?

Describe the Feature

I am currently hosting dbt on Google Cloud Functions using Python. Although the implementation is working well, Google Cloud functions does not allow any source files to be overwritten as they become read only once the setup is deployed. The issue I was facing was around the log files and target compiled sqls that are written by dbt during the runs. GCF allows only one path to write files which is "/tmp".
This made me change the paths of target, log and package-install to "/tmp" in the dbt-project.yml file. However, I realized this only works for the "dbt run" command. For all the other commands that I have tested (eg. debug, deps), it tries to write to overwrite the original destinations (e.g logs, target) and hence, does not allow me to use those commands.
I think this feature of having a centralized customizable path for all such files will benefit the community and expand the deployment areas for dbt.

Describe alternatives you've considered

The alternatives considered here include creating a package of my dbt code using Docker containers and hosting the entire package on Google Cloud Run. However, since my use case is currently not big enough, I feel I can do with the run command only for now. I'm not sure if it is the same case for AWS Lambda functions but if it is, it can seriously help expand deployment regions for dbt code.

Who will this benefit?

This will be extremely useful for lightweight dbt deployment without worrying about the need of containers.
When the use cases are not too big and sql scripts to be deployed are not very heavy, implementing this feature will allow a full-fledged lightweight deployment of dbt on Google Cloud Functions. I feel GCP is being used more and more by the day and having this feature will help a lot of companies.

Are you interested in contributing this feature?

Yes, I would love to help in any way I can

Anything else?

No response

@yashbhatianyt yashbhatianyt added enhancement New feature or request triage labels Jul 16, 2022
@github-actions github-actions bot changed the title [Feature] Customizable and centralized paths for all dbt commands [CT-850] [Feature] Customizable and centralized paths for all dbt commands Jul 16, 2022
@amirbtb
Copy link

amirbtb commented Jul 20, 2022

Hey, I'm also using Google Cloud Functions to run dbt.
I had the same issue and, same as you, decided to set /tmp as the root folder for all files written by dbt.

Here is a subset of the settings in dbt_project.yml :

name: 'demo_dbt'
config-version: 2
version: '1.0.0'
# … other settings here ...
log-path: '/tmp/dbt_logs'
target-path: '/tmp/target' 
# … other settings here ...

I am sorry but I don't understand when you say "For all the other commands that I have tested (eg. debug, deps), it tries to write to overwrite the original destinations (e.g logs, target) and hence, does not allow me to use those commands."

I run dbt deps before I build and deploy the Cloud Function artifact (the .zip file deployed to GCS) so all dependencies are already present in the Cloud Functions at run time, allowing me to reduce the execution time to the duration of the dbt build command. This can also fix the issue you have using dbt deps in the Cloud Function.

About dbt debug, from my understanding, it does not produce any artifact so you should not have write rights issues.
I guess my current setup works well since I only use the dbt build command. I may run into issues similar to yours if I come to use other dbt commands in the Cloud Functions.

@yashbhatianyt
Copy link
Author

I have done the same, by setting '/tmp/dbt_logs' and all that but these configurations only work for the dbt run command. When you run the dbt debug command, it still writes the logs to the logs folder present inside source code. So, I wanted to know if there's any way to change that because just setting the log-path & target-path in dbt_project.yml does not work for other commands.

@jtcohen6 jtcohen6 self-assigned this Aug 9, 2022
@jtcohen6 jtcohen6 added cli and removed triage labels Aug 20, 2022
@jtcohen6 jtcohen6 removed their assignment Aug 20, 2022
@jtcohen6
Copy link
Contributor

@yashbhatianyt Thanks for opening, and sorry for the delay getting back to you!

I can't speak to the specific implementation you're pursuing around Google Cloud functions / Google Cloud Run. But I definitely agree with the intent here.

We should:

  • Support TARGET_PATH, LOG_PATH, etc as runtime configurations that can be set via CLI flag, env var, or user config
  • Support those configs for all commands, including dbt deps and dbt debug

The good news is that we're currently underway with an initiative to refactor our CLI (#5527), part of which will be to rationalize which flags/configs are supported by which commands. I think ensuring that these universal path configs are universally respected is something that would fall into scope.

I'm going to close this issue in favor of the in-progress initiative for now. If we don't manage to resolve this over the next several months, we can revisit accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants