To run this example you will need
- docker
- kubectl (to manage kubernetes)
- kind (Kubernetes in docker)
- helm (package manager for Kubernetes)
Make sure that docker works:
docker version
You will need the SYNQ_TOKEN to send dbt output to Synq.io
If you don't have a kind cluster locally, please create it.
kind create cluster
kubectl cluster-info --context kind-kind
In the next step we will install Airflow to our local Kubernetes cluster.
We will do that with Helm and we use airflow-helm chart for that https:/airflow-helm/charts/tree/main/charts/airflow .
As a part of installation 2 additional python packages are installed:
airflow-dbt
that provides the Dbt*Operatorsdbt-postgres
that provides thedbt
command and support for Postgres
Before we install it you might want to edit the Helm values.yml file. The installations is configured in a way that it will periodically pull this repository with git from https:/getsynq/synq-dbt-airflow.git and add it to the dags
folder.
# Add the airflow helm repository
helm repo add airflow-stable https://airflow-helm.github.io/charts
# Install/upgrade Airflow with helm using the values from values.yml
helm upgrade --install \
"$AIRFLOW_NAME" \
airflow-stable/airflow \
--namespace "airflow-dbt" \
--version "8.6.1" \
--values ./values.yml \
--create-namespace \
--wait
This can take a while (5 minutes).
To integrate with synq, you have to install the dbt wrapper program from synq. https:/getsynq/synq-dbt
For production we recommend you add the synq-dbt
binary to the Airflow image with other dependencies that you need to run your DAGs. In the basic DbtPlugin example we use an Airflow image that has synq-dbt preinstalled.
In the advanced DbtPlugin example we have created a DAG that installs the synq dbt on to the worker, but it will not persist if the worker is restarted. Installation is triggered every DAG start.
The Synq dbt wrapper needs SYNQ_TOKEN
to be set. The airflow dbt plugin is currently not supporting env
passing via the Dbt*
operators. So we have to set the SYNQ_TOKEN
in 2 places:
Firstly, set the variable SYNQ_TOKEN
in Airflow. In the top navbar go to Admin -> Variables and add a new variable:
Secondly, set the token as environment variable for the pods. This will only be needed until the dbt airflow plugin releases a new version. Currently the DbtOperator is not passing trough the environment variables like SYNQ_TOKEN
Edit the Helm values.yml file and update the token. Then upgrade the airflow release.
# Upgrade Airflow with helm using the values from values.yml
helm upgrade --install \
"$AIRFLOW_NAME" \
airflow-stable/airflow \
--namespace "airflow-dbt" \
--version "8.6.1" \
--values ./values.yml \
--create-namespace \
--wait
If you want to use synq-dbt
with Kubernetes operator you have to add 2 things
to the image/container that will be used in the kubernetes Job/Pod
synq-dbt
- the dbt project
You have to pass the SYNQ_TOKEN
to the Kubernetes Operator.
In the basic Kubernetes example we use a docker image that has the synq-dbt
and dbt project in the image itself.
In the advanced example we install synq-dbt
and we git clone
our dbt project in seperate Kubernetes init containers.
To connect to Airflow in Kubernetes we have to port forward.
kubectl -n airflow port-forward service/airflow-web 8080
Open your browser http://localhost:8080
The dbt project has 2 simple models that will create one table and one view in the airflow
database/dbt_example
schema.
Click on the "play" button of the DAG you want to start and select Trigger DAG
After a few seconds the DAG should complete successful.
kubectl -n airflow-dbt port-forward service/airflow-postgresql 5432
You can now use your database client to inspect the database.
If you want to delete the whole setup, you just need to delete the kind cluster with:
kind delete cluster