-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use relative_path to determine whether big seeds are modified #2927
Comments
But then I confused myself again by reading |
Update: I was right all along, except for when I was wrong about being wrong. |
Describe the bug
It is possible for
state:modified
to produce different behavior on dbt CLI vs. dbt Cloud. Why?When we compare big (>1 MB) seed files—too big to efficiently hash contents—we instead store + compare a hash of the file path. (The operating principle: If it's a massive seed file, unless it's been renamed or moved around, we're just going to assume it's unchanged!) Today, that looks like:
https:/fishtown-analytics/dbt/blob/34869fc2a2a354a18a232e21315c3901aafab0b6/core/dbt/contracts/files.py#L156-L161
Instead, we should use the
relative_path
, which handles the fact that, in deployment, files are regularly copied/cloned around and ultimately mounted from who-knows-where in S3.This should be a one-line change, and it will require updating some tests.
Steps To Reproduce
Create a big seed file. Run
dbt seed -s state:modified
from dbt Cloud. It should always run, despite being unchanged and unmoved.Expected behavior
Frankly, we don't recommend folks use
dbt seed
to load anything larger than Very Small Data, but we should still do our best to produce consistent behavior when they do.The output of
dbt --version
:v0.18.0 or v0.18.1
The text was updated successfully, but these errors were encountered: