-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full refresh of incremental tables locks the table during build #2426
Comments
I believe the relevant bit of code is here: I think the cause of the delay you're seeing is that dbt doesn't use the same approach here that it does in the table materialization. Here's the table order of operations:
The full-refresh mode of the incremental materialization instead does:
It's worth saying that, in much older versions of dbt, the way the incremental materialization executed full-refresh runs was by... dropping the preexisting table before even opening the transaction to recreate it. So this is definitely better than that, if still not as good as it could be. @drewbanin Do you know a good reason for the difference between the materializations? Maybe this is something we could think about for broader materialization work in 0.18.0? |
hey @jtcohen6 - I can't think of a good reason why incrementals would work differently than tables on redshift. I'd be very supportive of changing the atomic swap flow for 0.18.0! |
That's great! I am currently working on a custom materialization for our project using the pattern described in @jtcohen6 's response. I would be happy to contribute if that works! |
As discussed above, changing to the table order of operations helped remove the lock on the table when building and resolved our issue. My change involved a custom materialization with the following lines of code: to something like:
Let me know if it would be helpful to contribute this in a PR or if it would be better for the team to make the change since it involves changes to an important functionality. There are also definitely improvements to be made like removing the dummy sql statements. |
@drkarthi sure thing - please do feel free to PR a change for this against |
Has this made it into a release yet? I'm experiencing this issue with Postgres. |
@TAJD This issue is still open and is not slated for a release (or 0.19 specifically) at the moment. Drew's comment above still applies though! We are happy to take a look at a PR contribution. |
Thanks for clarifying! I'll take a look. |
Just saw that a commit has already been pushed, closing my PR and will follow the changes there :) |
One additional thing I tried was removing the dummy sql statement
This is the code change related to that
|
Thanks @drkarthi. The existing test suite passes for the modification but I'm not sure about how to write a test to confirm that this behaviour occurs. My thoughts are to either confirm various statements exist in the compiled SQL code or to run a check mid-build. I'm going to take a look through the existing DBT test suite to gain ideas - happy to receive input in the mean time! A workaround which solved the use case in my situation was to write a macro that concurrently refreshed a matview, allowing for zero downtime (Our production, testing and analytics databases are the same thing ;).) |
@TAJD I tested the changes using the approach of checking mid-build and it was working. Don't know of a way to formally test the changes. |
@TAJD have you had the chance to look at this? would it be useful if I reopened my PR and we can wait for feedback from the maintainers? |
@drkarthi please do - I'm sorry I lost focus on this after I got a fix in production. I ended up writing a macro to persist data in a separate table. Thanks for your effort! |
Hey @drkarthi, it looks like your PR had a small amount of feedback but was close to being ready to be merged. Any chance you could take a look? This issue is affecting us fairly regularly and it'd be great to see it resolved. |
Sorry @Limess, I had lost track of this. I will address the pr comments this weekend. |
Describe the bug
When running a full refresh of incremental tables, the table is locked when the new table is being built. This locks the table for several minutes in production for large tables.
Steps To Reproduce
Expected behavior
Expected the original table to be available for querying when the temporary table is being built. The table should be locked only when moving the contents from the temporary table to the original table.
Below is a prototype where only the delete from target table and the insert into the target table are part of the transaction. This locks the table for a much smaller duration:
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
macOS Catalina 10.15.4
The output of
python --version
:Python 3.6.3
Additional context
When I checked in Slack, I was told this may be a solved problem. However, I do notice a significant difference in query time between the two approaches. Let me know if I am missing something.
The text was updated successfully, but these errors were encountered: