-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor DNS away from flows, deal with errors #419
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the big change! I left a few comments on my review.
app/queues/dns/index.server.ts
Outdated
attempts: 3, | ||
backoff: { | ||
type: 'exponential', | ||
delay: 60_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know DNS record update will take maximum 60 seconds. So I assumed most cases it will be less than 60 seconds and used much less number to finish the job as early as possible. I guess this 60 second is intended to avoid any too early retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with 10s, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 10s with 6 attempts sounds good to me as it can hit 60 seconds at max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: the way exponential backoff works, it's 2^retry * delay
, so that means 10s, 20s, 80s, 160s, etc. If we did 10s and 6, it would get to 640s = 10.6 hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 10s with 3 retry reasonable
New commit up with fixes. |
app/queues/dns/index.server.ts
Outdated
attempts: 3, | ||
backoff: { | ||
type: 'exponential', | ||
delay: 60_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 10s with 6 attempts sounds good to me as it can hit 60 seconds at max.
break; | ||
default: | ||
return updateDnsRecordById(id, { | ||
status: dnsStatus === 'INSYNC' ? DnsRecordStatus.active : DnsRecordStatus.error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is expected. I returned undefined
when creating a record, it updates the DB as active
. I know we are going to have reconciler, but the user will see my record is available right away. Probably we should add switch
by job type in waitOnChange
to handle create
differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what we want. When we create
the DB record will be available right away, but pending
. When it finishes, we update to be either active
or error
and the UI changes to match.
I've made the review fixes, thank you both for helping me improve this! |
@SerpentBytes @Genne23v this is ready for another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me!
Fixes #368
This refactors our DNS queue/workers to properly deal with the different stages of our DNS "pipeline." Previously we used a Flow in BullMQ, but the way that errors work between child and parent jobs meant that we couldn't properly process error cases.
This switches the architecture to use a Step Job pattern. The main reason I'm going this route is that I want to group all of the Route53 calls into a single queue, which I can limit globally (i.e., 5 per second max). By having everything in one queue, it makes it easier.
Here is what I've done:
app/queues/dns/queue.server.ts
to manage the newdns-queue
. It usesQueueEvents
to allow listening forfailed
jobs. When a job in the queue fails, we have a chance to update the database.app/queues/dns/worker.server.ts
to define the worker. It uses a series of steps instead of our Flow, and the code progresses through these steps. Data is stored on the Job'sdata
vs. being passed between workers. The step logic is basically the same thing @Genne23v did before, just moved to different functions.app/queues/dns/index.server.ts
as the main API people use to work with this. It exposes methods for adding jobs to the queue. These are also using the code @Genne23v wrote before, just moving things around.Elsewhere, I've updated all the call sites to use the new API names. I've also made some other changes that were necessary for testing:
app/routes/__index/dns-records/index.tsx
so that you can Delete a record when it's in the Error state. Currently you can'tdeleteDnsRecord
to returnundefined
vs.null
when no Change ID is returned (i.e., record already deleted in Route53). This makes the types easier to write.To test this, try creating, update, deleting records in the UI.