Refactor DNS away from flows, deal with errors #419

humphd · 2023-03-22T21:26:59Z

Fixes #368

This refactors our DNS queue/workers to properly deal with the different stages of our DNS "pipeline." Previously we used a Flow in BullMQ, but the way that errors work between child and parent jobs meant that we couldn't properly process error cases.

This switches the architecture to use a Step Job pattern. The main reason I'm going this route is that I want to group all of the Route53 calls into a single queue, which I can limit globally (i.e., 5 per second max). By having everything in one queue, it makes it easier.

Here is what I've done:

Created app/queues/dns/queue.server.ts to manage the new dns-queue. It uses QueueEvents to allow listening for failed jobs. When a job in the queue fails, we have a chance to update the database.
Created app/queues/dns/worker.server.ts to define the worker. It uses a series of steps instead of our Flow, and the code progresses through these steps. Data is stored on the Job's data vs. being passed between workers. The step logic is basically the same thing @Genne23v did before, just moved to different functions.
Created app/queues/dns/index.server.ts as the main API people use to work with this. It exposes methods for adding jobs to the queue. These are also using the code @Genne23v wrote before, just moving things around.

Elsewhere, I've updated all the call sites to use the new API names. I've also made some other changes that were necessary for testing:

I've updated app/routes/__index/dns-records/index.tsx so that you can Delete a record when it's in the Error state. Currently you can't
I've updated deleteDnsRecord to return undefined vs. null when no Change ID is returned (i.e., record already deleted in Route53). This makes the types easier to write.

To test this, try creating, update, deleting records in the UI.

Genne23v

Thank you for the big change! I left a few comments on my review.

app/models/dns-record.server.ts

app/queues/dns/dns-worker.server.ts

Genne23v · 2023-03-23T00:35:37Z

app/queues/dns/index.server.ts

+ attempts: 3,
+ backoff: {
+ type: 'exponential',
+ delay: 60_000,


I know DNS record update will take maximum 60 seconds. So I assumed most cases it will be less than 60 seconds and used much less number to finish the job as early as possible. I guess this 60 second is intended to avoid any too early retry.

I went with 10s, what do you think?

I think 10s with 6 attempts sounds good to me as it can hit 60 seconds at max.

NOTE: the way exponential backoff works, it's 2^retry * delay, so that means 10s, 20s, 80s, 160s, etc. If we did 10s and 6, it would get to 640s = 10.6 hours.

I think 10s with 3 retry reasonable

app/routes/dev.tsx

humphd · 2023-03-23T01:38:35Z

New commit up with fixes.

app/queues/dns/dns-worker.server.ts

Genne23v · 2023-03-23T01:49:58Z

app/queues/dns/index.server.ts

+ attempts: 3,
+ backoff: {
+ type: 'exponential',
+ delay: 60_000,


I think 10s with 6 attempts sounds good to me as it can hit 60 seconds at max.

Genne23v · 2023-03-23T02:11:48Z

app/queues/dns/dns-worker.server.ts

+ break;
+ default:
+ return updateDnsRecordById(id, {
+ status: dnsStatus === 'INSYNC' ? DnsRecordStatus.active : DnsRecordStatus.error,


I'm not sure if this is expected. I returned undefined when creating a record, it updates the DB as active. I know we are going to have reconciler, but the user will see my record is available right away. Probably we should add switch by job type in waitOnChange to handle create differently?

This is exactly what we want. When we create the DB record will be available right away, but pending. When it finishes, we update to be either active or error and the UI changes to match.

app/routes/__index/dns-records/index.tsx

app/queues/dns/dns-worker.server.ts

humphd · 2023-03-23T17:39:39Z

I've made the review fixes, thank you both for helping me improve this!

humphd · 2023-03-23T19:23:14Z

@SerpentBytes @Genne23v this is ready for another look

Genne23v

It looks good to me!

humphd added the category: DNS A service about hosting domains label Mar 22, 2023

humphd requested review from Myrfion, SerpentBytes, Genne23v and a user March 22, 2023 21:26

humphd self-assigned this Mar 22, 2023

SerpentBytes mentioned this pull request Mar 22, 2023

Notify user when DNS record status change from pending to active/error #412

Merged

Genne23v requested changes Mar 23, 2023

View reviewed changes

humphd requested a review from Genne23v March 23, 2023 01:38

Genne23v requested changes Mar 23, 2023

View reviewed changes

SerpentBytes reviewed Mar 23, 2023

View reviewed changes

app/routes/__index/dns-records/index.tsx Show resolved Hide resolved

SerpentBytes reviewed Mar 23, 2023

View reviewed changes

app/queues/dns/dns-worker.server.ts Outdated Show resolved Hide resolved

Refactor DNS away from flows, deal with errors

cb83af5

humphd force-pushed the dns-refactor branch from bf46d95 to cb83af5 Compare March 23, 2023 17:38

humphd requested review from SerpentBytes and Genne23v March 23, 2023 17:39

humphd mentioned this pull request Mar 23, 2023

Unable to update optional DNS Record values #425

Closed

Myrfion approved these changes Mar 23, 2023

View reviewed changes

Genne23v approved these changes Mar 23, 2023

View reviewed changes

humphd merged commit b1cac3d into DevelopingSpace:main Mar 23, 2023

SerpentBytes approved these changes Mar 23, 2023

View reviewed changes

humphd mentioned this pull request Apr 4, 2023

Add flow job failure handling for certificate workers #511

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor DNS away from flows, deal with errors #419

Refactor DNS away from flows, deal with errors #419

humphd commented Mar 22, 2023

Genne23v left a comment

Genne23v Mar 23, 2023

humphd Mar 23, 2023

Genne23v Mar 23, 2023

humphd Mar 23, 2023

Genne23v Mar 23, 2023

humphd commented Mar 23, 2023

Genne23v Mar 23, 2023

Genne23v Mar 23, 2023

humphd Mar 23, 2023

humphd commented Mar 23, 2023

humphd commented Mar 23, 2023

Genne23v left a comment

Refactor DNS away from flows, deal with errors #419

Refactor DNS away from flows, deal with errors #419

Conversation

humphd commented Mar 22, 2023

Genne23v left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humphd commented Mar 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humphd commented Mar 23, 2023

humphd commented Mar 23, 2023

Genne23v left a comment

Choose a reason for hiding this comment