Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DNS away from flows, deal with errors #419

Merged
merged 1 commit into from
Mar 23, 2023

Conversation

humphd
Copy link
Contributor

@humphd humphd commented Mar 22, 2023

Fixes #368

This refactors our DNS queue/workers to properly deal with the different stages of our DNS "pipeline." Previously we used a Flow in BullMQ, but the way that errors work between child and parent jobs meant that we couldn't properly process error cases.

This switches the architecture to use a Step Job pattern. The main reason I'm going this route is that I want to group all of the Route53 calls into a single queue, which I can limit globally (i.e., 5 per second max). By having everything in one queue, it makes it easier.

Here is what I've done:

  1. Created app/queues/dns/queue.server.ts to manage the new dns-queue. It uses QueueEvents to allow listening for failed jobs. When a job in the queue fails, we have a chance to update the database.
  2. Created app/queues/dns/worker.server.ts to define the worker. It uses a series of steps instead of our Flow, and the code progresses through these steps. Data is stored on the Job's data vs. being passed between workers. The step logic is basically the same thing @Genne23v did before, just moved to different functions.
  3. Created app/queues/dns/index.server.ts as the main API people use to work with this. It exposes methods for adding jobs to the queue. These are also using the code @Genne23v wrote before, just moving things around.

Elsewhere, I've updated all the call sites to use the new API names. I've also made some other changes that were necessary for testing:

  1. I've updated app/routes/__index/dns-records/index.tsx so that you can Delete a record when it's in the Error state. Currently you can't
  2. I've updated deleteDnsRecord to return undefined vs. null when no Change ID is returned (i.e., record already deleted in Route53). This makes the types easier to write.

To test this, try creating, update, deleting records in the UI.

@humphd humphd added the category: DNS A service about hosting domains label Mar 22, 2023
@humphd humphd requested review from Myrfion, SerpentBytes, Genne23v and a user March 22, 2023 21:26
@humphd humphd self-assigned this Mar 22, 2023
Copy link
Contributor

@Genne23v Genne23v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the big change! I left a few comments on my review.

app/models/dns-record.server.ts Outdated Show resolved Hide resolved
app/queues/dns/dns-worker.server.ts Show resolved Hide resolved
app/queues/dns/dns-worker.server.ts Outdated Show resolved Hide resolved
attempts: 3,
backoff: {
type: 'exponential',
delay: 60_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know DNS record update will take maximum 60 seconds. So I assumed most cases it will be less than 60 seconds and used much less number to finish the job as early as possible. I guess this 60 second is intended to avoid any too early retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with 10s, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 10s with 6 attempts sounds good to me as it can hit 60 seconds at max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: the way exponential backoff works, it's 2^retry * delay, so that means 10s, 20s, 80s, 160s, etc. If we did 10s and 6, it would get to 640s = 10.6 hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 10s with 3 retry reasonable

app/routes/dev.tsx Show resolved Hide resolved
@humphd humphd requested a review from Genne23v March 23, 2023 01:38
@humphd
Copy link
Contributor Author

humphd commented Mar 23, 2023

New commit up with fixes.

app/queues/dns/dns-worker.server.ts Show resolved Hide resolved
attempts: 3,
backoff: {
type: 'exponential',
delay: 60_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 10s with 6 attempts sounds good to me as it can hit 60 seconds at max.

break;
default:
return updateDnsRecordById(id, {
status: dnsStatus === 'INSYNC' ? DnsRecordStatus.active : DnsRecordStatus.error,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is expected. I returned undefined when creating a record, it updates the DB as active. I know we are going to have reconciler, but the user will see my record is available right away. Probably we should add switch by job type in waitOnChange to handle create differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what we want. When we create the DB record will be available right away, but pending. When it finishes, we update to be either active or error and the UI changes to match.

@humphd
Copy link
Contributor Author

humphd commented Mar 23, 2023

I've made the review fixes, thank you both for helping me improve this!

@humphd
Copy link
Contributor Author

humphd commented Mar 23, 2023

@SerpentBytes @Genne23v this is ready for another look

Copy link
Contributor

@Genne23v Genne23v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me!

@humphd humphd merged commit b1cac3d into DevelopingSpace:main Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: DNS A service about hosting domains
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail earlier with DNS issues
4 participants