Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Courier foreground worker with "kratos courier watch" #1033

Merged

Conversation

mbonnell-wish
Copy link
Contributor

@mbonnell-wish mbonnell-wish commented Feb 3, 2021

Related issue

#652 #732

Proposed changes

Following the discussion in #1024, this PR defines a new kratos courier watch command which runs the message courier in the foreground, as well as disabling the background courier by default when kratos serve is executed.
The --watch-courier flag can be passed to the serve command to execute the courier in the background as is the current behavior.

Checklist

  • I have read the contributing guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security. vulnerability, I
    confirm that I got green light (please contact
    [email protected]) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

Further comments

@mbonnell-wish mbonnell-wish marked this pull request as draft February 3, 2021 01:35
@mattbonnell
Copy link
Contributor

Hey @aeneasr, would you happen to know the source of this error?

Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error you get is caused by https:/ory/x/blob/87359d96f1cf9eb327949f693077968a50ecc283/configx/provider.go#L162-L166
Because of that, you have to add the flag to the config schema, just like it is done with e.g. the dev or help flag:

"dev": {
"type": "boolean"
},
"help": {
"type": "boolean"
},
"sqa-opt-out": {
"type": "boolean",
"default": false
},
"config": {
"type": "array",
"items": {
"type": "string"
}
}
},

We should probably adjust the flag loading, but it will be possible to put that value in the config file as well 😉

Looks good otherwise 👍

@mattbonnell
Copy link
Contributor

Thanks!

@mbonnell-wish mbonnell-wish marked this pull request as ready for review February 3, 2021 16:34
@mbonnell-wish
Copy link
Contributor Author

What would we be looking for in terms of added material in the user guide docs?

@mbonnell-wish
Copy link
Contributor Author

mbonnell-wish commented Feb 3, 2021

Hmm, I'm failing thevalidate CI job because the generated cli docs don't pass the formatting check. Is there some command i should be running other than make docs/cli?

@zepatrik
Copy link
Member

zepatrik commented Feb 4, 2021

make format will format everything, cd docs; npm run format only the docs 😉

@aeneasr
Copy link
Member

aeneasr commented Feb 5, 2021

I believe you need to update the e2e config to reflect the new changes as the mail tests are probably failing!

https:/ory/kratos/blob/master/test/e2e/run.sh#L141

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I think it would make sense to explain this in the docs a bit:

Thank you!

Run: func(cmd *cobra.Command, args []string) {
d := driver.New(cmd.Context(), configx.WithFlags(cmd.Flags()))

go ServeHealth(d, cmd, args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having health and metrics servers is honorable :) But we don't really need them IMO. It would make sense if this would handle HTTP API requests. Then, the LB needs to know if the service is up (health) and prometheus would want to know how much latency we have (for example).

So in my opinion, this can be removed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Yeah, after doing some more reading, I think we can do without liveness probes in this case, and just rely on k8s restarting the pod if the courier crashes. I'll remove this

On the metrics front, you're quite right that latency metrics wouldn't be very relevant here since it isn't handling HTTP requests, but we would like to be able to monitor the go runtime related metrics, particularly since we'd like to be able to measure how well it's performing as we add load and potentially tune things like the number of concurrent goroutines and batch fetch size in response to our observations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will work on these additional docs today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok draft of doc updates are in, take a read and don't hesitate to suggest improvements

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for your contribution! This looks pretty good and I have some ideas how to improve it further :)

cmd/courier/watch.go Outdated Show resolved Hide resolved
Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Now we need a bit of testing and we're good to go! :)

driver/config/config.go Outdated Show resolved Hide resolved
@@ -0,0 +1,82 @@
package courier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test to see if this works (e.g. health port) as expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much needed. Where would the best place be to add it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options: you extract as much from the cobra handler as possible (e.g. as a function) and then set it up, or you use the cobra test framework to run the command.

I would opt for the first option. Regarding courier tests and the SMTP integration, check out:

Cobra commands can be controlled from tests using:

import "github.com/ory/x/cmdx"

// ...
ce := &cmdx.CommandExecuter{New: func() *cobra.Command {
		return cmd.RootCmd // might need to be exported in kratos
	}, Ctx: ctx}

ce.ExecNoErr(t, "your", "command")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests in cmd/courier/watch_test.go, let me know what you think!

@mbonnell-wish
Copy link
Contributor Author

hm, the GO111MODULE=off go get github.com/mattn/goveralls github.com/ory/go-acc test job is failing with

cannot find package "github.com/hashicorp/hcl/hcl/printer" in any of:
	/usr/local/go/src/github.com/hashicorp/hcl/hcl/printer (from $GOROOT)
	/go/src/github.com/hashicorp/hcl/hcl/printer (from $GOPATH)

Exited with code exit status 1

Can't imagine it's related to my PR

}
}()
<-ctx.Done()
return server.Shutdown(ctx)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this so the server can be shutdown by cancelling the context (eg in unit tests).
Let me know if there's a better way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks ok

@aeneasr
Copy link
Member

aeneasr commented Feb 12, 2021

@mattbonnell , great work! The current problem for the tests is not your fault. I'll be trying to fix that now!

@aeneasr aeneasr changed the base branch from master to courier-foreground-worker February 12, 2021 13:58
@aeneasr aeneasr merged commit b52296c into ory:courier-foreground-worker Feb 12, 2021
aeneasr added a commit that referenced this pull request Feb 12, 2021
BREACKING CHANGES: This patch moves the courier watcher (responsible for sending mail) to its own foreground worker, which can be executed as a, for example, Kubernetes job.

It is still possible to have the previous behaviour which would run the worker as a background task when running `kratos serve` by using the `--watch-courier` flag.

To run the foreground worker, use `kratos courier watch -c your/config.yaml`.

Closes #1033
Closes #1024

Co-authored-by: Matt Bonnell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants