Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Implement ACME RFC 8555, challenge retries #181

Closed
wants to merge 7 commits into from

Conversation

wesgraham
Copy link

@wesgraham wesgraham commented Feb 7, 2020

Summary

(Fixes #168)

Implements section 8.2 in the ACME protocol spec. Describes how client and server retries should be handled during a challenge validation.

Specifically, automates server validation retries, throttles retry attempts, includes error information about the result of each attempted validation, and sets appropriate retry-after headers on the challenge resource response.

@wesgraham
Copy link
Author

wesgraham commented Feb 7, 2020

Hi all,
Note the code written in this PR thus far only satisfies the MUST’s of the retry protocol. As I’ve implemented this a couple of thoughts have come to mind. Briefly leaving some concerns still to be resolved before merging - would appreciate any thoughts:

  1. We can make GetChallenge() or ValidateChallenge() implement retries server-side automatically. I can add retry state to the challenge object, however I’m not clear what the intended flow would be
    I.e. if GetChallenge() is called on a challenge whose state is “retrying”, will the challenge object reset its retry state, or will it block that request until the retry resolves?

  2. Updating state to “invalid” after predetermined retry attempts has been omitted thus far. Was hoping for clarity on the team's desired policy. The protocol simply states to mark challenges invalid when the server has “given up”.

Will also write unit tests after cementing desired steps :)

@codecov-io
Copy link

codecov-io commented Feb 7, 2020

Codecov Report

Merging #181 into master will decrease coverage by 0.28%.
The diff coverage is 73.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #181      +/-   ##
==========================================
- Coverage   73.85%   73.56%   -0.29%     
==========================================
  Files          75       69       -6     
  Lines        8501     8041     -460     
==========================================
- Hits         6278     5915     -363     
+ Misses       1899     1814      -85     
+ Partials      324      312      -12
Impacted Files Coverage Δ
acme/api/handler.go 80% <50%> (-3.04%) ⬇️
acme/challenge.go 81.77% <83.33%> (+3.16%) ⬆️
authority/authority.go 53.48% <0%> (-4.49%) ⬇️
authority/tls.go 71.42% <0%> (-1.75%) ⬇️
acme/authz.go 88.95% <0%> (-0.5%) ⬇️
ca/provisioner.go 86.77% <0%> (ø) ⬆️
kms/cloudkms/signer.go
kms/kms.go
kms/apiv1/requests.go
kms/apiv1/options.go
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e882fa...4575049. Read the comment docs.

@dcow
Copy link
Contributor

dcow commented Feb 10, 2020

  1. We can make GetChallenge() or ValidateChallenge() implement retries server-side automatically. I can add retry state to the challenge object, however I’m not clear what the intended flow would be
    I.e. if GetChallenge() is called on a challenge whose state is “retrying”, will the challenge object reset its retry state, or will it block that request until the retry resolves?

If the client calls retry, rate limit permitting, the server should reset its retry state, and clear the invalid challenge state, and then begin the challenge process anew.

  1. Updating state to “invalid” after predetermined retry attempts has been omitted thus far. Was hoping for clarity on the team's desired policy. The protocol simply states to mark challenges invalid when the server has “given up”.

I'd say a 5 minute exponential backoff retry mechanism is probably fine. After the timeout, the state becomes "invalid", yes.

@wesgraham wesgraham changed the title Implement ACME RFC 8555, challenge retries [RFC] Implement ACME RFC 8555, challenge retries Feb 14, 2020
@wesgraham
Copy link
Author

Committed an approach to automated retries, would appreciate any feedback.
Note the approach is failing some test cases due to a segfault in challenge.go's validate() method (specifically when status is already invalid, and we attempt to call upd.save()). If anyone had any ideas as to why this is that would be appreciated.

Copy link
Contributor

@dcow dcow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Few questions (:

acme/api/handler_test.go Show resolved Hide resolved
acme/api/handler_test.go Outdated Show resolved Hide resolved
acme/api/handler_test.go Show resolved Hide resolved
acme/challenge.go Outdated Show resolved Hide resolved
acme/challenge.go Outdated Show resolved Hide resolved
acme/challenge.go Show resolved Hide resolved
if err != nil {
return nil, Wrap(err, "error attempting challenge validation")

for i:=0; i < 10; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to loop here until the challenge's status lands on a terminal value or while the retry.Active field is not true rather than counting to 10? The challenge's respective validate funcs already count the number of retries and then declare the challenge invalid after the attempts has expired. Not using a fixed 10 would also allow for certain types of challenges to define longer retry periods. However it means a faulty challenge implementation would hang this loop forever. Another option would be to pull the logic out of the challenge validation funcs and just centralize it here, updating the status to invalid only after 10 tries. What were your thoughts while implementing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was similar - I was trying to avoid a scenario where multiple calls to ValidateChallenge() would loop forever. I think "while retry.active" should get the job done, especially with the lock on that loop. The for range(10) was implemented just in case the retry object state was being modified by any other objects, but I'm seeing that is highly unlikely.

Copy link
Author

@wesgraham wesgraham Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented with while retry.Active

upd.Status = StatusInvalid
upd.Retry.Backoffs *= 2
upd.Retry.Active = false
upd.Retry.Called = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of zeroing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Called" to me represented how many times the challenge validation retry was performed on a specific client call. Resetting to 0 was designed to signal that the current process retrying validation had terminated, and the next retry process should start. Called is also used in computing the retry-after header (alongside the backoffs count).

Copy link
Author

@wesgraham wesgraham Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented an updated version of this in the latest commit - let me know of any thoughts

acme/challenge.go Outdated Show resolved Hide resolved
acme/authority.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dcow dcow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good. What's up with all the additional/unrelated removals, though? Would rebasing the branch help ensure we're not reverting anything with a merge?

return hc, nil
}
if hc.getStatus() == StatusInvalid {
// TODO: Resolve segfault on upd.save
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still happening?

@@ -615,3 +678,20 @@ func getChallenge(db nosql.DB, id string) (challenge, error) {
}
return ch, nil
}

// iterateRetry iterates a challenge's retry and error objects upon a failed validation attempt
func (bc *baseChallenge) iterateRetry(db nosql.DB, error *Error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean increment retry? (:

@dcow
Copy link
Contributor

dcow commented Apr 30, 2020

I'm picking this up #242

@dcow dcow closed this Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement RFC8555 (ACME spec) § 8.2
4 participants