[RFC] Implement ACME RFC 8555, challenge retries #181

wesgraham · 2020-02-07T23:41:55Z

Summary

(Fixes #168)

Implements section 8.2 in the ACME protocol spec. Describes how client and server retries should be handled during a challenge validation.

Specifically, automates server validation retries, throttles retry attempts, includes error information about the result of each attempted validation, and sets appropriate retry-after headers on the challenge resource response.

wesgraham · 2020-02-07T23:43:58Z

Hi all,
Note the code written in this PR thus far only satisfies the MUST’s of the retry protocol. As I’ve implemented this a couple of thoughts have come to mind. Briefly leaving some concerns still to be resolved before merging - would appreciate any thoughts:

We can make GetChallenge() or ValidateChallenge() implement retries server-side automatically. I can add retry state to the challenge object, however I’m not clear what the intended flow would be
I.e. if GetChallenge() is called on a challenge whose state is “retrying”, will the challenge object reset its retry state, or will it block that request until the retry resolves?
Updating state to “invalid” after predetermined retry attempts has been omitted thus far. Was hoping for clarity on the team's desired policy. The protocol simply states to mark challenges invalid when the server has “given up”.

Will also write unit tests after cementing desired steps :)

codecov-io · 2020-02-07T23:46:20Z

Codecov Report

Merging #181 into master will decrease coverage by 0.28%.
The diff coverage is 73.07%.

@@            Coverage Diff             @@
##           master     #181      +/-   ##
==========================================
- Coverage   73.85%   73.56%   -0.29%     
==========================================
  Files          75       69       -6     
  Lines        8501     8041     -460     
==========================================
- Hits         6278     5915     -363     
+ Misses       1899     1814      -85     
+ Partials      324      312      -12

Impacted Files	Coverage Δ
acme/api/handler.go	`80% <50%> (-3.04%)`	⬇️
acme/challenge.go	`81.77% <83.33%> (+3.16%)`	⬆️
authority/authority.go	`53.48% <0%> (-4.49%)`	⬇️
authority/tls.go	`71.42% <0%> (-1.75%)`	⬇️
acme/authz.go	`88.95% <0%> (-0.5%)`	⬇️
ca/provisioner.go	`86.77% <0%> (ø)`	⬆️
kms/cloudkms/signer.go
kms/kms.go
kms/apiv1/requests.go
kms/apiv1/options.go
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e882fa...4575049. Read the comment docs.

dcow · 2020-02-10T16:51:26Z

We can make GetChallenge() or ValidateChallenge() implement retries server-side automatically. I can add retry state to the challenge object, however I’m not clear what the intended flow would be
I.e. if GetChallenge() is called on a challenge whose state is “retrying”, will the challenge object reset its retry state, or will it block that request until the retry resolves?

If the client calls retry, rate limit permitting, the server should reset its retry state, and clear the invalid challenge state, and then begin the challenge process anew.

Updating state to “invalid” after predetermined retry attempts has been omitted thus far. Was hoping for clarity on the team's desired policy. The protocol simply states to mark challenges invalid when the server has “given up”.

I'd say a 5 minute exponential backoff retry mechanism is probably fine. After the timeout, the state becomes "invalid", yes.

wesgraham · 2020-02-14T21:21:00Z

Committed an approach to automated retries, would appreciate any feedback.
Note the approach is failing some test cases due to a segfault in challenge.go's validate() method (specifically when status is already invalid, and we attempt to call upd.save()). If anyone had any ideas as to why this is that would be appreciated.

dcow

Looking good so far. Few questions (:

acme/api/handler_test.go

acme/challenge.go

dcow · 2020-02-18T03:48:46Z

acme/authority.go

- if err != nil {
- return nil, Wrap(err, "error attempting challenge validation")
+
+ for i:=0; i < 10; i++ {


Do we want to loop here until the challenge's status lands on a terminal value or while the retry.Active field is not true rather than counting to 10? The challenge's respective validate funcs already count the number of retries and then declare the challenge invalid after the attempts has expired. Not using a fixed 10 would also allow for certain types of challenges to define longer retry periods. However it means a faulty challenge implementation would hang this loop forever. Another option would be to pull the logic out of the challenge validation funcs and just centralize it here, updating the status to invalid only after 10 tries. What were your thoughts while implementing this?

My initial thought was similar - I was trying to avoid a scenario where multiple calls to ValidateChallenge() would loop forever. I think "while retry.active" should get the job done, especially with the lock on that loop. The for range(10) was implemented just in case the retry object state was being modified by any other objects, but I'm seeing that is highly unlikely.

Implemented with while retry.Active

dcow · 2020-02-18T03:55:19Z

acme/challenge.go

+ upd.Status = StatusInvalid
+ upd.Retry.Backoffs *= 2
+ upd.Retry.Active = false
+ upd.Retry.Called = 0


What's the point of zeroing this?

"Called" to me represented how many times the challenge validation retry was performed on a specific client call. Resetting to 0 was designed to signal that the current process retrying validation had terminated, and the next retry process should start. Called is also used in computing the retry-after header (alongside the backoffs count).

Implemented an updated version of this in the latest commit - let me know of any thoughts

acme/challenge.go

acme/authority.go

dcow

This is looking really good. What's up with all the additional/unrelated removals, though? Would rebasing the branch help ensure we're not reverting anything with a merge?

dcow · 2020-02-27T17:52:49Z

acme/challenge.go

+ return hc, nil
+ }
+ if hc.getStatus() == StatusInvalid {
+ // TODO: Resolve segfault on upd.save


Is this still happening?

dcow · 2020-02-27T17:55:47Z

acme/challenge.go

@@ -615,3 +678,20 @@ func getChallenge(db nosql.DB, id string) (challenge, error) {
 }
 return ch, nil
 }
+
+// iterateRetry iterates a challenge's retry and error objects upon a failed validation attempt
+func (bc *baseChallenge) iterateRetry(db nosql.DB, error *Error) error {


Do you mean increment retry? (:

dcow · 2020-04-30T11:52:14Z

I'm picking this up #242

Implement acme RFC 8555, challenge retries

00a8ffa

wesgraham and others added 2 commits February 14, 2020 13:17

Add automated challenge retries, RFC 8555

0c2592d

Merge branch 'master' into wesley/Retry

995db67

wesgraham changed the title ~~Implement ACME RFC 8555, challenge retries~~ [RFC] Implement ACME RFC 8555, challenge retries Feb 14, 2020

dcow suggested changes Feb 18, 2020

View reviewed changes

wesgraham added 4 commits February 18, 2020 17:48

Polish retry conditions

e44c962

Add retry support for TLSALPN challenge

72d1b50

Resolve merge conflicts with master

e03f5aa

Implement standard backoff strategy

4575049

dcow reviewed Feb 27, 2020

View reviewed changes

sourishkrout added this to the v0.15.0 milestone Apr 16, 2020

sourishkrout assigned dcow Apr 16, 2020

sourishkrout added the area/acme ACME label Apr 29, 2020

This was referenced Apr 30, 2020

Dcow/rebase retry #240

Closed

Dcow/challenge retry #241

Closed

ACME (RFC 8555) § 8.2 Challenge Retries #242

Open

dcow closed this Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Implement ACME RFC 8555, challenge retries #181

[RFC] Implement ACME RFC 8555, challenge retries #181

wesgraham commented Feb 7, 2020 •

edited

Loading

wesgraham commented Feb 7, 2020 •

edited

Loading

codecov-io commented Feb 7, 2020 •

edited

Loading

dcow commented Feb 10, 2020 •

edited

Loading

wesgraham commented Feb 14, 2020

dcow left a comment

dcow Feb 18, 2020

wesgraham Feb 19, 2020

wesgraham Feb 19, 2020 •

edited

Loading

dcow Feb 18, 2020

wesgraham Feb 19, 2020

wesgraham Feb 19, 2020 •

edited

Loading

dcow left a comment

dcow Feb 27, 2020

dcow Feb 27, 2020

dcow commented Apr 30, 2020

[RFC] Implement ACME RFC 8555, challenge retries #181

[RFC] Implement ACME RFC 8555, challenge retries #181

Conversation

wesgraham commented Feb 7, 2020 • edited Loading

Summary

wesgraham commented Feb 7, 2020 • edited Loading

codecov-io commented Feb 7, 2020 • edited Loading

Codecov Report

dcow commented Feb 10, 2020 • edited Loading

wesgraham commented Feb 14, 2020

dcow left a comment

Choose a reason for hiding this comment

dcow Feb 18, 2020

Choose a reason for hiding this comment

wesgraham Feb 19, 2020

Choose a reason for hiding this comment

wesgraham Feb 19, 2020 • edited Loading

Choose a reason for hiding this comment

dcow Feb 18, 2020

Choose a reason for hiding this comment

wesgraham Feb 19, 2020

Choose a reason for hiding this comment

wesgraham Feb 19, 2020 • edited Loading

Choose a reason for hiding this comment

dcow left a comment

Choose a reason for hiding this comment

dcow Feb 27, 2020

Choose a reason for hiding this comment

dcow Feb 27, 2020

Choose a reason for hiding this comment

dcow commented Apr 30, 2020

wesgraham commented Feb 7, 2020 •

edited

Loading

wesgraham commented Feb 7, 2020 •

edited

Loading

codecov-io commented Feb 7, 2020 •

edited

Loading

dcow commented Feb 10, 2020 •

edited

Loading

wesgraham Feb 19, 2020 •

edited

Loading

wesgraham Feb 19, 2020 •

edited

Loading