Skip to content

Commit

Permalink
Reduce flakiness on workflow-ID-specific ratelimit test (#5986)
Browse files Browse the repository at this point in the history
This test had three issues:
1. it's *very* time-sensitive, spending >=200ms across all StartWorkflowExecution calls will allow one (or more) of the "should be limited" calls to succeed.  This is now more permissive.
2. it seems to misunderstand / be misleading about how ratelimits work.  The reason the first 5 calls can be done "immediately" is due to the burst value, not the RPS itself.  We just init the burst with that value.
3. if `assert.ErrorAs` failed, the next line would panic because the error value was nil.  `require` would work too, but switching to `if assert.ErrorAs(...) {...}` lets the rest of the checks continue, and it's relatively simple.

These are now fixed.
  • Loading branch information
Groxx authored May 8, 2024
1 parent 4db60d9 commit da5107b
Showing 1 changed file with 13 additions and 5 deletions.
18 changes: 13 additions & 5 deletions host/workflowidratelimit_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -127,21 +127,29 @@ func (s *WorkflowIDRateLimitIntegrationSuite) TestWorkflowIDSpecificRateLimits()
ctx, cancel := createContext()
defer cancel()

// The ratelimit is 5 per second, so we should be able to start 5 workflows without any error
// The ratelimit is 5 per second with a burst of 5, so we should be able to start 5 workflows without any error
for i := 0; i < 5; i++ {
_, err := s.engine.StartWorkflowExecution(ctx, request)
assert.NoError(s.T(), err)
}

// Now we should get a rate limit error
// Now we should get a rate limit error (with some fuzziness for time passing)
limited := 0
for i := 0; i < 5; i++ {
_, err := s.engine.StartWorkflowExecution(ctx, request)
var busyErr *types.ServiceBusyError
assert.ErrorAs(s.T(), err, &busyErr)
assert.Equal(s.T(), common.WorkflowIDRateLimitReason, busyErr.Reason)
if err != nil {
if assert.ErrorAs(s.T(), err, &busyErr) {
limited++
assert.Equal(s.T(), common.WorkflowIDRateLimitReason, busyErr.Reason)
}
}
}
// 5 fails occasionally, trying 4. If needed, reduce to 3 or find a way to
// make this test less sensitive to latency, as test-runner hosts vary a lot.
assert.GreaterOrEqual(s.T(), limited, 4, "should have encountered some rate-limit errors after the burst was exhausted")

// After 1 second we should be able to start another workflow
// After 1 second (200ms at a minimum) we should be able to start more workflows without being limited
time.Sleep(1 * time.Second)
_, err := s.engine.StartWorkflowExecution(ctx, request)
assert.NoError(s.T(), err)
Expand Down

0 comments on commit da5107b

Please sign in to comment.