Reduce flakiness on workflow-ID-specific ratelimit test #5986

Groxx · 2024-05-08T02:32:51Z

This test had three issues:

it's very time-sensitive, spending >=200ms across all StartWorkflowExecution calls will allow one (or more) of the "should be limited" calls to succeed. This is now more permissive.
it seems to misunderstand / be misleading about how ratelimits work. The reason the first 5 calls can be done "immediately" is due to the burst value, not the RPS itself. We just init the burst with that value.
if assert.ErrorAs failed, the next line would panic because the error value was nil. require would work too, but switching to if assert.ErrorAs(...) {...} lets the rest of the checks continue, and it's relatively simple.

These are now fixed.

Groxx · 2024-05-08T02:40:49Z

Specifically, some tests are failing with the following.
Note both:

a nil panic
0.23s is over 200ms, and the RPS is 5, meaning enough time elapsed to allow another request.

    --- FAIL: TestWorkflowIDRateLimitIntegrationSuite/TestWorkflowIDSpecificRateLimits (0.23s)
        workflowidratelimit_test.go:140:
            	Error Trace:	/cadence/host/workflowidratelimit_test.go:140
            	Error:      	Should be in error chain:
            	            	expected: %!q(**types.ServiceBusyError=0xc000e44b88)
            	            	in chain:
            	Test:       	TestWorkflowIDRateLimitIntegrationSuite/TestWorkflowIDSpecificRateLimits
        suite.go:87: test panicked: runtime error: invalid memory address or nil pointer dereference
            goroutine 87550 [running]:
            runtime/debug.Stack()
            	/usr/local/go/src/runtime/debug/stack.go:24 +0x72
            github.com/stretchr/testify/suite.failOnPanic(0xc00673aea0, {0x456b360, 0x7c675e0})
            	/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:87 +0x45
            github.com/stretchr/testify/suite.Run.func1.1()
            	/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:183 +0x34c
            panic({0x456b360, 0x7c675e0})
            	/usr/local/go/src/runtime/panic.go:890 +0x263
            github.com/uber/cadence/host.(*WorkflowIDRateLimitIntegrationSuite).TestWorkflowIDSpecificRateLimits(0xc004c882c0)
            	/cadence/host/workflowidratelimit_test.go:141 +0x8d0
            reflect.Value.call({0xc003fe96c0?, 0xc004ebb100?, 0x13?}, {0x49cb58c, 0x4}, {0xc0033cddb8, 0x1, 0x1?})
            	/usr/local/go/src/reflect/value.go:586 +0x13aa
            reflect.Value.Call({0xc003fe96c0?, 0xc004ebb100?, 0xc004c882c0?}, {0xc0031dfdb8, 0x1, 0x1})
            	/usr/local/go/src/reflect/value.go:370 +0xc8
            github.com/stretchr/testify/suite.Run.func1(0xc00673aea0)
            	/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:197 +0x70e
            testing.tRunner(0xc00673aea0, 0xc002a8a870)
            	/usr/local/go/src/testing/testing.go:1576 +0x217
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:1629 +0x806

codecov · 2024-05-08T02:46:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.66%. Comparing base (4db60d9) to head (99782f7).

Additional details and impacted files

see 4 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4db60d9...99782f7. Read the comment docs.

coveralls · 2024-05-08T03:03:15Z

Pull Request Test Coverage Report for Build 018f5614-bc7b-44eb-a2cf-dea6e7a9db0a

Details

0 of 0 changed or added relevant lines in 0 files are covered.
30 unchanged lines in 10 files lost coverage.
Overall coverage decreased (-0.001%) to 68.461%

Files with Coverage Reduction	New Missed Lines	%
common/task/weighted_round_robin_task_scheduler.go	1	89.05%
service/matching/taskReader.go	2	83.55%
common/persistence/execution_manager.go	2	83.74%
service/matching/taskListManager.go	2	80.65%
service/matching/matcher.go	2	90.72%
common/membership/hashring.go	2	84.69%
common/persistence/historyManager.go	2	66.67%
common/task/fifo_task_scheduler.go	3	84.54%
common/persistence/statsComputer.go	5	96.07%
service/history/shard/context.go	9	69.08%

Totals
Change from base Build 018f5556-26ed-4210-a4ed-49299450aae8:	-0.001%
Covered Lines:	100555
Relevant Lines:	146880

💛 - Coveralls

This test had three issues: 1. it's *very* time-sensitive, spending >=200ms across all StartWorkflowExecution calls will allow one (or more) of the "should be limited" calls to succeed. This is now more permissive. 2. it seems to misunderstand / be misleading about how ratelimits work. The reason the first 5 calls can be done "immediately" is due to the burst value, not the RPS itself. We just init the burst with that value. 3. if `assert.ErrorAs` failed, the next line would panic because the error value was nil. `require` would work too, but switching to `if assert.ErrorAs(...) {...}` lets the rest of the checks continue, and it's relatively simple. These are now fixed.

Reduce flakiness on workflow-ID-specific ratelimit test

f338796

Groxx requested review from Shaddoll, neil-xie, davidporter-id-au, shijiesheng, agautam478, jakobht, 3vilhamster, sankari165, dkrotx, taylanisikdemir and demirkayaender as code owners May 8, 2024 02:32

raising to 4, seems stable enough that it might be sufficient

45a9aee

more specific comment

99782f7

jakobht approved these changes May 8, 2024

View reviewed changes

Groxx merged commit da5107b into uber:master May 8, 2024
20 checks passed

Groxx deleted the ratelimit-test-fix branch May 8, 2024 04:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce flakiness on workflow-ID-specific ratelimit test #5986

Reduce flakiness on workflow-ID-specific ratelimit test #5986

Groxx commented May 8, 2024 •

edited

Loading

Groxx commented May 8, 2024 •

edited

Loading

codecov bot commented May 8, 2024 •

edited

Loading

coveralls commented May 8, 2024

Reduce flakiness on workflow-ID-specific ratelimit test #5986

Reduce flakiness on workflow-ID-specific ratelimit test #5986

Conversation

Groxx commented May 8, 2024 • edited Loading

Groxx commented May 8, 2024 • edited Loading

codecov bot commented May 8, 2024 • edited Loading

Codecov Report

coveralls commented May 8, 2024

Pull Request Test Coverage Report for Build 018f5614-bc7b-44eb-a2cf-dea6e7a9db0a

Details

💛 - Coveralls

Groxx commented May 8, 2024 •

edited

Loading

Groxx commented May 8, 2024 •

edited

Loading

codecov bot commented May 8, 2024 •

edited

Loading