Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce flakiness on workflow-ID-specific ratelimit test #5986

Merged
merged 3 commits into from
May 8, 2024

Conversation

Groxx
Copy link
Contributor

@Groxx Groxx commented May 8, 2024

This test had three issues:

  1. it's very time-sensitive, spending >=200ms across all StartWorkflowExecution calls will allow one (or more) of the "should be limited" calls to succeed. This is now more permissive.
  2. it seems to misunderstand / be misleading about how ratelimits work. The reason the first 5 calls can be done "immediately" is due to the burst value, not the RPS itself. We just init the burst with that value.
  3. if assert.ErrorAs failed, the next line would panic because the error value was nil. require would work too, but switching to if assert.ErrorAs(...) {...} lets the rest of the checks continue, and it's relatively simple.

These are now fixed.

@Groxx
Copy link
Contributor Author

Groxx commented May 8, 2024

Specifically, some tests are failing with the following.
Note both:

  • a nil panic
  • 0.23s is over 200ms, and the RPS is 5, meaning enough time elapsed to allow another request.
    --- FAIL: TestWorkflowIDRateLimitIntegrationSuite/TestWorkflowIDSpecificRateLimits (0.23s)
        workflowidratelimit_test.go:140:
            	Error Trace:	/cadence/host/workflowidratelimit_test.go:140
            	Error:      	Should be in error chain:
            	            	expected: %!q(**types.ServiceBusyError=0xc000e44b88)
            	            	in chain:
            	Test:       	TestWorkflowIDRateLimitIntegrationSuite/TestWorkflowIDSpecificRateLimits
        suite.go:87: test panicked: runtime error: invalid memory address or nil pointer dereference
            goroutine 87550 [running]:
            runtime/debug.Stack()
            	/usr/local/go/src/runtime/debug/stack.go:24 +0x72
            github.com/stretchr/testify/suite.failOnPanic(0xc00673aea0, {0x456b360, 0x7c675e0})
            	/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:87 +0x45
            github.com/stretchr/testify/suite.Run.func1.1()
            	/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:183 +0x34c
            panic({0x456b360, 0x7c675e0})
            	/usr/local/go/src/runtime/panic.go:890 +0x263
            github.com/uber/cadence/host.(*WorkflowIDRateLimitIntegrationSuite).TestWorkflowIDSpecificRateLimits(0xc004c882c0)
            	/cadence/host/workflowidratelimit_test.go:141 +0x8d0
            reflect.Value.call({0xc003fe96c0?, 0xc004ebb100?, 0x13?}, {0x49cb58c, 0x4}, {0xc0033cddb8, 0x1, 0x1?})
            	/usr/local/go/src/reflect/value.go:586 +0x13aa
            reflect.Value.Call({0xc003fe96c0?, 0xc004ebb100?, 0xc004c882c0?}, {0xc0031dfdb8, 0x1, 0x1})
            	/usr/local/go/src/reflect/value.go:370 +0xc8
            github.com/stretchr/testify/suite.Run.func1(0xc00673aea0)
            	/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:197 +0x70e
            testing.tRunner(0xc00673aea0, 0xc002a8a870)
            	/usr/local/go/src/testing/testing.go:1576 +0x217
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:1629 +0x806

Copy link

codecov bot commented May 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.66%. Comparing base (4db60d9) to head (99782f7).

Additional details and impacted files

see 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4db60d9...99782f7. Read the comment docs.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 018f5614-bc7b-44eb-a2cf-dea6e7a9db0a

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 30 unchanged lines in 10 files lost coverage.
  • Overall coverage decreased (-0.001%) to 68.461%

Files with Coverage Reduction New Missed Lines %
common/task/weighted_round_robin_task_scheduler.go 1 89.05%
service/matching/taskReader.go 2 83.55%
common/persistence/execution_manager.go 2 83.74%
service/matching/taskListManager.go 2 80.65%
service/matching/matcher.go 2 90.72%
common/membership/hashring.go 2 84.69%
common/persistence/historyManager.go 2 66.67%
common/task/fifo_task_scheduler.go 3 84.54%
common/persistence/statsComputer.go 5 96.07%
service/history/shard/context.go 9 69.08%
Totals Coverage Status
Change from base Build 018f5556-26ed-4210-a4ed-49299450aae8: -0.001%
Covered Lines: 100555
Relevant Lines: 146880

💛 - Coveralls

@Groxx Groxx merged commit da5107b into uber:master May 8, 2024
20 checks passed
@Groxx Groxx deleted the ratelimit-test-fix branch May 8, 2024 04:39
agautam478 pushed a commit to agautam478/cadence that referenced this pull request May 8, 2024
This test had three issues:
1. it's *very* time-sensitive, spending >=200ms across all StartWorkflowExecution calls will allow one (or more) of the "should be limited" calls to succeed.  This is now more permissive.
2. it seems to misunderstand / be misleading about how ratelimits work.  The reason the first 5 calls can be done "immediately" is due to the burst value, not the RPS itself.  We just init the burst with that value.
3. if `assert.ErrorAs` failed, the next line would panic because the error value was nil.  `require` would work too, but switching to `if assert.ErrorAs(...) {...}` lets the rest of the checks continue, and it's relatively simple.

These are now fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants