Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

20%+ throughput regression in goldilocks scenarios #98021

Closed
MichalStrehovsky opened this issue Feb 6, 2024 · 2 comments · Fixed by #98876
Closed

20%+ throughput regression in goldilocks scenarios #98021

MichalStrehovsky opened this issue Feb 6, 2024 · 2 comments · Fixed by #98876
Assignees
Milestone

Comments

@MichalStrehovsky
Copy link
Member

See https://aka.ms/aspnet/nativeaot/benchmarks.

Stage2Aot scenario:

image

Stage1GrpcAot scenario:

image

I narrowed this down using crank to:

Last good:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.3 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.3 --application.sdkVersion 9.0.100-preview.2.24074.1

First bad:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.10 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.10 --application.sdkVersion 9.0.100-preview.2.24074.1

Here's the RPS numbers I got for the various builds with 3 attempts each:

24074.1 24074.3 24074.10
222,358 214,822 187,955
215,877 216,408 157,982
223,201 215,822 215,307

This puts the regression between builds 24074.3 and 24074.10.

That corresponds to this commit range:

96dd3d1...b47fdea

My suspect is #97227 that did changes to various waits on native AOT. Cc @kouvel @VSadov @sebastienros

@ghost
Copy link

ghost commented Feb 6, 2024

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

See https://aka.ms/aspnet/nativeaot/benchmarks.

Stage2Aot scenario:

image

Stage1GrpcAot scenario:

image

I narrowed this down using crank to:

Last good:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.3 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.3 --application.sdkVersion 9.0.100-preview.2.24074.1

First bad:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.10 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.10 --application.sdkVersion 9.0.100-preview.2.24074.1

Here's the RPS numbers I got for the various builds with 3 attempts each:

24074.1 24074.3 24074.10
222,358 214,822 187,955
215,877 216,408 157,982
223,201 215,822 215,307

This puts the regression between builds 24074.3 and 24074.10.

That corresponds to this commit range:

96dd3d1...b47fdea

My suspect is #97227 that did changes to various waits on native AOT. Cc @kouvel @VSadov @sebastienros

Author: MichalStrehovsky
Assignees: -
Labels:

area-NativeAOT-coreclr

Milestone: 9.0.0

@jkotas
Copy link
Member

jkotas commented Feb 23, 2024

Fixed by #98867

@jkotas jkotas closed this as completed Feb 23, 2024
kouvel added a commit to kouvel/runtime that referenced this issue Feb 23, 2024
PR dotnet#97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.

Fixes dotnet#98021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 23, 2024
kouvel added a commit to kouvel/runtime that referenced this issue Mar 1, 2024
PR dotnet#97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.

Fixes dotnet#98021
kouvel added a commit to kouvel/runtime that referenced this issue Mar 1, 2024
PR dotnet#97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.

Fixes dotnet#98021
kouvel added a commit that referenced this issue Mar 7, 2024
)

* Reapply "Add an internal mode to `Lock` to have it use non-alertable waits (#97227)" (#98867)

This reverts commit f129701.

* Fix Lock's waiter duration computation

PR #97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time.

Fixes #98021

* Fix wraparound issue

* Fix recording waiter start time

* Use a bit in the _state field instead
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 7, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
3 participants