20%+ throughput regression in goldilocks scenarios #98021

MichalStrehovsky · 2024-02-06T06:22:28Z

See https://aka.ms/aspnet/nativeaot/benchmarks.

Stage2Aot scenario:

Stage1GrpcAot scenario:

I narrowed this down using crank to:

Last good:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.3 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.3 --application.sdkVersion 9.0.100-preview.2.24074.1

First bad:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.10 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.10 --application.sdkVersion 9.0.100-preview.2.24074.1

Here's the RPS numbers I got for the various builds with 3 attempts each:

24074.1	24074.3	24074.10
222,358	214,822	187,955
215,877	216,408	157,982
223,201	215,822	215,307

This puts the regression between builds 24074.3 and 24074.10.

That corresponds to this commit range:

96dd3d1...b47fdea

My suspect is #97227 that did changes to various waits on native AOT. Cc @kouvel @VSadov @sebastienros

The text was updated successfully, but these errors were encountered:

ghost · 2024-02-06T06:22:32Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

See https://aka.ms/aspnet/nativeaot/benchmarks.

Stage2Aot scenario:

Stage1GrpcAot scenario:

I narrowed this down using crank to:

Last good:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.3 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.3 --application.sdkVersion 9.0.100-preview.2.24074.1

First bad:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --scenario todosapipublishaot --profile intel-lin-app --profile intel-load-load --profile intel-db-db --application.packageReferences Microsoft.DotNet.ILCompiler=9.0.0-preview.2.24074.10 --application.environmentVariables DOTNET_GCDynamicAdaptationMode=1 --application.framework net9.0 --application.options.collectCounters true --application.aspNetCoreVersion 9.0.0-preview.2.24073.3 --application.runtimeVersion 9.0.0-preview.2.24074.10 --application.sdkVersion 9.0.100-preview.2.24074.1

Here's the RPS numbers I got for the various builds with 3 attempts each:

24074.1	24074.3	24074.10
222,358	214,822	187,955
215,877	216,408	157,982
223,201	215,822	215,307

This puts the regression between builds 24074.3 and 24074.10.

That corresponds to this commit range:

96dd3d1...b47fdea

My suspect is #97227 that did changes to various waits on native AOT. Cc @kouvel @VSadov @sebastienros

Author:	MichalStrehovsky
Assignees:	-
Labels:	`area-NativeAOT-coreclr`
Milestone:	9.0.0

jkotas · 2024-02-23T22:06:27Z

Fixed by #98867

PR dotnet#97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time. Fixes dotnet#98021

) * Reapply "Add an internal mode to `Lock` to have it use non-alertable waits (#97227)" (#98867) This reverts commit f129701. * Fix Lock's waiter duration computation PR #97227 introduced a tick count masking issue where the stored waiter start time excludes the upper bit from the ushort tick count, but comparisons with it were not doing the appropriate masking. This was leading to a lock convoy on some heavily contended locks once in a while due to waiters incorrectly appearing to have waited for a long time. Fixes #98021 * Fix wraparound issue * Fix recording waiter start time * Use a bit in the _state field instead

MichalStrehovsky added the area-NativeAOT-coreclr label Feb 6, 2024

MichalStrehovsky added this to the 9.0.0 milestone Feb 6, 2024

agocke assigned kouvel Feb 15, 2024

MichalStrehovsky mentioned this issue Feb 23, 2024

More precise writebarrier for regions #98485

Closed

agocke mentioned this issue Feb 23, 2024

Revert "Add an internal mode to Lock to have it use non-alertable w… #98867

Merged

jkotas closed this as completed Feb 23, 2024

kouvel mentioned this issue Feb 23, 2024

Reapply revert of https:/dotnet/runtime/pull/97227, fix Lock's waiter duration computation #98876

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 23, 2024

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 7, 2024

sebastienros mentioned this issue Apr 5, 2024

datas tuning fix #98743

Merged

github-actions bot locked and limited conversation to collaborators Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20%+ throughput regression in goldilocks scenarios #98021

20%+ throughput regression in goldilocks scenarios #98021

MichalStrehovsky commented Feb 6, 2024

ghost commented Feb 6, 2024

jkotas commented Feb 23, 2024

20%+ throughput regression in goldilocks scenarios #98021

20%+ throughput regression in goldilocks scenarios #98021

Comments

MichalStrehovsky commented Feb 6, 2024

ghost commented Feb 6, 2024

jkotas commented Feb 23, 2024