Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usage of [AggressiveOptimization] #58209

Merged
merged 1 commit into from
Aug 27, 2021

Conversation

steveharter
Copy link
Member

@steveharter steveharter commented Aug 26, 2021

Revert the 4 usages of [AggressiveOptimization].

The 2 in ObjectDefaultConverter that were reverted have loops which cause the jitter to re-jit them anyways, so the use of [AggressiveOptimization] really shouldn't do anything.

The other 2 in JsonConverter<T> that remain do seem get up to a 3% benefit but only some of the time, and only measured against a local release build that is a non-R2R (non-crossgen2'd) STJ.dll image. In theory there shouldn't be a perf improvement since a R2R image automatically did the equivalent of [AggressiveOptimization] for all methods. When obtaining the fastest time from several runs, the benchmark (which measures serializer overhead) is faster with [AO], however the variability of these benchmarks overlap and are difficult to measure consistently. Perhaps with [AO] there is sometimes closer\faster memory locality with other methods.

Removing [AO] also has the benefit of supporting a "DynamicPGO" explicit mode that offers "best perf but slow start".

See #57327 (comment)

I don't believe the changes to remove [AO] meet the bar to port to V6 at this time, since R2R performance should be the same in the vast majority because a "DynamicPGO" mode would need to be turned on explicitly.

@steveharter steveharter added area-System.Text.Json tenet-build-performance Impacts build time: official, developer or CI labels Aug 26, 2021
@steveharter steveharter added this to the 7.0.0 milestone Aug 26, 2021
@steveharter steveharter self-assigned this Aug 26, 2021
@ghost
Copy link

ghost commented Aug 26, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Revert the 4 usages of [AggressiveOptimization].

The 2 in ObjectDefaultConverter that were reverted have loops which cause the jitter to re-jit them anyways, so the use of [AggressiveOptimization] really shouldn't do anything.

The other 2 in JsonConverter<T> that remain do seem get up to a 3% benefit but only some of the time, and only measured against a local release build that is a non-R2R (non-crossgen2'd) STJ.dll image. In theory there shouldn't be a perf improvement since a R2R image automatically did the equivalent of [AggressiveOptimization] for all methods. When obtaining the fastest time from several runes, the benchmark (which measures serializer overhead) is faster with [AO], however the variability of these benchmarks overlap and are difficult to measure consistently. Perhaps with [AO] there is sometimes closer\faster memory locality with other methods.

Removing [AO] also has the benefit of supporting a "DynamicPGO" explicit mode that offers "best perf but slow start".

See #57327 (comment)

I don't believe the changes to remove [AO] meet the bar to port to V6 at this time, since R2R performance should be the same in the vast majority because a "DynamicPGO" mode would need to be turned on explicitly.

Author: steveharter
Assignees: steveharter
Labels:

area-System.Text.Json, tenet-build-performance

Milestone: 7.0.0

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to backport to 6.0?

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes on your comments above:

  • AO prevents methods from being compiled in R2R. It also causes methods to bypass tiering and always be jitted with optimization. We generally advocate using it only if running an unoptimized or R2R version of the method is unacceptable. Because AO methods bypass tiering, we cannot gather any PGO data for these methods (neither static or dynamic).
  • By default, methods with loops will also bypass tiering and are always jitted with optimization. But this default behavior is overridden when gathering static PGO data. So it's possible once you remove AO from these loop methods and we get static PGO data for them, the non-AO versions will be faster than the AO versions. You should also see slightly faster startup behavior because of less jitting.
  • Removing AO from the non-loop methods should likewise enable static PGO data collection and potentially better perf once the static PGO collection process catches up. And also will provide a small startup benefit.
  • Generally speaking it is rare to be benchmarking R2R code; typically this will only happen if you use some custom benchmark harness or disable tiered compilation. If you benchmark the R2R code, it is typically 20% or so slower than the jitted counterpart.

@stephentoub stephentoub merged commit ba5dedd into dotnet:main Aug 27, 2021
@stephentoub
Copy link
Member

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https:/dotnet/runtime/actions/runs/1174017757

@stephentoub
Copy link
Member

@AndyAyersMS, do you know why all of these are AggressiveOptimization?
image
Only one (the async one) has a comment indicating its purpose. Maybe they should be revisited (or at least commented to say why they're necessary)?

@EgorBo
Copy link
Member

EgorBo commented Aug 28, 2021

@AndyAyersMS, do you know why all of these are AggressiveOptimization?
image
Only one (the async one) has a comment indicating its purpose. Maybe they should be revisited (or at least commented to say why they're necessary)?

My 5 cents: From my understanding there two major excuses for AggressiveOptimization to exist in BCL:

  1. Sometimes tier1 optimizations are able to eliminate allocations and we don't want some specific code to allocate even in tier0.
    (some code might stuck in a hot path in tier0 forever with DOTNET_TC_QuickJitForLoops=1 - we promote this variable for faster startup and DynamicPGO cases)

  2. The code is already heavily optimized by hands and we don't want JIT to re-order it with PGO (because PGO sometimes can optimize a method for some specific use-case during startup, but all other call-sites will regress - a good example is all functions in CastHelpers.

@EgorBo
Copy link
Member

EgorBo commented Aug 28, 2021

A good example for 1) is this code:
image

codegen for tier0 is terrible. Two allocations (we box both values) 🙂

@AndyAyersMS
Copy link
Member

@AndyAyersMS, do you know why all of these are AggressiveOptimization?

I don't, no. Probably worth reviewing.

IIRC we mostly wanted to use AO to eliminate box allocation from the async machinery, since we otherwise could see 100MB's of boxing from Tier0 code before things got rejitted.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Json tenet-build-performance Impacts build time: official, developer or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants