If compReturnBB is unreachable we should remove it #48740

AndyAyersMS · 2021-02-25T04:09:05Z

We create compReturnBB anticipating redirection of existing returns, but we do that before we take tail calls into account, so we may not actually redirect any returns there. Because the block is marked with BBF_DONT_REMOVE it hangs around and ends up creating a dead epilog. Here's an example; note IG06 is unreachable.

Probably not super common, but we should consider some other method of keeping the block around until morph (like an artificial ref count that we retract), or perhaps not creating it until it is needed.

; Assembly listing for method System.Type:get_Attributes():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; optimized using profile data
; rbp based frame
; fully interruptible
; with IBC profile data, edge weights are valid, and fgCalledCount is 89
; Final local variable assignments
;
;  V00 this         [V00,T00] (  6,  4   )     ref  ->  rax         this class-hnd
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )     int  ->  zero-ref    "guarded devirt return temp"
;* V03 tmp2         [V03    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "guarded devirt this exact temp"
;  V04 tmp3         [V04,T01] (  1,  0   )     int  ->  rax         "Single return block return value"
;
; Lcl frame size = 32

G_M30881_IG01:              ;; offset=0000H
       55                   push     rbp
       4883EC20             sub      rsp, 32
       488D6C2420           lea      rbp, [rsp+20H]
       48894D10             mov      gword ptr [rbp+10H], rcx
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_ENTER]
       488B4D10             mov      rcx, gword ptr [rbp+10H]
       488BC1               mov      rax, rcx
                                                ;; bbWeight=1    PerfScore 8.00
G_M30881_IG02:              ;; offset=0026H
       48BA48A7649DFC7F0000 mov      rdx, 0x7FFC9D64A748
       483910               cmp      qword ptr [rax], rdx
       751E                 jne      SHORT G_M30881_IG04
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_TAILCALL]
       488BC8               mov      rcx, rax
                                                ;; bbWeight=1    PerfScore 7.50
G_M30881_IG03:              ;; offset=0049H
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       E900000000           jmp      hackishModuleName:hackishMethodName()
                                                ;; bbWeight=1    PerfScore 3.00
G_M30881_IG04:              ;; offset=0053H
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_TAILCALL]
       488BC8               mov      rcx, rax
       488B00               mov      rax, qword ptr [rax]
       488B4078             mov      rax, qword ptr [rax+120]
       488B00               mov      rax, qword ptr [rax]
                                                ;; bbWeight=0    PerfScore 0.00
G_M30881_IG05:              ;; offset=0071H
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       48FFE0               rex.jmp  rax
                                                ;; bbWeight=0    PerfScore 0.00
G_M30881_IG06:              ;; offset=0079H
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_LEAVE]
       90                   nop
                                                ;; bbWeight=0    PerfScore 0.00
G_M30881_IG07:              ;; offset=008BH
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       C3                   ret

category:cq
theme:block-opts

The text was updated successfully, but these errors were encountered:

SingleAccretion · 2021-02-25T14:33:32Z

I have been looking at code that's related to this recently so posting my findings here, might be worth fixing at the same time.

There is a bug in how fgReturnCount is calculated in morph:

runtime/src/coreclr/jit/morph.cpp

Line 17024 in 1f866d2

fgReturnCount--;

Merging of returns calculates fgReturnCount "accurately", morph doesn't need to decrement here. This leads to an overflow in methods like these:

private static int Loop(int a)
{
    for (int i = 0; i < 10; i++)
    {
        if (a is 1)
            return a;
        if (a is 2)
            return a;
        if (a is 3)
            return a;
        if (a is 4)
            return a;
    }

    return 5;
}

Considering loop 0 to clone for optimizations.
Loop cloning: rejecting loop because it has 0 returns; if added to previously-existing -2 returns, would exceed the limit of 4.

(Edit: to be clear, I believe the fix for this to be the simple deletion of the decrement in morph. The reason I haven't done it yet is the fact that adding debug checking for fgReturnCount lead me to discover all the other issues with it and consider a "proper" fix).

We are safe here from the correctness perspective because nothing important uses fgReturnCount, but it'd still be nice to maintain it properly (and print it in the dump as the unsigned number it actually is). Unfortunately, it looks to be a somewhat complex undertaking with unclear benefits, as the return count is only used by cloning and unrolling to not go over the x86 encoder's hard limit on the number of epilogues.

I also know that fgReturnCount becomes incorrect today in cases where we "merge" returns from tail calls without actually merging them in morph and when removing dead return blocks.

One idea I had on how we could have our cake and eat it too was to just delete fgReturnCount and always move the return blocks out of loops. This is already the case today for the vast majority of loops, the exceptions are because of how the compaction accounts for EH, which can cause blocks that could be moved out to be missed. Presumably, BBJ_RETURNs can always be moved out as they should not be participating in EH, but I don't actually know if that's always the case in the compiler (it is the case in IL). It is also unclear to me if we have code elsewhere that could potentially be duplicating returns but forgetting to check the limit (probably not, but it could be added in the future).

AndyAyersMS · 2021-02-25T16:45:47Z

Thanks for bringing this up, @SingleAccretion.

Seems like we could either (a) decide to count properly at all times, and start checking this post-phase, or (b) decide to (re) count only when we actually need to act based on the number.

(a) may be tricky as there's a delay between when we figure out our merging strategy and do the actual merging, and as we've seen from the above, an interaction with tail calls (which can't be accounted for early, when we merge we haven't yet committed to which tail call candidates will become tails calls). So we may be temporarily over the limit, and post merge we may have fewer actual returns then we thought (or more, given the bug in morph's accounting).

(b) may be tricky as we can't be over the limit once we get to the point where we can't fix things, so at some point we have to settle on the number and stick with it and ensure we're at or under any runtime constraint.

I think your idea of (almost) always moving BBJ_RETURN out of loops is a good one -- moving does not alter the number of returns but as you note it can unblock cloning. There are potential EH constraints on moving but they should only hamper movement out of loops when there's a return inside a try inside a loop, and that's likely not something we are able/likely to clone anyways. And you can always split off the tail part of the BBJ_RETURN (past the point where any exceptions can happen) and move that.

Back when we did the return merging work I actually thought we should be more aggressive about merging and perhaps have the default stance that we always try and merge returns. Would be interesting to study the impact of something like this.

AndyAyersMS · 2021-02-26T21:21:46Z

Loop cloning: rejecting loop because it has 0 returns; if added to previously-existing -2 returns,

cc @BruceForstall for the issue with poor tracking of returns impacting cloning.

AndyAyersMS · 2022-10-17T16:20:10Z

Another thing we might consider is creating a true "final BB" that the return blocks all branch to. This block would post-dominate all the normal flow in the method, and its pred list would allow easy enumeration of all the returns. It wouldn't hold any code.

Going a bit further, we could have a second "truly final BB" that is targeted by the final BB above and by any exceptional exits and flow dead ends (no return calls, possibly heads of no-exit loops). This block would post-dominate everything in the method and could (with suitable fake IR like keep alive) be used to force things to remain alive throughout the method without doing anything "artificial". And would also be useful if we ever decide to build post-dominance.

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 25, 2021

AndyAyersMS added this to the Future milestone Feb 25, 2021

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Feb 25, 2021

BruceForstall mentioned this issue Feb 28, 2021

Improve JIT loop optimizations (.NET 6) #43549

Closed

25 tasks

AndyAyersMS mentioned this issue Mar 21, 2021

Comments and cleanup for loop cloning #49768

Merged

BruceForstall mentioned this issue Jul 6, 2021

Improve JIT loop optimizations (.NET 7) #55235

Closed

5 tasks

BruceForstall mentioned this issue Feb 15, 2022

Improve JIT loop optimizations #65342

Open

20 tasks

BruceForstall mentioned this issue Nov 9, 2023

JIT: remove BBF_DONT_REMOVE from some blocks that can be removed #94532

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If compReturnBB is unreachable we should remove it #48740

If compReturnBB is unreachable we should remove it #48740

AndyAyersMS commented Feb 25, 2021 •

edited by BruceForstall

Loading

SingleAccretion commented Feb 25, 2021 •

edited

Loading

AndyAyersMS commented Feb 25, 2021

AndyAyersMS commented Feb 26, 2021

AndyAyersMS commented Oct 17, 2022

If compReturnBB is unreachable we should remove it #48740

If compReturnBB is unreachable we should remove it #48740

Comments

AndyAyersMS commented Feb 25, 2021 • edited by BruceForstall Loading

SingleAccretion commented Feb 25, 2021 • edited Loading

AndyAyersMS commented Feb 25, 2021

AndyAyersMS commented Feb 26, 2021

AndyAyersMS commented Oct 17, 2022

AndyAyersMS commented Feb 25, 2021 •

edited by BruceForstall

Loading

SingleAccretion commented Feb 25, 2021 •

edited

Loading