Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If compReturnBB is unreachable we should remove it #48740

Open
AndyAyersMS opened this issue Feb 25, 2021 · 4 comments
Open

If compReturnBB is unreachable we should remove it #48740

AndyAyersMS opened this issue Feb 25, 2021 · 4 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 25, 2021

We create compReturnBB anticipating redirection of existing returns, but we do that before we take tail calls into account, so we may not actually redirect any returns there. Because the block is marked with BBF_DONT_REMOVE it hangs around and ends up creating a dead epilog. Here's an example; note IG06 is unreachable.

Probably not super common, but we should consider some other method of keeping the block around until morph (like an artificial ref count that we retract), or perhaps not creating it until it is needed.

; Assembly listing for method System.Type:get_Attributes():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; optimized using profile data
; rbp based frame
; fully interruptible
; with IBC profile data, edge weights are valid, and fgCalledCount is 89
; Final local variable assignments
;
;  V00 this         [V00,T00] (  6,  4   )     ref  ->  rax         this class-hnd
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )     int  ->  zero-ref    "guarded devirt return temp"
;* V03 tmp2         [V03    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "guarded devirt this exact temp"
;  V04 tmp3         [V04,T01] (  1,  0   )     int  ->  rax         "Single return block return value"
;
; Lcl frame size = 32

G_M30881_IG01:              ;; offset=0000H
       55                   push     rbp
       4883EC20             sub      rsp, 32
       488D6C2420           lea      rbp, [rsp+20H]
       48894D10             mov      gword ptr [rbp+10H], rcx
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_ENTER]
       488B4D10             mov      rcx, gword ptr [rbp+10H]
       488BC1               mov      rax, rcx
                                                ;; bbWeight=1    PerfScore 8.00
G_M30881_IG02:              ;; offset=0026H
       48BA48A7649DFC7F0000 mov      rdx, 0x7FFC9D64A748
       483910               cmp      qword ptr [rax], rdx
       751E                 jne      SHORT G_M30881_IG04
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_TAILCALL]
       488BC8               mov      rcx, rax
                                                ;; bbWeight=1    PerfScore 7.50
G_M30881_IG03:              ;; offset=0049H
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       E900000000           jmp      hackishModuleName:hackishMethodName()
                                                ;; bbWeight=1    PerfScore 3.00
G_M30881_IG04:              ;; offset=0053H
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_TAILCALL]
       488BC8               mov      rcx, rax
       488B00               mov      rax, qword ptr [rax]
       488B4078             mov      rax, qword ptr [rax+120]
       488B00               mov      rax, qword ptr [rax]
                                                ;; bbWeight=0    PerfScore 0.00
G_M30881_IG05:              ;; offset=0071H
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       48FFE0               rex.jmp  rax
                                                ;; bbWeight=0    PerfScore 0.00
G_M30881_IG06:              ;; offset=0079H
       488D0D00000000       lea      rcx, [(reloc)]
       488D5510             lea      rdx, [rbp+10H]
       FF1500000000         call     [CORINFO_HELP_PROF_FCN_LEAVE]
       90                   nop
                                                ;; bbWeight=0    PerfScore 0.00
G_M30881_IG07:              ;; offset=008BH
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       C3                   ret

category:cq
theme:block-opts

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 25, 2021
@AndyAyersMS AndyAyersMS added this to the Future milestone Feb 25, 2021
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Feb 25, 2021
@SingleAccretion
Copy link
Contributor

SingleAccretion commented Feb 25, 2021

I have been looking at code that's related to this recently so posting my findings here, might be worth fixing at the same time.

There is a bug in how fgReturnCount is calculated in morph:

fgReturnCount--;

Merging of returns calculates fgReturnCount "accurately", morph doesn't need to decrement here. This leads to an overflow in methods like these:

private static int Loop(int a)
{
    for (int i = 0; i < 10; i++)
    {
        if (a is 1)
            return a;
        if (a is 2)
            return a;
        if (a is 3)
            return a;
        if (a is 4)
            return a;
    }

    return 5;
}
Considering loop 0 to clone for optimizations.
Loop cloning: rejecting loop because it has 0 returns; if added to previously-existing -2 returns, would exceed the limit of 4.

(Edit: to be clear, I believe the fix for this to be the simple deletion of the decrement in morph. The reason I haven't done it yet is the fact that adding debug checking for fgReturnCount lead me to discover all the other issues with it and consider a "proper" fix).

We are safe here from the correctness perspective because nothing important uses fgReturnCount, but it'd still be nice to maintain it properly (and print it in the dump as the unsigned number it actually is). Unfortunately, it looks to be a somewhat complex undertaking with unclear benefits, as the return count is only used by cloning and unrolling to not go over the x86 encoder's hard limit on the number of epilogues.

I also know that fgReturnCount becomes incorrect today in cases where we "merge" returns from tail calls without actually merging them in morph and when removing dead return blocks.

One idea I had on how we could have our cake and eat it too was to just delete fgReturnCount and always move the return blocks out of loops. This is already the case today for the vast majority of loops, the exceptions are because of how the compaction accounts for EH, which can cause blocks that could be moved out to be missed. Presumably, BBJ_RETURNs can always be moved out as they should not be participating in EH, but I don't actually know if that's always the case in the compiler (it is the case in IL). It is also unclear to me if we have code elsewhere that could potentially be duplicating returns but forgetting to check the limit (probably not, but it could be added in the future).

@AndyAyersMS
Copy link
Member Author

Thanks for bringing this up, @SingleAccretion.

Seems like we could either (a) decide to count properly at all times, and start checking this post-phase, or (b) decide to (re) count only when we actually need to act based on the number.

(a) may be tricky as there's a delay between when we figure out our merging strategy and do the actual merging, and as we've seen from the above, an interaction with tail calls (which can't be accounted for early, when we merge we haven't yet committed to which tail call candidates will become tails calls). So we may be temporarily over the limit, and post merge we may have fewer actual returns then we thought (or more, given the bug in morph's accounting).

(b) may be tricky as we can't be over the limit once we get to the point where we can't fix things, so at some point we have to settle on the number and stick with it and ensure we're at or under any runtime constraint.

I think your idea of (almost) always moving BBJ_RETURN out of loops is a good one -- moving does not alter the number of returns but as you note it can unblock cloning. There are potential EH constraints on moving but they should only hamper movement out of loops when there's a return inside a try inside a loop, and that's likely not something we are able/likely to clone anyways. And you can always split off the tail part of the BBJ_RETURN (past the point where any exceptions can happen) and move that.

Back when we did the return merging work I actually thought we should be more aggressive about merging and perhaps have the default stance that we always try and merge returns. Would be interesting to study the impact of something like this.

@AndyAyersMS
Copy link
Member Author

Loop cloning: rejecting loop because it has 0 returns; if added to previously-existing -2 returns,

cc @BruceForstall for the issue with poor tracking of returns impacting cloning.

@AndyAyersMS
Copy link
Member Author

Another thing we might consider is creating a true "final BB" that the return blocks all branch to. This block would post-dominate all the normal flow in the method, and its pred list would allow easy enumeration of all the returns. It wouldn't hold any code.

Going a bit further, we could have a second "truly final BB" that is targeted by the final BB above and by any exceptional exits and flow dead ends (no return calls, possibly heads of no-exit loops). This block would post-dominate everything in the method and could (with suitable fake IR like keep alive) be used to force things to remain alive throughout the method without doing anything "artificial". And would also be useful if we ever decide to build post-dominance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

2 participants