Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.AccessViolationException while formatting stacktrace #10986

Closed
jkotas opened this issue Aug 27, 2018 · 26 comments
Closed

System.AccessViolationException while formatting stacktrace #10986

jkotas opened this issue Aug 27, 2018 · 26 comments

Comments

@jkotas
Copy link
Member

jkotas commented Aug 27, 2018

Reported by @BrennanConroy

We’re getting this exception in a test occasionally.

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
    at System.Reflection.Internal.MemoryBlock.PeekCompressedInteger(Int32 offset, Int32& numberOfBytesRead)
    at System.Reflection.Metadata.Ecma335.BlobHeap.GetMemoryBlock(BlobHandle handle)
    at System.Reflection.Metadata.MethodDebugInformation.GetSequencePoints()
    at System.Diagnostics.StackTraceSymbols.GetSourceLineInfoWithoutCasAssert(String assemblyPath, IntPtr loadedPeAddress, Int32 loadedPeSize, IntPtr inMemoryPdbAddress, Int32 inMemoryPdbSize, Int32 methodToken, Int32 ilOffset, String& sourceFile, Int32& sourceLine, Int32& sourceColumn)
    at System.Diagnostics.StackFrameHelper.InitializeSourceInfo(Int32 iSkip, Boolean fNeedFileInfo, Exception exception)
    at System.Diagnostics.StackTrace.CaptureStackTrace(Int32 iSkip, Boolean fNeedFileInfo, Thread targetThread, Exception e)
    at System.Diagnostics.StackTrace..ctor(Exception e, Boolean fNeedFileInfo)
    at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
    at System.Exception.GetStackTrace(Boolean needFileInfo)
    at System.Exception.ToString(Boolean needFileLineInfo, Boolean needMessage)
    at System.Exception.ToString()
@jkotas
Copy link
Member Author

jkotas commented Aug 27, 2018

From @jkotas: If you lose the race here:

            // This may fail as another thread might have beaten us to it, but it doesn't matter
            _metadataCache.TryAdd(cacheKey, provider);

            if (provider == null)
            {
                return null;
            }

            // The reader has already been open, so this doesn't throw:
            return provider.GetMetadataReader();
        }

There is nothing going to keep the provider alive.

From @tmat: I think you are right. We should return the provider that’s already in the cache, not the new provider.

@mikem8361 mikem8361 self-assigned this Aug 27, 2018
@mikem8361
Copy link
Member

mikem8361 commented Aug 29, 2018

Fixed in master (PR dotnet/corefx#31991) and 2.2 branch (PR dotnet/corefx#32016)

@tmat
Copy link
Member

tmat commented Aug 29, 2018

@mikem8361 Do you plan to fix in .NET Framework 4.8 as well?

@mikem8361
Copy link
Member

We will see if ship room thinks this is important enough to get into desktop. Personally I hope to avoid any desktop work; maybe we can get someone else more familiar with the process.

/cc: @tommcdon

@mikem8361
Copy link
Member

Jan opened https://devdiv.visualstudio.com/DevDiv/_workitems/edit/672340 for the desktop.

@BrennanConroy
Copy link
Member

Well we only actually ever saw this issue on desktop... so it would be nice to get the fix there.

@danmoseley
Copy link
Member

Reactivating for 2.2 consideration.

@danmoseley danmoseley reopened this Oct 2, 2018
@danmoseley
Copy link
Member

cc @tommcdon ?

Shiproom template

Description

There is a race condition in the portable PDB metadata provider cache that leaked providers and caused crashes in the diagnostic StackTrace API.
To fix the race, detect the cause where the provider wasn't being disposed and dispose it.

Customer Impact

Without this fix, the StackTrace API can crash.

Regression?

No.

Risk

Low.

@tommcdon
Copy link
Member

tommcdon commented Oct 2, 2018

@danmosemsft The Shiproom template LGTM. Yes we agree that this race condition should be fixed in 2.2.

@danmoseley
Copy link
Member

Shiproom Okayed this for 2.2, but deferred for 2.1.

@jnm2
Copy link
Contributor

jnm2 commented Jun 30, 2019

@mikem8361 @danmosemsft Fyi I believe this to be the underlying bug for a longtime problem with intermittent CI hangs in our NUnit tests. The AccessViolationException takes out an NUnit worker thread which causes NUnit to hang while waiting for it.

I spent roughly 17 hours digging at this so far. Here's where it gets interesting: nunit/nunit#3295 (comment)
The best repro I have had time to get so far: https:/jnm2/NUnitHangRepro#how-to-use

@jnm2
Copy link
Contributor

jnm2 commented Jun 30, 2019

Ah, should have asked explicitly: would you please port this fix to .NET Framework 4.7.2 and 4.8? I'm going to do my best to harden NUnit to the bug, but even so it's not going to be a good user experience to sporadically not have stack traces in the test results.

@danmoseley
Copy link
Member

@mikem8361 @tommcdon for the question about backporting.

@tommcdon
Copy link
Member

tommcdon commented Jul 3, 2019

@jnm2 Thanks for taking the time to report this issue and work towards NUnit to this issue. After careful consideration, we do not plan to action this particular item in the .NET framework 4.7.2 and 4.8 servicing releases. If this issue is blocking you from production, there are alternative assisted support options available from Microsoft.

@jnm2
Copy link
Contributor

jnm2 commented Jul 3, 2019

@tommcdon That's disappointing and I have to admit it leaves me curious, but thank you for considering. There are a number of other folks using NUnit and coming at this same .NET use-after-free bug from various directions. Is the rationale that .NET Framework 4.8 is intentionally less supported to nudge folks to .NET Core/.NET 5? That would be helpful to know as NUnit plans around this bug and as something we can pass on to folks who are affected by this.

@jkotas
Copy link
Member Author

jkotas commented Jul 3, 2019

@jnm2 We continue to both service and support .NET Framework, which includes bug–, reliability– and security fixes. https://devblogs.microsoft.com/dotnet/net-core-is-the-future-of-net/

This issue is both reliability and security use-after-free bug as you have pointed out. I am asking folks to reconsider.

@jnm2
Copy link
Contributor

jnm2 commented Jul 16, 2019

@jkotas That's reassuring! I'm eager to hear what happens. Ideally we can document which security rollup this gets fixed in.

@jnm2
Copy link
Contributor

jnm2 commented Jul 29, 2019

@jkotas Since this issue is closed and I'm afraid of it falling off the radar, would it be a good idea for me to open a new issue?

@jkotas
Copy link
Member Author

jkotas commented Jul 29, 2019

.NET Framework bugs are not tracked on github. They are tracked in internal bug database. This specific bug is tracked as https://devdiv.visualstudio.com/DevDiv/_workitems/edit/672340

https:/dotnet/core#getting-help has instruction for opening .NET Framework issues. If you would like to have something you can see and monitor progress from outside, you can open a .NET Framework issue there.

@ocoanet
Copy link

ocoanet commented Aug 26, 2019

@jkotas Would it be possible to have the status of the .NET Framework bug? This is a major issue for us, randomly and abruptly killing our production services.

@jkotas
Copy link
Member Author

jkotas commented Aug 26, 2019

@mikem8361 is working on it. I am sorry it has been delayed due to summer vacations.

@jnm2
Copy link
Contributor

jnm2 commented Aug 26, 2019

I was just debating whether I should pester you again over the weekend. Thanks @ocoanet for doing it for me.

@jkotas
Copy link
Member Author

jkotas commented Sep 27, 2019

The .NET Framework fix is scheduled for November update.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 2.2.x milestone Jan 31, 2020
@jnm2
Copy link
Contributor

jnm2 commented May 9, 2020

@jkotas Just to confirm, was it actually shipped in the February update?

November's update:
https://devblogs.microsoft.com/dotnet/net-framework-november-13-2019-update-for-net-framework-4-8/

February's update:
https://devblogs.microsoft.com/dotnet/net-framework-february-2020-security-and-quality-rollup/

CLR

  • There is a race condition in the portable PDB metadata provider cache that leaked providers and caused crashes in the diagnostic StackTrace API. To fix the race, detect the cause where the provider wasn’t being disposed and dispose it.

@jkotas
Copy link
Member Author

jkotas commented May 9, 2020

Yes. Our release team had to adjust the schedule for this.

@jnm2
Copy link
Contributor

jnm2 commented May 9, 2020

No worries, just wanted to have a number in mind to spread! Thanks again!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants