Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: Globally lock link.exe on 32-bit MSVC #33155

Closed
wants to merge 1 commit into from

Conversation

alexcrichton
Copy link
Member

Looks like link.exe can't be called in parallel when it is targeting 32-bit.
There's some weird complications with mspdbsrv.exe which can't seem to be
worked around. Instead, let's just globally lock all invocations of the linker
to ensure there's only one linker running at a time.

For more detail, see the comments in the commit itself.

Closes #33145

Looks like `link.exe` can't be called in parallel when it is targeting 32-bit.
There's some weird complications with `mspdbsrv.exe` which can't seem to be
worked around. Instead, let's just globally lock all invocations of the linker
to ensure there's only one linker running at a time.

For more detail, see the comments in the commit itself.

Closes rust-lang#33145
@rust-highfive
Copy link
Collaborator

r? @jroesch

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @brson

@rust-highfive rust-highfive assigned brson and unassigned jroesch Apr 22, 2016
@brson
Copy link
Contributor

brson commented Apr 22, 2016

@bors r+ p=2

@bors
Copy link
Contributor

bors commented Apr 22, 2016

📌 Commit 95cc51d has been approved by brson

@bors
Copy link
Contributor

bors commented Apr 22, 2016

⌛ Testing commit 95cc51d with merge 8fd45ae...

@bors
Copy link
Contributor

bors commented Apr 22, 2016

💔 Test failed - auto-win-msvc-32-opt

@alexcrichton
Copy link
Member Author

Well that answers the question of "does this fix the problem"

@retep998
Copy link
Member

So if the issue isn't parallel linker invocations, then what the heck is it?

@nagisa
Copy link
Member

nagisa commented Apr 23, 2016

So if the issue isn't parallel linker invocations, then what the heck is it?

Are we sure that link.exe only exits after ensuring mspdbsrv.exe completed its work? Since it communicates with link.exe with some sort of IPC in the first place, it sounds more likely that the answer to the question is “no”.

So perhaps then the problem is link.exe invocations in a very quick succession?

@eddyb
Copy link
Member

eddyb commented Apr 23, 2016

Can mspdbsrv.exe's status be queried somehow? Or at least wait a second after a PDB failure and try again, with a limit on the number of retries?

@alexcrichton
Copy link
Member Author

So perhaps then the problem is link.exe invocations in a very quick succession?

Yeah this is my suspicion now as well. With locking I was able to reproduce the RPC error locally with the same test script above. It was quite sporadic, though, so hard to see.

I remember that Cargo has run into bugs like this as well (weird issues with mspdbsrv.exe). Specifically after link.exe has finished it looked like sometimes mspdbsrv.exe remains alive and keeps files open (e.g. preventing deletion of them).

The next thing to try would be job objects (kill link.exe and its entire process tree when link.exe exits), but unfortunately our bots aren't on a version of Windows that supports nested job objects and the entire build is already in a job object.

That and job objects are also complicated.

@retep998
Copy link
Member

retep998 commented Apr 24, 2016

What happens if you set MSBUILD_NODE_CONNECTION_TIMEOUT to 0? Taking a shot in the dark here based off of https://ask.electric-cloud.com/questions/6390/link-fatal-error-lnk1318-unexpected-pdb-error.html

Alternatively, and this really isn't optimal, but perhaps adding a 1 second sleep after we acquire the lock but before we execute link?

@alexcrichton
Copy link
Member Author

Attempting again I cannot reproduce the error with locking added. After thousands of runs no linker invocation failed with the same PDB error (now I'm second guessing whether I could reproduce it before). I've been testing with a custom linker, however, which has the locking injected, so I'm now trying to build this branch from source and I'll run tests using that to see what happens.

I haven't tested the msbuild env var yet (as I haven't reproduced with locking), but I'd be very surprised if it worked as we're not using msbuild at all...

@alexcrichton
Copy link
Member Author

Turns out concurrency was a red herring (full explanation), so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants