-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin: SIGSEGV or unknown caller pc on ppc64le with -buildmode pie #43228
Comments
This definitely could be a similar problem to what happened before. If r2 is not restored correctly then you would see this type of failure. Do you know if there is a Go version where it works with -buildmode=pie? |
I'm afraid I've only tried 1.13.12 and 1.15.6 here so far. I can try more if there might be other useful versions for troubleshooting, with the caveat that the code requires at least 1.13 |
I think I am able to reproduce this problem with a test where the main program is built with pie and it invokes a plugin. At some point after it returns from the plugin the value of r2 is incorrect which can lead to various errors. I am still trying to debug it to find where it is happening. This failure won't occur if the main program is built with -buildmode=default, because in that case the main program doesn't depend on the r2 value. |
In your example backtraces, are all 3 the same if you run with Go 1.13.12? |
I should be able to check tomorrow and will post here. I believe they are the same SIGSEGV or unknown caller pc - but I want to check. |
@cherrymui @aclements I need some input here. In trying to reproduce the problem in this issue I found a bug in the case where a main program is built with -buildmode=pie and then invokes a function in a plugin. Seems likely to be this issue but haven't verified that yet. I reproduced this by modifying the test in misc/cgo/testplugin so that when building TestIssue25756 it passes in -buildmode=pie. If I do that, I get a segv during the run and after debugging it found that r2 is incorrect after calling crosscall2 in runtime/cgo/asm_ppc64x.s. When compiled with -buildmode=pie, crosscall2 has this generated code in the main program:
But without (-buildmode=default):
So that means when compiled with pie, the r2 that is stored on the stack in crosscall2 is the value computed for the TOC in the current (main) program and that saved value is used by the caller after returning, which is incorrect and will result in a segv or other odd behavior. When compiled for buildmode=default, the value of r2 that is stored in crosscall2 is the r2 value from the caller, which is what it should be. Not sure what the best way to fix it is. I was also surprised to see that there is a crosscall2 in the main program and one in the plugin object. When calling crosscall2 from the plugin, the one in the main program is called. If the one in the plugin was called instead, that would work because it would generate the correct r2 value before storing it. |
Yeah, the combination of PIE+plugin is probably never tested and probably doesn't work (on some platforms). So, you mean crosscall2 should store R2 on stack before computing it from R12? I agree the code doesn't seem to work with assembler-added addis+addi instructions. I guess the part I don't understand is
Why does it store things on the caller's frame? (whereas it doesn't seem to do that on other platforms.) The caller doesn't do that? |
I think so. It should be the R2 that the caller was using before this call.
I don't know that either. I think it has been doing it that way since the beginning and maybe this was done as a hack to make things work and was just never changed. I believe the caller should be saving R2 at that location before the call. I tried removing the line that saved R2 and that caused some testcases to fail, some of the cgo crash and callback tests, which I will investigate further. |
I don't think crosscall2 should not be saving R2 on the caller's stack. if the caller needs R2 and is calling from code that has a different TOC, the caller should be saving and restoring it. However, if the caller of crosscall2 has the same TOC, so doesn't save and restore R2, but R2 gets clobbered by the call to cgocallback in crosscall2 then it needs to be saved and restored around that call, not on the caller's stack but its own. Still investigating this to see if this is correct. |
Yeah, I think it makes sense that crosscall2 preserves R2, just not on the caller's frame. This might work: if the caller needs to save/restore R2, it does it by itself and we don't clobber the slot on the caller's frame. If the caller doesn't need to save R2, it will jump past the inserted two instructions (addis+addi), and R2 will be preserved. |
Does this still fail on Go 1.17? |
On run 1 - SIGILL info
run 2 - SIGSEGV info
|
I've looked at this a bit more.
Interestingly, in the plugin case, crosscall2 in the main program is called with a PLT even though there is a copy of crosscall2 in the plugin binary. If it could call crosscall2 in the plugin that would work because they would have the same TOC and R2 value. It is only when crosscall2 is called in a different binary that there is an issue. I haven't looked at this cgo code generation much before so I don't know how to make 2) happen. |
Maybe we can make crosscall2 not dynamically exported (i.e. hidden visibility) so it will always use the local copy. I still think crosscall2 should never write to the caller's frame (as I commented above). |
Ah I think the right solution is to create a stack frame in crosscall2 and save R2 there, along with the other values it saves now in the caller's frame. If it is called through a PLT then the caller will have saved its own R2 on its stack and will restore it after the call. If it is not called through a PLT then crosscall2 will save and restore it from its own stack frame. I think that covers the cases. I will give it a try. |
Change https://golang.org/cl/319489 mentions this issue: |
@dtrudg Can you test out this CL to see if it fixes your issue? |
Thanks @laboger - I should have time today, or tomorrow morning at the latest. |
This does fix the issue. Using the patched |
@dtrudg I made an update to the CL, could you try again to verify it still fixes the problem before I submit it? Thanks. |
Change https://golang.org/cl/321349 mentions this issue: |
This test is known to be broken on the darwin/arm64 builder. Skip it while it's being investigated so it doesn't mask other failures. For #46239. Updates #43228. Change-Id: I8fe57a0636bba84c3100337146dcb96cc264e524 Reviewed-on: https://go-review.googlesource.com/c/go/+/321349 Trust: Dmitri Shuralyov <[email protected]> Run-TryBot: Dmitri Shuralyov <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
What version of Go are you using (
go version
)?(also seen on now unsupported
go version go1.13.12 linux/ppc64le
)Does this issue reproduce with the latest release?
Yes, using
1.15.6
as above.What operating system and processor architecture are you using (
go env
)?go env
OutputDistro Information
What did you do?
Compiled and used a go plugin, where the main executable is compiled using
-buildmode=pie
onppc64le
.Plugin code is, unfortunately, closed source. Main executable is built from https:/hpcng/singularity/releases/tag/v3.7.0 - I will try to get a simpler / open-source reproducer when time permits.
What did you expect to see?
No errors. The same configuration works without issue on
amd64
andarm64
.On
pp64le
using-buildmode=default
there is no error. Only-buildmode=pie
causes issues.Also, we see no issues with
-buildmode=pie
onppc64le
when we are not using plugins.What did you see instead?
SIGSEGV
orunknown caller pc
are seen in consecutive runs, with failures occuring at slightly different points in execution (of code within the plugin). The traces/errors seem similar, superficially, to those in #30283 (from containerd/containerd#3005). Those issues occurred on ppc64le with-buildmode=pie
(but not plugins afaik) and were fixed in go 1.12.3 backtraces from consecutive runs:
Backtrace 1
Backtrace 2
Backtrace 3
The text was updated successfully, but these errors were encountered: