Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid bitcast and LLVM ERROR: Broken function found, compilation aborted! when including PyCall in system image #31473

Open
tkf opened this issue Mar 25, 2019 · 12 comments
Labels
external dependencies Involves LLVM, OpenBLAS, or other linked libraries

Comments

@tkf
Copy link
Member

tkf commented Mar 25, 2019

I'm trying to make PyCall.jl compatible with PackageCompiler.jl in JuliaPy/PyCall.jl#651. This branch itself has been working. You can check it with

julia -e 'using Pkg; pkg"dev PyCall"'
cd ~/.julia/dev/PyCall/
git remote add -f tkf https:/tkf/PyCall.jl
# git checkout -b aot tkf/aot
git checkout -b aot 65eb0c04453420946c6e33d24d885f1cfa7578d0
aot/compile.jl --color=yes

However, merging JuliaPy/PyCall.jl#668

git merge --no-edit v1.91.0
aot/compile.jl --color=yes

breaks AOT-compilation with the following error (sandwiched by some logging messages and a stacktrace):

Invalid bitcast
  %56 = bitcast i64 %55 to %jl_value_t addrspace(64)*, !dbg !413904
LLVM ERROR: Broken function found, compilation aborted!

(Here is an example of the full output: https://travis-ci.org/JuliaPy/PyCall.jl/jobs/511050638#L761-L763)

Does this mean there is a function in JuliaPy/PyCall.jl#668 that invokes a bug in Julia compiler? How do I debug more?

cc: @stevengj

@stevengj stevengj added the external dependencies Involves LLVM, OpenBLAS, or other linked libraries label Mar 25, 2019
@stevengj
Copy link
Member

stevengj commented Mar 25, 2019

This seems like a bug in Julia — user code should not be able to trigger an LLVM error (unless it is doing a raw llvmcall, which doesn't seem to be the case here).

One crude way to debug is to try deleting code from PyCall until the error goes away, to boil it down to a minimal module exhibiting the failure.

@tkf
Copy link
Member Author

tkf commented Mar 25, 2019

OK. I think that's doable.

Let me also note that JuliaPy/PyCall.jl#668 seems to invoke segfault in a non-deterministic manner in LLVM function (see JuliaPy/PyCall.jl#670). Maybe it's related?

@vchuravy
Copy link
Member

You can also try a julia debug + LLVM assertion build. From the short snippet this should probably be a inttoptr instruction.

@tkf
Copy link
Member Author

tkf commented Mar 26, 2019

@vchuravy Thanks. So I build julia-debug with

$ cat Make.user
LLVM_ASSERTIONS = 1

$ make julia-debug

(Is it what you mean by "julia debug + LLVM assertion build"?)

Then executing JULIA=~/repos/watch/julia/usr/bin/julia-debug aot/compile.jl --color=yes --startup-file=no failed with julia-debug: /home/takafumi/repos/watch/julia/usr/include/llvm/Support/Casting.h:255: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::PointerType; Y = llvm::Type; typename llvm::cast_retty<X, Y*>::ret_type = llvm::PointerType*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. followed by

signal (6): Aborted
in expression starting at none:0
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
__assert_fail_base.cold.0 at /usr/lib/libc.so.6 (unknown line)
__assert_fail at /usr/lib/libc.so.6 (unknown line)
cast<llvm::PointerType, llvm::Type> at /home/takafumi/repos/watch/julia/usr/include/llvm/Support/Casting.h:255
getPointerAddressSpace at /home/takafumi/repos/watch/julia/usr/include/llvm/IR/DerivedTypes.h:504
emit_bitcast at /home/takafumi/repos/watch/julia/src/cgutils.cpp:466
emit_ccall at /home/takafumi/repos/watch/julia/src/ccall.cpp:1626
emit_expr at /home/takafumi/repos/watch/julia/src/codegen.cpp:3970
emit_ssaval_assign at /home/takafumi/repos/watch/julia/src/codegen.cpp:3665
emit_stmtpos at /home/takafumi/repos/watch/julia/src/codegen.cpp:3857
emit_function at /home/takafumi/repos/watch/julia/src/codegen.cpp:6343
jl_compile_linfo at /home/takafumi/repos/watch/julia/src/codegen.cpp:1142
jl_compile_hint at /home/takafumi/repos/watch/julia/src/gf.c:1978
jl_compile_specializations at /home/takafumi/repos/watch/julia/src/precompile.c:378
jl_precompile at /home/takafumi/repos/watch/julia/src/precompile.c:387
jl_write_compiler_output at /home/takafumi/repos/watch/julia/src/precompile.c:34
jl_atexit_hook at /home/takafumi/repos/watch/julia/src/init.c:228
main at /home/takafumi/repos/watch/julia/ui/repl.c:218
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at /home/takafumi/repos/watch/julia/usr/bin/julia-debug (unknown line)
Allocations: 78109592 (Pool: 78092347; Big: 17245); GC: 172

Does this help locating the bug?

@tkf
Copy link
Member Author

tkf commented Mar 26, 2019

It looks like the error is due to the way jl_value_ptr is ccalled. Following change seems to fix the system image build (which was failing previously)

diff --git a/src/pytype.jl b/src/pytype.jl
index a88121f..7498d8b 100644
--- a/src/pytype.jl
+++ b/src/pytype.jl
@@ -412,9 +412,11 @@ const Py_TPFLAGS_HAVE_STACKLESS_EXTENSION = Ref(0x00000000)
 # use this to create a new jlwrap type, with init to set up custom members
 function pyjlwrap_type!(init::Function, to::PyTypeObject, name::AbstractString)
     sz = sizeof(Py_jlWrap) + sizeof(PyPtr) # must be > base type
     PyTypeObject!(to, name, sz) do t::PyTypeObject
-        t.tp_base = ccall(:jl_value_ptr, Ptr{Cvoid}, (Ref{PyTypeObject},), jlWrapType)
+        # Not using `Ref{PyTypeObject}` to avoid
+        # https:/JuliaLang/julia/issues/31473
+        t.tp_base = ccall(:jl_value_ptr, Ptr{Cvoid}, (Any,), jlWrapType)
         ccall((@pysym :Py_IncRef), Cvoid, (Any,), jlWrapType)
         init(t)
     end
 end

@stevengj
Copy link
Member

stevengj commented Mar 26, 2019

@yuyichao, it used to use (Any,) here, but you said that it should be changed to Ref{PyTypeObject} once we dropped 0.6 support (JuliaPy/PyCall.jl@d5de0c5). So it seems like Ref{PyTypeObject} should be correct there nowadays (which is why I changed it as part of the cleanup in JuliaPy/PyCall.jl#668), and there is a bug in Julia?

(We can of course use Any here for now as a workaround.)

@stevengj
Copy link
Member

Actually, maybe in this case @yuyichao's comment in the source code is wrong: since jl_value_ptr takes a jl_value_t* argument, Any was actually correct. And since the error comes from a misuse of a low-level ccall, it is not a bug in Julia that we are crashing.

I'm not sure why we aren't just calling pointer_from_objref here. Maybe that function didn't exist when this code was written?

@stevengj
Copy link
Member

stevengj commented Mar 26, 2019

I'm still worried about the best way to implement this code in PyCall. jlWrapType is a pointer to a mutable struct type. What we need is the corresponding C struct *, i.e. what would be passed to ccall for a Ref{PyTypeObject} argument. This isn't necessarily the same as the jl_value_t* pointer, although I guess that is currently the case?

It seems like it would be better for us to do Base.unsafe_convert(Ptr{Cvoid}, Ref(jlWrapType)) (though internally this calls pointer_from_objref so it would be equivalent in the current Julia).

@yuyichao
Copy link
Contributor

since jl_value_ptr takes a jl_value_t* argument, Any was actually correct.

It's also correct for all other ones, but it's not more or less correct than the Ref version.

This actually seems to be a regression from 0.6 likely when Ptr is represented by a integer.

@stevengj stevengj reopened this Mar 26, 2019
@stevengj
Copy link
Member

stevengj commented Mar 26, 2019

Reopening since @yuyichao thinks it is a probably a Julia bug.

@yuyichao
Copy link
Contributor

This isn't necessarily the same as the jl_value_t* pointer, although I guess that is currently the case?

Technically true but we have enough code assuming it after this becomes the case I won't really worry about this get silently changed back. There is still a difference between the two, namely whether your pointer need the tag, Any guarantees that whereas Ref don't. It doesn't matter in this case though.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 27, 2019

This isn't necessarily the same as the jl_value_t* pointer, although I guess that is currently the case?

For a mutable object, they are defined to be the same. Annotating with Ref adds an extra indirection (through allocating a mutable object, fwiw) to ensure there is both a pointer and gc-root available (although as Yichao points out, gc-root can be a bit slippery, since the object doesn't have to contain a type tag unless you call it as type Any). Also, since this is a C function, we can do whatever we want (including crash) if we don't agree the signature is right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external dependencies Involves LLVM, OpenBLAS, or other linked libraries
Projects
None yet
Development

No branches or pull requests

5 participants