Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm/test/JitListener/multiple.ll fails on linux #51201

Closed
h-vetinari opened this issue Sep 14, 2021 · 20 comments
Closed

llvm/test/JitListener/multiple.ll fails on linux #51201

h-vetinari opened this issue Sep 14, 2021 · 20 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla

Comments

@h-vetinari
Copy link
Contributor

Bugzilla Link 51859
Resolution FIXED
Resolved on Oct 27, 2021 08:38
Version trunk
OS Linux
Blocks #50580 #51489
CC @AlexDenisov,@andykaylor,@bwyma,@topperc,@h-vetinari,@lhames,@RKSimon,@slacka,@phoebewang,@rotateright,@tstellar
Fixed by commit(s) e49c0c5 08e3a5c

Extended Description

While trying to build 13.0.0-rc3 for conda-forge in conda-forge/llvmdev-feedstock#131, the following error occurs under linux on x86_64:

[...]
[100%] Running the LLVM regression tests
-- Testing: 44819 tests, 2 workers --
Testing:  0.. 10.. 20.. 30.. 40..
FAIL: LLVM :: JitListener/multiple.ll (22231 of 44819)
******************** TEST 'LLVM :: JitListener/multiple.ll' FAILED ********************
Script:
--
: 'RUN: at line 2';   llvm-jitlistener $SRC_DIR/llvm/test/JitListener/multiple.ll | $SRC_DIR/llvm/build/bin/FileCheck $SRC_DIR/llvm/test/JitListener/multiple.ll
--
Exit Code: 1

Command Output (stderr):
--
$SRC_DIR/llvm/test/JitListener/multiple.ll:29:10: error: CHECK: expected string not found in input
; CHECK: Method load [1]: bar, Size = {{[0-9]+}}
         ^
<stdin>:1:1: note: scanning from here
Method load [1]: foo, Size = 11
^
<stdin>:6:1: note: possible intended match here
Method load [2]: bar, Size = 38
^

Input file: <stdin>
Check file: $SRC_DIR/llvm/test/JitListener/multiple.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: Method load [1]: foo, Size = 11
check:29'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            2:  Line info @ 0: multiple.c, line 1
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            3:  Line info @ 9: multiple.c, line 1
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            4:  Line info @ 11: multiple.c, line 2
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            5:
check:29'0     ~
            6: Method load [2]: bar, Size = 38
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:29'1     ?                                possible intended match
            7:  Line info @ 0: multiple.c, line 5
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            8:  Line info @ 7: multiple.c, line 5
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9:  Line info @ 9: multiple.c, line 6
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           10:  Line info @ 11: multiple.c, line 6
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           11:  Line info @ 21: multiple.c, line 9
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            .
            .
            .
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (1):
  LLVM :: JitListener/multiple.ll


Testing Time: 1818.51s
  Unsupported      :  1595
  Passed           : 43075
  Expectedly Failed:   148
  Failed           :     1
make[3]: *** [test/CMakeFiles/check-llvm.dir/build.make:71: test/CMakeFiles/check-llvm] Error 1
make[2]: *** [CMakeFiles/Makefile2:87867: test/CMakeFiles/check-llvm.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:87874: test/CMakeFiles/check-llvm.dir/rule] Error 2
make: *** [Makefile:17557: check-llvm] Error 2
@h-vetinari
Copy link
Contributor Author

assigned to @andykaylor

@h-vetinari
Copy link
Contributor Author

The build script currently in use by this feedstock (resp. my PR) can be found here: https:/h-vetinari/llvmdev-feedstock/blob/rc/recipe/build.sh

Happy to explain how to reconstruct/reproduce this. A sample CI run can be found here: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=376252 (then check linux_64)

@h-vetinari
Copy link
Contributor Author

Hey all, I'm new to the LLVM bugzilla, and am not sure if I categorized this bug correctly, and whether the right people were informed?

Could someone provide a hint or reclassify the bug accordingly, please? :)

@slacka
Copy link
Mannequin

slacka mannequin commented Sep 25, 2021

Do you know what is necessary to reproduce this in a standard linux environment? What compiler, LDFLAGS, CXXFLAGS etc are you using? For this to get any traction, someone is going to have to be able to reproduce it.

@h-vetinari
Copy link
Contributor Author

Do you know what is necessary to reproduce this in a standard linux
environment? What compiler, LDFLAGS, CXXFLAGS etc are you using? For this to
get any traction, someone is going to have to be able to reproduce it.

Hey, thanks for the response! Happy to help move this forward in any way I can.

The compiler used was GCC 9.4.0 (current default on linux for conda-forge), while the flags were set as follows:

CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/llvm-package-13.0.0.rc3 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix
CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem $PREFIX/include
CXXFLAGS=-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/llvm-package-13.0.0.rc3 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix
LDFLAGS=-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib
DEBUG_CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/llvm-package-13.0.0.rc3 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix
DEBUG_CPPFLAGS=-D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem $PREFIX/include
DEBUG_CXXFLAGS=-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/llvm-package-13.0.0.rc3 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix

@lhames
Copy link
Contributor

lhames commented Sep 26, 2021

The test tries to match the following:

; CHECK: Method load [1]: bar, Size = {{[0-9]+}}
; CHECK: Line info @ {{[0-9]+}}: multiple.c, line {{[5,6,7,9]}}
; CHECK: Line info @ {{[0-9]+}}: multiple.c, line {{[5,6,7,9]}}
; CHECK: Line info @ {{[0-9]+}}: multiple.c, line {{[5,6,7,9]}}
; CHECK: Line info @ {{[0-9]+}}: multiple.c, line {{[5,6,7,9]}}

; CHECK: Method load [2]: foo, Size = {{[0-9]+}}
; CHECK: Line info @ {{[0-9]+}}: multiple.c, line {{[1,2]}}
; CHECK: Line info @ {{[0-9]+}}: multiple.c, line {{[1,2]}}

However the output from the failing test case is:

Method load [1]: foo, Size = 11
Line info @ 0: multiple.c, line 1
Line info @ 9: multiple.c, line 1
Line info @ 11: multiple.c, line 2

Method load [2]: bar, Size = 38
Line info @ 0: multiple.c, line 5
Line info @ 7: multiple.c, line 5
Line info @ 9: multiple.c, line 6
Line info @ 11: multiple.c, line 6
Line info @ 21: multiple.c, line 9

The 'foo' and 'bar' functions are in a different order.

At first glance I wouldn't expect this order to be stable. Maybe the test or the llvm-jitlistener tool itself needs to be made resilient to changes in ordering?

Andy, Brock -- Do you guys have any thoughts? Or can you point us at the right people at Intel to ask?

@h-vetinari
Copy link
Contributor Author

Just noticed that the build also uses -DLLVM_USE_INTEL_JITEVENTS=ON. Not sure if it's related, but by the sound of the last comment, it might be?

@h-vetinari
Copy link
Contributor Author

This is still present in rc4.

@lhames
Copy link
Contributor

lhames commented Sep 26, 2021

Unless we get a response from the intel folks I don't think this will be fixed in LLVM 13.

In the short term I would recommend disabling this test locally, or (if you know that you don't need intel JIT events support) build with -DLLVM_USE_INTEL_JITEVENTS=OFF.

@andykaylor
Copy link

This test won't be run at all without -DLLVM_USE_INTEL_JITEVENTS=ON. The failure appears to be benign. Everything looks like it's working correctly except for the test itself. I'm not sure what determines the order in which the functions are reported to the listener. It's possible the memory image layout is non-deterministic.

I can post a patch to make the test less sensitive to the order. If we need the order to be deterministic across different builds of LLVM, that may be more difficult.

@bwyma
Copy link
Contributor

bwyma commented Sep 27, 2021

The layout is expected to be deterministic. If the order is different but deterministic then we can probably just update the test. If the order is different and nondeterministic then there should be a software change to make it deterministic. Do you know which patch caused the order to change?

@andykaylor
Copy link

It looks like we saw failures as early as March in our downstream testing. I thought it was a difference between our code and LLVM trunk. There are some Git revisions listed in some of our proprietary code, but I can't seem to match them to anything in either our repo or LLVM.

In any event, the problem should be fixed (or at least worked around) by this: https://reviews.llvm.org/D110589

The order seems to be deterministic from run to run. My patch above might be overly general.

@lhames
Copy link
Contributor

lhames commented Sep 27, 2021

I've approved Andy's patch in https://reviews.llvm.org/D110589, but I agree with Brock -- if the order has changed but the new one is deterministic then just re-ordering the test case seems like an ideal solution.

Andy, Brock: If either of you decide that you would prefer to reorder the test case you're welcome to go ahead and commit that without further review.

@h-vetinari
Copy link
Contributor Author

Thanks for looking & this and coming up with a fix. I agree that as long things are deterministic within a release it shouldn't be a big issue (but the test should still be written to not care about the order, so we don't play this game every 6 months). :)

@andykaylor
Copy link

I also think the more flexible test is better. I'm working on identifying the commit that caused this test to start failing, because I'd like to understand the kind of change that leads to the re-ordering of the output. The image emitted to memory should be as stable and deterministic as an object file. If the breaking change was something in the JitListener interface itself, that would seem more like a matter of indifference.

I know it's common practice in LLVM to update tests when a change causes minor difference in output, but since this one is off by default and target-specific I don't expect most people to notice when they've broken it. I think that's a strong argument for making it less fragile. Even if the memory layout of the JIT'ed image changes from time to time, that's not entirely bad.

I'll also try to get a buildbot up that enables this feature so we know sooner when it's failing. I'm pretty sure we used to have one. I don't know what happened to it.

@andykaylor
Copy link

BTW, I don't think this should block the 13.0 release. Since the test failure is benign and most people won't even see it, I don't think it's critical to port the fix to the release branch. I guess that's up to Tom though.

@andykaylor
Copy link

This is the change that caused the test to start failing:

962b29d

"ELFObjectWriter: Don't sort non-local symbols"

Given that this was an intentional change to the way symbols are ordered in object files (including the MC-produced object images that MCJIT uses), I expect that's pretty stable. I'll update the test to check the new order. If it breaks again, we can consider using the more general checks.

@andykaylor
Copy link

The test should be fixed by this commit:

e49c0c5

@tstellar
Copy link
Collaborator

Merged: 08e3a5c

@tstellar
Copy link
Collaborator

mentioned in issue #51489

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

5 participants