Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread sanitizer tests in the CI pipeline #2068

Merged
merged 47 commits into from
Jul 18, 2023
Merged

Thread sanitizer tests in the CI pipeline #2068

merged 47 commits into from
Jul 18, 2023

Conversation

jblueh
Copy link
Contributor

@jblueh jblueh commented Jul 3, 2023

Proposed Changes

To simplify the maintenance of hybrid parallel SU2, we want to run thread sanitizer tests as part of the CI pipeline. We use thread-sanitizer enabled containers for this. This is work in progress and there are things that need testing, so I want to take it step by step while observing the behaviour of the CI pipeline.

Related Work

su2code/Docker-Builds#17 and follow-on PRs, #1679

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
  • My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
  • I have added a test case that demonstrates my contribution, if necessary.
  • I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.

@jblueh jblueh changed the title [WIP] Thread sanitizer tests in the CI pipeline Thread sanitizer tests in the CI pipeline Jul 18, 2023
Copy link
Contributor Author

@jblueh jblueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread sanitizer tests are now executed as part of the CI pipeline. With the tweaks explained in the code comments, the time spent on thread sanitizer builds and tests is right now approximately

job time
build BaseOMP-tsan 20min
build ReverseOMP-tsan 45min
run hybrid_regression.py with tsan 42min
run hybrid_regression_AD.py with tsan 12min

which is reasonable, I would say, given that thread sanitizer runs are expected to be slower and that some of the other build and test jobs take around 20min as well.

I will update the documentation on containers with thread sanitizer specifics when this is merged.

Comment on lines +10568 to +10576
/*--- Use a thread-sanitizer dependent loop schedule to work around suspected false positives ---*/
#ifndef __SANITIZE_THREAD__
#define CPHYSGEO_PARFOR SU2_OMP_FOR_DYN(roundUpDiv(nPoint, 2 * omp_get_max_threads()))
#else
#define CPHYSGEO_PARFOR SU2_OMP_FOR_()
#endif

#define END_CPHYSGEO_PARFOR END_SU2_OMP_FOR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a dynamic loop schedule, there were thread sanitizer findings in optimized debug builds that I suspect to be false positives. This changes the loop schedule if the thread sanitizer is used.

@@ -475,6 +478,7 @@ def main():
cosine_gust.test_iter = 79
cosine_gust.test_vals = [-2.418813, 0.004650, -0.001878, -0.000637, -0.000271]
cosine_gust.unsteady = True
cosine_gust.enabled_with_tsan = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests can be disabled for thread sanitizer testing. I disabled all tests that took longer than 10 minutes to run.

Comment on lines +862 to +875
new_iter = self.test_iter + 1

if running_with_tsan:

# detect restart
restart_iter = 0
for line in lines:
if line.strip().split("=")[0].strip() == "RESTART_ITER":
restart_iter = int(line.strip().split("=")[1].strip())
break

if new_iter > restart_iter + 2:
new_iter = restart_iter + 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread sanitzer tests run at most two iterations, to reduce the overall runtime cost.

try:
fromdate = time.ctime(os.stat(fromfile).st_mtime)
fromlines = open(fromfile, 'U').readlines()
if not running_with_tsan: # thread sanitizer tests only check the return code, no need to compare outputs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test running with the thread sanitizer stops with a non-zero return code as soon as a data race is detected, the actual test output is not parsed.

Comment on lines +83 to +86
- config_set: BaseOMP-tsan
flags: '--buildtype=debugoptimized -Dwith-omp=true -Denable-mixedprec=true -Denable-tecio=false --warnlevel=3'
- config_set: ReverseOMP-tsan
flags: '--buildtype=debugoptimized -Denable-autodiff=true -Denable-normal=false -Dwith-omp=true -Denable-mixedprec=true -Denable-tecio=false --warnlevel=3'
Copy link
Contributor Author

@jblueh jblueh Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimized debug builds have reasonable runtime and stack traces that are still useful if a data race is detected. This is in line with the recommendations.

I had to remove the -werror, there is a warning about control reaching the end of the non-void function CMultizoneDriver::Monitor(). IIRC a fix for this is pending somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep in #2011

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -werror can be readded there 👍

Copy link
Member

@pcarruscag pcarruscag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice addition 👍 you can leave the conversations where you documented the main aspects unresolved and I will merge as admin.

@pcarruscag pcarruscag merged commit 8f5d770 into develop Jul 18, 2023
17 checks passed
@pcarruscag pcarruscag deleted the feature_tsan_ci branch July 18, 2023 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants