Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch with debug flag is getting failed with latest gramine and gsc master with Segmentation fault #87

Open
anjalirai-intel opened this issue Oct 4, 2024 · 0 comments · May be fixed by gramineproject/gramine#2017

Comments

@anjalirai-intel
Copy link

The commit [PAL/{Linux,Linux-SGX}] that added trace logs for raw syscalls) in Gramine introduced a bug. As a result, the PyTorch build with the debug flag is failing with the GSC master branch, leading to a segmentation fault.

This issue occurs with the latest versions of Gramine and GSC. To reproduce it, you need to modify the curation_script.sh because, by default, the contrib repository uses the tagged versions of GSC and Gramine.

Steps to reproduce:

  1. Generate an OpenSSL key: openssl genrsa -3 -out enclave-key.pem 3072
  2. Clone the contrib repository: git clone https:/gramineproject/contrib.git
  3. Apply the changes mentioned below to utils/curation_script.sh
  4. Run base_image_helper.sh
  5. Execute: python3 curate.py pytorch pytorch-encrypted
  6. Provide the signing key path generated in Step 1 and wait for the build to finish
  7. Run the commands from commands.txt

Curation Script Updates:

diff --git a/Intel-Confidential-Compute-for-X/util/curation_script.sh b/Intel-Confidential-Compute-for-X/util/curation_script.sh
index 039813c..81fc3ce 100755
--- a/Intel-Confidential-Compute-for-X/util/curation_script.sh
+++ b/Intel-Confidential-Compute-for-X/util/curation_script.sh
@@ -126,8 +126,9 @@ create_gsc_image () {
     rm -rf gsc >/dev/null 2>&1
     git clone https:/gramineproject/gsc.git
     cd gsc
-    git checkout $(git tag --list 'v*.*' --sort=taggerdate | tail -1)
+    git checkout master
     cp -f config.yaml.template config.yaml
+    sed -i "s/Branch.*master.*\|Branch.*v1.7.*/Branch: 'b6a2d79b641aed7a52220246ad238d241a6fc995'/" config.yaml
     sed -i 's|ubuntu:.*|'$distro'"|' config.yaml
 
     ./gsc build $cmdline_flag --buildtype $1 $app_image_x  $WORKLOAD_DIR/$app_image_manifest

By following these steps, you should be able to reproduce the segmentation fault issue with the Gramine commit and GSC master branch.

Error:

(pal_exception.c:237:handle_ud) trace: Emulating raw syscall instruction with number 202 at address libgomp.so.1+0x1fccc (addr = 0x2cbd3ccc)
(pal_exception.c:237:handle_ud) trace: Emulating raw syscall instruction with number 202 at address libgomp.so.1+0x1fccc (addr = 0x2cbd3ccc)
(pal_exception.c:237:handle_ud) trace: Emulating raw syscall instruction with number 202 at address libgomp.so.1+0x1fccc (addr = 0x2cbd3ccc)
(host_exception.c:141:handle_sync_signal) error: Unexpected segmentation fault (SIGSEGV) occurred inside untrusted PAL (libc-2.31.so+0x9a1fe (addr = 0x7f3b1476a1fe))

This issue is consistently reproducible on Azure system but can be seen intermittently for other workloads on different host. I have attached the debug logs as well.

test_pytorch_default_with_debug_console.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant