You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Getting error 137 in build job in container. I have been debugging for about a week and cannot find the issue. I have looked in dmesg, auditctl, messages, journalctl and kublet logs and cannot find what is killing my build within the pod. I don't see errors anywhere for the kernel sending this message. It is also not a memory leak within the build. There is plenty of memory on the worker node and limits not being reached within the pod.
Current stats are:
Three master nodes: version 1.27.10 Rhel8 Kernel: 4.18.0.477 : containerd 1.7.11
Three worker nodes: version 1.27.10 Rocky8 Kernel 4.18.477 : containerd 1.7.11
Build pods get the error and CI jobs are giving the error. This doesn't happen on all the builds, but specific ones. My guesses are something with cgroups or kernel, but I cannot find the issue. The pod never OOMs during the process, but the build within the pod will get the error. Below is an example error from CI job:
2024-02-29 14:13:18 - INFO - | UPD include/generated/timeconst.h
2024-02-29 14:13:18 - INFO - | UPD include/generated/bounds.h
2024-02-29 14:13:18 - INFO - | CC arch/arm64/kernel/asm-offsets.s
2024-02-29 14:13:18 - INFO - | Killed
2024-02-29 14:13:18 - INFO - | make[1]: *** [/builds/***/kernel-source/scripts/Makefile.build:121: arch/arm64/kernel/asm-offsets.s] Error 137
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Getting error 137 in build job in container. I have been debugging for about a week and cannot find the issue. I have looked in dmesg, auditctl, messages, journalctl and kublet logs and cannot find what is killing my build within the pod. I don't see errors anywhere for the kernel sending this message. It is also not a memory leak within the build. There is plenty of memory on the worker node and limits not being reached within the pod.
Current stats are:
Three master nodes: version 1.27.10 Rhel8 Kernel: 4.18.0.477 : containerd 1.7.11
Three worker nodes: version 1.27.10 Rocky8 Kernel 4.18.477 : containerd 1.7.11
Build pods get the error and CI jobs are giving the error. This doesn't happen on all the builds, but specific ones. My guesses are something with cgroups or kernel, but I cannot find the issue. The pod never OOMs during the process, but the build within the pod will get the error. Below is an example error from CI job:
2024-02-29 14:13:18 - INFO - | UPD include/generated/timeconst.h
2024-02-29 14:13:18 - INFO - | UPD include/generated/bounds.h
2024-02-29 14:13:18 - INFO - | CC arch/arm64/kernel/asm-offsets.s
2024-02-29 14:13:18 - INFO - | Killed
2024-02-29 14:13:18 - INFO - | make[1]: *** [/builds/***/kernel-source/scripts/Makefile.build:121: arch/arm64/kernel/asm-offsets.s] Error 137
Beta Was this translation helpful? Give feedback.
All reactions