-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preempt-rt: dmesg causes device hang and reboot #1165
Comments
Hey @lfdmn I didn't try that myself. I can do that and get back to you. |
Sounds like a locking problem. The systemd journal code uses the |
Thanks @madisongh for the insights |
Thanks for the details! I will drop a message to the forum about this. |
Hi, small update on this issue. Stepping through the code with gdbserver is enough to make the device hang and reboot. So it is not just related to dmesg as I originally thought. Any hints where I could start looking for this kind of locking problems? I'm not familiar with kernel development :) |
I've had some success with lock debugging in the kernel before: https://www.kernelconfig.io/config_prove_locking?q=&kernelversion=4.9&arch=arm64 However it works by spitting out prints to dmesg so I'm not sure how it will do if dmesg is related to the locking issue. You might want to experiment with redirecting prints to the console and see if that helps. |
I have been trying different things but I'm out of luck redirecting the printk to console. Here are some console prints when stepping with gdbserver if that would talk to someone.
rtmutex.c:1070 points to some mutex ownership issue: /*
* Slow path lock function spin_lock style: this variant is very
* careful not to miss any non-lock wakeups.
*
* We store the current state under p->pi_lock in p->saved_state and
* the try_to_wake_up() code handles this accordingly.
*/
static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock,
bool mg_off)
{
struct task_struct *lock_owner, *self = current;
struct rt_mutex_waiter waiter, *top_waiter;
unsigned long flags;
int ret;
rt_mutex_init_waiter(&waiter, true);
raw_spin_lock_irqsave(&lock->wait_lock, flags);
if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
return;
}
BUG_ON(rt_mutex_owner(lock) == self); // <------------------------ |
@lfdmn Can you try with the R32.7.4 kernel? I see NVIDIA added another patch in the printk code to work around a locking issue, maybe that will help? |
Thanks @madisongh. I gave a shot at compiling the demo distro demo-image-base image and now systemd gets blocked at boot and I never get to the shell. If someone is interrested, here are the logs: For the dmesg case NVidia has made a hack to disable I had tried it with meta-tegra and dmesg was reporting nothing. With the Ubuntu distro there is an output, I'm not sure why would that be. With R32.7.3 I found other locks with gdbserver (reboot) and smartcl (no reboot but losing disk access). Here is the current topic on the forum https://forums.developer.nvidia.com/t/jetpack-4-6-3-preempt-rt-patkernel-reboot-loop/257804/ |
Hey @lfdmn |
Hi, Reviving an old issue The lock in tegra demo distro is related the nvpmodel deadlocking during the boot. One workaround is to set nvpmodel to MAXN Hot plugging CPU isn't stable https://forums.developer.nvidia.com/t/r32-7-1-4-9-253-rt168-info-possible-circular-locking-dependency-detected-nvpmodel-all-q-mutex-hp-lock/221391/9 But the lock may still appear in different places depending on timings and scheduled work. The kernel has been fine up to 4.6.1. This is something new that came with the introduction of the hynix memory type. I've tried using the latest JetPack and revert to 4.6.1 kernel but this is not possible. Loading the gpu kernel module fails. More than the memory has changed I wonder if I have been flagged on the forums but I don't get any answers anymore :) Does anyone have a way to reach out to NVidia to get their attention or would be able or know someone able to help debugging the patches? |
This PR OE4T/linux-tegra-4.9#43 addresses the issue. |
Describe the bug
I'm running on dunfell branch (R32.7.3 JetPack release: 4.6.3) with premp-rt patches applied. I disabled all the fragments except for
CONFIG_OVERLAY_FS=y
The
dmsg
command hangs the kernel and the device reboots.journalctl -k
works without problem.I see the same behavior with:
To Reproduce
Steps to reproduce the behavior:
devtool modify linux-tegra
cd scripts
./rt-patch.sh apply-patches
dmsg
Does anyone see the same behavior?
The text was updated successfully, but these errors were encountered: