-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 system calls are entered with interrupts masked #35307
Comments
I think the title is a bit misleading. I looked briefly at this issue and I don't think that the system calls are entered with interrupt masked but sometimes the IRQs are disable during a syscall by some other arch code (are you sure that the |
On ARM64 there is a peculiarity with the FPU support. Everything that ends Because the FP context is huge, we have save areas only for actual threads So if you don't do any floating point and you don't encounter a va_start() Now I'm thinking of a way to avoid this IRQ masking in the future, at least |
Ah.... OK, thanks. I think that mostly explains this, though my original diagnosis that the interrupts were wrong at syscall entry doesn't quite fit. But then it's not impossible I was just wrong. This definitely defeats that assert I just added, though. But in this case the interrupt state doesn't represent a "lock" being broken, so as long as a z_swap() call doesn't lose FPU registers while in this state (it doesn't, right?) there's no reason to bother trying to bail. I think that's justification enough to just #if out ARM64 in that warning, and link here for the explanation. As far as the interrupt trick, that definitely seems like something we should address as a performance bug, though. Being unable to interrupt kernel code in the general case is a pretty severe latency problem. But maybe that's an "enhancement" kind of thing and not a "bug". |
Closing this since it seems like from the discussion this isn't a bug. Please open an enhancement for the peformance improvement. |
The proposed enhancement, and most likely the fix for the failing test
mentioned here, is in PR #35410.
|
I just submitted #35301, which will assert on the case where a thread with interrupts locked tries to context switch. This is an error, but it has traditionally worked in Zephyr and we haven't been able to detect it. Now we can.
But it broke in one spot in the test suite on qemu_cortex_a53_smp. And the symptom turned out to be that the ztest "1cpu" setup routine (which is a system call when USERSPACE=y) was coming out of the trap handler with masked interrupts. The function then creates a highest priority thread to "hold" a CPU, which I guess wants to preempt the current thread[1] and thus tries to context switch. Which hits the new warning.
It's a little suspicious that this got hit only in that one spot out of the whole test suite[2], so there may be more complexity here than I've diagnosed. But the symptom seems really clear. At the top of the system call I can do e.g.
__ASSERT(arch_irq_unlocked(arch_irq_lock()), "")
and watch it fail. Which is definitely 100% wrong. System calls need to operate with interrupts unmasked, otherwise the kernel becomes a giant latency trap.[1] This might be something to look at too: on x86 and xtensa SMP, the IPI announcing that new thread to the other CPU almost always gets picked up synchronously before the current thread reaches the schedule point. It's extremely rare to find a high priority thread spawned that tries to preempt the current thread. Maybe IPIs aren't being delivered promptly?
[2] And in fact that use of 1cpu on the test case was needless anyway, so I removed it. We don't even need to work around this ARM64 issue right now.
The text was updated successfully, but these errors were encountered: