Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel timeout_list repeatedly add a thread #16793

Closed
xiaoliang314 opened this issue Jun 13, 2019 · 6 comments
Closed

kernel timeout_list repeatedly add a thread #16793

xiaoliang314 opened this issue Jun 13, 2019 · 6 comments
Labels
bug The issue is a bug, or the PR is fixing a bug

Comments

@xiaoliang314
Copy link

xiaoliang314 commented Jun 13, 2019

When I debug my Bluetooth Mesh program, the kernel often crashes during communication. I traced this reason because it was caused by a thread being repeatedly added to the kernel timeout list. I am using a low speed MCU only 16M. The following is the gdb log information when an error occurs:

(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
assert_post_action (file=file@entry=0x103063c "/home/ubuntu/zephyr/kernel/timeout.c", line=line@entry=92)
at /home/ubuntu/zephyr/lib/os/assert.c:26
26 {
(gdb) bt
#0 assert_post_action (file=file@entry=0x103063c "/home/ubuntu/zephyr/kernel/timeout.c", line=line@entry=92)
at /home/ubuntu/zephyr/lib/os/assert.c:26
#1 0x01022c5e in z_add_timeout (to=to@entry=0x20000ae0 <adv_thread_data+24>,
fn=fn@entry=0x1021989 <z_thread_timeout>, ticks=1001) at /home/ubuntu/zephyr/kernel/timeout.c:92
#2 0x01021dc8 in z_add_thread_timeout (ticks=, th=0x20000ac8 <adv_thread_data>)
at /home/ubuntu/zephyr/kernel/include/timeout_q.h:48
#3 pend (thread=0x20000ac8 <adv_thread_data>, wait_q=wait_q@entry=0x20005500 <adv_thread_stack+880>,
timeout=timeout@entry=10000) at /home/ubuntu/zephyr/kernel/sched.c:374
#4 0x01021e1a in z_pend_curr (lock=lock@entry=0x20003f9c , key=..., key@entry=...,
wait_q=wait_q@entry=0x20005500 <adv_thread_stack+880>, timeout=timeout@entry=10000)
at /home/ubuntu/zephyr/kernel/sched.c:446
#5 0x01022534 in z_impl_k_sem_take (sem=sem@entry=0x20005500 <adv_thread_stack+880>, timeout=timeout@entry=10000)
at /home/ubuntu/zephyr/kernel/sem.c:160
#6 0x01006590 in k_sem_take (timeout=10000, sem=0x20005500 <adv_thread_stack+880>)
at /home/ubuntu/zephyr/samples/bluetooth/mesh_test_demo/build/xxxx/zephyr/include/generated/syscalls/kernel.h
#7 bt_hci_cmd_send_sync (opcode=opcode@entry=8202, buf=buf@entry=0x20008440 <net_buf_hci_cmd_pool>, rsp=rsp@entry=0x0) at /home/ubuntu/zephyr/subsys/bluetooth/host/hci_core.c:311
#8 0x01006a50 in set_advertise_enable (enable=enable@entry=false) at /home/ubuntu/zephyr/subsys/bluetooth/host/hci_core.c:374
#9 0x010076ba in bt_le_adv_stop () at /home/ubuntu/zephyr/subsys/bluetooth/host/hci_core.c:5402
#10 0x01007e7c in adv_send (buf=) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/adv.c:151
#11 adv_thread (p1=, p2=, p3=) at /home/ubuntu/zephyr/subsys/bluetooth/host/mesh/adv.c:213
#12 0x010241a6 in z_thread_entry (entry=0x1007d25 <adv_thread>, p1=, p2=, p3=) at /home/ubuntu/zephyr/lib/os/thread_entry.c:29
#13 0xaaaaaaaa in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame 2
#2 0x01021dc8 in z_add_thread_timeout (ticks=, th=0x20000ac8 <adv_thread_data>)
at /home/ubuntu/zephyr/kernel/include/timeout_q.h:48
48 z_add_timeout(&th->base.timeout, z_thread_timeout, ticks);
(gdb) p timeout_list
$1 = {{head = 0x20000ae0 <adv_thread_data+24>, next = 0x20000ae0 <adv_thread_data+24>}, {
tail = 0x20000b3c <beacon_timer+12>, prev = 0x20000b3c <beacon_timer+12>}}
(gdb) frame 1
#1 0x01022c5e in z_add_timeout (to=to@entry=0x20000ae0 <adv_thread_data+24>,
fn=fn@entry=0x1021989 <z_thread_timeout>, ticks=1001) at /home/ubuntu/zephyr/kernel/timeout.c:92
92 __ASSERT(!sys_dnode_is_linked(&to->node), "");
(gdb) p to
$2 = (struct _timeout *) 0x20000ae0 <adv_thread_data+24>
(gdb) p *to
$3 = {node = {{head = 0x20000b3c <beacon_timer+12>, next = 0x20000b3c <beacon_timer+12>}, {
tail = 0x20009850 <timeout_list>, prev = 0x20009850 <timeout_list>}}, dticks = 14,
fn = 0x1021989 <z_thread_timeout>}
(gdb)

zephyer version:
commit 4278bf5 (HEAD -> master, origin/master, origin/HEAD)
Author: Michael Scott [email protected]
Date: Wed Mar 13 15:04:40 2019 -0700

scripts/dts: Fix for PyYAML 5.1 Loader changes

PyYAML 5.1 was just released and it doesn't support !include as
previous versions do.  This breaks our DTS bindings parsing.

Let's fix our extract_dts_include.py script to work with both
3.13 and 5.1.

Also, update the pyyaml requirement to >=3.13 to be sure we're
compatible.

Fixes: https:/zephyrproject-rtos/zephyr/issues/14496

Signed-off-by: Michael Scott <[email protected]>
@xiaoliang314 xiaoliang314 added the bug The issue is a bug, or the PR is fixing a bug label Jun 13, 2019
@xiaoliang314
Copy link
Author

a_add_timeout attempts to add a thread to timeout_list causes the assertion to fail because the inserted thread already exists in timeout_list

@xiaoliang314
Copy link
Author

I have modified the stack of the adv thread to 1024 bytes, and the original 768 has the risk of overflow.

@aescolar
Copy link
Member

@jhedberg Are you able to help here?

@jhedberg
Copy link
Member

@jhedberg Are you able to help here?

I don't know if I can help, but this all sounds quite familiar. I think a bug (or several of them) were fixed in the kernel timeout/scheduling code recently for this kind of issues. So as the first thing I'd recommend trying with the latest master branch to see if the issue is reproducible.

@xiaoliang314
Copy link
Author

@jhedberg Thank you, It looks like this time, right?

commit 4521e0c
kernel/sched: Mark sleeping threads suspended

On SMP, there was a bug where the logic that re-adds _current to the
run queue at swap time would accidentally reschedule threads that had
just gone to sleep, because the is_thread_prevented_from_running()
predicate only tests for threads that are "suspended" or "pending" and
not sleeping.

Overload _THREAD_SUSPENDED to indicate "sleeping" also. Simple fix
for an immediate bug, though long term we really want to unify all the
blocked conditions to prevent this kind of state bug.

Signed-off-by: Andy Ross [email protected]

@xiaoliang314
Copy link
Author

The problem is solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

No branches or pull requests

3 participants