A crash was observed in 8.4.4 zebra when FRR processes were being shutting down by
automated script:
Thread 1 (LWP 16051):
#0 0x00007f63f449bb8f in raise () from /lib64/libpthread.so.0
FRRouting#1 0x00007f63f5c68300 in core_handler (signo=11, siginfo=0x7ffec6322cb0, context=<optimized out>) at lib/sigevent.c:261
FRRouting#2 <signal handler called>
FRRouting#3 zebra_router_get_table (zvrf=zvrf@entry=0x0, tableid=tableid@entry=254, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST) at /usr/include/bits/string_fortified.h:71
FRRouting#4 0x0000560d74192e6d in zebra_vrf_get_table_with_table_id (afi=AFI_IP, safi=SAFI_UNICAST, vrf_id=<optimized out>, table_id=254) at zebra/zebra_vrf.c:335
FRRouting#5 0x0000560d74186d20 in process_subq_early_route_add (ere=<optimized out>) at zebra/zebra_rib.c:2649
FRRouting#6 process_subq_early_route (lnode=0x560d7783c5f0) at zebra/zebra_rib.c:3127
FRRouting#7 process_subq (qindex=META_QUEUE_EARLY_ROUTE, subq=0x560d75cb1e40) at zebra/zebra_rib.c:3150
FRRouting#8 meta_queue_process (dummy=<optimized out>, data=0x560d75cba680) at zebra/zebra_rib.c:3202
FRRouting#9 0x00007f63f5c84550 in work_queue_run (thread=0x7ffec63233d0) at lib/workqueue.c:285
FRRouting#10 0x00007f63f5c7a4c1 in thread_call (thread=thread@entry=0x7ffec63233d0) at lib/thread.c:2008
FRRouting#11 0x00007f63f5c32088 in frr_run (master=0x560d75acbf50) at lib/libfrr.c:1216
FRRouting#12 0x0000560d7411c8f7 in main (argc=<optimized out>, argv=0x7ffec63237a8) at zebra/main.c:499
Below is analysis for the sequence of events which led to zebra crash:
- configs including VRF configs were deleted in zebra
- Some route messages for the deleted VRF were still in the zebra metaq waiting
to be processed
- when the route message was dequeued for processing, the VRF was already deleted
- lookup of zvrf failed for the route vrf_id in route-entry, but the NULL return
was not checked, resulted in SIGSEGV crash when it was dereferenced later
Signed-off-by: Jenny Yuan <[email protected]>