-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ospf6d: Stop crash in ospf6_write #13897
ospf6d: Stop crash in ospf6_write #13897
Conversation
I'm seeing crashes in ospf6_write on the `assert(node)`. The only sequence of events that I see that could possibly cause this to happen is this: a) Someone has scheduled a outgoing write to the ospf6->t_write and placed item(s) on the ospf6->oi_write_q b) A decision is made in ospf6_send_lsupdate() to send an immediate packet via a event_execute(..., ospf6_write,....). c) ospf6_write is called and the oi_write_q is cleaned out. d) the t_write event is now popped and the oi_write_q is empty and FRR asserts on the `assert(node)` <crash> When event_execute is called for ospf6_write, just cancel the t_write event. If ospf6_write has more data to send at the end of the function it will reschedule itself. I've only seen this crash one time and am unable to reliably reproduce this at all. But this is the only mechanism that I can see that could make this happen, given how little the oi_write_q is actually touched in code. Signed-off-by: Donald Sharp <[email protected]>
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDTest incomplete. See below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: IncompleteAddresssanitizer topotests part 4: Incomplete(check logs for details)Successful on other platforms/tests
|
ci:rerun ci system looks like it lost it's mind |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 3: Failed (click for details)Topotests Ubuntu 18.04 i386 part 3: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12695/artifact/TOPO3U18I386/TopotestDetails/Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO3U18I386-12695/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 3 Successful on other platforms/tests
|
ci:rerun |
@Mergifyio backport dev/9.0 stable/8.5 stable/8.4 |
✅ Backports have been created
|
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12701/ This is a comment from an automated CI system. |
ospf6d: Stop crash in ospf6_write (backport #13897)
I'm seeing crashes in ospf6_write on the
assert(node)
. The only sequence of events that I see that could possibly cause this to happen is this:a) Someone has scheduled a outgoing write to the ospf6->t_write and placed item(s) on the ospf6->oi_write_q
b) A decision is made in ospf6_send_lsupdate() to send an immediate packet via a event_execute(..., ospf6_write,....). c) ospf6_write is called and the oi_write_q is cleaned out. d) the t_write event is now popped and the oi_write_q is empty and FRR asserts on the
assert(node)
When event_execute is called for ospf6_write, just cancel the t_write event. If ospf6_write has more data to send at the end of the function it will reschedule itself. I've only seen this crash one time and am unable to reliably reproduce this at all. But this is the only mechanism that I can see that could make this happen, given how little the oi_write_q is actually touched in code.