jorge at dti2
May 8, 2012, 9:56 AM
Post #4 of 5
El 08/05/2012 17:51, Paul Jakma escribió:
Re: [RFC PATCH] lib: fix thread_cancel_event()
[In reply to]
> On Mon, 7 May 2012, Jorge Boncompte [DTI2] wrote:
>> What happens is that since commit b5043aab (lib: fix incorrect thread
>> list...) now a thread can be on the event and ready lists but
>> thread_cancel_event() doesn't account for that.
> Hmm, how does a thread end up on lists? The source of that is what needs fixing,
> very likely - rather than patching the symptom in cancel_event.
maybe I should have said "event" instead of "thread"?
What I think happened before b5043aabb, is that events were queued to the event
list, thread_process() dequeued ONE event to the ready list, it got executed
"ospf_nsm_event()"... the ospf NSM moved to NSM_Deleted state, so
ospf_nsm_event() called ospf_nbr_delete() -> ospf_nbr_free() ->
thread_cancel_event(), that cancelled the events for the same "nbr" in the
events list, and last it freed neighbour memory.
After b5043aabb... thread_process() dequeues ALL queued events from the events
to the ready list, and somehow there are several for the same neighbour, so
after the neighbour it's deleted in a first pass, as thread_cancel_event() only
cancelled the events on the events list, on the next scheduler iteration that
calls ospf_nsm_event() for the already freed neighbour, the daemon crashes.
Does this make sense? Why there are several events queued? I don't know, it was
a lossy wireless link that triggered it if that helps. Maybe a packet was
received while ospfd declared the neighbour dead?
Likewise, if I did understood how this work, that I am not saying I really do
;-), expired timers are moved to the ready list all at once now too, so could
another codepath trigger a similar problem? For example from an event that
thread_cancel()'s a timer that it's already in the ready list?
Quagga-dev mailing list
Quagga-dev [at] lists