Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Quagga: Dev

[RFC PATCH] lib: fix thread_cancel_event()

 

 

Quagga dev RSS feed   Index | Next | Previous | View Threaded


jorge at dti2

May 7, 2012, 9:17 AM

Post #1 of 5 (311 views)
Permalink
[RFC PATCH] lib: fix thread_cancel_event()

From: "Jorge Boncompte [DTI2]" <jorge [at] dti2>

ospfd was crashing some times on neighbour going down. The cause was that
ospf_nsm_event() was accessing already freed memory in ospf_nbr_delete()
call from ospf_nsm_event().

What happens is that since commit b5043aab (lib: fix incorrect thread
list...) now a thread can be on the event and ready lists but
thread_cancel_event() doesn't account for that.

* thread.c: (thread_cancel_event) loop on the ready list too to cancel
pending events.

Signed-off-by: Jorge Boncompte [DTI2] <jorge [at] dti2>
---
lib/thread.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/lib/thread.c b/lib/thread.c
index b36c43a..dd0413b 100644
--- a/lib/thread.c
+++ b/lib/thread.c
@@ -916,6 +916,24 @@ thread_cancel_event (struct thread_master *m, void *arg)
thread_add_unuse (m, t);
}
}
+
+ /* thread can be on the ready list too */
+ thread = m->ready.head;
+ while (thread)
+ {
+ struct thread *t;
+
+ t = thread;
+ thread = t->next;
+
+ if (t->arg == arg)
+ {
+ ret++;
+ thread_list_delete (&m->ready, t);
+ t->type = THREAD_UNUSED;
+ thread_add_unuse (m, t);
+ }
+ }
return ret;
}

--
1.7.8.3


_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


jorge at dti2

May 7, 2012, 11:00 AM

Post #2 of 5 (300 views)
Permalink
Re: [RFC PATCH] lib: fix thread_cancel_event() [In reply to]

El 07/05/2012 18:17, Jorge Boncompte [DTI2] escribió:
> From: "Jorge Boncompte [DTI2]" <jorge [at] dti2>
>
> ospfd was crashing some times on neighbour going down. The cause was that
> ospf_nsm_event() was accessing already freed memory in ospf_nbr_delete()
> call from ospf_nsm_event().
>
> What happens is that since commit b5043aab (lib: fix incorrect thread
> list...) now a thread can be on the event and ready lists but
> thread_cancel_event() doesn't account for that.
>
> * thread.c: (thread_cancel_event) loop on the ready list too to cancel
> pending events.

It is a RFC patch because I am not sure if other classes of threads (i am
thinking timers) may need a fix like this. Paul you are the expert in this area
I think so your comments are most than welcomed.

>
> Signed-off-by: Jorge Boncompte [DTI2] <jorge [at] dti2>
> ---
> lib/thread.c | 18 ++++++++++++++++++
> 1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/lib/thread.c b/lib/thread.c
> index b36c43a..dd0413b 100644
> --- a/lib/thread.c
> +++ b/lib/thread.c
> @@ -916,6 +916,24 @@ thread_cancel_event (struct thread_master *m, void *arg)
> thread_add_unuse (m, t);
> }
> }
> +
> + /* thread can be on the ready list too */
> + thread = m->ready.head;
> + while (thread)
> + {
> + struct thread *t;
> +
> + t = thread;
> + thread = t->next;
> +
> + if (t->arg == arg)
> + {
> + ret++;
> + thread_list_delete (&m->ready, t);
> + t->type = THREAD_UNUSED;
> + thread_add_unuse (m, t);
> + }
> + }
> return ret;
> }
>


_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

May 8, 2012, 8:51 AM

Post #3 of 5 (299 views)
Permalink
Re: [RFC PATCH] lib: fix thread_cancel_event() [In reply to]

On Mon, 7 May 2012, Jorge Boncompte [DTI2] wrote:

> What happens is that since commit b5043aab (lib: fix incorrect thread
> list...) now a thread can be on the event and ready lists but
> thread_cancel_event() doesn't account for that.

Hmm, how does a thread end up on lists? The source of that is what needs
fixing, very likely - rather than patching the symptom in cancel_event.

Hard to see how b5043aabb causes that to happen too.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
Those who have had no share in the good fortunes of the mighty
Often have a share in their misfortunes.
-- Bertolt Brecht, "The Caucasian Chalk Circle"
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


jorge at dti2

May 8, 2012, 9:56 AM

Post #4 of 5 (296 views)
Permalink
Re: [RFC PATCH] lib: fix thread_cancel_event() [In reply to]

El 08/05/2012 17:51, Paul Jakma escribió:
> On Mon, 7 May 2012, Jorge Boncompte [DTI2] wrote:
>
>> What happens is that since commit b5043aab (lib: fix incorrect thread
>> list...) now a thread can be on the event and ready lists but
>> thread_cancel_event() doesn't account for that.
>
> Hmm, how does a thread end up on lists? The source of that is what needs fixing,
> very likely - rather than patching the symptom in cancel_event.

maybe I should have said "event" instead of "thread"?

What I think happened before b5043aabb, is that events were queued to the event
list, thread_process() dequeued ONE event to the ready list, it got executed
"ospf_nsm_event()"... the ospf NSM moved to NSM_Deleted state, so
ospf_nsm_event() called ospf_nbr_delete() -> ospf_nbr_free() ->
thread_cancel_event(), that cancelled the events for the same "nbr" in the
events list, and last it freed neighbour memory.

After b5043aabb... thread_process() dequeues ALL queued events from the events
to the ready list, and somehow there are several for the same neighbour, so
after the neighbour it's deleted in a first pass, as thread_cancel_event() only
cancelled the events on the events list, on the next scheduler iteration that
calls ospf_nsm_event() for the already freed neighbour, the daemon crashes.

Does this make sense? Why there are several events queued? I don't know, it was
a lossy wireless link that triggered it if that helps. Maybe a packet was
received while ospfd declared the neighbour dead?

Likewise, if I did understood how this work, that I am not saying I really do
;-), expired timers are moved to the ready list all at once now too, so could
another codepath trigger a similar problem? For example from an event that
thread_cancel()'s a timer that it's already in the ready list?

Jorge

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

May 9, 2012, 2:46 AM

Post #5 of 5 (297 views)
Permalink
Re: [RFC PATCH] lib: fix thread_cancel_event() [In reply to]

On Tue, 8 May 2012, Jorge Boncompte [DTI2] wrote:

> After b5043aabb... thread_process() dequeues ALL queued events
> from the events to the ready list, and somehow there are several for the
> same neighbour, so after the neighbour it's deleted in a first pass, as
> thread_cancel_event() only cancelled the events on the events list, on
> the next scheduler iteration that calls ospf_nsm_event() for the already
> freed neighbour, the daemon crashes.

Ah, yes. It should be deleting by arg, not just a specific thread. So the
patch is correct. Good catch.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
I guess the Little League is even littler than we thought.
-- D. Cavett
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev

Quagga dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.