Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Quagga: Dev

[PATCH RFC] OSPF vertices memory exhaustion

 

 

First page Previous page 1 2 Next page Last page  View All Quagga dev RSS feed   Index | Next | Previous | View Threaded


equinox at opensourcerouting

Jun 19, 2012, 6:28 AM

Post #1 of 40 (809 views)
Permalink
[PATCH RFC] OSPF vertices memory exhaustion

This was found in scale testing at OSR. The fix is a very symptom-oriented
one, it seems that the same vertex simply gets added to the SPF tree a zillion
times, exhausting available memory. This may or may not be an indication of a
more severe bug somewhere else, review and/or comments very welcome...

Code really explains it best, so here:

[this patch is not production-quality]

diff --git a/ospfd/ospf_spf.c b/ospfd/ospf_spf.c
index d7f9ba2..39c6026 100644
--- a/ospfd/ospf_spf.c
+++ b/ospfd/ospf_spf.c
@@ -423,6 +423,7 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
unsigned int distance)
{
struct vertex_parent *vp;
+ int backlink;

/* we must have a newhop, and a distance */
assert (v && w && newhop);
@@ -455,9 +456,26 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
ospf_spf_flush_parents (w);
w->distance = distance;
}
-
+
+ backlink = ospf_lsa_has_link (w->lsa, v->lsa);
+
+ if (distance == w->distance)
+ {
+ struct listnode *node, *nnode;
+ for (ALL_LIST_ELEMENTS (w->parents, node, nnode, vp))
+ if (vp->parent == v && vp->nexthop == newhop && vp->backlink == backlink)
+ {
+ if (IS_DEBUG_OSPF_EVENT)
+ zlog_debug ("%s: duplicate nexthop", __func__);
+ return;
+ }
+
+ if (IS_DEBUG_OSPF_EVENT)
+ zlog_debug ("%s: new equal-cost %d nexthop", __func__, distance);
+ }
+
/* new parent is <= existing parents, add it to parent list */
- vp = vertex_parent_new (v, ospf_lsa_has_link (w->lsa, v->lsa), newhop);
+ vp = vertex_parent_new (v, backlink, newhop);
listnode_add (w->parents, vp);

return;
--
1.7.3.4

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Jun 19, 2012, 7:22 AM

Post #2 of 40 (778 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On vacation now but you could try fixing this code:

* http://blogs.sun.com/paulj/entry/the_difference_a_line_makes
*/
if (added)
return added;

It is wrong which I have pointed out earlier but Paul claimed it does not matter.
Try moving it to here:
if (vp->parent == area->spf) /* connects to root? */
{
/* 16.1.1 para 5. ...the parent vertex is a network that
* directly connects the calculating router to the destination
* router. The list of next hops is then determined by
* examining the destination's router-LSA...
*/

assert(w->type == OSPF_VERTEX_ROUTER);
while ((l = ospf_get_next_link (w, v, l)))
{
/* ...For each link in the router-LSA that points back to the
* parent network, the link's Link Data field provides the IP
* address of a next hop router. The outgoing interface to
* use can then be derived from the next hop IP address (or
* it can be inherited from the parent network).
*/
nh = vertex_nexthop_new ();
nh->oi = vp->nexthop->oi;
nh->router = l->link_data;
added = 1;
ospf_spf_add_parent (v, w, nh, distance);
}
return added; <---- here
}
}

-----quagga-dev-bounces [at] lists wrote: -----

> To: quagga-dev [at] lists
> From: David Lamparter
> Sent by: quagga-dev-bounces [at] lists
> Date: 19/06/2012 15:44
> Cc: David Lamparter <equinox [at] opensourcerouting>
> Subject: [quagga-dev 9433] [PATCH RFC] OSPF vertices memory exhaustion
>
>
> This was found in scale testing at OSR. The fix is a very symptom-oriented
> one, it seems that the same vertex simply gets added to the SPF tree a zillion
> times, exhausting available memory. This may or may not be an indication of a
> more severe bug somewhere else, review and/or comments very welcome...
>
> Code really explains it best, so here:
>
> [this patch is not production-quality]
>
> diff --git a/ospfd/ospf_spf.c b/ospfd/ospf_spf.c
> index d7f9ba2..39c6026 100644
> --- a/ospfd/ospf_spf.c
> +++ b/ospfd/ospf_spf.c
> @@ -423,6 +423,7 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
> unsigned int distance)
> {
> struct vertex_parent *vp;
> + int backlink;
>
> /* we must have a newhop, and a distance */
> assert (v && w && newhop);
> @@ -455,9 +456,26 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
> ospf_spf_flush_parents (w);
> w->distance = distance;
> }
> -
> +
> + backlink = ospf_lsa_has_link (w->lsa, v->lsa);
> +
> + if (distance == w->distance)
> + {
> + struct listnode *node, *nnode;
> + for (ALL_LIST_ELEMENTS (w->parents, node, nnode, vp))
> + if (vp->parent == v && vp->nexthop == newhop && vp->backlink == backlink)
> + {
> + if (IS_DEBUG_OSPF_EVENT)
> + zlog_debug ("%s: duplicate nexthop", __func__);
> + return;
> + }
> +
> + if (IS_DEBUG_OSPF_EVENT)
> + zlog_debug ("%s: new equal-cost %d nexthop", __func__, distance);
> + }
> +
> /* new parent is <= existing parents, add it to parent list */
> - vp = vertex_parent_new (v, ospf_lsa_has_link (w->lsa, v->lsa), newhop);
> + vp = vertex_parent_new (v, backlink, newhop);
> listnode_add (w->parents, vp);
>
> return;
> --
> 1.7.3.4
>
> _______________________________________________
> Quagga-dev mailing list
> Quagga-dev [at] lists
> http://lists.quagga.net/mailman/listinfo/quagga-dev
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


equinox at diac24

Jun 19, 2012, 7:41 AM

Post #3 of 40 (775 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Tue, Jun 19, 2012 at 04:22:51PM +0200, Joakim Tjernlund wrote:
> On vacation now but you could try fixing this code:
>
> * http://blogs.sun.com/paulj/entry/the_difference_a_line_makes
> */
> if (added)
> return added;
>
> It is wrong which I have pointed out earlier but Paul claimed it does not matter.

Another argument for me to get this entire codepart into my brain RAM
and do a full review... I have to work down a stack of other stuff
first though (e.g. the unnumbered stuff) :S

Have a nice vacation!

-David

> Try moving it to here:
> if (vp->parent == area->spf) /* connects to root? */
> {
> /* 16.1.1 para 5. ...the parent vertex is a network that
> * directly connects the calculating router to the destination
> * router. The list of next hops is then determined by
> * examining the destination's router-LSA...
> */
>
> assert(w->type == OSPF_VERTEX_ROUTER);
> while ((l = ospf_get_next_link (w, v, l)))
> {
> /* ...For each link in the router-LSA that points back to the
> * parent network, the link's Link Data field provides the IP
> * address of a next hop router. The outgoing interface to
> * use can then be derived from the next hop IP address (or
> * it can be inherited from the parent network).
> */
> nh = vertex_nexthop_new ();
> nh->oi = vp->nexthop->oi;
> nh->router = l->link_data;
> added = 1;
> ospf_spf_add_parent (v, w, nh, distance);
> }
> return added; <---- here
> }
> }
>
> -----quagga-dev-bounces [at] lists wrote: -----
>
> > To: quagga-dev [at] lists
> > From: David Lamparter
> > Sent by: quagga-dev-bounces [at] lists
> > Date: 19/06/2012 15:44
> > Cc: David Lamparter <equinox [at] opensourcerouting>
> > Subject: [quagga-dev 9433] [PATCH RFC] OSPF vertices memory exhaustion
> >
> >
> > This was found in scale testing at OSR. The fix is a very symptom-oriented
> > one, it seems that the same vertex simply gets added to the SPF tree a zillion
> > times, exhausting available memory. This may or may not be an indication of a
> > more severe bug somewhere else, review and/or comments very welcome...
> >
> > Code really explains it best, so here:
> >
> > [this patch is not production-quality]
> >
> > diff --git a/ospfd/ospf_spf.c b/ospfd/ospf_spf.c
> > index d7f9ba2..39c6026 100644
> > --- a/ospfd/ospf_spf.c
> > +++ b/ospfd/ospf_spf.c
> > @@ -423,6 +423,7 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
> > unsigned int distance)
> > {
> > struct vertex_parent *vp;
> > + int backlink;
> >
> > /* we must have a newhop, and a distance */
> > assert (v && w && newhop);
> > @@ -455,9 +456,26 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
> > ospf_spf_flush_parents (w);
> > w->distance = distance;
> > }
> > -
> > +
> > + backlink = ospf_lsa_has_link (w->lsa, v->lsa);
> > +
> > + if (distance == w->distance)
> > + {
> > + struct listnode *node, *nnode;
> > + for (ALL_LIST_ELEMENTS (w->parents, node, nnode, vp))
> > + if (vp->parent == v && vp->nexthop == newhop && vp->backlink == backlink)
> > + {
> > + if (IS_DEBUG_OSPF_EVENT)
> > + zlog_debug ("%s: duplicate nexthop", __func__);
> > + return;
> > + }
> > +
> > + if (IS_DEBUG_OSPF_EVENT)
> > + zlog_debug ("%s: new equal-cost %d nexthop", __func__, distance);
> > + }
> > +
> > /* new parent is <= existing parents, add it to parent list */
> > - vp = vertex_parent_new (v, ospf_lsa_has_link (w->lsa, v->lsa), newhop);
> > + vp = vertex_parent_new (v, backlink, newhop);
> > listnode_add (w->parents, vp);
> >
> > return;
> > --
> > 1.7.3.4
> >
> > _______________________________________________
> > Quagga-dev mailing list
> > Quagga-dev [at] lists
> > http://lists.quagga.net/mailman/listinfo/quagga-dev
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Jun 19, 2012, 4:13 PM

Post #4 of 40 (771 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

-----David Lamparter <equinox [at] diac24> wrote: -----

> To: Joakim Tjernlund <joakim.tjernlund [at] transmode>
> From: David Lamparter <equinox [at] diac24>
> Date: 19/06/2012 16:41
> Cc: David Lamparter <equinox [at] opensourcerouting>, quagga-dev [at] lists
> Subject: Re: [quagga-dev 9433] [PATCH RFC] OSPF vertices memory exhaustion
>
>
> On Tue, Jun 19, 2012 at 04:22:51PM +0200, Joakim Tjernlund wrote:
> > On vacation now but you could try fixing this code:
> >
> > * http://blogs.sun.com/paulj/entry/the_difference_a_line_makes
> > */
> > if (added)
> > return added;
> >
> > It is wrong which I have pointed out earlier but Paul claimed it does not matter.
>
> Another argument for me to get this entire codepart into my brain RAM
> and do a full review... I have to work down a stack of other stuff
> first though (e.g. the unnumbered stuff) :S

If you cannot process potential anwers, why do you ask?
I took the time to point out a possible solution(on my vacation) and you
just throw it out the window?

I have head this unnumbered argument over and over again. I can not understand what is
holding you back. I have posted patches with the solution long time ago and it is very
sad that nothing is coming from it(BIRD took the same patches in a hartbeat).
If you cannot wrap your head aroud it, just test them?
I have been using them for years in networks with hundreds of nodes so then cant be that
wrong.

>
> Have a nice vacation!

Thanks, going to the niagara falls next :)

>
> -David
>
> > Try moving it to here:
> > if (vp->parent == area->spf) /* connects to root? */
> > {
> > /* 16.1.1 para 5. ...the parent vertex is a network that
> > * directly connects the calculating router to the destination
> > * router. The list of next hops is then determined by
> > * examining the destination's router-LSA...
> > */
> >
> > assert(w->type == OSPF_VERTEX_ROUTER);
> > while ((l = ospf_get_next_link (w, v, l)))
> > {
> > /* ...For each link in the router-LSA that points back to the
> > * parent network, the link's Link Data field provides the IP
> > * address of a next hop router. The outgoing interface to
> > * use can then be derived from the next hop IP address (or
> > * it can be inherited from the parent network).
> > */
> > nh = vertex_nexthop_new ();
> > nh->oi = vp->nexthop->oi;
> > nh->router = l->link_data;
> > added = 1;
> > ospf_spf_add_parent (v, w, nh, distance);
> > }
> > return added; <---- here
> > }
> > }
> >
> > -----quagga-dev-bounces [at] lists wrote: -----
> >
> > > To: quagga-dev [at] lists
> > > From: David Lamparter
> > > Sent by: quagga-dev-bounces [at] lists
> > > Date: 19/06/2012 15:44
> > > Cc: David Lamparter <equinox [at] opensourcerouting>
> > > Subject: [quagga-dev 9433] [PATCH RFC] OSPF vertices memory exhaustion
> > >
> > >
> > > This was found in scale testing at OSR. The fix is a very symptom-oriented
> > > one, it seems that the same vertex simply gets added to the SPF tree a zillion
> > > times, exhausting available memory. This may or may not be an indication of a
> > > more severe bug somewhere else, review and/or comments very welcome...
> > >
> > > Code really explains it best, so here:
> > >
> > > [this patch is not production-quality]
> > >
> > > diff --git a/ospfd/ospf_spf.c b/ospfd/ospf_spf.c
> > > index d7f9ba2..39c6026 100644
> > > --- a/ospfd/ospf_spf.c
> > > +++ b/ospfd/ospf_spf.c
> > > @@ -423,6 +423,7 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
> > > unsigned int distance)
> > > {
> > > struct vertex_parent *vp;
> > > + int backlink;
> > >
> > > /* we must have a newhop, and a distance */
> > > assert (v && w && newhop);
> > > @@ -455,9 +456,26 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
> > > ospf_spf_flush_parents (w);
> > > w->distance = distance;
> > > }
> > > -
> > > +
> > > + backlink = ospf_lsa_has_link (w->lsa, v->lsa);
> > > +
> > > + if (distance == w->distance)
> > > + {
> > > + struct listnode *node, *nnode;
> > > + for (ALL_LIST_ELEMENTS (w->parents, node, nnode, vp))
> > > + if (vp->parent == v && vp->nexthop == newhop && vp->backlink == backlink)
> > > + {
> > > + if (IS_DEBUG_OSPF_EVENT)
> > > + zlog_debug ("%s: duplicate nexthop", __func__);
> > > + return;
> > > + }
> > > +
> > > + if (IS_DEBUG_OSPF_EVENT)
> > > + zlog_debug ("%s: new equal-cost %d nexthop", __func__, distance);
> > > + }
> > > +
> > > /* new parent is <= existing parents, add it to parent list */
> > > - vp = vertex_parent_new (v, ospf_lsa_has_link (w->lsa, v->lsa), newhop);
> > > + vp = vertex_parent_new (v, backlink, newhop);
> > > listnode_add (w->parents, vp);
> > >
> > > return;
> > > --
> > > 1.7.3.4
> > >
> > > _______________________________________________
> > > Quagga-dev mailing list
> > > Quagga-dev [at] lists
> > > http://lists.quagga.net/mailman/listinfo/quagga-dev
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


equinox at opensourcerouting

Jun 19, 2012, 8:53 PM

Post #5 of 40 (770 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Wed, Jun 20, 2012 at 01:13:01AM +0200, Joakim Tjernlund wrote:
> > On Tue, Jun 19, 2012 at 04:22:51PM +0200, Joakim Tjernlund wrote:
> > > On vacation now but you could try fixing this code:
> > >
> > > * http://blogs.sun.com/paulj/entry/the_difference_a_line_makes
> > > */
> > > if (added)
> > > return added;
> > >
> > > It is wrong which I have pointed out earlier but Paul claimed it does not matter.
> >
> > Another argument for me to get this entire codepart into my brain RAM
> > and do a full review... I have to work down a stack of other stuff
> > first though (e.g. the unnumbered stuff) :S
>
> If you cannot process potential anwers, why do you ask?

I didn't say I cannot process your answer, instead quite simply I need
time to do so. Part of this is caused by me trying to maintain Quagga
in its entirety, not just ospfd -- so, since I continually move around
processing stuff on bgpd, zebra and, hooray, SNMP, I can't really dive
headfirst into a very complicated OSPF topic.

> I took the time to point out a possible solution(on my vacation) and you
> just throw it out the window?

(Btw, I think you can safely enjoy your vacation and we'll still be at
this when you return <_<)

> I have head this unnumbered argument over and over again. I can not understand what is
> holding you back. I have posted patches with the solution long time ago and it is very
> sad that nothing is coming from it(BIRD took the same patches in a hartbeat).

Well, that's exactly the problem - you posted it a long time ago and it
doesn't apply cleanly on master anymore. I understand that it should
have been completed years back, but what can I do to change the past?

Now the Cumulus Networks guys took your code and refitted it for HEAD,
which I'm immensely thankful for, especially since they do have the time
to dive into OSPF. (In fact, I think I will bother them with your
comment on the "return added" you mention above!) Now I hear from you
that they apparently based off an early version of your patches and that
there are problems with it. But you were very unspecific on what those
problems are. They promised me to look into that, but you're not
exactly giving us much detail here...

> If you cannot wrap your head aroud it, just test them?
> I have been using them for years in networks with hundreds of nodes so then cant be that
> wrong.

Well, I have in fact been carrying them in my -dn42 tree for 2 years,
but no one used them. Then I switched them on and I see it mishandles
the IP addresses and tries to use an IPv6 address as IPv4 (came out
sending packets with 254.128.0.0 as source, which you may recognise as
fe80:0:). I guess I also had an early version of your patches...

> > Have a nice vacation!
>
> Thanks, going to the niagara falls next :)

I hear someone recently walked over them on a rope - don't try that :)


-David
Attachments: signature.asc (0.22 KB)


joakim.tjernlund at transmode

Jun 20, 2012, 6:49 PM

Post #6 of 40 (775 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

>
> On Wed, Jun 20, 2012 at 01:13:01AM +0200, Joakim Tjernlund wrote:
> > > On Tue, Jun 19, 2012 at 04:22:51PM +0200, Joakim Tjernlund wrote:
> > > > On vacation now but you could try fixing this code:
> > > >
> > > > * http://blogs.sun.com/paulj/entry/the_difference_a_line_makes
> > > > */
> > > > if (added)
> > > > return added;
> > > >
> > > > It is wrong which I have pointed out earlier but Paul claimed it does not matter.
> > >
> > > Another argument for me to get this entire codepart into my brain RAM
> > > and do a full review... I have to work down a stack of other stuff
> > > first though (e.g. the unnumbered stuff) :S
> >
> > If you cannot process potential anwers, why do you ask?
>
> I didn't say I cannot process your answer, instead quite simply I need
> time to do so. Part of this is caused by me trying to maintain Quagga
> in its entirety, not just ospfd -- so, since I continually move around
> processing stuff on bgpd, zebra and, hooray, SNMP, I can't really dive
> headfirst into a very complicated OSPF topic.
>
> > I took the time to point out a possible solution(on my vacation) and you
> > just throw it out the window?
>
> (Btw, I think you can safely enjoy your vacation and we'll still be at
> this when you return <_<)

:), I guees so but when I return I will be very busy with my back log.

>
> > I have head this unnumbered argument over and over again. I can not understand what is
> > holding you back. I have posted patches with the solution long time ago and it is very
> > sad that nothing is coming from it(BIRD took the same patches in a hartbeat).
>
> Well, that's exactly the problem - you posted it a long time ago and it
> doesn't apply cleanly on master anymore. I understand that it should
> have been completed years back, but what can I do to change the past?

You can't of course but you could have started with the already present
patches instead of new stuff. The old stuff gets out of sync when you
apply new stuff first(the new stuff will still be there once you get to it). I
know this isn't the funniest things to start with but that comes with the maintainer role.

I have no chance of testing the current tree in our product so what you have
in the archives and patch work is all you have.

>
> Now the Cumulus Networks guys took your code and refitted it for HEAD,
> which I'm immensely thankful for, especially since they do have the time
> to dive into OSPF. (In fact, I think I will bother them with your
> comment on the "return added" you mention above!) Now I hear from you
> that they apparently based off an early version of your patches and that
> there are problems with it. But you were very unspecific on what those
> problems are. They promised me to look into that, but you're not
> exactly giving us much detail here...

But I did, I pointed you to my old path from which Cumulus based theirs on. That patch
had the limitations in its commit desc.

>
> > If you cannot wrap your head aroud it, just test them?
> > I have been using them for years in networks with hundreds of nodes so then cant be that
> > wrong.
>
> Well, I have in fact been carrying them in my -dn42 tree for 2 years,
> but no one used them. Then I switched them on and I see it mishandles
> the IP addresses and tries to use an IPv6 address as IPv4 (came out
> sending packets with 254.128.0.0 as source, which you may recognise as
> fe80:0:). I guess I also had an early version of your patches...

And this was due to my unnumbered patches? Well, I never tested them on IPV6
but I hardly think that is reason to throw it all out.

>
> > > Have a nice vacation!
> >
> > Thanks, going to the niagara falls next :)
>
> I hear someone recently walked over them on a rope - don't try that :)

Yeah, missed that one. He is already gone :(
Anyhow, got a real shower going "under" the falls with the tourist boat they have here.

Jocke
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Jun 21, 2012, 11:00 AM

Post #7 of 40 (762 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

>
> Well, that's exactly the problem - you posted it a long time ago and it
> doesn't apply cleanly on master anymore. I understand that it should
> have been completed years back, but what can I do to change the past?
>
> Now the Cumulus Networks guys took your code and refitted it for HEAD,
> which I'm immensely thankful for, especially since they do have the time
> to dive into OSPF. (In fact, I think I will bother them with your
> comment on the "return added" you mention above!) Now I hear from you

Found the old patch with the "return added" moved. Paul does't think it matters
so he dropped it. I don't think so and have been using it for years in my code.

http://patchwork.diac24.net/patch/261/
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Jul 25, 2012, 1:07 PM

Post #8 of 40 (639 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Tue, 19 Jun 2012, Joakim Tjernlund wrote:

> On vacation now but you could try fixing this code:
>
> * http://blogs.sun.com/paulj/entry/the_difference_a_line_makes
> */
> if (added)
> return added;
>
> It is wrong which I have pointed out earlier but Paul claimed it does
> not matter.

When was that discussion? Link to archives?

It seems you didn't convince me this was wrong before. I've just gone
through the original bug and refreshed my understanding from scratch as to
why that was done as it is, and I still don't think it's wrong. Indeed,
I'm not convinced you understand why that code is the way it is (based
just on the commit comment, and my own refresh of the code, the blog
entry, and bug #330 - I havn't gone over earlier discussions). ;)

Now, that's not meant as an insult, Just that my approach to maintenance
was that the maintainer should take a sceptical stance on contributions
and require that they come with convincing explanations as to what the
problem is, and why the submitted patch fixes it. Otherwise, I felt then,
we risked lots of pointless code churn from speculative or
not-fully-understood patches.

That said, maintenance approach today may be different.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
If one cannot enjoy reading a book over and over again, there is no use
in reading it at all.
-- Oscar Wilde
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Jul 25, 2012, 2:04 PM

Post #9 of 40 (638 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Wed, 25 Jul 2012, Paul Jakma wrote:

> Now, that's not meant as an insult, Just that my approach to maintenance
> was that the maintainer should take a sceptical stance on contributions
> and require that they come with convincing explanations

Basically, maintainers have to be a bit like Idi Amin:

http://www.codinghorror.com/blog/2012/07/but-you-did-not-persuade-me.html

Now clearly there's limits to comparing maintainers to despotic
megalomaniacs - free software maintainers don't usually wear military
uniforms - but they do have to be sceptical about the advice they get.
They can't always tell themselves directly whether a patch is correct or
not. So contributors must persuade them, ideally with clear evidence.

Anyway.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
There's nothing like the face of a kid eating a Hershey bar.
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Jul 31, 2012, 1:22 PM

Post #10 of 40 (612 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

Paul Jakma <paul [at] jakma> wrote on 2012/07/25 22:07:44:
>
> On Tue, 19 Jun 2012, Joakim Tjernlund wrote:
>
> > On vacation now but you could try fixing this code:
> >
> > * http://blogs.sun.com/paulj/entry/the_difference_a_line_makes
> > */
> > if (added)
> > return added;
> >
> > It is wrong which I have pointed out earlier but Paul claimed it does
> > not matter.
>
> When was that discussion? Link to archives?

It is this thread, if you bother to dig a little bit.
Also, the patch itself has more info then what got committed to the repo.

>
> It seems you didn't convince me this was wrong before. I've just gone

Yes, I said as much already.

> through the original bug and refreshed my understanding from scratch as to
> why that was done as it is, and I still don't think it's wrong. Indeed,
> I'm not convinced you understand why that code is the way it is (based
> just on the commit comment, and my own refresh of the code, the blog
> entry, and bug #330 - I havn't gone over earlier discussions). ;)

Can you then explain why your old code is correct w.r.t the OSPF RFC?
Why is it correct to jump from within 16.1.1,para 5:
* ...the parent vertex is a network that
* directly connects the calculating router to the destination
* router. The list of next hops is then determined by
* examining the destination's router-LSA...
just because you didn't find any links to
/* 16.1.1 para 4. If there is at least one intervening router in the
* current shortest path between the destination and the root, the
* destination simply inherits the set of next hops from the
* parent.
*/

Any way I read the RFC these two cases are mutually exclusive.


>
> Now, that's not meant as an insult, Just that my approach to maintenance
> was that the maintainer should take a sceptical stance on contributions
> and require that they come with convincing explanations as to what the
> problem is, and why the submitted patch fixes it. Otherwise, I felt then,
> we risked lots of pointless code churn from speculative or
> not-fully-understood patches.
>
> That said, maintenance approach today may be different.

Yes, thankfully.

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 2, 2012, 3:35 AM

Post #11 of 40 (609 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Tue, 31 Jul 2012, Joakim Tjernlund wrote:

> Paul Jakma <paul [at] jakma> wrote on 2012/07/25 22:07:44:

>>> It is wrong which I have pointed out earlier but Paul claimed it does
>>> not matter.
>>
>> When was that discussion? Link to archives?

> It is this thread, if you bother to dig a little bit.

I hadn't been involved in this thread before, so I don't see how this
helps me find that discussion between us.

> Also, the patch itself has more info then what got committed to the repo.

It has no useful information really. :) You don't explain why it's wrong.
You don't even mention any of the rationale for why the original change
was made - which makes me wonder if you even familiarised yourself with
it. Basically, there's no evidence in the commit message or recent
discussion here that you understand why the change was made. Without that,
it would not be possible to understand the code, nor to judge it to be
wrong.

You say 2 well known OSPF people say it is wrong, but they presumably did
so based on how you explained the code to them - which you didn't state in
the commit message, nor give references to (if public). If your
understanding happens to be wrong, your explanation to will have been
deficient, and their opinion will be meaningless.

> Can you then explain why your old code is correct w.r.t the OSPF RFC?

Well, first off, why should anyone explain a change to you when it seems
you may not have familiarised yourself with the motivation for the change?

> Why is it correct to jump from within 16.1.1,para 5:
> * ...the parent vertex is a network that
> * directly connects the calculating router to the destination
> * router. The list of next hops is then determined by
> * examining the destination's router-LSA...
> just because you didn't find any links to
> /* 16.1.1 para 4. If there is at least one intervening router in the
> * current shortest path between the destination and the root, the
> * destination simply inherits the set of next hops from the
> * parent.
> */
>
> Any way I read the RFC these two cases are mutually exclusive.

Well, what was the context of the change (that you reversed)? It was in
the context of SPF over LSDBs that have /not/ converged. Hence, next-hop
validation can fail, because V->W is being explored, but no W->V backlink
exists.

From the paragraph before 16.1.1:

"it is important to note that links between
transit vertices must be bidirectional in order to be included
in the above tree."

The patch was meant to improve this edge-case - it has no effect when the
LSDBs are converged (as can be read in bug #330).

The RFC in 16.1.1 is assuming that there is bidirectional connectivity
already, when it describes the nexthop-calculation.

You ask "how can both apply?", but what's happening is that initially we
think we have a root connection, but then nexthop-calculation fails. So
really, we no longer are exploring a root connection. Thus we need to back
track. If we don't then we will calculate there is /no/ path - even when
there are other longer, non-root-connected paths. This is what bug #330
was about - improving routing connectivity /during/ the process of
convergence.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
WARNING TO ALL PERSONNEL:

Firings will continue until morale improves.
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 2, 2012, 6:19 AM

Post #12 of 40 (607 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Thu, 2 Aug 2012, Paul Jakma wrote:

> The RFC in 16.1.1 is assuming that there is bidirectional connectivity
> already, when it describes the nexthop-calculation.

Oh, if there is a problem here, it is that we're getting back-link-finding
failures in the nexthop-calculation code. The SPF exploration (spf_next)
should already have filtered out such cases, I'd have thought. So it's
possibly there's some kind of discrepancy between the check done in the
general SPF exploration and at the nexthop-calculation stage - the latter
is checking more than the former. Alternatively, there's a good reason for
that and I've forgotten why.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
The problem with this country is that there is no death penalty for
incompetence.
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 2, 2012, 8:30 AM

Post #13 of 40 (607 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

Paul Jakma <paul [at] jakma> wrote on 2012/08/02 12:35:35:
>
> On Tue, 31 Jul 2012, Joakim Tjernlund wrote:
>
> > Paul Jakma <paul [at] jakma> wrote on 2012/07/25 22:07:44:
>
> >>> It is wrong which I have pointed out earlier but Paul claimed it does
> >>> not matter.
> >>
> >> When was that discussion? Link to archives?
>
> > It is this thread, if you bother to dig a little bit.
>
> I hadn't been involved in this thread before, so I don't see how this
> helps me find that discussion between us.

There is link to patchwork where with some discussion between us:
http://patchwork.diac24.net/patch/261/
There are probably some more in the mail archive too.

>
> > Also, the patch itself has more info then what got committed to the repo.
>
> It has no useful information really. :) You don't explain why it's wrong.
> You don't even mention any of the rationale for why the original change
> was made - which makes me wonder if you even familiarised yourself with
> it. Basically, there's no evidence in the commit message or recent
> discussion here that you understand why the change was made. Without that,
> it would not be possible to understand the code, nor to judge it to be
> wrong.

Well, it is there. A bit terse but you could work it out if you read the
the RFC. to my defence your explanations up till now has been likewise :)

>
> You say 2 well known OSPF people say it is wrong, but they presumably did
> so based on how you explained the code to them - which you didn't state in
> the commit message, nor give references to (if public). If your
> understanding happens to be wrong, your explanation to will have been
> deficient, and their opinion will be meaningless.

Yes, you are quite right that I might explained it wrong. In any case I found a
reference to the discussion:
http://www.ietf.org/mail-archive/web/ospf/current/msg05568.html
I am sure I mentioned it to you at some point, if not I apologize
for missing that.

>
> > Can you then explain why your old code is correct w.r.t the OSPF RFC?
>
> Well, first off, why should anyone explain a change to you when it seems
> you may not have familiarised yourself with the motivation for the change?

I tried to, but bug 330 is a big bug and is also wrong in its impl. Furthermore,
if you are to deviate from the RFC you should clearly document why. One should
not have to look in some confusing bug or a blog entry(now no existing) and
try to make sense of the different parts.

>
> > Why is it correct to jump from within 16.1.1,para 5:
> > * ...the parent vertex is a network that
> > * directly connects the calculating router to the destination
> > * router. The list of next hops is then determined by
> > * examining the destination's router-LSA...
> > just because you didn't find any links to
> > /* 16.1.1 para 4. If there is at least one intervening router in the
> > * current shortest path between the destination and the root, the
> > * destination simply inherits the set of next hops from the
> > * parent.
> > */
> >
> > Any way I read the RFC these two cases are mutually exclusive.
>
> Well, what was the context of the change (that you reversed)? It was in
> the context of SPF over LSDBs that have /not/ converged. Hence, next-hop
> validation can fail, because V->W is being explored, but no W->V backlink
> exists.

Yes.

>
> From the paragraph before 16.1.1:
>
> "it is important to note that links between
> transit vertices must be bidirectional in order to be included
> in the above tree."

Yes.

>
> The patch was meant to improve this edge-case - it has no effect when the
> LSDBs are converged (as can be read in bug #330).
>
> The RFC in 16.1.1 is assuming that there is bidirectional connectivity
> already, when it describes the nexthop-calculation.
>
> You ask "how can both apply?", but what's happening is that initially we
> think we have a root connection, but then nexthop-calculation fails. So
> really, we no longer are exploring a root connection. Thus we need to back
> track. If we don't then we will calculate there is /no/ path - even when
> there are other longer, non-root-connected paths. This is what bug #330
> was about - improving routing connectivity /during/ the process of
> convergence.

And you are sure that no other ill effects can happen from doing this?
Like routing loops or black holes? Check the ietf ref above.

Jocke

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 2, 2012, 10:11 AM

Post #14 of 40 (607 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

Paul Jakma <paul [at] jakma> wrote on 2012/08/02 15:19:51:
>
> On Thu, 2 Aug 2012, Paul Jakma wrote:
>
> > The RFC in 16.1.1 is assuming that there is bidirectional connectivity
> > already, when it describes the nexthop-calculation.
>
> Oh, if there is a problem here, it is that we're getting back-link-finding
> failures in the nexthop-calculation code. The SPF exploration (spf_next)
> should already have filtered out such cases, I'd have thought. So it's
> possibly there's some kind of discrepancy between the check done in the
> general SPF exploration and at the nexthop-calculation stage - the latter
> is checking more than the former. Alternatively, there's a good reason for
> that and I've forgotten why.

The back link "check" in ospf_spf_add_parent() doesn't really do anything. It only
records the backlink and the only user that I can see of this backlink is
VLINKs.

BTW, why is VLINKs impl. the way its is? For example ospf_nexthop_calculation
calcs the nexthop for a VLINK but the RFC says that VLINK nexthop should be
deferred to later.

Jocke

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 2, 2012, 4:54 PM

Post #15 of 40 (605 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Thu, 2 Aug 2012, Joakim Tjernlund wrote:

> Yes, you are quite right that I might explained it wrong. In any case I found a
> reference to the discussion:

> http://www.ietf.org/mail-archive/web/ospf/current/msg05568.html

> I am sure I mentioned it to you at some point, if not I apologize
> for missing that.

Acee answered your question with that you're not supposed to use paths
that don't have backlinks.

The behaviour is that if somehow during nexthop-calc, we find there isn't
a backlink, we don't use it. The patch you reverted in no way changes
that.

The change was that it makes it possible to use paths that /do/ work. Bug
#330 was complaining that, if we got to such a failure in nexthop-calc, we
/lost/ *other* working paths.

> I tried to, but bug 330 is a big bug and is also wrong in its impl.
> Furthermore, if you are to deviate from the RFC

We're not deviating from the RFC. At least, not where you see one.

> you should clearly document why. One should not have to look in some
> confusing bug or a blog entry(now no existing) and try to make sense of
> the different parts.

There was a large comment there, you deleted it. The blog entry still
exists, you just have to s/sun/oracle/ in the url. You could have emailed
me and asked me about the url, or indeed the bug in general, before
submitting a patch to fiddle with code you dislike because you don't under
stand it.

Really, do you think it is a good idea that we should encourage people, if
they don't understand some previous change, to submit patches to revert
them, rather than taking the time to read the bug reports that informed
the change first?

> And you are sure that no other ill effects can happen from doing this?
> Like routing loops or black holes? Check the ietf ref above.

Yes, I'm sure. Again, if you read bug #330, we get blackholes *without*
that change - we are trying to minimise blackholes during convergence!

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
We should have a Vollyballocracy. We elect a six-pack of presidents.
Each one serves until they screw up, at which point they rotate.
-- Dennis Miller
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 2, 2012, 5:04 PM

Post #16 of 40 (605 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Thu, 2 Aug 2012, Joakim Tjernlund wrote:

> The back link "check" in ospf_spf_add_parent() doesn't really do
> anything. It only records the backlink and the only user that I can see
> of this backlink is VLINKs.

I mean:

if (ospf_lsa_has_link (w_lsa->data, v->lsa) < 0 )
{
if (IS_DEBUG_OSPF_EVENT)
zlog_debug ("The LSA doesn't have a link back");
continue;
}

in ospf_spf_next(). That's the high-level bit that's supposed to filter
out V->W's which have no back-link (i.e. W->V) - which is what Acee
referred to. So if there's a problem, it's because this check isn't doing
its job and get_next_link behaves differently. E.g. likely because the
get_next_link() which next-hop calc uses later is stricter in some way
than ospf_lsa_has_link(). The nexthop-calc backlink should just be a
sanity check, it shouldn't fail.

There was a reason why I wasn't able to unify those 2 functions at the
time, but I can't remember why.

> BTW, why is VLINKs impl. the way its is? For example
> ospf_nexthop_calculation calcs the nexthop for a VLINK but the RFC says
> that VLINK nexthop should be deferred to later.

I think it is deferred as it should be, or at least gets updated later
equivalently, though don't remember the details off-hand. We spent a bit
of time at Sun testing VLinks, going back and forth with UNH on
conformance testing - a long time ago. The original SPF code got it wrong
a lot, we got it passing.

It could be my changes aren't structured the best way, but it did pass UNH
testing.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
I think we're all Bozos on this bus.
-- Firesign Theatre
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 3, 2012, 6:38 AM

Post #17 of 40 (601 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

Paul Jakma <paul [at] jakma> wrote on 2012/08/03 01:54:36:
>
> On Thu, 2 Aug 2012, Joakim Tjernlund wrote:
>
> > Yes, you are quite right that I might explained it wrong. In any case I found a
> > reference to the discussion:
>
> > http://www.ietf.org/mail-archive/web/ospf/current/msg05568.html
>
> > I am sure I mentioned it to you at some point, if not I apologize
> > for missing that.
>
> Acee answered your question with that you're not supposed to use paths
> that don't have backlinks.
>
> The behaviour is that if somehow during nexthop-calc, we find there isn't
> a backlink, we don't use it. The patch you reverted in no way changes
> that.
>
> The change was that it makes it possible to use paths that /do/ work. Bug
> #330 was complaining that, if we got to such a failure in nexthop-calc, we
> /lost/ *other* working paths.
>
> > I tried to, but bug 330 is a big bug and is also wrong in its impl.
> > Furthermore, if you are to deviate from the RFC
>
> We're not deviating from the RFC. At least, not where you see one.

Well, we are not following it either, this nexthop_calculation tries
to follow the it exactly, even quoting text from the RFC

>
> > you should clearly document why. One should not have to look in some
> > confusing bug or a blog entry(now no existing) and try to make sense of
> > the different parts.
>
> There was a large comment there, you deleted it. The blog entry still
> exists, you just have to s/sun/oracle/ in the url. You could have emailed
> me and asked me about the url, or indeed the bug in general, before
> submitting a patch to fiddle with code you dislike because you don't under
> stand it.
>
> Really, do you think it is a good idea that we should encourage people, if
> they don't understand some previous change, to submit patches to revert
> them, rather than taking the time to read the bug reports that informed
> the change first?

I did read it all and I wrote in the patch why thought there was a bug still.
It is already clear that the intention and the actual fix from bug 330 is very
different, the fix completely botched SPF.

>
> > And you are sure that no other ill effects can happen from doing this?
> > Like routing loops or black holes? Check the ietf ref above.
>
> Yes, I'm sure. Again, if you read bug #330, we get blackholes *without*
> that change - we are trying to minimise blackholes during convergence!

Bug 330 is has a lot of stuff, to much to make proper sense.
It has this this hack which also solves the problem for the reporter.
- pqueue_enqueue (w, candidate);
+ if(w->parents->count != 0)
+ pqueue_enqueue (w, candidate);
Then we have your solution which we know is broken!
Referring to this bug to explain everything does not work, there was to much
going on and fixing in one go.

I explained in the patch why I figured there was still a bug and your reply was
very brief and didn't hold on closer examination so to me it looked like you
missed my point.

Anyhow, now it is much clearer what you tried to do so:

Now finding your blog again you write:

The idea being to:

1) Make the unconditional return able to indicate failure
2) Allow the block of code above to find all the parents by moving the return out of the inner loop

2) Reads to me that you really are trying to find all parents connected to the root, not
falling into "intervening router" case. If so, this seems like a better fix.

diff --git a/ospfd/ospf_spf.c b/ospfd/ospf_spf.c
index ffac985..11316cd 100644
--- a/ospfd/ospf_spf.c
+++ b/ospfd/ospf_spf.c
@@ -483,7 +483,7 @@ ospf_nexthop_calculation (struct ospf_area *area, struct vertex *v,
struct vertex_nexthop *nh;
struct vertex_parent *vp;
struct ospf_interface *oi = NULL;
- unsigned int added = 0;
+ unsigned int added = 0, root_found = 0;
char buf1[BUFSIZ];
char buf2[BUFSIZ];

@@ -663,7 +663,7 @@ ospf_nexthop_calculation (struct ospf_area *area, struct vertex *v,
* router. The list of next hops is then determined by
* examining the destination's router-LSA...
*/
-
+ root_found = 1;
assert(w->type == OSPF_VERTEX_ROUTER);
while ((l = ospf_get_next_link (w, v, l)))
{
@@ -679,11 +679,11 @@ ospf_nexthop_calculation (struct ospf_area *area, struct vertex *v,
added = 1;
ospf_spf_add_parent (v, w, nh, distance);
}
- /* Always return here as we known that 16.1.1 para 4
- does not apply one you have found a connection to root */
- return added;
}
}
+ /* Let the above loop find all parents so return here */
+ if (root_found)
+ return added;
}

/* 16.1.1 para 4. If there is at least one intervening router in the

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 3, 2012, 7:56 AM

Post #18 of 40 (600 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

Paul Jakma <paul [at] jakma> wrote on 2012/08/03 02:04:04:
>
> On Thu, 2 Aug 2012, Joakim Tjernlund wrote:
>
> > The back link "check" in ospf_spf_add_parent() doesn't really do
> > anything. It only records the backlink and the only user that I can see
> > of this backlink is VLINKs.
>
> I mean:
>
> if (ospf_lsa_has_link (w_lsa->data, v->lsa) < 0 )
> {
> if (IS_DEBUG_OSPF_EVENT)
> zlog_debug ("The LSA doesn't have a link back");
> continue;
> }
>
> in ospf_spf_next(). That's the high-level bit that's supposed to filter
> out V->W's which have no back-link (i.e. W->V) - which is what Acee
> referred to. So if there's a problem, it's because this check isn't doing
> its job and get_next_link behaves differently. E.g. likely because the
> get_next_link() which next-hop calc uses later is stricter in some way
> than ospf_lsa_has_link(). The nexthop-calc backlink should just be a
> sanity check, it shouldn't fail.

Yeah, I have a vague recollection that the high order check only can/should check for
ANY link(not sure from where, probably the ospf ietf list). One gets the impression that
one should only do the backlink check for transit vertexes though:

(b) Otherwise, W is a transit vertex (router or transit
network). Look up the vertex W's LSA (router-LSA or
network-LSA) in Area A's link state database. If the
LSA does not exist, or its LS age is equal to MaxAge, or
it does not have a link back to vertex V, examine the
next link in V's LSA.[23]

Now it is done for Network LSAs too.


However, the return value from nexthop_calculation serves as a better check. One
could possible remove the high order check and rely on the nexthop_calculation
instead. One would have do an extra check in some places in nexthop_calculation
for backlink though.

>
> There was a reason why I wasn't able to unify those 2 functions at the
> time, but I can't remember why.
>
> > BTW, why is VLINKs impl. the way its is? For example
> > ospf_nexthop_calculation calcs the nexthop for a VLINK but the RFC says
> > that VLINK nexthop should be deferred to later.
>
> I think it is deferred as it should be, or at least gets updated later
> equivalently, though don't remember the details off-hand. We spent a bit
> of time at Sun testing VLinks, going back and forth with UNH on
> conformance testing - a long time ago. The original SPF code got it wrong
> a lot, we got it passing.
>
> It could be my changes aren't structured the best way, but it did pass UNH
> testing.

Would be nice if one could get rid of the backlink used be VLINKS though, now
you do an extra ospf_lsa_has_link() in ospf_spf_add_parent() which is only
used by VLINKS

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 3, 2012, 8:21 AM

Post #19 of 40 (596 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Fri, 3 Aug 2012, Joakim Tjernlund wrote:

> Well, we are not following it either, this nexthop_calculation tries to
> follow the it exactly, even quoting text from the RFC

Sigh. It doesn't deviate from the RFC.

The RFC says:

If there is an intervening router:
<inherit>
Else if parent vertex is the root:
<do the appropriate nexthop calculation>
Else the parent vertex is a network that
directly connects the calculating router to the destination
router:
<do the appropriate nexthop calculation>

Our code does:

If parent is the root:
If the W vertex is a router:
<do the appropriate nexthop calculation>
Else if W is a network:
<do the appropriate nexthop calculation>
return
Else if the W vertex is a network:
For every parent of W, if it's a root:
<do the appropriate nexthop calculation>
return.

/* intervening case */
for every parent of W
<inherit>

The code differs significantly from the RFC in the flow of its logic. To
say it differs from the RFC is in the strict sense correct. I am sure a
lot of code doesn't follow the logic flow of RFCs strictly - there is a
reason why human-language RFCs can take a lot of effort to turn into
executable machine-language. Submitting a patch to revert a change because
it doesn't follow the RFC is non-sensical, when the code at no stages
matches the RFC, before or after any of these changes.

However, it seems to stick to the spirit.

Bug #330 argued that <do the appropriate nexthop calculation> could fail
to find a suitable nexthop - note that this is a very _*significant*_
difference from the logic in the RFC (the text you keep pointing out does
not consider failure of that process is possible). The problem being then
that it would return out of nexthop-calculation from the for-loop - having
added no next-hop.

However, there might have been other parents connecting W to the root,
which we could explore. Failing that, there may be intervening parent
paths we can take. Thus, that return needed to be made conditional on
whether nexthop-calc worked:

If parent is the root:
If the W vertex is a router:
<do the appropriate nexthop calculation>
Else if W is a network:
<do the appropriate nexthop calculation>
return
Else if the W vertex is a network:
For every parent of W, if it's a root:
if <do the appropriate nexthop calculation> succeeds
added = true
If added is true
return

/* intervening case */
for every parent of W
<inherit>

Note again, the RFC assumes that the "appropriate nexthop-calc" part
(which involves finding the link back) can not fail. Again, the RFC is not
code, and even if it were, it is not our code.

The RFC goes into the last "Else if the W vertex is a network and it has a
parent that's root" branch assuming it must work out (cause SPF
exploration should have checked that and filtered out). So of course, in
the RFC, there is no provision for leaving that branch.

In our code, that's not true. We can go into that branch and find a W with
a parent to root that isn't a valid V-W vertex, so we shouldn't consider
it at all. We need to continue.

Actually, I know why we have to do the W->V check at this stage, because W
can be connected by both an invalid V->W and other valid, non-root-parent
ones. SPF next will filter out exploration of Ws via the invalid links,
but when it finds Ws via valid ones, the nexthop calculation will still
run across the invalid ones when trying to find the correct one. I think
that's why this is so.

> I did read it all and I wrote in the patch why thought there was a bug
> still. It is already clear that the intention and the actual fix from
> bug 330 is very different, the fix completely botched SPF.

You say that, but you havn't shown this at all. Assertions don't become
truth through repetition. You may be right, but here you need to actually
persuade me, with an argument that's a bit more specific on details and
reasoning.

If it comes down to the 2 of us just saying "I'm right", well then I'm
going to take my side (for a number of reasons, arbitrary and not so
arbitrary).

> Bug 330 is has a lot of stuff, to much to make proper sense.
> It has this this hack which also solves the problem for the reporter.
> - pqueue_enqueue (w, candidate);
> + if(w->parents->count != 0)
> + pqueue_enqueue (w, candidate);
> Then we have your solution which we know is broken!

For a value of "we" that equals "Joakim".

> I explained in the patch

Assertions are not explanations.

I shall revert your revert. The previous code was tested for both the
failing case (which in bug #330 was due to differing routers not having
converged - it optimises that case alone, eliminating a temporary
blackhole, and had no effect when converged).

To really make this code follow the spirit of the RFC, nexthop-calculation
shouldn't have to worry /at all/ about back-links not being found. The
higher-level spf_next function should have filtered those out.

Patches to SPF on this topic should try address that (or explain why
nexthop-calc has to be concerned with it - I failed to document that at
the time, or perhaps I just didn't realise).

Further, the SPF code /really/ needs unit-tests. It's clear this code is
hard to understand - I think not just because of any problems in the
structure of the code (which may be my fault), but just cause there are a
lot of details to OSPF SPF. I would strongly suggest unit-tests are
written for this code, /before/ any significant further changes are made
to the SPF code in the repo.

We really need a way to feed an LSDB into the SPF code and see the result.
Which means we need an easy way to construct LSDBs and/or dump LSDBs from
running networks.

Also, I have to say, your extreme confidence in your abilities and
scepticism in others causes people to have to waste a lot of time arguing
with you.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
"It's a dog-eat-dog world out there, and I'm wearing Milkbone underware."
-- Norm, from _Cheers_
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 3, 2012, 8:28 AM

Post #20 of 40 (597 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Fri, 3 Aug 2012, Paul Jakma wrote:

> Actually, I know why we have to do the W->V check at this stage, because W
> can be connected by both an invalid V->W and other valid, non-root-parent
> ones. SPF next will filter out exploration of Ws via the invalid links, but
> when it finds Ws via valid ones, the nexthop calculation will still run
> across the invalid ones when trying to find the correct one. I think that's
> why this is so.

Ah, I wrote the above after the below:

> To really make this code follow the spirit of the RFC, nexthop-calculation
> shouldn't have to worry /at all/ about back-links not being found. The
> higher-level spf_next function should have filtered those out.

So I think it's just because spf_next has lost the backlink context.
Perhaps it might be possible to:

- extend ospf_lsa_has_link take the link from V->W under consideration
as an argument
- return the backlink that matches that link, in a form suitable
to be passed on to ospf_nexthop_calculation, so that it doesn't have to
replicate that.

I think it could be a lot of work, if it's possible. Unfortunately, it
could also be that the apparently simple language of the RFC simply hides
a lot of detail that's just hard to consolidate. I remember I tried to do
at the time, but couldn't, at least.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
love, n.:
When it's growing, you don't mind watering it with a few tears.
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 3, 2012, 9:19 AM

Post #21 of 40 (601 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

Paul Jakma <paul [at] jakma> wrote on 2012/08/03 17:21:01:
>
> On Fri, 3 Aug 2012, Joakim Tjernlund wrote:
>
> > Well, we are not following it either, this nexthop_calculation tries to
> > follow the it exactly, even quoting text from the RFC
>
> Sigh. It doesn't deviate from the RFC.
>
> The RFC says:
>
> If there is an intervening router:
> <inherit>
> Else if parent vertex is the root:
> <do the appropriate nexthop calculation>
> Else the parent vertex is a network that
> directly connects the calculating router to the destination
> router:
> <do the appropriate nexthop calculation>
>
> Our code does:
>
> If parent is the root:
> If the W vertex is a router:
> <do the appropriate nexthop calculation>
> Else if W is a network:
> <do the appropriate nexthop calculation>
> return
> Else if the W vertex is a network:
> For every parent of W, if it's a root:
> <do the appropriate nexthop calculation>
> return.
>
> /* intervening case */
> for every parent of W
> <inherit>
>
> The code differs significantly from the RFC in the flow of its logic. To
> say it differs from the RFC is in the strict sense correct. I am sure a
> lot of code doesn't follow the logic flow of RFCs strictly - there is a
> reason why human-language RFCs can take a lot of effort to turn into
> executable machine-language. Submitting a patch to revert a change because
> it doesn't follow the RFC is non-sensical, when the code at no stages
> matches the RFC, before or after any of these changes.
>
> However, it seems to stick to the spirit.
>
> Bug #330 argued that <do the appropriate nexthop calculation> could fail
> to find a suitable nexthop - note that this is a very _*significant*_
> difference from the logic in the RFC (the text you keep pointing out does
> not consider failure of that process is possible). The problem being then
> that it would return out of nexthop-calculation from the for-loop - having
> added no next-hop.
>
> However, there might have been other parents connecting W to the root,
> which we could explore. Failing that, there may be intervening parent
> paths we can take. Thus, that return needed to be made conditional on
> whether nexthop-calc worked:
>
> If parent is the root:
> If the W vertex is a router:
> <do the appropriate nexthop calculation>
> Else if W is a network:
> <do the appropriate nexthop calculation>
> return
> Else if the W vertex is a network:
> For every parent of W, if it's a root:
> if <do the appropriate nexthop calculation> succeeds
> added = true
> If added is true
> return
>
> /* intervening case */
> for every parent of W
> <inherit>
>
> Note again, the RFC assumes that the "appropriate nexthop-calc" part
> (which involves finding the link back) can not fail. Again, the RFC is not
> code, and even if it were, it is not our code.
>
> The RFC goes into the last "Else if the W vertex is a network and it has a
> parent that's root" branch assuming it must work out (cause SPF
> exploration should have checked that and filtered out). So of course, in
> the RFC, there is no provision for leaving that branch.
>
> In our code, that's not true. We can go into that branch and find a W with
> a parent to root that isn't a valid V-W vertex, so we shouldn't consider
> it at all. We need to continue.
>
> Actually, I know why we have to do the W->V check at this stage, because W
> can be connected by both an invalid V->W and other valid, non-root-parent
> ones. SPF next will filter out exploration of Ws via the invalid links,
> but when it finds Ws via valid ones, the nexthop calculation will still
> run across the invalid ones when trying to find the correct one. I think
> that's why this is so.

Ah, now I get what you are doing.

>
> > I did read it all and I wrote in the patch why thought there was a bug
> > still. It is already clear that the intention and the actual fix from
> > bug 330 is very different, the fix completely botched SPF.
>
> You say that, but you havn't shown this at all. Assertions don't become
> truth through repetition. You may be right, but here you need to actually
> persuade me, with an argument that's a bit more specific on details and
> reasoning.

I was, referring to the changes in bug 330 and there was a bug
in there. You corrected it your self later and documented it in "The difference a line makes"

>
> If it comes down to the 2 of us just saying "I'm right", well then I'm
> going to take my side (for a number of reasons, arbitrary and not so
> arbitrary).

No, now I understand what you were trying to do and why. This was not
at all clear from reading bug 330 and later your blog so it looked like
a bug and from your replies I didn't think you understood what I meant.

>
> > Bug 330 is has a lot of stuff, to much to make proper sense.
> > It has this this hack which also solves the problem for the reporter.
> > - pqueue_enqueue (w, candidate);
> > + if(w->parents->count != 0)
> > + pqueue_enqueue (w, candidate);
> > Then we have your solution which we know is broken!
>
> For a value of "we" that equals "Joakim".

Not quite, the context was bug 330. I was trying to explain that one
could not trust the outcome of bug 330 as you yourself found a bug later
and that there might be more.

>
> > I explained in the patch
>
> Assertions are not explanations.

The top part was an assertion, meant to go in the commit msg if the patch
was accepted. The part below ---
was me trying to explain how I reasoned, somewhat terse though.

>
> I shall revert your revert. The previous code was tested for both the
> failing case (which in bug #330 was due to differing routers not having
> converged - it optimises that case alone, eliminating a temporary
> blackhole, and had no effect when converged).

Please do.

>
> To really make this code follow the spirit of the RFC, nexthop-calculation
> shouldn't have to worry /at all/ about back-links not being found. The
> higher-level spf_next function should have filtered those out.

Precisely, I just got to that conclusion too.

>
> Patches to SPF on this topic should try address that (or explain why
> nexthop-calc has to be concerned with it - I failed to document that at
> the time, or perhaps I just didn't realise).

Yes, maybe backlink checks should be moved to nexthop_calculation for impl. reasons.

>
> Further, the SPF code /really/ needs unit-tests. It's clear this code is
> hard to understand - I think not just because of any problems in the
> structure of the code (which may be my fault), but just cause there are a
> lot of details to OSPF SPF. I would strongly suggest unit-tests are
> written for this code, /before/ any significant further changes are made
> to the SPF code in the repo.
>
> We really need a way to feed an LSDB into the SPF code and see the result.
> Which means we need an easy way to construct LSDBs and/or dump LSDBs from
> running networks.
>
> Also, I have to say, your extreme confidence in your abilities and
> scepticism in others causes people to have to waste a lot of time arguing
> with you.

Sorry about that, I get a bit carried away sometimes and it seldom goes this far.
I think you are an exception because in the past I felt many times I never got
through to you or just ignored.

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 3, 2012, 10:20 AM

Post #22 of 40 (603 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

Paul Jakma <paul [at] jakma> wrote on 2012/08/03 17:28:51:
>
> On Fri, 3 Aug 2012, Paul Jakma wrote:
>
> > Actually, I know why we have to do the W->V check at this stage, because W
> > can be connected by both an invalid V->W and other valid, non-root-parent
> > ones. SPF next will filter out exploration of Ws via the invalid links, but
> > when it finds Ws via valid ones, the nexthop calculation will still run
> > across the invalid ones when trying to find the correct one. I think that's
> > why this is so.
>
> Ah, I wrote the above after the below:
>
> > To really make this code follow the spirit of the RFC, nexthop-calculation
> > shouldn't have to worry /at all/ about back-links not being found. The
> > higher-level spf_next function should have filtered those out.
>
> So I think it's just because spf_next has lost the backlink context.
> Perhaps it might be possible to:
>
> - extend ospf_lsa_has_link take the link from V->W under consideration
> as an argument
> - return the backlink that matches that link, in a form suitable
> to be passed on to ospf_nexthop_calculation, so that it doesn't have to
> replicate that.
>
> I think it could be a lot of work, if it's possible. Unfortunately, it
> could also be that the apparently simple language of the RFC simply hides
> a lot of detail that's just hard to consolidate. I remember I tried to do
> at the time, but couldn't, at least.

I think to it its hard to do too, it might be better/easier to just defer
to check to nexthop_calc because I don't think it is possible to find the exact
link in all cases so better to let nexthop_calc do it instead and possibly cheat when one
cannot find the exact link. Possibly one could have different return values:
0 - no link/nexthop fond.
1 - ANY link and nexthop found
2 - correct backlink and nexthop found.


Jocke

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 3, 2012, 1:23 PM

Post #23 of 40 (597 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Fri, 3 Aug 2012, Joakim Tjernlund wrote:

> I think to it its hard to do too, it might be better/easier to just
> defer to check to nexthop_calc because I don't think it is possible to
> find the exact link in all cases so better to let nexthop_calc do it
> instead

Yeah, I think that's the conclusion I made that time, and I was forced
into having nexthop_calc also check. It's easy for the RFC to say:

"If a vertex doesn't have a backlink, ignore it from consideration"

and then for its nexthop-calculation to assume there must be a backlink.

What the RFC doesn't talk about it is that this means a vertex may have
have 2 or more edges to it. One invalid - which was ignored in the
higher-level vertex exploration - and another valid. And nexthop-calc has
to be careful not to get led astray by the invalid one.

These invalid (in the non-bidirectional sense) edges occur while OSPF is
converging.

> and possibly cheat when one cannot find the exact link. Possibly
> one could have different return values:

> 0 - no link/nexthop fond.
> 1 - ANY link and nexthop found
> 2 - correct backlink and nexthop found.

Maybe. If it can be cleaned up and made easier to grok / structurally
cleaner, that'd be great.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
Humor in the Court:
Q. And who is this person you are speaking of?
A. My ex-widow said it.
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 3, 2012, 1:43 PM

Post #24 of 40 (596 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Fri, 3 Aug 2012, Joakim Tjernlund wrote:

> No, now I understand what you were trying to do and why. This was not
> at all clear from reading bug 330 and later your blog so it looked like
> a bug and from your replies I didn't think you understood what I meant.

Yeah, there were regressions in the bug #330 changes.

All of those commits are *years* old though. ;)

> Not quite, the context was bug 330. I was trying to explain that one
> could not trust the outcome of bug 330 as you yourself found a bug later
> and that there might be more.

Sure, but sending in revert patches to a new maintainer, to old code that
is clearly tricky (cause it's badly structured and/or just cause the
functionality is complex, subtle, and/or fully of details and edge-cases
in the abstract) - and is documented to be tricky - is, I'd suggest, not
the proper way forward. :)

> Sorry about that, I get a bit carried away sometimes and it seldom goes
> this far. I think you are an exception because in the past I felt many
> times I never got through to you or just ignored.

We seem to both be have a tendency to be stubborn and argumentative. ;)

It's really good that you're sceptical. It's a good trait. I just wish
you'd also apply that scepticism to yourself. ;) We all have blind-spots
in our understanding, and we need to "use" other people as tools to help
show us where those are. By explaining our own understanding we best allow
others to highlight those blind spots to us.

In 2009, I think (?), we had a few weeks of heavy arguing, about your
unnumbered PtP patches, and to be honest I may have given up on trying to
merge your stuff after that. I think we'd both gotten each other's gruff
up by that stage, and we didn't seem to be able to have productive
discussions.

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
Be regular and orderly in your life, so that you may be violent
and original in your work.
-- Flaubert
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


paul at jakma

Aug 3, 2012, 2:10 PM

Post #25 of 40 (599 views)
Permalink
Re: [PATCH RFC] OSPF vertices memory exhaustion [In reply to]

On Fri, 3 Aug 2012, Paul Jakma wrote:

> merge your stuff after that. I think we'd both gotten each other's gruff
> up by that stage, and we didn't seem to be able to have productive
> discussions.

Oh.. Hopefully though, we can do better in the future (not that I can be
very active for the foreseeable future anyway), and also things will go
more smoothly with David. ;)

regards,
--
Paul Jakma paul [at] jakma @pjakma Key ID: 64A2FF6A
Fortune:
That function is not currently supported, but Bill Gates assures us it will be featured in the next upgrade.
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev

First page Previous page 1 2 Next page Last page  View All Quagga dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.