Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Quagga: Dev

Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476)

 

 

Quagga dev RSS feed   Index | Next | Previous | View Threaded


joakim.tjernlund at transmode

Jul 31, 2012, 2:31 PM

Post #1 of 8 (827 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476)

>
> On Jul 25, 2012, at 10:44 AM, Paul Jakma wrote:
>
> > On Wed, 25 Jul 2012, David Lamparter wrote:
> >
> >> This was found in scale testing at OSR; ospfd is adding the same link
> >> over and over again to the SPF tree. This fix prevents the resulting
> >> memory corruption from happening and adds a debug message to track
> >> occurence of this issue and/or confirm a proper fix.
> >
> > It'd be really good to document a topology that causes this "add same link forever" problem to occur. I.e. in terms of the router and network LSA content. Ideally, to find a very simple topology that triggers this.
>
> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
>
> r1
> /\
> / \
> r2 r3
> /\ /\
> / \/ \
> r4 r5 r6
> /\ /\ /\
> / \/ \/ \
> r7 r8 r9 r10
> \ /\ /\ /
> \/ \/ \/
> r11 r12 r13
> \ /\ /
> \/ \/
> r14 r15
> \ /
> \/
> r16
>
> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.

It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.

Jocke

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


sfeldma at cumulusnetworks

Jul 31, 2012, 3:15 PM

Post #2 of 8 (802 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [In reply to]

On Jul 31, 2012, at 2:31 PM, Joakim Tjernlund wrote:

>> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
>>
>> r1
>> /\
>> / \
>> r2 r3
>> /\ /\
>> / \/ \
>> r4 r5 r6
>> /\ /\ /\
>> / \/ \/ \
>> r7 r8 r9 r10
>> \ /\ /\ /
>> \/ \/ \/
>> r11 r12 r13
>> \ /\ /
>> \/ \/
>> r14 r15
>> \ /
>> \/
>> r16
>>
>> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
>> combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.
>>
> It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.

I re-tested 13x13 with the above fix on top of your 4 SPF patches, and test PASSES.

-scott
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 1, 2012, 2:41 AM

Post #3 of 8 (791 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [In reply to]

Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 00:15:24:
>
> On Jul 31, 2012, at 2:31 PM, Joakim Tjernlund wrote:
>
> >> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
> >>
> >> r1
> >> /\
> >> / \
> >> r2 r3
> >> /\ /\
> >> / \/ \
> >> r4 r5 r6
> >> /\ /\ /\
> >> / \/ \/ \
> >> r7 r8 r9 r10
> >> \ /\ /\ /
> >> \/ \/ \/
> >> r11 r12 r13
> >> \ /\ /
> >> \/ \/
> >> r14 r15
> >> \ /
> >> \/
> >> r16
> >>
> >> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
> >> combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.
> >>
> > It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.
>
> I re-tested 13x13 with the above fix on top of your 4 SPF patches, and test PASSES.

Nice! If you got the time, could you back out the
ospfd: Do not fall back to intervening router
and see if that makes the error come back?

Jocke

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


sfeldma at cumulusnetworks

Aug 1, 2012, 11:06 AM

Post #4 of 8 (797 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [In reply to]

On Aug 1, 2012, at 2:41 AM, Joakim Tjernlund wrote:

> Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 00:15:24:
>>
>> On Jul 31, 2012, at 2:31 PM, Joakim Tjernlund wrote:
>>
>>>> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
>>>>
>>>> r1
>>>> /\
>>>> / \
>>>> r2 r3
>>>> /\ /\
>>>> / \/ \
>>>> r4 r5 r6
>>>> /\ /\ /\
>>>> / \/ \/ \
>>>> r7 r8 r9 r10
>>>> \ /\ /\ /
>>>> \/ \/ \/
>>>> r11 r12 r13
>>>> \ /\ /
>>>> \/ \/
>>>> r14 r15
>>>> \ /
>>>> \/
>>>> r16
>>>>
>>>> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
>>>> combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.
>>>>
>>> It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.
>>
>> I re-tested 13x13 with the above fix on top of your 4 SPF patches, and test PASSES.
>
> Nice! If you got the time, could you back out the
> ospfd: Do not fall back to intervening router
> and see if that makes the error come back?

I'm not quite sure what you're asking, but I did try same 13x13 test without the above fix and with your 4 SPF patches, and the test FAILS, as before.

-scott


_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 1, 2012, 1:57 PM

Post #5 of 8 (787 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [In reply to]

Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 20:06:29:
>
>
> On Aug 1, 2012, at 2:41 AM, Joakim Tjernlund wrote:
>
> > Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 00:15:24:
> >>
> >> On Jul 31, 2012, at 2:31 PM, Joakim Tjernlund wrote:
> >>
> >>>> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
> >>>>
> >>>> r1
> >>>> /\
> >>>> / \
> >>>> r2 r3
> >>>> /\ /\
> >>>> / \/ \
> >>>> r4 r5 r6
> >>>> /\ /\ /\
> >>>> / \/ \/ \
> >>>> r7 r8 r9 r10
> >>>> \ /\ /\ /
> >>>> \/ \/ \/
> >>>> r11 r12 r13
> >>>> \ /\ /
> >>>> \/ \/
> >>>> r14 r15
> >>>> \ /
> >>>> \/
> >>>> r16
> >>>>
> >>>> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
> >>>> combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.
> >>>>
> >>> It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.
> >>
> >> I re-tested 13x13 with the above fix on top of your 4 SPF patches, and test PASSES.
> >
> > Nice! If you got the time, could you back out the
> > ospfd: Do not fall back to intervening router
> > and see if that makes the error come back?
>
> I'm not quite sure what you're asking, but I did try same 13x13 test without the above fix and with your 4 SPF patches, and the test FAILS, as before.

hmm, there seems to be a misunderstanding here. Lets start over.
Start with current quagga tree + your own get_next_link() in nexthop_calculation.

Then remove the patch
ospfd: avoid exhausting memory with OSPF vertices (BZ#476)
Run test and if that passes, remove the patch
ospfd: Do not fall back to intervening router
and tell if that passes or fails.

Jocke

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


sfeldma at cumulusnetworks

Aug 1, 2012, 2:03 PM

Post #6 of 8 (794 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [In reply to]

On Aug 1, 2012, at 1:57 PM, Joakim Tjernlund wrote:

> Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 20:06:29:
>>
>>
>> On Aug 1, 2012, at 2:41 AM, Joakim Tjernlund wrote:
>>
>>> Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 00:15:24:
>>>>
>>>> On Jul 31, 2012, at 2:31 PM, Joakim Tjernlund wrote:
>>>>
>>>>>> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
>>>>>>
>>>>>> r1
>>>>>> /\
>>>>>> / \
>>>>>> r2 r3
>>>>>> /\ /\
>>>>>> / \/ \
>>>>>> r4 r5 r6
>>>>>> /\ /\ /\
>>>>>> / \/ \/ \
>>>>>> r7 r8 r9 r10
>>>>>> \ /\ /\ /
>>>>>> \/ \/ \/
>>>>>> r11 r12 r13
>>>>>> \ /\ /
>>>>>> \/ \/
>>>>>> r14 r15
>>>>>> \ /
>>>>>> \/
>>>>>> r16
>>>>>>
>>>>>> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
>>>>>> combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.
>>>>>>
>>>>> It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.
>>>>
>>>> I re-tested 13x13 with the above fix on top of your 4 SPF patches, and test PASSES.
>>>
>>> Nice! If you got the time, could you back out the
>>> ospfd: Do not fall back to intervening router
>>> and see if that makes the error come back?
>>
>> I'm not quite sure what you're asking, but I did try same 13x13 test without the above fix and with your 4 SPF patches, and the test FAILS, as before.
>
> hmm, there seems to be a misunderstanding here. Lets start over.
> Start with current quagga tree + your own get_next_link() in nexthop_calculation.
>
> Then remove the patch
> ospfd: avoid exhausting memory with OSPF vertices (BZ#476)

The test fails with that setup.

> Run test and if that passes, remove the patch
> ospfd: Do not fall back to intervening router
> and tell if that passes or fails.


_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


joakim.tjernlund at transmode

Aug 1, 2012, 2:09 PM

Post #7 of 8 (792 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [In reply to]

Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 23:03:33:

>
>
> On Aug 1, 2012, at 1:57 PM, Joakim Tjernlund wrote:
>
> > Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 20:06:29:
> >>
> >>
> >> On Aug 1, 2012, at 2:41 AM, Joakim Tjernlund wrote:
> >>
> >>> Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 00:15:24:
> >>>>
> >>>> On Jul 31, 2012, at 2:31 PM, Joakim Tjernlund wrote:
> >>>>
> >>>>>> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
> >>>>>>
> >>>>>> r1
> >>>>>> /\
> >>>>>> / \
> >>>>>> r2 r3
> >>>>>> /\ /\
> >>>>>> / \/ \
> >>>>>> r4 r5 r6
> >>>>>> /\ /\ /\
> >>>>>> / \/ \/ \
> >>>>>> r7 r8 r9 r10
> >>>>>> \ /\ /\ /
> >>>>>> \/ \/ \/
> >>>>>> r11 r12 r13
> >>>>>> \ /\ /
> >>>>>> \/ \/
> >>>>>> r14 r15
> >>>>>> \ /
> >>>>>> \/
> >>>>>> r16
> >>>>>>
> >>>>>> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
> >>>>>> combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.
> >>>>>>
> >>>>> It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.
> >>>>
> >>>> I re-tested 13x13 with the above fix on top of your 4 SPF patches, and test PASSES.
> >>>
> >>> Nice! If you got the time, could you back out the
> >>> ospfd: Do not fall back to intervening router
> >>> and see if that makes the error come back?
> >>
> >> I'm not quite sure what you're asking, but I did try same 13x13 test without the above fix and with your 4 SPF patches, and the test FAILS, as before.
> >
> > hmm, there seems to be a misunderstanding here. Lets start over.
> > Start with current quagga tree + your own get_next_link() in nexthop_calculation.
> >
> > Then remove the patch
> > ospfd: avoid exhausting memory with OSPF vertices (BZ#476)
>
> The test fails with that setup.

OK, thanks for confirming that. The
ospfd: avoid exhausting memory with OSPF vertices (BZ#476)
really feels like a workaround for another bug, yet to be found.

Jocke

_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev


sfeldma at cumulusnetworks

Aug 1, 2012, 2:35 PM

Post #8 of 8 (789 views)
Permalink
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [In reply to]

On Aug 1, 2012, at 2:09 PM, Joakim Tjernlund wrote:

> Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 23:03:33:
>
>>
>>
>> On Aug 1, 2012, at 1:57 PM, Joakim Tjernlund wrote:
>>
>>> Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 20:06:29:
>>>>
>>>>
>>>> On Aug 1, 2012, at 2:41 AM, Joakim Tjernlund wrote:
>>>>
>>>>> Scott Feldman <sfeldma [at] cumulusnetworks> wrote on 2012/08/01 00:15:24:
>>>>>>
>>>>>> On Jul 31, 2012, at 2:31 PM, Joakim Tjernlund wrote:
>>>>>>
>>>>>>>> The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:
>>>>>>>>
>>>>>>>> r1
>>>>>>>> /\
>>>>>>>> / \
>>>>>>>> r2 r3
>>>>>>>> /\ /\
>>>>>>>> / \/ \
>>>>>>>> r4 r5 r6
>>>>>>>> /\ /\ /\
>>>>>>>> / \/ \/ \
>>>>>>>> r7 r8 r9 r10
>>>>>>>> \ /\ /\ /
>>>>>>>> \/ \/ \/
>>>>>>>> r11 r12 r13
>>>>>>>> \ /\ /
>>>>>>>> \/ \/
>>>>>>>> r14 r15
>>>>>>>> \ /
>>>>>>>> \/
>>>>>>>> r16
>>>>>>>>
>>>>>>>> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow
>>>>>>>> combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.
>>>>>>>>
>>>>>>> It would be interesting to rerun this test with latest Quagga as the SPF code got some changes.
>>>>>>
>>>>>> I re-tested 13x13 with the above fix on top of your 4 SPF patches, and test PASSES.
>>>>>
>>>>> Nice! If you got the time, could you back out the
>>>>> ospfd: Do not fall back to intervening router
>>>>> and see if that makes the error come back?
>>>>
>>>> I'm not quite sure what you're asking, but I did try same 13x13 test without the above fix and with your 4 SPF patches, and the test FAILS, as before.
>>>
>>> hmm, there seems to be a misunderstanding here. Lets start over.
>>> Start with current quagga tree + your own get_next_link() in nexthop_calculation.
>>>
>>> Then remove the patch
>>> ospfd: avoid exhausting memory with OSPF vertices (BZ#476)
>>
>> The test fails with that setup.
>
> OK, thanks for confirming that. The
> ospfd: avoid exhausting memory with OSPF vertices (BZ#476)
> really feels like a workaround for another bug, yet to be found.

Not a workaround. ospf_spf_add_parent will happily add duplicate parent references with same nexthop. Each reference takes a little bit of memory (vertex_parent_new). Not a big deal generally, but with a topo like the "pachinko machine" above where paths fan-in and fan-out, the duplication compounds at a crazy rate. So the patch for BZ#476 just prunes the duplicates as SPF proceeds. I don't see any bug with SPF.

-scott
_______________________________________________
Quagga-dev mailing list
Quagga-dev [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-dev

Quagga dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.