
joakim.tjernlund at transmode
Aug 3, 2012, 9:19 AM
Post #21 of 40
(601 views)
Permalink
|
|
Re: [PATCH RFC] OSPF vertices memory exhaustion
[In reply to]
|
|
Paul Jakma <paul [at] jakma> wrote on 2012/08/03 17:21:01: > > On Fri, 3 Aug 2012, Joakim Tjernlund wrote: > > > Well, we are not following it either, this nexthop_calculation tries to > > follow the it exactly, even quoting text from the RFC > > Sigh. It doesn't deviate from the RFC. > > The RFC says: > > If there is an intervening router: > <inherit> > Else if parent vertex is the root: > <do the appropriate nexthop calculation> > Else the parent vertex is a network that > directly connects the calculating router to the destination > router: > <do the appropriate nexthop calculation> > > Our code does: > > If parent is the root: > If the W vertex is a router: > <do the appropriate nexthop calculation> > Else if W is a network: > <do the appropriate nexthop calculation> > return > Else if the W vertex is a network: > For every parent of W, if it's a root: > <do the appropriate nexthop calculation> > return. > > /* intervening case */ > for every parent of W > <inherit> > > The code differs significantly from the RFC in the flow of its logic. To > say it differs from the RFC is in the strict sense correct. I am sure a > lot of code doesn't follow the logic flow of RFCs strictly - there is a > reason why human-language RFCs can take a lot of effort to turn into > executable machine-language. Submitting a patch to revert a change because > it doesn't follow the RFC is non-sensical, when the code at no stages > matches the RFC, before or after any of these changes. > > However, it seems to stick to the spirit. > > Bug #330 argued that <do the appropriate nexthop calculation> could fail > to find a suitable nexthop - note that this is a very _*significant*_ > difference from the logic in the RFC (the text you keep pointing out does > not consider failure of that process is possible). The problem being then > that it would return out of nexthop-calculation from the for-loop - having > added no next-hop. > > However, there might have been other parents connecting W to the root, > which we could explore. Failing that, there may be intervening parent > paths we can take. Thus, that return needed to be made conditional on > whether nexthop-calc worked: > > If parent is the root: > If the W vertex is a router: > <do the appropriate nexthop calculation> > Else if W is a network: > <do the appropriate nexthop calculation> > return > Else if the W vertex is a network: > For every parent of W, if it's a root: > if <do the appropriate nexthop calculation> succeeds > added = true > If added is true > return > > /* intervening case */ > for every parent of W > <inherit> > > Note again, the RFC assumes that the "appropriate nexthop-calc" part > (which involves finding the link back) can not fail. Again, the RFC is not > code, and even if it were, it is not our code. > > The RFC goes into the last "Else if the W vertex is a network and it has a > parent that's root" branch assuming it must work out (cause SPF > exploration should have checked that and filtered out). So of course, in > the RFC, there is no provision for leaving that branch. > > In our code, that's not true. We can go into that branch and find a W with > a parent to root that isn't a valid V-W vertex, so we shouldn't consider > it at all. We need to continue. > > Actually, I know why we have to do the W->V check at this stage, because W > can be connected by both an invalid V->W and other valid, non-root-parent > ones. SPF next will filter out exploration of Ws via the invalid links, > but when it finds Ws via valid ones, the nexthop calculation will still > run across the invalid ones when trying to find the correct one. I think > that's why this is so. Ah, now I get what you are doing. > > > I did read it all and I wrote in the patch why thought there was a bug > > still. It is already clear that the intention and the actual fix from > > bug 330 is very different, the fix completely botched SPF. > > You say that, but you havn't shown this at all. Assertions don't become > truth through repetition. You may be right, but here you need to actually > persuade me, with an argument that's a bit more specific on details and > reasoning. I was, referring to the changes in bug 330 and there was a bug in there. You corrected it your self later and documented it in "The difference a line makes" > > If it comes down to the 2 of us just saying "I'm right", well then I'm > going to take my side (for a number of reasons, arbitrary and not so > arbitrary). No, now I understand what you were trying to do and why. This was not at all clear from reading bug 330 and later your blog so it looked like a bug and from your replies I didn't think you understood what I meant. > > > Bug 330 is has a lot of stuff, to much to make proper sense. > > It has this this hack which also solves the problem for the reporter. > > - pqueue_enqueue (w, candidate); > > + if(w->parents->count != 0) > > + pqueue_enqueue (w, candidate); > > Then we have your solution which we know is broken! > > For a value of "we" that equals "Joakim". Not quite, the context was bug 330. I was trying to explain that one could not trust the outcome of bug 330 as you yourself found a bug later and that there might be more. > > > I explained in the patch > > Assertions are not explanations. The top part was an assertion, meant to go in the commit msg if the patch was accepted. The part below --- was me trying to explain how I reasoned, somewhat terse though. > > I shall revert your revert. The previous code was tested for both the > failing case (which in bug #330 was due to differing routers not having > converged - it optimises that case alone, eliminating a temporary > blackhole, and had no effect when converged). Please do. > > To really make this code follow the spirit of the RFC, nexthop-calculation > shouldn't have to worry /at all/ about back-links not being found. The > higher-level spf_next function should have filtered those out. Precisely, I just got to that conclusion too. > > Patches to SPF on this topic should try address that (or explain why > nexthop-calc has to be concerned with it - I failed to document that at > the time, or perhaps I just didn't realise). Yes, maybe backlink checks should be moved to nexthop_calculation for impl. reasons. > > Further, the SPF code /really/ needs unit-tests. It's clear this code is > hard to understand - I think not just because of any problems in the > structure of the code (which may be my fault), but just cause there are a > lot of details to OSPF SPF. I would strongly suggest unit-tests are > written for this code, /before/ any significant further changes are made > to the SPF code in the repo. > > We really need a way to feed an LSDB into the SPF code and see the result. > Which means we need an easy way to construct LSDBs and/or dump LSDBs from > running networks. > > Also, I have to say, your extreme confidence in your abilities and > scepticism in others causes people to have to waste a lot of time arguing > with you. Sorry about that, I get a bit carried away sometimes and it seldom goes this far. I think you are an exception because in the past I felt many times I never got through to you or just ignored. _______________________________________________ Quagga-dev mailing list Quagga-dev [at] lists http://lists.quagga.net/mailman/listinfo/quagga-dev
|