Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Quagga: Users

bgp stuck in clearing state

 

 

Quagga users RSS feed   Index | Next | Previous | View Threaded


mike at sentex

Nov 20, 2006, 5:24 PM

Post #1 of 16 (2585 views)
Permalink
bgp stuck in clearing state

I was trying out the latest snapshot from the cvs with a simple config


RELENG_5-bgpd# show run

Current configuration:
!
hostname RELENG_5-bgpd
password test
!
router bgp 2
bgp router-id 192.168.43.34
network 99.99.99.99/32
redistribute static
neighbor 192.168.44.223 remote-as 1
neighbor 192.168.44.223 update-source 192.168.44.2
neighbor 192.168.44.223 soft-reconfiguration inbound
neighbor 192.168.44.223 prefix-list ROUTE-IN in
neighbor 192.168.44.223 prefix-list ROUTE-OUT out
!
ip prefix-list ROUTE-OUT seq 10 permit any
!
line vty
!
end
RELENG_5-bgpd#

When the other side clears the bgp session, the peer on occasion gets
stuck in the clearing state. Even if I shut the interface, and then
unshut it, it will go back to the "clearing" state.



RELENG_5-bgpd# sh ip bgp sum
BGP router identifier 192.168.43.34, local AS number 2
RIB entries 388615, using 24 MiB of memory
Peers 1, using 2508 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ
Up/Down State/PfxRcd
192.168.44.223 4 1 5200 5157 0 0 1 00:12:30 Clearing

Total number of neighbors 1
RELENG_5-bgpd# show log
Syslog logging: disabled
Stdout logging: disabled
Monitor logging: level debugging
File logging: disabled
Protocol name: BGP
Record priority: disabled
RELENG_5-bgpd# conf t
RELENG_5-bgpd(config)# router bgp 1
BGP is already running; AS is 2
RELENG_5-bgpd(config)# router bgp 2
RELENG_5-bgpd(config-router)# nei 192.168.44.223 shut
RELENG_5-bgpd(config-router)#
RELENG_5-bgpd# wr
Configuration saved to /usr/local/etc/quagga/bgpd.conf
RELENG_5-bgpd# sh ip bgp sum
BGP router identifier 192.168.43.34, local AS number 2
RIB entries 388615, using 24 MiB of memory
Peers 1, using 2508 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ
Up/Down State/PfxRcd
192.168.44.223 4 1 5200 5157 0 0 1 00:12:51
Idle (Admin)

Total number of neighbors 1
RELENG_5-bgpd# conf t
RELENG_5-bgpd(config)# router bgp 2
RELENG_5-bgpd(config-router)# no neighbor 192.168.44.223 shutdown
RELENG_5-bgpd(config-router)#
RELENG_5-bgpd# wr
Configuration saved to /usr/local/etc/quagga/bgpd.conf
RELENG_5-bgpd# sh ip bgp sum
BGP router identifier 192.168.43.34, local AS number 2
RIB entries 388615, using 24 MiB of memory
Peers 1, using 2508 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ
Up/Down State/PfxRcd
192.168.44.223 4 1 5200 5157 0 0 1 00:17:39 Clearing

Total number of neighbors 1
RELENG_5-bgpd#

RELENG_5-bgpd# sh work
List (ms) Q. Runs Cycle Counts
P Items Hold Total Best Gran. Avg. Name
0 50 11498 35910 11972 23608 process_main_queue
0 50 0 0 1 0 process_rsclient_queue
0 10 198 33176 8320 18645 clear 192.168.44.223


---Mike


--------------------------------------------------------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike [at] sentex
Providing Internet since 1994 www.sentex.net
Cambridge, Ontario Canada www.sentex.net/mike

_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


mike at sentex

Nov 21, 2006, 5:05 AM

Post #2 of 16 (2538 views)
Permalink
Re: bgp stuck in clearing state [In reply to]

At 08:24 PM 11/20/2006, Mike Tancsa wrote:
>I was trying out the latest snapshot from the cvs with a simple config

This is pretty easy to reproduce. On one of the peers, if I make it
busy forwarding a lot of packets/s I can trigger the broken / stuck
state. Is there any way to work around this ? Any other debugging
info I can provide ?

---Mike



>RELENG_5-bgpd# show run
>
>Current configuration:
>!
>hostname RELENG_5-bgpd
>password test
>!
>router bgp 2
> bgp router-id 192.168.43.34
> network 99.99.99.99/32
> redistribute static
> neighbor 192.168.44.223 remote-as 1
> neighbor 192.168.44.223 update-source 192.168.44.2
> neighbor 192.168.44.223 soft-reconfiguration inbound
> neighbor 192.168.44.223 prefix-list ROUTE-IN in
> neighbor 192.168.44.223 prefix-list ROUTE-OUT out
>!
>ip prefix-list ROUTE-OUT seq 10 permit any
>!
>line vty
>!
>end
>RELENG_5-bgpd#
>
>When the other side clears the bgp session, the peer on occasion
>gets stuck in the clearing state. Even if I shut the interface, and
>then unshut it, it will go back to the "clearing" state.
>
>
>
>RELENG_5-bgpd# sh ip bgp sum
>BGP router identifier 192.168.43.34, local AS number 2
>RIB entries 388615, using 24 MiB of memory
>Peers 1, using 2508 bytes of memory
>
>Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ
>Up/Down State/PfxRcd
>192.168.44.223 4 1 5200 5157 0 0 1 00:12:30 Clearing
>
>Total number of neighbors 1
>RELENG_5-bgpd# show log
>Syslog logging: disabled
>Stdout logging: disabled
>Monitor logging: level debugging
>File logging: disabled
>Protocol name: BGP
>Record priority: disabled
>RELENG_5-bgpd# conf t
>RELENG_5-bgpd(config)# router bgp 1
>BGP is already running; AS is 2
>RELENG_5-bgpd(config)# router bgp 2
>RELENG_5-bgpd(config-router)# nei 192.168.44.223 shut
>RELENG_5-bgpd(config-router)#
>RELENG_5-bgpd# wr
>Configuration saved to /usr/local/etc/quagga/bgpd.conf
>RELENG_5-bgpd# sh ip bgp sum
>BGP router identifier 192.168.43.34, local AS number 2
>RIB entries 388615, using 24 MiB of memory
>Peers 1, using 2508 bytes of memory
>
>Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ
>Up/Down State/PfxRcd
>192.168.44.223 4 1 5200 5157 0 0 1 00:12:51
>Idle (Admin)
>
>Total number of neighbors 1
>RELENG_5-bgpd# conf t
>RELENG_5-bgpd(config)# router bgp 2
>RELENG_5-bgpd(config-router)# no neighbor 192.168.44.223 shutdown
>RELENG_5-bgpd(config-router)#
>RELENG_5-bgpd# wr
>Configuration saved to /usr/local/etc/quagga/bgpd.conf
>RELENG_5-bgpd# sh ip bgp sum
>BGP router identifier 192.168.43.34, local AS number 2
>RIB entries 388615, using 24 MiB of memory
>Peers 1, using 2508 bytes of memory
>
>Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ
>Up/Down State/PfxRcd
>192.168.44.223 4 1 5200 5157 0 0 1 00:17:39 Clearing
>
>Total number of neighbors 1
>RELENG_5-bgpd#
>
>RELENG_5-bgpd# sh work
> List (ms) Q. Runs Cycle Counts
>P Items Hold Total Best Gran. Avg. Name
> 0 50 11498 35910 11972 23608 process_main_queue
> 0 50 0 0 1 0 process_rsclient_queue
> 0 10 198 33176 8320 18645 clear 192.168.44.223
>
>
> ---Mike
>
>
>--------------------------------------------------------------------
>Mike Tancsa, tel +1 519 651 3400
>Sentex Communications, mike [at] sentex
>Providing Internet since 1994 www.sentex.net
>Cambridge, Ontario Canada www.sentex.net/mike
>
>_______________________________________________
>Quagga-users mailing list
>Quagga-users [at] lists
>http://lists.quagga.net/mailman/listinfo/quagga-users

_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


j.kammer at eurodata

Nov 21, 2006, 6:09 AM

Post #3 of 16 (2509 views)
Permalink
Re: bgp stuck in clearing state [In reply to]

On Tue, Nov 21, 2006 at 08:05:44AM -0500, Mike Tancsa wrote:
> At 08:24 PM 11/20/2006, Mike Tancsa wrote:
> >I was trying out the latest snapshot from the cvs with a simple config
>
> This is pretty easy to reproduce. On one of the peers, if I make it
> busy forwarding a lot of packets/s I can trigger the broken / stuck
> state. Is there any way to work around this ? Any other debugging
> info I can provide ?
>
> ---Mike
>

Turn on debugging of fsm, look whether you get something like

<date....> BGP: <ip> [FSM] Timer (holdtime timer expire)

to check whether it is an expiry of a hold timer.

With a whole lot of packets you may loose too many keepalive packets,
causeing the hold timer to expire, which then tears down the session.

Handling of an expired hold timer has a bug, see

http://bugzilla.quagga.net/show_bug.cgi?id=302

I added a possible fix on 2006-11-17 11:46 there.


If the fix fixes this (it did for me, but nobody else said something),
then the peer will not get stuck in Clearing. So you'll have no problem
any more with the other peer closing the session.


But, having a holdtime expire because of too much traffic is
another problem. You do not want the session to be dropped in that
situation, you want to continue forwarding your traffic! Possible
fixes for that could be:
- send more keepalives
- increase the holdtime
(either one, or both) but this also increases the time bgpd will take
to note a real loss of a session, which increases the time for
a re-connect.

So that trade-off won't get you far.

You may have to give bgp packets precedence other the "bulk traffic"
you are shovelling through if you really have to keep the traffic
rolling, i.e. that would mean quality of service (which is not part of
quagga, but of your underlying OS).


Regards,
Juergen.

--
Juergen Kammer j.kammer [at] eurodata
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


mike at sentex

Nov 21, 2006, 8:05 AM

Post #4 of 16 (2531 views)
Permalink
Re: bgp stuck in clearing state [In reply to]

At 09:09 AM 11/21/2006, Juergen Kammer wrote:
>On Tue, Nov 21, 2006 at 08:05:44AM -0500, Mike Tancsa wrote:
> > At 08:24 PM 11/20/2006, Mike Tancsa wrote:
> > >I was trying out the latest snapshot from the cvs with a simple config
> >
> > This is pretty easy to reproduce. On one of the peers, if I make it
> > busy forwarding a lot of packets/s I can trigger the broken / stuck
> > state. Is there any way to work around this ? Any other debugging
> > info I can provide ?
> >
> > ---Mike
> >
>
>Turn on debugging of fsm, look whether you get something like
>
><date....> BGP: <ip> [FSM] Timer (holdtime timer expire)
>
>to check whether it is an expiry of a hold timer.
>
>With a whole lot of packets you may loose too many keepalive packets,
>causeing the hold timer to expire, which then tears down the session.

Hi,
Thanks for the info! In this case, its all 3 test peers
that can get stuck on the clearing state. I have one box in the
middle (which I posted the config for) that is doing the high pps
routing and 2 test peers that just have a bunch of static routes
defined that I am then advertising to the central router. Any one of
the three can be made to get stuck in the clearing state. I added
debugging to syslog and I see the following over and over again


Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [FSM] bgp_ignore called
Nov 21 11:00:54 releng6-865 bgpd[40504]: 192.168.44.223 [Event] BGP
connection closed fd 12


>Handling of an expired hold timer has a bug, see
>
> http://bugzilla.quagga.net/show_bug.cgi?id=302
>
>I added a possible fix on 2006-11-17 11:46 there.


Super, I will give it a try !



>If the fix fixes this (it did for me, but nobody else said something),
>then the peer will not get stuck in Clearing. So you'll have no problem
>any more with the other peer closing the session.
>
>
>But, having a holdtime expire because of too much traffic is
>another problem.

Actually, that problem is fine and I can live with that. There are
only so many pps the box can deal with. The real problem I want to
work around is when the routing software gets stuck to the point
where I cant clear the session. i.e. a clear or a shut/no shut does
not even work.

---Mike

_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


mike at sentex

Nov 21, 2006, 1:50 PM

Post #5 of 16 (2514 views)
Permalink
Re: bgp stuck in clearing state [In reply to]

At 09:09 AM 11/21/2006, Juergen Kammer wrote:


>I added a possible fix on 2006-11-17 11:46 there.
>
>
>If the fix fixes this (it did for me, but nobody else said something),
>then the peer will not get stuck in Clearing. So you'll have no problem
>any more with the other peer closing the session.

Hi,
That indeed seems to have fixed it ! Are there any other side
effects with this change, or is it an obvious bug ?


---Mike

_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


j.kammer at eurodata

Nov 21, 2006, 2:40 PM

Post #6 of 16 (2511 views)
Permalink
Re: [SOLVED] bgp stuck in clearing state [In reply to]

On Tue, Nov 21, 2006 at 04:50:32PM -0500, Mike Tancsa wrote:
> At 09:09 AM 11/21/2006, Juergen Kammer wrote:
>
>
> >I added a possible fix on 2006-11-17 11:46 there.
> >
> >
> >If the fix fixes this (it did for me, but nobody else said something),
> >then the peer will not get stuck in Clearing. So you'll have no problem
> >any more with the other peer closing the session.
>
> Hi,
> That indeed seems to have fixed it ! Are there any other side
> effects with this change, or is it an obvious bug ?

Splendid! A confirmed bug fix!


The only situation affected is when a hold time expire happens.

It is a bug, but not an obvious one; you have to think a bit in circles to
find this ;-). I happened to stumble upon this in a test environment
with a quagga hosted an a virtual machine - and because the time on the
virtual machine is not reliant, once a day the hold time expired on
its peers, and they got stuck in Clearing. The fix Paul did cleaned up
another race, but this one was still deeper covered. Paul fixed a
missing Clearing_Completed because of a race, this here is a bgp_stop
which never gets executed - so this time the Clearing_Completed is missing
because it never gets generated by bgp_stop.

The question is whether to fix *entering* Clearing, either by a
change in the state machine (as I did), or by a change in the routine
setting up for bgp_stop when a hold time expire happens,
which are both quite reliably without side effects, or handling events
*in Clearing*, which also could be used to solve this but would have to be
thought over more carefully.
... Paul, are you reading this?

Kind regards,
Juergen.
--
Juergen Kammer j.kammer [at] eurodata
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


mike at sentex

Nov 28, 2006, 7:58 AM

Post #7 of 16 (2496 views)
Permalink
Re: [SOLVED] bgp stuck in clearing state [In reply to]

At 05:40 PM 11/21/2006, Juergen Kammer wrote:
>On Tue, Nov 21, 2006 at 04:50:32PM -0500, Mike Tancsa wrote:
> > At 09:09 AM 11/21/2006, Juergen Kammer wrote:
> >
> >
> > >I added a possible fix on 2006-11-17 11:46 there.
> > >
> > >
> > >If the fix fixes this (it did for me, but nobody else said something),
> > >then the peer will not get stuck in Clearing. So you'll have no problem
> > >any more with the other peer closing the session.
> >
> > Hi,
> > That indeed seems to have fixed it ! Are there any other side
> > effects with this change, or is it an obvious bug ?
>
>Splendid! A confirmed bug fix!
>
>
>The only situation affected is when a hold time expire happens.
>
>It is a bug, but not an obvious one; you have to think a bit in circles to
>find this ;-). I happened to stumble upon this in a test environment
>with a quagga hosted an a virtual machine - and because the time on the
>virtual machine is not reliant, once a day the hold time expired on
>its peers, and they got stuck in Clearing. The fix Paul did cleaned up
>another race, but this one was still deeper covered. Paul fixed a
>missing Clearing_Completed because of a race, this here is a bgp_stop
>which never gets executed - so this time the Clearing_Completed is missing
>because it never gets generated by bgp_stop.
>
>The question is whether to fix *entering* Clearing, either by a
>change in the state machine (as I did), or by a change in the routine
>setting up for bgp_stop when a hold time expire happens,
>which are both quite reliably without side effects, or handling events
>*in Clearing*, which also could be used to solve this but would have to be
>thought over more carefully.
>... Paul, are you reading this?



I have it deployed on one of my ibgp routers and so far so good. I
havent seen any ill effects.

Any thoughts on including it in the cvs ?

---Mike

_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


paul at clubi

Dec 4, 2006, 1:42 PM

Post #8 of 16 (2489 views)
Permalink
Re: [SOLVED] bgp stuck in clearing state [In reply to]

On Tue, 21 Nov 2006, Juergen Kammer wrote:

> It is a bug, but not an obvious one; you have to think a bit in
> circles to find this ;-). I happened to stumble upon this in a
> test environment with a quagga hosted an a virtual machine - and
> because the time on the virtual machine is not reliant, once a day
> the hold time expired on its peers, and they got stuck in Clearing.
> The fix Paul did cleaned up another race, but this one was still
> deeper covered. Paul fixed a missing Clearing_Completed because of
> a race, this here is a bgp_stop which never gets executed - so this
> time the Clearing_Completed is missing because it never gets
> generated by bgp_stop.

Urg, wowser, yes. That's a silly bug. Clearing state assumes:

- can only be entered from Established
- clear_route_all gets called (ie via bgp_stop())

Technically the state machine also is buggy for
ConnectRetry_timer_expired and TCP_connection_open_failed, but those
events should never be raised in Established in practice.

> The question is whether to fix *entering* Clearing, either by a
> change in the state machine (as I did), or by a change in the routine
> setting up for bgp_stop when a hold time expire happens,
> which are both quite reliably without side effects, or handling events
> *in Clearing*, which also could be used to solve this but would have to be
> thought over more carefully.

Yeah, might it be more robust to handle this unconditionally on
transition into Clearing I wonder? E.g. as in:

http://hibernia.jakma.org/~paul/patches/quagga-bgpd-clearing-miss.diff

?

Thanks for looking into this bug.

regards,
--
Paul Jakma paul [at] clubi paul [at] jakma Key ID: 64A2FF6A
Fortune:
A sect or party is an elegant incognito devised to save a man from
the vexation of thinking.
-- Ralph Waldo Emerson, Journals, 1831
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


j.kammer at eurodata

Dec 6, 2006, 12:45 AM

Post #9 of 16 (2494 views)
Permalink
Re: other solution for bgp stuck in clearing state [In reply to]

On Mon, Dec 04, 2006 at 09:42:59PM +0000, Paul Jakma wrote:
> Yeah, might it be more robust to handle this unconditionally on
> transition into Clearing I wonder? E.g. as in:
>
> http://hibernia.jakma.org/~paul/patches/quagga-bgpd-clearing-miss.diff
>
> ?
>

Tried this, i.e. changed my "Established" to "Clearing" again, and put
your patch in.

Stopped a running (previous) quagga, and started the thusly patched
one.

Result:
Boom.

I.e.:

2006/12/06 09:32:36 BGP: BGPd 0.99.5 starting: vty [at] 260, bgp [at] 17
2006/12/06 09:32:39 BGP: 10.1.28.2 [Error] State Idle is not Established, but ro
utes were cleared - bug!
2006/12/06 09:32:39 BGP: Assertion `peer->status == 6' failed in file bgp_route.
c, line 2655, function bgp_clear_route
2006/12/06 09:32:39 BGP: Backtrace for 10 stack frames:
2006/12/06 09:32:39 BGP: [bt 0] /usr/lib/libzebra.so.0(zlog_backtrace+0x1f) [0xb
7ebae98]
2006/12/06 09:32:39 BGP: [bt 1] /usr/lib/libzebra.so.0(_zlog_assert_failed+0x83)
[0xb7ebb0b8]
2006/12/06 09:32:39 BGP: [bt 2] /usr/lib/quagga/bgpd(bgp_clear_route+0x150) [0x8
06d6e5]
2006/12/06 09:32:39 BGP: [bt 3] /usr/lib/quagga/bgpd(bgp_clear_route_all+0x21) [
0x806d70e]
2006/12/06 09:32:39 BGP: [bt 4] /usr/lib/quagga/bgpd(bgp_event+0xa6) [0x8064cb8]
2006/12/06 09:32:39 BGP: [bt 5] /usr/lib/quagga/bgpd [0x8063dbc]
2006/12/06 09:32:39 BGP: [bt 6] /usr/lib/libzebra.so.0(thread_call+0x62) [0xb7eb
065b]
2006/12/06 09:32:39 BGP: [bt 7] /usr/lib/quagga/bgpd(main+0x273) [0x805becf]
2006/12/06 09:32:39 BGP: [bt 8] /lib/tls/libc.so.6(__libc_start_main+0xc8) [0xb7
d22ea8]
2006/12/06 09:32:39 BGP: [bt 9] /usr/lib/quagga/bgpd [0x805bb01]

So your quagga-bgpd-clearing-miss.diff does change too much.

bgp_clear_route gets called when we are leaving Idle, oups:
you do not check whether you change into Clearing at all there, you
unconditionally clear all routes whenever a transition happens, uh, oh.


Regards,
Juergen.

--
Juergen Kammer j.kammer [at] eurodata
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


paul at clubi

Dec 6, 2006, 5:36 AM

Post #10 of 16 (2505 views)
Permalink
Re: other solution for bgp stuck in clearing state [In reply to]

On Wed, 6 Dec 2006, Juergen Kammer wrote:

>
> 2006/12/06 09:32:36 BGP: BGPd 0.99.5 starting: vty [at] 260, bgp [at] 17
> 2006/12/06 09:32:39 BGP: 10.1.28.2 [Error] State Idle is not Established, but ro
> utes were cleared - bug!
> 2006/12/06 09:32:39 BGP: Assertion `peer->status == 6' failed in file bgp_route.
> c, line 2655, function bgp_clear_route

Oh, oops, forgot about that ;).

> bgp_clear_route gets called when we are leaving Idle, oups: you do
> not check whether you change into Clearing at all there, you
> unconditionally clear all routes whenever a transition happens, uh,
> oh.

D'Oh. Ok, I didn't say I tested it. ;)

Just curious what you think of the approach at least, given your
comment on best place to do it.

regards,
--
Paul Jakma paul [at] clubi paul [at] jakma Key ID: 64A2FF6A
Fortune:
The devil finds work for idle glands.
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


j.kammer at eurodata

Dec 6, 2006, 7:13 AM

Post #11 of 16 (2494 views)
Permalink
Re: other solution for bgp stuck in clearing state [In reply to]

On Wed, Dec 06, 2006 at 01:36:15PM +0000, Paul Jakma wrote:
> On Wed, 6 Dec 2006, Juergen Kammer wrote:
>
> >
> >2006/12/06 09:32:36 BGP: BGPd 0.99.5 starting: vty [at] 260, bgp [at] 17
> >2006/12/06 09:32:39 BGP: 10.1.28.2 [Error] State Idle is not Established,
> >but ro
> >utes were cleared - bug!
> >2006/12/06 09:32:39 BGP: Assertion `peer->status == 6' failed in file
> >bgp_route.
> >c, line 2655, function bgp_clear_route
>
> Oh, oops, forgot about that ;).
>
> >bgp_clear_route gets called when we are leaving Idle, oups: you do
> >not check whether you change into Clearing at all there, you
> >unconditionally clear all routes whenever a transition happens, uh,
> >oh.
>
> D'Oh. Ok, I didn't say I tested it. ;)
>
> Just curious what you think of the approach at least, given your
> comment on best place to do it.

Hm... to do something whenever the state changes into a specific one
sounds more like sweeping something under the carpet. Would be better
to ensure that we have done the right thing when we transition into
Clearing.

A look into the transition table shows that we are entering Clearing
only after calling:
bgp_stop - does already the right thing
bgp_ignore - happens only in situations when no connection
was there anyway
bgp_fsm_holdtime_expire - does not do the right thing
bgp_stop_with_error - calls bgp_stop, see there

So, if this is right, it should suffice that bgp_fsm_holdtime_expire
does not enqueue BGP_Stop, but call bgp_stop instead (or call only
bgp_clear_route if that suffices).

Does that sound logical?

Juergen.
--
Juergen Kammer Email: j.kammer [at] eurodata
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


paul at clubi

Dec 6, 2006, 1:37 PM

Post #12 of 16 (2486 views)
Permalink
Re: other solution for bgp stuck in clearing state [In reply to]

On Wed, 6 Dec 2006, Juergen Kammer wrote:

> Hm... to do something whenever the state changes into a specific
> one sounds more like sweeping something under the carpet.

Nah, it's just to ensure that something which /must/ be done on
transition into a state *does* get done.

We could add a field to the FSM table (or add another table) to
specify actions specific to transition /into/ some state, but given
for this we're only talking about one state, we might as well as just
have it in the state-change function, rather than bloat up the FSM
table (and add more function-call indirection).

We did similar cleanups in ospfd's neighbour FSM btw, where there
were actions common to whole classes of state-changes, and we cleaned
up code by moving such actions to the state-change function, rather
than replicating code/calls across several different transition
action functions.

Updated:

http://hibernia.jakma.org/~paul/patches/quagga-bgpd-clearing-miss.diff

regards,
--
Paul Jakma paul [at] clubi paul [at] jakma Key ID: 64A2FF6A
Fortune:
During the voyage of life, remember to keep an eye out for a fair wind; batten
down during a storm; hail all passing ships; and fly your colors proudly.
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


j.kammer at eurodata

Dec 7, 2006, 6:21 AM

Post #13 of 16 (2494 views)
Permalink
Re: other solution for bgp stuck in clearing state [In reply to]

On Wed, Dec 06, 2006 at 09:37:43PM +0000, Paul Jakma wrote:
> We did similar cleanups in ospfd's neighbour FSM btw, where there
> were actions common to whole classes of state-changes, and we cleaned
> up code by moving such actions to the state-change function, rather
> than replicating code/calls across several different transition
> action functions.

OK, if there is a modus operandi for such thingies, I'm the last to
complain.

> Updated:
>
> http://hibernia.jakma.org/~paul/patches/quagga-bgpd-clearing-miss.diff

Hm, works in my testbed!

regards,
Juergen.

--
Juergen Kammer j.kammer [at] eurodata
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


Preeti.Khurana at guavus

Mar 29, 2012, 8:57 AM

Post #14 of 16 (1545 views)
Permalink
Re: BGP stuck in clearing state [In reply to]

On 29/03/12 6:44 PM, "Paul Jakma" <paul [at] jakma> wrote:

>On Thu, 29 Mar 2012, Preeti Khurana wrote:
>
>> Hi,
>
>> I am seeing a strange issue of bgp flapping on my box. Am using quagga
>> 0.99.16 version.
>
>I can't think of any specific fixes that would make a difference, but it
>could be a useful data-point to reproduce with the latest 0.99.20.1
>release.
>
>> Also the memory utilization of bgpd process becomes huge. Reaches to 36
>> G from 4 G . The show memory output at that time is :: ( Look at the
>> Link Node & Hash Bucket).
>
>> Work queue item : 471092973
>
>Sounds like it's struggling to process the route updates due to the
>flapping. I'm sure "show work-queues" shows there's a lot of work waiting
>to be done.

Its because the FSM state has reached "Clearing" and only
ClearingCompleted event can move the state in next ones.
Now none of such event comes, and the work queue items increase because of
that.

>
>Are you using a low hold-time? Perhaps it's just too low to cope with
>mass-resets.

Default value of hold time is used. Increasing that also does not help.
The peer is also having the default hold time of 180.

>
>regards,
>--
>Paul Jakma paul [at] jakma twitter: @pjakma PGP: 64A2FF6A
>Fortune:
>Personifiers of the world, unite! You have nothing to lose but Mr.
>Dignity!
> -- Bernadette Bosky
>_______________________________________________
>Quagga-users mailing list
>Quagga-users [at] lists
>http://lists.quagga.net/mailman/listinfo/quagga-users


_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


paul at jakma

Mar 30, 2012, 2:34 AM

Post #15 of 16 (1548 views)
Permalink
Re: BGP stuck in clearing state [In reply to]

On Thu, 29 Mar 2012, Preeti Khurana wrote:

> Its because the FSM state has reached "Clearing" and only
> ClearingCompleted event can move the state in next ones. Now none of
> such event comes, and the work queue items increase because of that.

Sure. The question is whether or not this queue of items is being
processed at all. E.g. have a look at "show work-queues", and "show thread
cpu" from the bgpd vty.

I.e. is this some kind of "rate work arriving > rate work being done"
backlog, or is there some bug with processing the work?

regards,
--
Paul Jakma paul [at] jakma twitter: @pjakma PGP: 64A2FF6A
Fortune:
The most disagreeable thing that your worst enemy says to your face does
not approach what your best friends say behind your back.
-- Alfred De Musset
_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users


Preeti.Khurana at guavus

Apr 4, 2012, 12:38 AM

Post #16 of 16 (1540 views)
Permalink
Re: BGP stuck in clearing state [In reply to]

On 30/03/12 3:04 PM, "Paul Jakma" <paul [at] jakma> wrote:

>On Thu, 29 Mar 2012, Preeti Khurana wrote:
>
>> Its because the FSM state has reached "Clearing" and only
>> ClearingCompleted event can move the state in next ones. Now none of
>> such event comes, and the work queue items increase because of that.
>
>Sure. The question is whether or not this queue of items is being
>processed at all. E.g. have a look at "show work-queues", and "show
>thread
>cpu" from the bgpd vty.
>
>I.e. is this some kind of "rate work arriving > rate work being done"
>backlog, or is there some bug with processing the work?

Paul,

One interesting thing I noticed is that when run the same bgp with lesser
number of routers ( 50 as compared to some 100 + routers), it runs fine
and doesn't show any such symptom. So now as workaround I have run two
bgpd processed peeing with 50 routers each. Since this was production, I
can't get you the output you are looking for.
>


_______________________________________________
Quagga-users mailing list
Quagga-users [at] lists
http://lists.quagga.net/mailman/listinfo/quagga-users

Quagga users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.