
lambert at lambertfam
Nov 15, 2005, 10:35 PM
Post #1 of 2
(1389 views)
Permalink
|
|
Problem with PPPoE, post-authentication timeouts?
|
|
I am having difficulties with certain users being unable to get online or having a delay of 30 or so seconds between PPPoE authenticated and passing traffic. I think the former may be a side effect of the latter. None of my known cisco PIX users on this ATM trunk can get online. They successfully authenticate three times per minute. However, the PIX says "Could not determine remote IP address" and sends a PPP term request. When a technician goes on site and uses a Windows XP SP2 laptop at the end of the ethernet cable that was plugged into the PIX they report that they are able to get online, but there is a delay between authentication, as the same username the PIX is using, and being able to actually pass traffic. The delay is reported at between 32 and 120 seconds. There is a report of no notable delay for one attempt sometime in the middle of the test process. When I go on-site with my PowerBook, I authenticate as the same username with no notable delay between authentication and passing traffic. The Speedstream DSL modems are able to log in but they do not recieve a remote gateway IP address. So they don't know where to send traffic and the packets don't flow. We had no problems until Friday afternoon when the power company dropped the grid in half the town, including our office. The generator and UPS kept the routers and servers up just fine during the 2.5 hour outage, but one of the outlets that supported our smartjack boxes is apparantly not a generator backed circuit. This was the first outage since I've been here. One of the guys trying to help while I looked into why the T1s were down from the router perspective, unplugged the telco fiber cabinet thinking that it was the one not on a generator backed circuit. Power to the cabinet had already been down during the wait for the generator to pick up the load. It was assumed that the cabinet had enough batteries to take care of itself for 30 seconds. I don't have proof that the cabinet ever lost power, but it's possible. I think it is likely due to the fact that all our ADSL PPPoE sessions on the DS3 from that cabinet were dropped. At that point, no one was able to authenticate. Even our radius authentication for management access to IOS would not authenticate. Reloading the RADIUS server did not fix the problem so I punted and reloaded the router, after doing a "write". When the router came back up, users were able to authenticate and we thought everything was hunky dory. We had a few users still reporting difficulty authenticating, but power-cycleing there CPE fixed those that reported problems Friday afternoon. All, with the exception of one company using a PIX. That user was authenticating three times per minute but unable to keep the connection for even a second. They had a Zywall which had been pulled from service because they wanted a "business class router" sitting on the shelf next to the PIX. When we swapped it in for the PIX, their DSL connection came up and everybody was happy. So, we thought "hmm, PIX got fried." Monday morning we walked into a hellstorm of complaints from the businesses that had gone home before the power came up Friday. One of them is using a PIX. They have the same three authentications per minute symptoms as the Friday place had. These guys have more going on behind the PIX than the Friday place and a router swap isn't going to cut it. So the techs go out and play with the windows client. No problem with the DSL line, just a funky post authentication delay. Tuesday we put a Linksys cable/dsl router sitting in front of the PIX. Most things seem to work, but it's kind of ridiculous. All over town, even the areas that didn't lose power are having similar problems. For the most part a power cycle and patience are able to get them up. But the users with routers are more likely to still have problems. We have multiple reports that power cycling a linksys router didn't fix the problem but hitting the "Disconnect" button on the PPPoE status page and then hitting the "Connect" button made that site work. I can't find anything that has changed on my aggregation box, Cisco 3661, from before the power event to after. There are no changes in RADIUS from before to after. Well over 100 customers on the problem aggregation box are able to get on easily and probably don't know there was ever a problem. At the end of Monday, we had 12 open problem reports for this issue. I have 200+ users on this box. Not all of them leave their CPE powered at all times. There are some errors on the ATM interface. The FLM 150 ADM box in the fiber cabinet has a red light labelled MJ. The telco says they don't see any issues with passive monitoring of the circuit. They are going to do intrusive testing during the maintenance window. router#show int atm1/0 ATM1/0 is up, line protocol is up Hardware is RS8234 ATM DS3 MTU 4470 bytes, sub MTU 4470, BW 45000 Kbit, DLY 190 usec, reliability 255/255, txload 36/255, rxload 34/255 Encapsulation ATM, loopback not set Encapsulation(s): AAL5 1023 maximum active VCs, 230 current VCCs VC Auto Creation Disabled. VC idle disconnect time: 300 seconds Last input 00:00:00, output 00:00:00, output hang never Last clearing of "show interface" counters 1d14h Input queue: 0/75/207/0 (size/max/drops/flushes); Total output drops: 17255 Queueing strategy: Per VC Queueing 5 minute input rate 6137000 bits/sec, 1817 packets/sec 5 minute output rate 6354000 bits/sec, 1807 packets/sec 330414631 packets input, 3403682365 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 28363 input errors, 1469 CRC, 0 frame, 96 overrun, 0 ignored, 2 abort 312474242 packets output, 165638520 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 output buffer failures, 0 output buffers swapped out This is a representative radius authentication sequence from the aggregation router, Cisco 3661: Nov 15 01:56:17 router-e0 1727136: 3d11h: RADIUS(00034691): Storing nasport 268566566 in rad_db Nov 15 01:56:17 router-e0 1727137: 3d11h: RADIUS(00034691): Config NAS IP: 0.0.0.0 Nov 15 01:56:17 router-e0 1727138: 3d11h: RADIUS/ENCODE(00034691): acct_session_id: 265149 Nov 15 01:56:17 router-e0 1727139: 3d11h: RADIUS(00034691): sending Nov 15 01:56:17 router-e0 1727140: 3d11h: RADIUS/ENCODE: Best Local IP-Address 10.61.218.5 for Radius-Server 10.61.218.2 Nov 15 01:56:17 router-e0 1727141: 3d11h: RADIUS(00034691): Send Access-Request to 10.61.218.2:1645 id 21745/59, len 80 Nov 15 01:56:17 router-e0 1727142: 3d11h: RADIUS: authenticator C1 CB 72 5C 6E 0B C8 EE - A0 3A 44 8E 1D 97 00 26 Nov 15 01:56:17 router-e0 1727143: 3d11h: RADIUS: User-Name [1] 13 "domain.net" Nov 15 01:56:17 router-e0 1727144: 3d11h: RADIUS: User-Password [2] 18 Nov 15 01:56:17 router-e0 1727145: * Nov 15 01:56:17 router-e0 1727146: 3d11h: RADIUS: NAS-Port-Type [61] 6 Virtual [5] Nov 15 01:56:17 router-e0 1727147: 3d11h: RADIUS: NAS-Port [5] 6 268566566 Nov 15 01:56:17 router-e0 1727148: 3d11h: RADIUS: Connect-Info [77] 5 "dsl" Nov 15 01:56:17 router-e0 1727149: 3d11h: RADIUS: Service-Type [6] 6 Outbound [5] Nov 15 01:56:17 router-e0 1727150: 3d11h: RADIUS: NAS-IP-Address [4] 6 10.61.218.5 Nov 15 01:56:17 router-e0 1727151: 3d11h: RADIUS: Received from id 21745/59 10.61.218.2:1645, Access-Reject, len 39 Nov 15 01:56:17 router-e0 1727152: 3d11h: RADIUS: authenticator 67 A0 AD 3A AC 78 99 61 - 1E 4E 03 D1 6E 50 CB 8D Nov 15 01:56:17 router-e0 1727153: 3d11h: RADIUS: Reply-Message [18] 19 Nov 15 01:56:17 router-e0 1727154: 3d11h: RADIUS: 0D 0A 41 63 63 65 73 73 20 64 65 6E 69 65 64 0D [??Access denied?] Nov 15 01:56:17 router-e0 1727155: 3d11h: RADIUS: 0A [?] Nov 15 01:56:17 router-e0 1727156: 3d11h: RADIUS(00034691): Received from id 21745/59 Nov 15 01:56:17 router-e0 1727157: 3d11h: RADIUS/DECODE: Reply-Message fragments, 17, total 17 bytes Nov 15 01:56:17 router-e0 1727158: 3d11h: RADIUS: AAA Unsupported [150] 10 Nov 15 01:56:17 router-e0 1727159: 3d11h: RADIUS: 31 2F 30 2F 30 2F 32 2E [1/0/0/2.] Nov 15 01:56:17 router-e0 1727160: 3d11h: RADIUS(00034691): Using existing nas_port 268566566 Nov 15 01:56:17 router-e0 1727161: 3d11h: RADIUS(00034691): Config NAS IP: 0.0.0.0 Nov 15 01:56:17 router-e0 1727162: 3d11h: RADIUS/ENCODE(00034691): acct_session_id: 265149 Nov 15 01:56:17 router-e0 1727163: 3d11h: RADIUS(00034691): sending Nov 15 01:56:17 router-e0 1727164: 3d11h: RADIUS/ENCODE: Best Local IP-Address 10.61.218.5 for Radius-Server 10.61.218.2 Nov 15 01:56:17 router-e0 1727165: 3d11h: RADIUS(00034691): Send Access-Request to 10.61.218.2:1645 id 21745/60, len 97 Nov 15 01:56:17 router-e0 1727166: 3d11h: RADIUS: authenticator 39 5D 5F 1F 9C 8F 47 80 - 2F 91 17 FD 75 75 37 EC Nov 15 01:56:17 router-e0 1727167: 3d11h: RADIUS: Framed-Protocol [7] 6 PPP [1] Nov 15 01:56:17 router-e0 1727168: 3d11h: RADIUS: User-Name [1] 24 "username [at] domain" Nov 15 01:56:17 router-e0 1727169: 3d11h: RADIUS: User-Password [2] 18 * Nov 15 01:56:17 router-e0 1727170: 3d11h: RADIUS: NAS-Port-Type [61] 6 Virtual [5] Nov 15 01:56:17 router-e0 1727171: 3d11h: RADIUS: NAS-Port [5] 6 268566566 Nov 15 01:56:17 router-e0 1727172: 3d11h: RADIUS: Connect-Info [77] 5 "dsl" Nov 15 01:56:17 router-e0 1727173: 3d11h: RADIUS: Service-Type [6] 6 Framed [2] Nov 15 01:56:17 router-e0 1727174: 3d11h: RADIUS: NAS-IP-Address [4] 6 10.61.218.5 Nov 15 01:56:17 router-e0 1727175: 3d11h: RADIUS: Received from id 21745/60 10.61.218.2:1645, Access-Accept, len 44 Nov 15 01:56:17 router-e0 1727176: 3d11h: RADIUS: authenticator 9D D7 FF 4D 6F 08 70 3D - 59 FB 8B 90 DA 3E 35 3C Nov 15 01:56:17 router-e0 1727177: 3d11h: RADIUS: Service-Type [6] 6 Framed [2] Nov 15 01:56:17 router-e0 1727178: 3d11h: RADIUS: Framed-Protocol [7] 6 PPP [1] Nov 15 01:56:17 router-e0 1727179: 3d11h: RADIUS: Framed-IP-Netmask [9] 6 255.255.255.255 Nov 15 01:56:17 router-e0 1727180: 3d11h: RADIUS: Framed-IP-Address [8] 6 10.66.76.136 Nov 15 01:56:17 router-e0 1727181: 3d11h: RADIUS(00034691): Received from id 21745/60 Nov 15 01:56:17 router-e0 1727182: 3d11h: RADIUS/ENCODE(00034691): Acct-session-id pre-pended with Nas Port = 1/0/0/2.38 Nov 15 01:56:17 router-e0 1727183: 3d11h: RADIUS(00034691): Using existing nas_port 268566566 Nov 15 01:56:18 router-e0 1727184: 3d11h: RADIUS(00034691): Config NAS IP: 0.0.0.0 Nov 15 01:56:18 router-e0 1727185: 3d11h: RADIUS/ENCODE: Best Local IP-Address 10.61.218.5 for Radius-Server 10.61.218.2 Nov 15 01:56:18 router-e0 1727186: 3d11h: RADIUS/ENCODE(00034691): Acct-session-id pre-pended with Nas Port = 1/0/0/2.38 Nov 15 01:56:18 router-e0 1727187: 3d11h: RADIUS(00034691): Using existing nas_port 268566566 Nov 15 01:56:18 router-e0 1727188: 3d11h: RADIUS(00034691): Config NAS IP: 0.0.0.0 Nov 15 01:56:18 router-e0 1727189: 3d11h: RADIUS/ENCODE: Best Local IP-Address 10.61.218.5 for Radius-Server 10.61.218.2 Nov 15 01:56:18 router-e0 1727190: 3d11h: RADIUS: AAA Unsupported [150] 10 Nov 15 01:56:18 router-e0 1727191: 3d11h: RADIUS: 31 2F 30 2F 30 2F 32 2E [1/0/0/2.] $ radlast -10 username username router 268566566 10.66.76.136 Tue Nov 15 02:00 - 02:00 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 02:00 - 02:00 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:59 - 01:59 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:59 - 01:59 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:59 - 01:59 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:58 - 01:58 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:58 - 01:58 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:58 - 01:58 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:57 - 01:57 (00:00) username router 268566566 10.66.76.136 Tue Nov 15 01:57 - 01:57 (00:00) Here is a ppp negotiation debug session from the PIX: PPP virtual access open, ifc = 0 Xmit Link Control Protocol pkt, Action code is: Config Request, len is: 6 tPkt dump: 05064affaac1 LCP Option: MAGIC_NUMBER, len: 6, data: 4affaac1 Rcvd Link Control Protocol pkt, Action code is: Config Request, len is: 14 Pkt dump: 010405d40304c02305064096ed14 LCP Option: Max_Rcv_Units, len: 4, data: 05d4 LCP Option: AUTHENTICATION_TYPES, len: 4, data: c023 LCP Option: MAGIC_NUMBER, len: 6, data: 4096ed14 Xmit Link Control Protocol pkt, Action code is: Config ACK, len is: 14 Pkt dump: 010405d40304c02305064096ed14 LCP Option: Max_Rcv_Units, len: 4, data: 05d4 LCP Option: AUTHENTICATION_TYPES, len: 4, data: c023 LCP Option: MAGIC_NUMBER, len: 6, data: 4096ed14 Rcvd Link Control Protocol pkt, Action code is: Config ACK, len is: 6 Pkt dump: 05064affaac1 LCP Option: MAGIC_NUMBER, len: 6, data: 4affaac1 Xmit Link Control Protocol pkt, Action code is: Echo Request, len is: 4 Pkt dump: 4affaac1 Rcvd Link Control Protocol pkt, Action code is: Echo Reply, len is: 4 Pkt dump: 4096ed14 PPP upap rcvd authen ack: 4096ed14 Rcvd IP Control Protocol pkt, Action code is: Config Request, len is: 6 Pkt dump: 030600000000 IPCP Option: Config IP, IP = 0.0.0.0 Xmit IP Control Protocol pkt, Action code is: Config Request, len is: 6 Pkt dump: 030600000000 IPCP Option: Config IP, IP = 0.0.0.0 Xmit IP Control Protocol pkt, Action code is: Config Request, len is: 6 Pkt dump: 030600000000 IPCP Option: Config IP, IP = 0.0.0.0 Xmit IP Control Protocol pkt, Action code is: Config Reject, len is: 6 Pkt dump: 030600000000 IPCP Option: Config IP, IP = 0.0.0.0 Rcvd IP Control Protocol pkt, Action code is: Config NAK, len is: 6 Pkt dump: 030641424c88 IPCP Option: Config IP, IP = 10.66.76.136 Xmit IP Control Protocol pkt, Action code is: Config Request, len is: 6 Pkt dump: 030641424c88 IPCP Option: Config IP, IP = 10.66.76.136 Rcvd IP Control Protocol pkt, Action code is: Config Request, len is: 0 Xmit IP Control Protocol pkt, Action code is: Config ACK, len is: 0 Rcvd IP Control Protocol pkt, Action code is: Config ACK, len is: 6 Pkt dump: 030641424c88 IPCP Option: Config IP, IP = 10.66.76.136 Could not determine remote IP address Xmit IP Control Protocol pkt, Action code is: Termination Request, len is: 37 Pkt dump: 436f756c64206e6f742064657465726d696e652072656d6f74652049502061646472657373 PPP va close, device = 1 Xmit Link Control Protocol pkt, Action code is: Termination Request, len is: 15 Pkt dump: 50656572205465726d696e61746564 RADIUS users file: username Auth-Type = System Service-Type = Framed-User, Framed-Protocol = PPP, Framed-Netmask = 255.255.255.255, Framed-Address = 10.66.76.136 DEFAULT Auth-Type = System, Group = "dsl", Simultaneous-Use = 5 Service-Type = Framed-User, Framed-Protocol = PPP, Framed-Netmask = 255.255.255.255, Ascend-Client-Primary-DNS = 10.61.218.2, Ascend-Client-Secondary-DNS = 10.61.218.3, Ascend-Client-Assign-DNS = DNS-Assign-Yes, # Ascend-Data-Filter = "ip in forward tcp est", # Ascend-Data-Filter = "ip in forward dstip 10.61.218.4/32", # Ascend-Data-Filter = "ip in drop tcp dstport = 25", # Ascend-Data-Filter = "ip in forward", Session-Timeout = 0, Port-Limit = 1, Framed-MTU = 1500 (The Ascend-Data-Filter lines were originally active. I took them out in an attempt to simplify during troubleshooting.) I've munged the IPs, username, and domain name consistently. If I've given away my passwords in the encoded data, please let me know. I would appreciate any insight you guys can provide. This stuff is still over my head. It only seems to affect our core town where the power outage occured. The power event region was the East half of the town. However the complaints seem to be in the southern half of the town, in and out of the power event region. This router services towns all over the state without, registerred, complaints from outside our town. That and the lack of config changes from working to non-working have me thinking it could be a problem with the DSLAMs in the power affected areas. Maybe some of the DSLAMs service into the non-power affected region? -- Scott Lambert KC5MLE Unix SysAdmin lambert [at] lambertfam _______________________________________________ cisco-nas mailing list cisco-nas [at] puck https://puck.nether.net/mailman/listinfo/cisco-nas
|