Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Cisco: VOIP

Trace files for CUCM/NTP problem

 

 

Cisco voip RSS feed   Index | Next | Previous | View Threaded


ealeatherman at gmail

Dec 13, 2010, 6:41 AM

Post #1 of 7 (2609 views)
Permalink
Trace files for CUCM/NTP problem

Hi folks,

Our operations team updated the NTP service recently (infoblox), and
right after that happened, I started getting syslog errors per below
on two different CUCM 7 clusters, both of which use that NTP server.

ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently
inaccessible or down. Verify the network between the primary and
secondary nodes. Check the status of NTP on both the primary and
secondary nodes via CLI 'utils ntp status'. If the network is fine,
try restarting NTP using CLI 'utils ntp restart'.

Looking at the status on these servers, the pub looks OK but the subs show:
utils ntp status on all secondary nodes comes up with (example):
remote refid st t when poll reach delay offset jitter
==============================================================================
*127.127.1.0 LOCAL(0) 10 l 32 64 377 0.000 0.000 0.004
10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553 0.052

Restarting NTP on all nodes fixes the problem temporarily (NTP status
goes back to normal) but only for a short time.

The NTP logs don't show anything other than what appears to be the NTP
service restarting every 30 minutes.. is this normal?
11/16/2010 23:00:02
sd_ntp|*********************************************************|<LVL::Info>
11/16/2010 23:00:02 sd_ntp| Running sd_ntp. Process Id=12302
|<LVL::Info>
11/16/2010 23:00:02
sd_ntp|*********************************************************|<LVL::Info>
11/16/2010 23:00:02 sd_ntp||<LVL::Info>
11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list -s|<LVL::Info>
11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug>
11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not changed|<LVL::Debug>
11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>
11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>

In both clusters, the pub and most or all of the subs are on the same
VLAN and physical switch.

What other traces can I look at on CM to troubleshoot this? Anyone
know if there is a debug for the process that's generating my syslog
errors?

I want to make sure it's not an error on my end and hopefully have
some better information on whats broke before I go back to the
operations group. All the IOS routers using infoblox for NTP appear to
be working just fine, so they see no problems :)

Thanks in advance!

--
Ed Leatherman
_______________________________________________
cisco-voip mailing list
cisco-voip [at] puck
https://puck.nether.net/mailman/listinfo/cisco-voip


wsisk at cisco

Dec 13, 2010, 8:50 AM

Post #2 of 7 (2562 views)
Permalink
Re: Trace files for CUCM/NTP problem [In reply to]

what version of CM? Many changes of NTP especially this one:
CSCsk70971 publisher NTP down if configured NTP down or unreliable

my interpretation:
something on the network NTP source changed
now subscribers giving error that pub is unreliable

this is expected if pub cannot sync to NTP source. what changes did they
make? it is still a viable NTP source for hte publisher? if not,
publisher will use local clock which makes it an invalid source for all
subs.

http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636


/Wes

Ed Leatherman wrote:
> Hi folks,
>
> Our operations team updated the NTP service recently (infoblox), and
> right after that happened, I started getting syslog errors per below
> on two different CUCM 7 clusters, both of which use that NTP server.
>
> ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently
> inaccessible or down. Verify the network between the primary and
> secondary nodes. Check the status of NTP on both the primary and
> secondary nodes via CLI 'utils ntp status'. If the network is fine,
> try restarting NTP using CLI 'utils ntp restart'.
>
> Looking at the status on these servers, the pub looks OK but the subs show:
> utils ntp status on all secondary nodes comes up with (example):
> remote refid st t when poll reach delay offset jitter
> ==============================================================================
> *127.127.1.0 LOCAL(0) 10 l 32 64 377 0.000 0.000 0.004
> 10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553 0.052
>
> Restarting NTP on all nodes fixes the problem temporarily (NTP status
> goes back to normal) but only for a short time.
>
> The NTP logs don't show anything other than what appears to be the NTP
> service restarting every 30 minutes.. is this normal?
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp| Running sd_ntp. Process Id=12302
> |<LVL::Info>
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp||<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list -s|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not changed|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>
>
> In both clusters, the pub and most or all of the subs are on the same
> VLAN and physical switch.
>
> What other traces can I look at on CM to troubleshoot this? Anyone
> know if there is a debug for the process that's generating my syslog
> errors?
>
> I want to make sure it's not an error on my end and hopefully have
> some better information on whats broke before I go back to the
> operations group. All the IOS routers using infoblox for NTP appear to
> be working just fine, so they see no problems :)
>
> Thanks in advance!
>
>


burns.jason at gmail

Dec 13, 2010, 3:57 PM

Post #3 of 7 (2616 views)
Permalink
Re: Trace files for CUCM/NTP problem [In reply to]

Ed,

CUCM is preferring the local clock, because your NTP reference has a Stratum
of 16!

10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553
0.052

Fix your NTP server 10.192.20.10 and you'll fix your CUCM.

-Burns

On Mon, Dec 13, 2010 at 11:50 AM, Wes Sisk <wsisk [at] cisco> wrote:

> what version of CM? Many changes of NTP especially this one:
> CSCsk70971 publisher NTP down if configured NTP down or unreliable
>
> my interpretation:
> something on the network NTP source changed
> now subscribers giving error that pub is unreliable
>
> this is expected if pub cannot sync to NTP source. what changes did they
> make? it is still a viable NTP source for hte publisher? if not, publisher
> will use local clock which makes it an invalid source for all subs.
>
>
> http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636
>
>
> /Wes
>
>
> Ed Leatherman wrote:
>
> Hi folks,
>
> Our operations team updated the NTP service recently (infoblox), and
> right after that happened, I started getting syslog errors per below
> on two different CUCM 7 clusters, both of which use that NTP server.
>
> ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently
> inaccessible or down. Verify the network between the primary and
> secondary nodes. Check the status of NTP on both the primary and
> secondary nodes via CLI 'utils ntp status'. If the network is fine,
> try restarting NTP using CLI 'utils ntp restart'.
>
> Looking at the status on these servers, the pub looks OK but the subs show:
> utils ntp status on all secondary nodes comes up with (example):
> remote refid st t when poll reach delay offset jitter
> ==============================================================================
> *127.127.1.0 LOCAL(0) 10 l 32 64 377 0.000 0.000 0.004
> 10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553 0.052
>
> Restarting NTP on all nodes fixes the problem temporarily (NTP status
> goes back to normal) but only for a short time.
>
> The NTP logs don't show anything other than what appears to be the NTP
> service restarting every 30 minutes.. is this normal?
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp| Running sd_ntp. Process Id=12302
> |<LVL::Info>
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp||<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list -s|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not changed|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>
>
> In both clusters, the pub and most or all of the subs are on the same
> VLAN and physical switch.
>
> What other traces can I look at on CM to troubleshoot this? Anyone
> know if there is a debug for the process that's generating my syslog
> errors?
>
> I want to make sure it's not an error on my end and hopefully have
> some better information on whats broke before I go back to the
> operations group. All the IOS routers using infoblox for NTP appear to
> be working just fine, so they see no problems :)
>
> Thanks in advance!
>
>
>
>
> _______________________________________________
> cisco-voip mailing list
> cisco-voip [at] puck
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>


jason.aarons at us

Dec 13, 2010, 4:12 PM

Post #4 of 7 (2578 views)
Permalink
Re: Trace files for CUCM/NTP problem [In reply to]

Have you pointed a different router/switch to your NTP server? Are they getting 16 as well? I recall a high offset/variation from clock can also make it 16.


A IOS device initially polls every 64ms, as the NTP server and client are better synced and there aren't dropped packets, this number increases to a maximum of 1024
http://www.nil.si/ipcorner/BeOnTime/
http://www.cisco.com/en/US/products/sw/iosswrel/ps1818/products_tech_note09186a008015bb3a.shtml

"while the highest level (stratum 16) usually indicates that the clock is not working or unaccessible"


From: cisco-voip-bounces [at] puck [mailto:cisco-voip-bounces [at] puck] On Behalf Of Jason Burns
Sent: Monday, December 13, 2010 6:57 PM
To: Wes Sisk
Cc: Cisco VOIP
Subject: Re: [cisco-voip] Trace files for CUCM/NTP problem

Ed,

CUCM is preferring the local clock, because your NTP reference has a Stratum of 16!

10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553 0.052

Fix your NTP server 10.192.20.10 and you'll fix your CUCM.

-Burns
On Mon, Dec 13, 2010 at 11:50 AM, Wes Sisk <wsisk [at] cisco<mailto:wsisk [at] cisco>> wrote:
what version of CM? Many changes of NTP especially this one:
CSCsk70971 publisher NTP down if configured NTP down or unreliable

my interpretation:
something on the network NTP source changed
now subscribers giving error that pub is unreliable

this is expected if pub cannot sync to NTP source. what changes did they make? it is still a viable NTP source for hte publisher? if not, publisher will use local clock which makes it an invalid source for all subs.

http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636


/Wes


Ed Leatherman wrote:

Hi folks,



Our operations team updated the NTP service recently (infoblox), and

right after that happened, I started getting syslog errors per below

on two different CUCM 7 clusters, both of which use that NTP server.



ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently

inaccessible or down. Verify the network between the primary and

secondary nodes. Check the status of NTP on both the primary and

secondary nodes via CLI 'utils ntp status'. If the network is fine,

try restarting NTP using CLI 'utils ntp restart'.



Looking at the status on these servers, the pub looks OK but the subs show:

utils ntp status on all secondary nodes comes up with (example):

remote refid st t when poll reach delay offset jitter

==============================================================================

*127.127.1.0 LOCAL(0) 10 l 32 64 377 0.000 0.000 0.004

10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553 0.052



Restarting NTP on all nodes fixes the problem temporarily (NTP status

goes back to normal) but only for a short time.



The NTP logs don't show anything other than what appears to be the NTP

service restarting every 30 minutes.. is this normal?

11/16/2010 23:00:02

sd_ntp|*********************************************************|<LVL::Info>

11/16/2010 23:00:02 sd_ntp| Running sd_ntp. Process Id=12302

|<LVL::Info>

11/16/2010 23:00:02

sd_ntp|*********************************************************|<LVL::Info>

11/16/2010 23:00:02 sd_ntp||<LVL::Info>

11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list -s|<LVL::Info>

11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug>

11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not changed|<LVL::Debug>

11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>

11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>



In both clusters, the pub and most or all of the subs are on the same

VLAN and physical switch.



What other traces can I look at on CM to troubleshoot this? Anyone

know if there is a debug for the process that's generating my syslog

errors?



I want to make sure it's not an error on my end and hopefully have

some better information on whats broke before I go back to the

operations group. All the IOS routers using infoblox for NTP appear to

be working just fine, so they see no problems :)



Thanks in advance!





_______________________________________________
cisco-voip mailing list
cisco-voip [at] puck<mailto:cisco-voip [at] puck>
https://puck.nether.net/mailman/listinfo/cisco-voip



-----------------------------------------
Disclaimer:

This e-mail communication and any attachments may contain
confidential and privileged information and is for use by the
designated addressee(s) named above only. If you are not the
intended addressee, you are hereby notified that you have received
this communication in error and that any use or reproduction of
this email or its contents is strictly prohibited and may be
unlawful. If you have received this communication in error, please
notify us immediately by replying to this message and deleting it
from your computer. Thank you.


ealeatherman at gmail

Dec 14, 2010, 8:15 AM

Post #5 of 7 (2585 views)
Permalink
Re: Trace files for CUCM/NTP problem [In reply to]

Wes,

This is what one of the pubs looks like:
admin:utils ntp status
ntpd (pid 14883) is running...

remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 LOCAL(0) 10 l 21 64 0 0.000 0.000 4000.00
10.0.1.132 .STEP. 16 u 77 64 0 0.000 0.000 4000.00
10.0.1.140 .STEP. 16 u 108 64 0 0.000 0.000 4000.00
10.0.1.148 .STEP. 16 u 117 64 0 0.000 0.000 4000.00

From my googling, the "reach" field is a bitmap that shows the
connectivity for the last 8 polling updates. 377 is the normal bit
map. Zero is obviously bad. I can watch these values progress from 0
up to 377 and then everything synchs back up. This happens on a a
somewhat periodic basis. What isn't clear to me yet, is this call
manager restarting the synchronization because it was getting values
it didn't like, or was it truly completely losing connectivity with
NTP for 8ish minutes (64 sec polling interval).

I'm running utils network capture port 123 on one of the pubs -
perhaps this will indicate if the connectivity is dropping.

I would point CUCM to a different NTP source for troubleshooting but I
don't have any maintenance windows to reset all the servers until
after the holidays, so trying to attack it from a different angle for
now.

On Mon, Dec 13, 2010 at 11:50 AM, Wes Sisk <wsisk [at] cisco> wrote:
> what version of CM?  Many changes of NTP especially this one:
> CSCsk70971    publisher NTP down if configured NTP down or unreliable
>
> my interpretation:
> something on the network NTP source changed
> now subscribers giving error that pub is unreliable
>
> this is expected if pub cannot sync to NTP source. what changes did they
> make? it is still a viable NTP source for hte publisher? if not, publisher
> will use local clock which makes it an invalid source for all subs.
>
> http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636
>
> /Wes
>
> Ed Leatherman wrote:
>
> Hi folks,
>
> Our operations team updated the NTP service recently (infoblox), and
> right after that happened, I started getting syslog errors per below
> on two different CUCM 7 clusters, both of which use that NTP server.
>
> ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently
> inaccessible or down. Verify the network between the primary and
> secondary nodes. Check the status of NTP on both the primary and
> secondary nodes via CLI 'utils ntp status'. If the network is fine,
> try restarting NTP using CLI 'utils ntp restart'.
>
> Looking at the status on these servers, the pub looks OK but the subs show:
> utils ntp status on all secondary nodes comes up with (example):
> remote refid st t when poll reach delay offset
> jitter
> ==============================================================================
> *127.127.1.0 LOCAL(0) 10 l 32 64 377 0.000 0.000
> 0.004
> 10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553
> 0.052
>
> Restarting NTP on all nodes fixes the problem temporarily (NTP status
> goes back to normal) but only for a short time.
>
> The NTP logs don't show anything other than what appears to be the NTP
> service restarting every 30 minutes.. is this normal?
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp| Running sd_ntp. Process Id=12302
> |<LVL::Info>
> 11/16/2010 23:00:02
> sd_ntp|*********************************************************|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp||<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list
> -s|<LVL::Info>
> 11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not
> changed|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>
> 11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>
>
> In both clusters, the pub and most or all of the subs are on the same
> VLAN and physical switch.
>
> What other traces can I look at on CM to troubleshoot this? Anyone
> know if there is a debug for the process that's generating my syslog
> errors?
>
> I want to make sure it's not an error on my end and hopefully have
> some better information on whats broke before I go back to the
> operations group. All the IOS routers using infoblox for NTP appear to
> be working just fine, so they see no problems :)
>
> Thanks in advance!
>
>



--
Ed Leatherman

_______________________________________________
cisco-voip mailing list
cisco-voip [at] puck
https://puck.nether.net/mailman/listinfo/cisco-voip


ealeatherman at gmail

Dec 14, 2010, 8:23 AM

Post #6 of 7 (2550 views)
Permalink
Re: Trace files for CUCM/NTP problem [In reply to]

I was about to say IOS devices are OK.. but i noticed the poll value
was 64 on the one I was reviewing... on a stable environment that
should be 1024 "steady-state" it sounds like. Some mischief is afoot.

Thanks for the NTP tips.

On Mon, Dec 13, 2010 at 7:12 PM, Jason Aarons (US)
<jason.aarons [at] us> wrote:
> Have you pointed a different router/switch to your NTP server? Are they
> getting 16 as well? I recall a high offset/variation from clock can also
> make it 16.
>
>
>
>
>
> A IOS device initially polls every 64ms, as the NTP server and client are
> better synced and there aren't dropped packets, this number increases to a
> maximum of 1024
>
> http://www.nil.si/ipcorner/BeOnTime/
>
> http://www.cisco.com/en/US/products/sw/iosswrel/ps1818/products_tech_note09186a008015bb3a.shtml
>
>
>
> “while the highest level (stratum 16) usually indicates that the clock is
> not working or unaccessible”
>
>
>
>
>
> From: cisco-voip-bounces [at] puck
> [mailto:cisco-voip-bounces [at] puck] On Behalf Of Jason Burns
> Sent: Monday, December 13, 2010 6:57 PM
> To: Wes Sisk
> Cc: Cisco VOIP
> Subject: Re: [cisco-voip] Trace files for CUCM/NTP problem
>
>
>
> Ed,
>
>
>
> CUCM is preferring the local clock, because your NTP reference has a Stratum
> of 16!
>
>
>
> 10.192.20.10    .STEP.          16 u  488  512  376    0.244   16.553
> 0.052
>
>
>
> Fix your NTP server 10.192.20.10 and you'll fix your CUCM.
>
>
>
> -Burns
>
> On Mon, Dec 13, 2010 at 11:50 AM, Wes Sisk <wsisk [at] cisco> wrote:
>
> what version of CM?  Many changes of NTP especially this one:
> CSCsk70971    publisher NTP down if configured NTP down or unreliable
>
> my interpretation:
> something on the network NTP source changed
> now subscribers giving error that pub is unreliable
>
> this is expected if pub cannot sync to NTP source. what changes did they
> make? it is still a viable NTP source for hte publisher? if not, publisher
> will use local clock which makes it an invalid source for all subs.
>
> http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636
>
>
>
> /Wes
>
> Ed Leatherman wrote:
>
> Hi folks,
>
>
>
> Our operations team updated the NTP service recently (infoblox), and
>
> right after that happened, I started getting syslog errors per below
>
> on two different CUCM 7 clusters, both of which use that NTP server.
>
>
>
> ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently
>
> inaccessible or down. Verify the network between the primary and
>
> secondary nodes.  Check the status of NTP on both the primary and
>
> secondary nodes via CLI 'utils ntp status'.  If the network is fine,
>
> try restarting NTP using CLI 'utils ntp restart'.
>
>
>
> Looking at the status on these servers, the pub looks OK but the subs show:
>
> utils ntp status on all secondary nodes comes up with (example):
>
>      remote           refid      st t when poll reach   delay   offset
> jitter
>
> ==============================================================================
>
> *127.127.1.0     LOCAL(0)        10 l   32   64  377    0.000    0.000
> 0.004
>
> 10.192.20.10    .STEP.          16 u  488  512  376    0.244   16.553
> 0.052
>
>
>
> Restarting NTP on all nodes fixes the problem temporarily (NTP status
>
> goes back to normal) but only for a short time.
>
>
>
> The NTP logs don't show anything other than what appears to be the NTP
>
> service restarting every 30 minutes.. is this normal?
>
> 11/16/2010 23:00:02
>
> sd_ntp|*********************************************************|<LVL::Info>
>
> 11/16/2010 23:00:02 sd_ntp|          Running sd_ntp. Process Id=12302
>
>                |<LVL::Info>
>
> 11/16/2010 23:00:02
>
> sd_ntp|*********************************************************|<LVL::Info>
>
> 11/16/2010 23:00:02 sd_ntp||<LVL::Info>
>
> 11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list
> -s|<LVL::Info>
>
> 11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf exists|<LVL::Debug>
>
> 11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not
> changed|<LVL::Debug>
>
> 11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>
>
> 11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>
>
>
>
> In both clusters, the pub and most or all of the subs are on the same
>
> VLAN and physical switch.
>
>
>
> What other traces can I look at on CM to troubleshoot this? Anyone
>
> know if there is a debug for the process that's generating my syslog
>
> errors?
>
>
>
> I want to make sure it's not an error on my end and hopefully have
>
> some better information on whats broke before I go back to the
>
> operations group. All the IOS routers using infoblox for NTP appear to
>
> be working just fine, so they see no problems :)
>
>
>
> Thanks in advance!
>
>
>
>
>
> _______________________________________________
> cisco-voip mailing list
> cisco-voip [at] puck
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>
>
> ________________________________
>
> Disclaimer: This e-mail communication and any attachments may contain
> confidential and privileged information and is for use by the designated
> addressee(s) named above only. If you are not the intended addressee, you
> are hereby notified that you have received this communication in error and
> that any use or reproduction of this email or its contents is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please notify us immediately by replying to this message and deleting
> it from your computer. Thank you.
>
> _______________________________________________
> cisco-voip mailing list
> cisco-voip [at] puck
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>



--
Ed Leatherman

_______________________________________________
cisco-voip mailing list
cisco-voip [at] puck
https://puck.nether.net/mailman/listinfo/cisco-voip


burns.jason at gmail

Dec 14, 2010, 12:26 PM

Post #7 of 7 (2620 views)
Permalink
Re: Trace files for CUCM/NTP problem [In reply to]

Ed,

One more thing. From all of your debug output here we can see something
different is happening with the "RefID" field.

The "RefID" field shows us the next upstream hop past the server we're
pointing to. Consider something like this:

Note - don't ever point to this server - just using it for example.

time.nist.gov (Stratum 1) <-- Your Local NTP Master (Stratum 2) <-- CUCM Pub
(Stratum 3) <-- CUCM Subs (Stratum 4)

So on your CUCM pub in this instance we would see:

Remote: Local NTP Master
Stratum: 2
RefID: time.nist.gov

This tells us your CUCM server is pointing to your local NTP server. Your
local NTP master is just 1 hop away from the root server, time.nist.gov.

In your case though, you see Stratum 16. 16 is like the infinite route in
RIP. NTP is throwing it's hands in the air and saying "I have no idea what
the hell the time it is".

Further, it's saying it's pointing at server ".STEP." This is a special
keyword that means the time is off by further than NTP can adjust for in one
single shot.

This seems to make sense with NTP restarting every 30 minutes. If we're
further out of sync than NTP can correct for, then we need to restart NTP so
the ntponeshot command can run to step the clock forwards or backwards by
several seconds or minutes at a time. Something NTP can't do alone without
the restart.

Here is an interesting experiment to find out what's happening. Remove the
NTP server entries from CUCM right at the start of the hour. Wait 4 hours.
At the start of the next 4 hours, find out how far off the CUCM clock is
from your watch (or PC clock) via "show status". Then look at the NTP server
and find out if it's time is still matching up with what you expect.

What is probably causing this is a hardware clock on the CUCM server
(motherboard) or the NTP server, drifting faster than NTP can correct for.

NTP can correct for errors of 500 parts per million. This is something like
43 seconds drift in 24 hours. If after 4 hours your CUCM server clock is off
by more than 7 seconds - then you need a new motherboard. It might be best
to wait 24 hours instead of 4 hours, just to get a more accurate idea of how
fast your clock might be drifting.

I'd be really interested to see what you find. I've replaced a few
motherboards in IBM servers for this exact problem.

-Burns

On Tue, Dec 14, 2010 at 11:23 AM, Ed Leatherman <ealeatherman [at] gmail>wrote:

> I was about to say IOS devices are OK.. but i noticed the poll value
> was 64 on the one I was reviewing... on a stable environment that
> should be 1024 "steady-state" it sounds like. Some mischief is afoot.
>
> Thanks for the NTP tips.
>
> On Mon, Dec 13, 2010 at 7:12 PM, Jason Aarons (US)
> <jason.aarons [at] us> wrote:
> > Have you pointed a different router/switch to your NTP server? Are they
> > getting 16 as well? I recall a high offset/variation from clock can also
> > make it 16.
> >
> >
> >
> >
> >
> > A IOS device initially polls every 64ms, as the NTP server and client are
> > better synced and there aren't dropped packets, this number increases to
> a
> > maximum of 1024
> >
> > http://www.nil.si/ipcorner/BeOnTime/
> >
> >
> http://www.cisco.com/en/US/products/sw/iosswrel/ps1818/products_tech_note09186a008015bb3a.shtml
> >
> >
> >
> > while the highest level (stratum 16) usually indicates that the clock is
> > not working or unaccessible
> >
> >
> >
> >
> >
> > From: cisco-voip-bounces [at] puck
> > [mailto:cisco-voip-bounces [at] puck] On Behalf Of Jason Burns
> > Sent: Monday, December 13, 2010 6:57 PM
> > To: Wes Sisk
> > Cc: Cisco VOIP
> > Subject: Re: [cisco-voip] Trace files for CUCM/NTP problem
> >
> >
> >
> > Ed,
> >
> >
> >
> > CUCM is preferring the local clock, because your NTP reference has a
> Stratum
> > of 16!
> >
> >
> >
> > 10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553
> > 0.052
> >
> >
> >
> > Fix your NTP server 10.192.20.10 and you'll fix your CUCM.
> >
> >
> >
> > -Burns
> >
> > On Mon, Dec 13, 2010 at 11:50 AM, Wes Sisk <wsisk [at] cisco> wrote:
> >
> > what version of CM? Many changes of NTP especially this one:
> > CSCsk70971 publisher NTP down if configured NTP down or unreliable
> >
> > my interpretation:
> > something on the network NTP source changed
> > now subscribers giving error that pub is unreliable
> >
> > this is expected if pub cannot sync to NTP source. what changes did they
> > make? it is still a viable NTP source for hte publisher? if not,
> publisher
> > will use local clock which makes it an invalid source for all subs.
> >
> >
> http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/8x/netstruc.html#wpmkr1185636
> >
> >
> >
> > /Wes
> >
> > Ed Leatherman wrote:
> >
> > Hi folks,
> >
> >
> >
> > Our operations team updated the NTP service recently (infoblox), and
> >
> > right after that happened, I started getting syslog errors per below
> >
> > on two different CUCM 7 clusters, both of which use that NTP server.
> >
> >
> >
> > ntpRunningStatus.sh: Primary node NTP server, OWP-PUB, is currently
> >
> > inaccessible or down. Verify the network between the primary and
> >
> > secondary nodes. Check the status of NTP on both the primary and
> >
> > secondary nodes via CLI 'utils ntp status'. If the network is fine,
> >
> > try restarting NTP using CLI 'utils ntp restart'.
> >
> >
> >
> > Looking at the status on these servers, the pub looks OK but the subs
> show:
> >
> > utils ntp status on all secondary nodes comes up with (example):
> >
> > remote refid st t when poll reach delay offset
> > jitter
> >
> >
> ==============================================================================
> >
> > *127.127.1.0 LOCAL(0) 10 l 32 64 377 0.000 0.000
> > 0.004
> >
> > 10.192.20.10 .STEP. 16 u 488 512 376 0.244 16.553
> > 0.052
> >
> >
> >
> > Restarting NTP on all nodes fixes the problem temporarily (NTP status
> >
> > goes back to normal) but only for a short time.
> >
> >
> >
> > The NTP logs don't show anything other than what appears to be the NTP
> >
> > service restarting every 30 minutes.. is this normal?
> >
> > 11/16/2010 23:00:02
> >
> >
> sd_ntp|*********************************************************|<LVL::Info>
> >
> > 11/16/2010 23:00:02 sd_ntp| Running sd_ntp. Process Id=12302
> >
> > |<LVL::Info>
> >
> > 11/16/2010 23:00:02
> >
> >
> sd_ntp|*********************************************************|<LVL::Info>
> >
> > 11/16/2010 23:00:02 sd_ntp||<LVL::Info>
> >
> > 11/16/2010 23:00:02 sd_ntp|[528] Command Line parameters: -list
> > -s|<LVL::Info>
> >
> > 11/16/2010 23:00:02 sd_ntp|[585] The file /etc/ntp.conf
> exists|<LVL::Debug>
> >
> > 11/16/2010 23:00:02 sd_ntp|[421] /etc/ntp/drift file is not
> > changed|<LVL::Debug>
> >
> > 11/16/2010 23:00:02 sd_ntp|[603] Listing all the servers|<LVL::Debug>
> >
> > 11/16/2010 23:00:02 sd_ntp|sd_ntp exitinng normally.|<LVL::Info>
> >
> >
> >
> > In both clusters, the pub and most or all of the subs are on the same
> >
> > VLAN and physical switch.
> >
> >
> >
> > What other traces can I look at on CM to troubleshoot this? Anyone
> >
> > know if there is a debug for the process that's generating my syslog
> >
> > errors?
> >
> >
> >
> > I want to make sure it's not an error on my end and hopefully have
> >
> > some better information on whats broke before I go back to the
> >
> > operations group. All the IOS routers using infoblox for NTP appear to
> >
> > be working just fine, so they see no problems :)
> >
> >
> >
> > Thanks in advance!
> >
> >
> >
> >
> >
> > _______________________________________________
> > cisco-voip mailing list
> > cisco-voip [at] puck
> > https://puck.nether.net/mailman/listinfo/cisco-voip
> >
> >
> >
> > ________________________________
> >
> > Disclaimer: This e-mail communication and any attachments may contain
> > confidential and privileged information and is for use by the designated
> > addressee(s) named above only. If you are not the intended addressee, you
> > are hereby notified that you have received this communication in error
> and
> > that any use or reproduction of this email or its contents is strictly
> > prohibited and may be unlawful. If you have received this communication
> in
> > error, please notify us immediately by replying to this message and
> deleting
> > it from your computer. Thank you.
> >
> > _______________________________________________
> > cisco-voip mailing list
> > cisco-voip [at] puck
> > https://puck.nether.net/mailman/listinfo/cisco-voip
> >
> >
>
>
>
> --
> Ed Leatherman
>

Cisco voip RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.