Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Re: The active trap of the SNMP is delayed.

 

 

First page Previous page 1 2 Next page Last page  View All Linux-HA users RSS feed   Index | Next | Previous | View Threaded


renayama19661014 at ybb

Jul 18, 2011, 7:04 PM

Post #1 of 38 (1866 views)
Permalink
Re: The active trap of the SNMP is delayed.

Hi All,

We are troubled in the face of this problem.
Please give advice.

* This problem changed the destination of the mailing list to seem to be a problem of the HA.

Best Regards,
Hideo Yamauchi.



--- On Fri, 2011/6/17, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:

> Hi All,
>
> I registered this problem in Bugzilla.
>
> * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2604
>
> Best Regards,
> Hideo Yamauch.
>
> --- On Wed, 2011/6/15, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>
> > Hi All,
> >
> > I found a problem with a trap of the SNMP.(from hbagent.)
> >
> > A trap of active of the node seems to have possibilities to be delayed.
> >
> > In addition, this problem sometimes occurs and does not always occur.
> >
> >
> > I confirmed it in the next procedure.
> >
> > Step1) Start a node.
> >
> > ============
> > Last updated: Wed Jun 15 19:23:39 2011
> > Stack: Heartbeat
> > Current DC: srv02 (afe72fff-b7b4-4663-b845-872df29c635d) - partition WITHOUT quorum
> > Version: 1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04
> > 2 Nodes configured, unknown expected votes
> > 1 Resources configured.
> > ============
> >
> > Online: [ srv01 srv02 ]
> >
> >  Resource Group: group-1
> >      prmDummy1  (ocf::heartbeat:Dummy): Started srv01
> >
> > Migration summary:
> > * Node srv02:
> > * Node srv01:
> >
> >
> > Step2) Intercept one interface of the Heartbeat communication.
> >
> > # iptables -A INPUT -i eth1 -s ! 192.168.10.110 -j DROP
> > # iptables -A INPUT -i eth1 -s ! 192.168.10.120 -j DROP
> >
> >
> > Step3) The next trap is received in SNMP managers.
> >
> > (snip)
> > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:30 <UNKNOWN> [UDP: [192.168.40.120]:59010]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23014) 0:03:50.14       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        LINUX-HA-MIB::LHANodeName = STRING: srv01       LINUX-HA-MIB::LHAIFName = STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2)
> >    ----> No problem.
> > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:32 <UNKNOWN> [UDP: [192.168.40.110]:44001]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23597) 0:03:55.97       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHANodeStatusUpdate      LINUX-HA-MIB::LHANodeName = STRING: srv02       LINUX-HA-MIB::LHANodeStatus = INTEGER: active(3)
> >    ----> The trap of active is improper in this timing.
> > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:34 <UNKNOWN> [UDP: [192.168.40.110]:44001]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23803) 0:03:58.03       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        LINUX-HA-MIB::LHANodeName = STRING: srv02       LINUX-HA-MIB::LHAIFName = STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2)
> >    ----> No problem.
> > (snip)
> >
> > Between the traps which interface intercepted, it is strange that the active trap of the node comes.
> >
> > And I think that it is necessary for the active trap to be sent in an earlier timing.
> >
> >
> > This problem seems to happen in Heartbeat2.1.4.
> >
> > I watched some sources, but think that client_lib of Heartbeat has a problem somehow or other.
> > Transmitted F_STATUS message is late and seems to be handled.
> >
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lars.ellenberg at linbit

Jul 21, 2011, 5:27 PM

Post #2 of 38 (1816 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

On Tue, Jul 19, 2011 at 11:04:51AM +0900, renayama19661014 [at] ybb wrote:
> Hi All,
>
> We are troubled in the face of this problem.
> Please give advice.
>
> * This problem changed the destination of the mailing list to seem to be a problem of the HA.
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> --- On Fri, 2011/6/17, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>
> > Hi All,
> >
> > I registered this problem in Bugzilla.
> >
> > * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2604
> >
> > Best Regards,
> > Hideo Yamauch.
> >
> > --- On Wed, 2011/6/15, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
> >
> > > Hi All,
> > >
> > > I found a problem with a trap of the SNMP.(from hbagent.)
> > >
> > > A trap of active of the node seems to have possibilities to be delayed.
> > >
> > > In addition, this problem sometimes occurs and does not always occur.
> > >
> > >
> > > I confirmed it in the next procedure.
> > >
> > > Step1) Start a node.
> > >
> > > ============
> > > Last updated: Wed Jun 15 19:23:39 2011
> > > Stack: Heartbeat
> > > Current DC: srv02 (afe72fff-b7b4-4663-b845-872df29c635d) - partition WITHOUT quorum
> > > Version: 1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04
> > > 2 Nodes configured, unknown expected votes
> > > 1 Resources configured.
> > > ============
> > >
> > > Online: [ srv01 srv02 ]
> > >
> > >  Resource Group: group-1
> > >      prmDummy1  (ocf::heartbeat:Dummy): Started srv01
> > >
> > > Migration summary:
> > > * Node srv02:
> > > * Node srv01:
> > >
> > >
> > > Step2) Intercept one interface of the Heartbeat communication.
> > >
> > > # iptables -A INPUT -i eth1 -s ! 192.168.10.110 -j DROP
> > > # iptables -A INPUT -i eth1 -s ! 192.168.10.120 -j DROP
> > >
> > >
> > > Step3) The next trap is received in SNMP managers.
> > >
> > > (snip)
> > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:30 <UNKNOWN> [UDP: [192.168.40.120]:59010]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23014) 0:03:50.14       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        LINUX-HA-MIB::LHANodeName = STRING: srv01       LINUX-HA-MIB::LHAIFName = STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2)
> > >    ----> No problem.
> > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:32 <UNKNOWN> [UDP: [192.168.40.110]:44001]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23597) 0:03:55.97       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHANodeStatusUpdate      LINUX-HA-MIB::LHANodeName = STRING: srv02       LINUX-HA-MIB::LHANodeStatus = INTEGER: active(3)
> > >    ----> The trap of active is improper in this timing.

Why?

> > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:34 <UNKNOWN> [UDP: [192.168.40.110]:44001]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23803) 0:03:58.03       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        LINUX-HA-MIB::LHANodeName = STRING: srv02       LINUX-HA-MIB::LHAIFName = STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2)
> > >    ----> No problem.
> > > (snip)
> > >
> > > Between the traps which interface intercepted, it is strange that the active trap of the node comes.
> > >
> > > And I think that it is necessary for the active trap to be sent in an earlier timing.
> > >
> > >
> > > This problem seems to happen in Heartbeat2.1.4.
> > >
> > > I watched some sources, but think that client_lib of Heartbeat has a problem somehow or other.
> > > Transmitted F_STATUS message is late and seems to be handled.

hbagent is no longer in the heartbeat code.
According to mercurial, it was removed three years ago.
I doubt it is/was used by many.
So I fear you won't get much help for this.


Still, I don't see "the problem".
You have two communication channels configured.
You block one.
You get a *link* down trap, immediately, probably because sending fails
locally if you do iptables -j DROP.

> > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate LHANodeName srv01 LHAIFName eth1 LHAIFStatus down(2)


You get a *node* active.
Why do you think this is wrong?
Which timing would have been "proper", and why?

> > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: LHANodeStatusUpdate LHANodeName srv02 LHANodeStatus active(3)


And after timeout, you get the *link* down to the other node.

> > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate LHANodeName srv02 LHAIFName eth1 LHAIFStatus down(2)


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Jul 21, 2011, 6:18 PM

Post #3 of 38 (1831 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Lars,

Thank you for comment.

> You get a *node* active.
> Why do you think this is wrong?
> Which timing would have been "proper", and why?

When I examined it before, I changed a source and obtained the following result.

I synchronized at the time of each node and took log.

It is 16:44:41 that srv01 node processed F_STATUS message of active.
----------------------------------------------------------------
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi send_cluster_msg() : add_controls ###
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi HBDoMsg_T_STATUS RECV : heartbeat_monitor NOCHANGE
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
Jun 8 16:44:41 srv01 heartbeat: [14110]: info: Local status now set to: 'active'
----------------------------------------------------------------

But, it is 16:47:04 that srv02 node received F_STATUS message.
----------------------------------------------------------------
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### T_STATUS
Jun 8 16:47:04 srv02 heartbeat: [6690]: info: MSG[10] : [ttl=3]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG: Dumping message with 12 fields
Jun 8 16:47:04 srv02 heartbeat: [6690]: info: MSG[11] : [auth=1 1fef495857b200940cb7fcb61223c85b299a6a99]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[0] : [t=status]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[1] : [st=active]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[2] : [dt=5dc0]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[3] : [protocol=1]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[4] : [src=srv01]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[5] : [(1)srcuuid=0x98dcb20(36 27)]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[6] : [seq=a]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[7] : [hg=4ddb360f]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[8] : [ts=4def2869]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[10] : [ttl=3]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### node_callback() call
Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: notice: Status update: Node srv01 now has status active
Jun 8 16:47:05 srv02 lha-snmpagent: [6707]: info: node 1: srv02, type: normal, status: active
----------------------------------------------------------------

I think that trap of active should be handled earlier.

How do you think?

Best Regards,
Hideo Yamauchi.


--- On Fri, 2011/7/22, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:

> On Tue, Jul 19, 2011 at 11:04:51AM +0900, renayama19661014 [at] ybb wrote:
> > Hi All,
> >
> > We are troubled in the face of this problem.
> > Please give advice.
> >
> > * This problem changed the destination of the mailing list to seem to be a problem of the HA.
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> >
> > --- On Fri, 2011/6/17, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
> >
> > > Hi All,
> > >
> > > I registered this problem in Bugzilla.
> > >
> > >  * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2604
> > >
> > > Best Regards,
> > > Hideo Yamauch.
> > >
> > > --- On Wed, 2011/6/15, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I found a problem with a trap of the SNMP.(from hbagent.)
> > > >
> > > > A trap of active of the node seems to have possibilities to be delayed.
> > > >
> > > > In addition, this problem sometimes occurs and does not always occur.
> > > >
> > > >
> > > > I confirmed it in the next procedure.
> > > >
> > > > Step1) Start a node.
> > > >
> > > > ============
> > > > Last updated: Wed Jun 15 19:23:39 2011
> > > > Stack: Heartbeat
> > > > Current DC: srv02 (afe72fff-b7b4-4663-b845-872df29c635d) - partition WITHOUT quorum
> > > > Version: 1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04
> > > > 2 Nodes configured, unknown expected votes
> > > > 1 Resources configured.
> > > > ============
> > > >
> > > > Online: [ srv01 srv02 ]
> > > >
> > > >  Resource Group: group-1
> > > >      prmDummy1  (ocf::heartbeat:Dummy): Started srv01
> > > >
> > > > Migration summary:
> > > > * Node srv02:
> > > > * Node srv01:
> > > >
> > > >
> > > > Step2) Intercept one interface of the Heartbeat communication.
> > > >
> > > > # iptables -A INPUT -i eth1 -s ! 192.168.10.110 -j DROP
> > > > # iptables -A INPUT -i eth1 -s ! 192.168.10.120 -j DROP
> > > >
> > > >
> > > > Step3) The next trap is received in SNMP managers.
> > > >
> > > > (snip)
> > > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:30 <UNKNOWN> [UDP: [192.168.40.120]:59010]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23014) 0:03:50.14       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        LINUX-HA-MIB::LHANodeName = STRING: srv01       LINUX-HA-MIB::LHAIFName = STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2)
> > > >    ----> No problem.
> > > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:32 <UNKNOWN> [UDP: [192.168.40.110]:44001]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23597) 0:03:55.97       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHANodeStatusUpdate      LINUX-HA-MIB::LHANodeName = STRING: srv02       LINUX-HA-MIB::LHANodeStatus = INTEGER: active(3)
> > > >    ----> The trap of active is improper in this timing.
>
> Why?
>
> > > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: 2011-06-15 19:24:34 <UNKNOWN> [UDP: [192.168.40.110]:44001]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (23803) 0:03:58.03       SNMPv2-MIB::snmpTrapOID.0 = OID: LINUX-HA-MIB::LHAIFStatusUpdate        LINUX-HA-MIB::LHANodeName = STRING: srv02       LINUX-HA-MIB::LHAIFName = STRING: eth1       LINUX-HA-MIB::LHAIFStatus = INTEGER: down(2)
> > > >    ----> No problem.
> > > > (snip)
> > > >
> > > > Between the traps which interface intercepted, it is strange that the active trap of the node comes.
> > > >
> > > > And I think that it is necessary for the active trap to be sent in an earlier timing.
> > > >
> > > >
> > > > This problem seems to happen in Heartbeat2.1.4.
> > > >
> > > > I watched some sources, but think that client_lib of Heartbeat has a problem somehow or other.
> > > > Transmitted F_STATUS message is late and seems to be handled.
>
> hbagent is no longer in the heartbeat code.
> According to mercurial, it was removed three years ago.
> I doubt it is/was used by many.
> So I fear you won't get much help for this.
>
>
> Still, I don't see "the problem".
> You have two communication channels configured.
> You block one.
> You get a *link* down trap, immediately, probably because sending fails
> locally if you do iptables -j DROP.
>
> > > > Jun 15 19:24:30 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate LHANodeName srv01 LHAIFName eth1 LHAIFStatus down(2)
>
>
> You get a *node* active.
> Why do you think this is wrong?
> Which timing would have been "proper", and why?
>
> > > > Jun 15 19:24:32 snmp-manager snmptrapd[4771]: LHANodeStatusUpdate LHANodeName srv02 LHANodeStatus active(3)
>
>
> And after timeout, you get the *link* down to the other node.
>
> > > > Jun 15 19:24:34 snmp-manager snmptrapd[4771]: LHAIFStatusUpdate LHANodeName srv02 LHAIFName eth1 LHAIFStatus down(2)
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lars.ellenberg at linbit

Jul 21, 2011, 8:09 PM

Post #4 of 38 (1833 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

On Fri, Jul 22, 2011 at 10:18:18AM +0900, renayama19661014 [at] ybb wrote:
> Hi Lars,
>
> Thank you for comment.
>
> > You get a *node* active.
> > Why do you think this is wrong?
> > Which timing would have been "proper", and why?
>
> When I examined it before, I changed a source and obtained the following result.
>
> I synchronized at the time of each node and took log.
>
> It is 16:44:41 that srv01 node processed F_STATUS message of active.
> ----------------------------------------------------------------
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi send_cluster_msg() : add_controls ###
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi HBDoMsg_T_STATUS RECV : heartbeat_monitor NOCHANGE
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> Jun 8 16:44:41 srv01 heartbeat: [14110]: info: Local status now set to: 'active'
> ----------------------------------------------------------------
>
> But, it is 16:47:04 that srv02 node received F_STATUS message.
> ----------------------------------------------------------------
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### T_STATUS
> Jun 8 16:47:04 srv02 heartbeat: [6690]: info: MSG[10] : [ttl=3]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG: Dumping message with 12 fields
> Jun 8 16:47:04 srv02 heartbeat: [6690]: info: MSG[11] : [auth=1 1fef495857b200940cb7fcb61223c85b299a6a99]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[0] : [t=status]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[1] : [st=active]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[2] : [dt=5dc0]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[3] : [protocol=1]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[4] : [src=srv01]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[5] : [(1)srcuuid=0x98dcb20(36 27)]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[6] : [seq=a]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[7] : [hg=4ddb360f]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[8] : [ts=4def2869]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[10] : [ttl=3]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### node_callback() call
> Jun 8 16:47:04 srv02 lha-snmpagent: [6707]: notice: Status update: Node srv01 now has status active
> Jun 8 16:47:05 srv02 lha-snmpagent: [6707]: info: node 1: srv02, type: normal, status: active
> ----------------------------------------------------------------
>
> I think that trap of active should be handled earlier.
>
> How do you think?

To first rule out the obvious:
Double check the time of your servers.
NTP enabled, and really synchronized?

Well, I guess that's what meant with
> I synchronized at the time of each node and took log.

So, OK.

I expect the "trap" to be send when the agent daemon
processes the message.

The message should be processed when it is delivered to this daemon --
unless the daemon was busy doing other stuff... it is a single threaded
daemon, after all, maybe it blocked gathering information, sending or
receiving data?

It should be delivered "immediately" when the heartbeat core messaging
layer receives it from the network.

You need to figure out where the observed delay happens.

Do a tcpdump, to find out the timing of the messages on the wire.
Tune up debugging (or add your own) in the message core,
and add debugging to the hbagent.

If you can pin down where the delay happens,
you should be able to find what is causing the delay.

Is the message delayed (or lost and retransmitted?) on the network?
Is it delayed in the messaging core?
Both seem unlikely.

Or was it dispatched to the hbagent in time,
but the hbagent did not notice until two and a half minutes later?
If so, what did the agent do all this time?

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Jul 21, 2011, 9:13 PM

Post #5 of 38 (1820 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Lars,

Thank you for advice.
I confirm the details of the problem using tools such as tcpdump again.

Please wait....

Best Regards,
Hideo Yamauchi.


--- On Fri, 2011/7/22, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:

> On Fri, Jul 22, 2011 at 10:18:18AM +0900, renayama19661014 [at] ybb wrote:
> > Hi Lars,
> >
> > Thank you for comment.
> >
> > > You get a *node* active.
> > > Why do you think this is wrong?
> > > Which timing would have been "proper", and why?
> >
> > When I examined it before, I changed a source and obtained the following result.
> >
> > I synchronized at the time of each node and took log.
> >
> > It is 16:44:41 that srv01 node processed F_STATUS message of active.
> > ----------------------------------------------------------------
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi send_cluster_msg() : add_controls ###
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi HBDoMsg_T_STATUS RECV : heartbeat_monitor NOCHANGE
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: Local status now set to: 'active'
> > ----------------------------------------------------------------
> >
> > But, it is 16:47:04 that srv02 node received F_STATUS message.
> > ----------------------------------------------------------------
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### T_STATUS
> > Jun  8 16:47:04 srv02 heartbeat: [6690]: info: MSG[10] : [ttl=3]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG: Dumping message with 12 fields
> > Jun  8 16:47:04 srv02 heartbeat: [6690]: info: MSG[11] : [auth=1 1fef495857b200940cb7fcb61223c85b299a6a99]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[0] : [t=status]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[1] : [st=active]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[2] : [dt=5dc0]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[3] : [protocol=1]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[4] : [src=srv01]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[5] : [(1)srcuuid=0x98dcb20(36 27)]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[6] : [seq=a]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[7] : [hg=4ddb360f]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[8] : [ts=4def2869]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[10] : [ttl=3]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### node_callback() call
> > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: notice: Status update: Node srv01 now has status active
> > Jun  8 16:47:05 srv02 lha-snmpagent: [6707]: info: node 1: srv02, type: normal, status: active
> > ----------------------------------------------------------------
> >
> > I think that trap of active should be handled earlier.
> >
> > How do you think?
>
> To first rule out the obvious:
> Double check the time of your servers.
> NTP enabled, and really synchronized?
>
> Well, I guess that's what meant with
> > I synchronized at the time of each node and took log.
>
> So, OK.
>
> I expect the "trap" to be send when the agent daemon
> processes the message.
>
> The message should be processed when it is delivered to this daemon --
> unless the daemon was busy doing other stuff...  it is a single threaded
> daemon, after all, maybe it blocked gathering information, sending or
> receiving data?
>
> It should be delivered "immediately" when the heartbeat core messaging
> layer receives it from the network.
>
> You need to figure out where the observed delay happens.
>
> Do a tcpdump, to find out the timing of the messages on the wire.
> Tune up debugging (or add your own) in the message core,
> and add debugging to the hbagent.
>
> If you can pin down where the delay happens,
> you should be able to find what is causing the delay.
>
> Is the message delayed (or lost and retransmitted?) on the network?
> Is it delayed in the messaging core?
> Both seem unlikely.
>
> Or was it dispatched to the hbagent in time,
> but the hbagent did not notice until two and a half minutes later?
> If so, what did the agent do all this time?
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Jul 25, 2011, 9:43 PM

Post #6 of 38 (1797 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Lars,
Hi All,

A cause to be delayed became clear.

This problem occurs by a timing.

When hbagent receives F_STATUS message while hbagent waits for a reply of the api communication, F_STATUS is performed queueing of.

When hbagent caught the event from Heartbeat, this message is handled.
Therefore, it is handled at the time of events such as one down of the inter-connect.

Therefore, the active trap of the node is transmitted when inter-connect fell.

/*
* Read an API message. All other messages are enqueued to be read later.
*/
static struct ha_msg *
read_api_msg(llc_private_t* pi)
{

for (;;) {
struct ha_msg* msg;
const char * type;

pi->chan->ops->waitin(pi->chan);
if (pi->chan->ch_status == IPC_DISCONNECT){
break;
}
if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
ha_api_perror("read_api_msg: "
"Cannot read reply from IPC channel");
continue;
}
if ((type=ha_msg_value(msg, F_TYPE)) != NULL
&& strcmp(type, T_APIRESP) == 0) {
return(msg);
}
/* Got an unexpected non-api message */
/* Queue it up for reading later */
enqueue_msg(pi, msg);
}
/*NOTREACHED*/
return(NULL);
}



I think that the following correction is necessary.
snmp_subagent/hbagent.c
(snip)
} else {

/* snmp request */
snmp_read(&fdset);

ret = handle_heartbeat_msg(); ----> read queueing msg.!!
}
(snip)

Best Regards,
Hideo Yamacuhi.


--- On Fri, 2011/7/22, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:

> Hi Lars,
>
> Thank you for advice.
> I confirm the details of the problem using tools such as tcpdump again.
>
> Please wait....
>
> Best Regards,
> Hideo Yamauchi.
>
>
> --- On Fri, 2011/7/22, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:
>
> > On Fri, Jul 22, 2011 at 10:18:18AM +0900, renayama19661014 [at] ybb wrote:
> > > Hi Lars,
> > >
> > > Thank you for comment.
> > >
> > > > You get a *node* active.
> > > > Why do you think this is wrong?
> > > > Which timing would have been "proper", and why?
> > >
> > > When I examined it before, I changed a source and obtained the following result.
> > >
> > > I synchronized at the time of each node and took log.
> > >
> > > It is 16:44:41 that srv01 node processed F_STATUS message of active.
> > > ----------------------------------------------------------------
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi send_cluster_msg() : add_controls ###
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: ###yamauchi HBDoMsg_T_STATUS RECV : heartbeat_monitor NOCHANGE
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG: Dumping message with 12 fields
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[0] : [t=status]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[1] : [st=active]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[2] : [dt=5dc0]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[3] : [protocol=1]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[4] : [src=srv01]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[5] : [(1)srcuuid=0x9f292d8(36 27)]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[6] : [seq=a]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[7] : [hg=4ddb360f]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[8] : [ts=4def2869]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[10] : [ttl=3]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> > > Jun  8 16:44:41 srv01 heartbeat: [14110]: info: Local status now set to: 'active'
> > > ----------------------------------------------------------------
> > >
> > > But, it is 16:47:04 that srv02 node received F_STATUS message.
> > > ----------------------------------------------------------------
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### T_STATUS
> > > Jun  8 16:47:04 srv02 heartbeat: [6690]: info: MSG[10] : [ttl=3]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG: Dumping message with 12 fields
> > > Jun  8 16:47:04 srv02 heartbeat: [6690]: info: MSG[11] : [auth=1 1fef495857b200940cb7fcb61223c85b299a6a99]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[0] : [t=status]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[1] : [st=active]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[2] : [dt=5dc0]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[3] : [protocol=1]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[4] : [src=srv01]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[5] : [(1)srcuuid=0x98dcb20(36 27)]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[6] : [seq=a]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[7] : [hg=4ddb360f]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[8] : [ts=4def2869]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[9] : [ld=0.17 0.07 0.01 2/73 14132]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[10] : [ttl=3]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: MSG[11] : [auth=1 ee7d14643b83b7e49684cf0d679ee7e6a0ea3aaa]
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: info: #### yamauchi ##### node_callback() call
> > > Jun  8 16:47:04 srv02 lha-snmpagent: [6707]: notice: Status update: Node srv01 now has status active
> > > Jun  8 16:47:05 srv02 lha-snmpagent: [6707]: info: node 1: srv02, type: normal, status: active
> > > ----------------------------------------------------------------
> > >
> > > I think that trap of active should be handled earlier.
> > >
> > > How do you think?
> >
> > To first rule out the obvious:
> > Double check the time of your servers.
> > NTP enabled, and really synchronized?
> >
> > Well, I guess that's what meant with
> > > I synchronized at the time of each node and took log.
> >
> > So, OK.
> >
> > I expect the "trap" to be send when the agent daemon
> > processes the message.
> >
> > The message should be processed when it is delivered to this daemon --
> > unless the daemon was busy doing other stuff...  it is a single threaded
> > daemon, after all, maybe it blocked gathering information, sending or
> > receiving data?
> >
> > It should be delivered "immediately" when the heartbeat core messaging
> > layer receives it from the network.
> >
> > You need to figure out where the observed delay happens.
> >
> > Do a tcpdump, to find out the timing of the messages on the wire.
> > Tune up debugging (or add your own) in the message core,
> > and add debugging to the hbagent.
> >
> > If you can pin down where the delay happens,
> > you should be able to find what is causing the delay.
> >
> > Is the message delayed (or lost and retransmitted?) on the network?
> > Is it delayed in the messaging core?
> > Both seem unlikely.
> >
> > Or was it dispatched to the hbagent in time,
> > but the hbagent did not notice until two and a half minutes later?
> > If so, what did the agent do all this time?
> >
> > --
> > : Lars Ellenberg
> > : LINBIT | Your Way to High Availability
> > : DRBD/HA support and consulting http://www.linbit.com
> >
> > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ygao at novell

Jul 27, 2011, 8:35 AM

Post #7 of 38 (1783 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

On 07/26/11 12:43, renayama19661014 [at] ybb wrote:
> Hi Lars,
> Hi All,
>
> A cause to be delayed became clear.
>
> This problem occurs by a timing.
>
> When hbagent receives F_STATUS message while hbagent waits for a reply of the api communication,
Under this circumstance, is there a specific heartbeat op that hbagent
is waiting for?

> F_STATUS is performed queueing of.
>
> When hbagent caught the event from Heartbeat, this message is handled.
> Therefore, it is handled at the time of events such as one down of the inter-connect.
>
> Therefore, the active trap of the node is transmitted when inter-connect fell.
>
> /*
> * Read an API message. All other messages are enqueued to be read later.
> */
> static struct ha_msg *
> read_api_msg(llc_private_t* pi)
> {
>
> for (;;) {
> struct ha_msg* msg;
> const char * type;
>
> pi->chan->ops->waitin(pi->chan);
> if (pi->chan->ch_status == IPC_DISCONNECT){
> break;
> }
> if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
> ha_api_perror("read_api_msg: "
> "Cannot read reply from IPC channel");
> continue;
> }
> if ((type=ha_msg_value(msg, F_TYPE)) != NULL
> && strcmp(type, T_APIRESP) == 0) {
> return(msg);
> }
> /* Got an unexpected non-api message */
> /* Queue it up for reading later */
> enqueue_msg(pi, msg);
> }
> /*NOTREACHED*/
> return(NULL);
> }
>
>
>
> I think that the following correction is necessary.
> snmp_subagent/hbagent.c
> (snip)
> } else {
>
> /* snmp request */
> snmp_read(&fdset);
>
> ret = handle_heartbeat_msg(); ----> read queueing msg.!!
> }
> (snip)
I'm still confused about invoking handle_heartbreat_msg() when select()
finds that the SNMP socket has input. Is it an appropriate timing?

Regards,
Yan
--
Gao,Yan <ygao [at] suse>
Software Engineer
China Server Team, SUSE.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Jul 27, 2011, 4:51 PM

Post #8 of 38 (1768 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

Thank you for comment.

> > Hi Lars,
> > Hi All,
> >
> > A cause to be delayed became clear.
> >
> > This problem occurs by a timing.
> >
> > When hbagent receives F_STATUS message while hbagent waits for a reply of the api communication,
> Under this circumstance, is there a specific heartbeat op that hbagent
> is waiting for?

Yes.

However, it is F_STATUS message of the considerably first stage that hbagent performs queueing .
I pinpoint which hb_api of hbagent it is.

When I made the following modifications, it was over log of the queueing .

(snip)
/*
* Read an API message. All other messages are enqueued to be read later.
*/
static struct ha_msg *
read_api_msg(llc_private_t* pi)
{
for (;;) {
struct ha_msg* msg;
const char * type;
pi->chan->ops->waitin(pi->chan);
if (pi->chan->ch_status == IPC_DISCONNECT){
break;
}
if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
ha_api_perror("read_api_msg: " "Cannot read reply from IPC channel");
continue;
}
if ((type=ha_msg_value(msg, F_TYPE)) != NULL && strcmp(type, T_APIRESP) == 0) {
return(msg);
}
/* Got an unexpected non-api message */
/* Queue it up for reading later */
/* yamauchi */
if (strcasecmp(ha_msg_value(msg, F_TYPE),T_STATUS) == 0) {
cl_log(LOG_INFO, "##### yamuchi enqure_msg ()#####");
cl_log_message(LOG_INFO, msg);
}
enqueue_msg(pi, msg);
}
/*NOTREACHED*/
return(NULL);
}

(snip)
Jul 27 19:13:50 srv01 ccm: [5432]: info: ##### yamuchi enqure_msg ()#####
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG: Dumping message with 12 fields
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[0] : [t=status]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[1] : [st=active]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[2] : [dt=6590]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[3] : [protocol=1]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[4] : [src=srv02]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[5] : [(1)srcuuid=0xa006540(36 27)]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[6] : [seq=6]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: ##### yamuchi enqure_msg ()#####
Jul 27 19:13:50 srv01 stonithd: [5435]: info: ##### yamuchi enqure_msg ()#####
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[7] : [hg=4ddb3648]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG: Dumping message with 12 fields
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG: Dumping message with 12 fields
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[8] : [ts=4e2fe4dd]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[0] : [t=status]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[0] : [t=status]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[1] : [st=active]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[1] : [st=active]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[10] : [ttl=3]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[2] : [dt=6590]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[2] : [dt=6590]
Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[3] : [protocol=1]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[3] : [protocol=1]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[4] : [src=srv02]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[4] : [src=srv02]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[5] : [(1)srcuuid=0x84255e0(36 27)]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[5] : [(1)srcuuid=0x83b7bf8(36 27)]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[6] : [seq=6]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[6] : [seq=6]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[7] : [hg=4ddb3648]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[7] : [hg=4ddb3648]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[8] : [ts=4e2fe4dd]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[8] : [ts=4e2fe4dd]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[10] : [ttl=3]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[10] : [ttl=3]
Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
(snip)
Jul 27 19:13:52 srv01 cib: [5433]: info: ##### yamuchi enqure_msg ()#####
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG: Dumping message with 12 fields
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[0] : [t=status]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[1] : [st=active]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[2] : [dt=6590]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[3] : [protocol=1]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[4] : [src=srv02]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[5] : [(1)srcuuid=0x8fc9060(36 27)]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[6] : [seq=6]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[7] : [hg=4ddb3648]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[8] : [ts=4e2fe4dd]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[10] : [ttl=3]
Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
(snip)


>
> > F_STATUS is performed queueing of.
> >
> > When hbagent caught the event from Heartbeat, this message is handled.
> > Therefore, it is handled at the time of events such as one down of the inter-connect.
> >
> > Therefore, the active trap of the node is transmitted when inter-connect fell.
> >
> > /*
> >  * Read an API message.  All other messages are enqueued to be read later.
> >  */
> > static struct ha_msg *
> > read_api_msg(llc_private_t* pi)
> > {
> >
> >     for (;;) {
> >         struct ha_msg*    msg;
> >         const char *    type;
> >        
> >         pi->chan->ops->waitin(pi->chan);
> >         if (pi->chan->ch_status  == IPC_DISCONNECT){
> >             break;
> >         }
> >         if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
> >             ha_api_perror("read_api_msg: "
> >                       "Cannot read reply from IPC channel");
> >             continue;
> >         }
> >         if ((type=ha_msg_value(msg, F_TYPE)) != NULL
> >         &&    strcmp(type, T_APIRESP) == 0) {
> >             return(msg);
> >         }
> >         /* Got an unexpected non-api message */
> >         /* Queue it up for reading later */
> >         enqueue_msg(pi, msg);
> >     }
> >     /*NOTREACHED*/
> >     return(NULL);
> > }
> >
> >
> >
> > I think that the following correction is necessary.
> > snmp_subagent/hbagent.c
> > (snip)
> >                         } else {
> >
> >                                 /* snmp request */
> >                                 snmp_read(&fdset);
> >
> >                                 ret = handle_heartbeat_msg(); ----> read queueing msg.!!
> >                         }
> > (snip)
> I'm still confused about invoking handle_heartbreat_msg() when select()
> finds that the SNMP socket has input. Is it an appropriate timing?

Sorry....

This correction is one example.
Because I do not know a lot about handling of hbagent, I demand the instructions of your right correction.

Best Regards,
Hideo Yamauchi.

>
> Regards,
>   Yan
> --
> Gao,Yan <ygao [at] suse>
> Software Engineer
> China Server Team, SUSE.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Jul 27, 2011, 6:04 PM

Post #9 of 38 (1770 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

> However, it is F_STATUS message of the considerably first stage that hbagent performs queueing .
> I pinpoint which hb_api of hbagent it is.

I confirmed it.

It is like the get_uuid processing that F_STATUS message is performed queueing of.

--- The next log added FUNCTION macro to a summons of read_api_msg. ---
--- get_uuid is reflected on the first log. ---
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: ##### yamuchi enqure_msg (): get_uuid #####
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: MSG: Dumping message with 12 fields
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: MSG[0] : [t=status]
Jul 28 18:51:03 srv01 lha-snmpagent: [6538]: info: MSG[1] : [st=active]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[2] : [dt=6590]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[3] : [protocol=1]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[4] : [src=srv02]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[5] : [(1)srcuuid=0x889db30(36 27)]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[6] : [seq=6]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[7] : [hg=4ddb3649]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[8] : [ts=4e313107]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[9] : [ld=0.16 0.04 0.01 2/89 6264]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[10] : [ttl=3]
Jul 28 18:51:04 srv01 lha-snmpagent: [6538]: info: MSG[11] : [auth=1 60410427f13e2377858cc0e403a8014c4704ab36]

In hb_agent, I think that cueing is considered to be it at the time of either next summons.

(snip)
int
init_heartbeat(void)
{
(snip)
/*
* get uuid for trap message.
* see: hbagentv2_update_diff() in hbagentv2.c
*/
if (hb->llc_ops->get_uuid_by_name(hb, myid, &uuid) == HA_FAIL) {
cl_log(LOG_ERR, "Cannot get mynodeid");
cl_log(LOG_ERR, "REASON: %s", hb->llc_ops->errmsg(hb));
return HA_FAIL;
}

(snip)
int
walk_nodetable(void)
{
(snip)
#ifdef HAVE_NEW_HB_API
/* the get_uuid_by_name is not available for STABLE_1_2 branch. */
if (hb->llc_ops->get_uuid_by_name(hb, name, &uuid) == HA_FAIL) {
cl_log(LOG_DEBUG, "Cannot get the uuid for node: %s", name);
}
#endif /* HAVE_NEW_HB_API */
(snip)


Best Regards,
Hideo Yamauchi.




--- On Thu, 2011/7/28, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:

> Hi Yan,
>
> Thank you for comment.
>
> > > Hi Lars,
> > > Hi All,
> > >
> > > A cause to be delayed became clear.
> > >
> > > This problem occurs by a timing.
> > >
> > > When hbagent receives F_STATUS message while hbagent waits for a reply of the api communication,
> > Under this circumstance, is there a specific heartbeat op that hbagent
> > is waiting for?
>
> Yes.
>
> However, it is F_STATUS message of the considerably first stage that hbagent performs queueing .
> I pinpoint which hb_api of hbagent it is.
>
> When I made the following modifications, it was over log of the queueing .
>
> (snip)
> /*
> * Read an API message.  All other messages are enqueued to be read later.
> */
> static struct ha_msg *
> read_api_msg(llc_private_t* pi)
> {
>         for (;;) {
>                 struct ha_msg*  msg;
>                 const char *    type;
>                 pi->chan->ops->waitin(pi->chan);
>                 if (pi->chan->ch_status  == IPC_DISCONNECT){
>                         break;
>                 }
>                 if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
>                         ha_api_perror("read_api_msg: "                                      "Cannot read reply from IPC channel");
>                         continue;
>                 }
>                 if ((type=ha_msg_value(msg, F_TYPE)) != NULL                &&      strcmp(type, T_APIRESP) == 0) {
>                         return(msg);
>                 }
>                 /* Got an unexpected non-api message */
>                 /* Queue it up for reading later */
> /* yamauchi */
> if (strcasecmp(ha_msg_value(msg, F_TYPE),T_STATUS) == 0) {
>         cl_log(LOG_INFO, "##### yamuchi enqure_msg ()#####");
>         cl_log_message(LOG_INFO, msg);
> }
>                 enqueue_msg(pi, msg);
>         }
>         /*NOTREACHED*/
>         return(NULL);
> }
>
> (snip)
> Jul 27 19:13:50 srv01 ccm: [5432]: info: ##### yamuchi enqure_msg ()#####
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG: Dumping message with 12 fields
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[0] : [t=status]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[1] : [st=active]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[5] : [(1)srcuuid=0xa006540(36 27)]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[6] : [seq=6]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: ##### yamuchi enqure_msg ()#####
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: ##### yamuchi enqure_msg ()#####
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG: Dumping message with 12 fields
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG: Dumping message with 12 fields
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[0] : [t=status]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[0] : [t=status]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[1] : [st=active]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[1] : [st=active]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:50 srv01 ccm: [5432]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[5] : [(1)srcuuid=0x84255e0(36 27)]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[5] : [(1)srcuuid=0x83b7bf8(36 27)]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[6] : [seq=6]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[6] : [seq=6]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:50 srv01 lha-snmpagent: [5438]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
> Jul 27 19:13:50 srv01 stonithd: [5435]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
> (snip)
> Jul 27 19:13:52 srv01 cib: [5433]: info: ##### yamuchi enqure_msg ()#####
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG: Dumping message with 12 fields
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[0] : [t=status]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[1] : [st=active]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[2] : [dt=6590]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[3] : [protocol=1]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[4] : [src=srv02]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[5] : [(1)srcuuid=0x8fc9060(36 27)]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[6] : [seq=6]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[7] : [hg=4ddb3648]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[8] : [ts=4e2fe4dd]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[9] : [ld=0.04 0.12 0.15 3/89 5394]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[10] : [ttl=3]
> Jul 27 19:13:52 srv01 cib: [5433]: info: MSG[11] : [auth=1 69619762aa14655cdccd9778ec4c4861a15a0f19]
> (snip)
>
>
> >
> > > F_STATUS is performed queueing of.
> > >
> > > When hbagent caught the event from Heartbeat, this message is handled.
> > > Therefore, it is handled at the time of events such as one down of the inter-connect.
> > >
> > > Therefore, the active trap of the node is transmitted when inter-connect fell.
> > >
> > > /*
> > >  * Read an API message.  All other messages are enqueued to be read later.
> > >  */
> > > static struct ha_msg *
> > > read_api_msg(llc_private_t* pi)
> > > {
> > >
> > >     for (;;) {
> > >         struct ha_msg*    msg;
> > >         const char *    type;
> > >        
> > >         pi->chan->ops->waitin(pi->chan);
> > >         if (pi->chan->ch_status  == IPC_DISCONNECT){
> > >             break;
> > >         }
> > >         if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
> > >             ha_api_perror("read_api_msg: "
> > >                       "Cannot read reply from IPC channel");
> > >             continue;
> > >         }
> > >         if ((type=ha_msg_value(msg, F_TYPE)) != NULL
> > >         &&    strcmp(type, T_APIRESP) == 0) {
> > >             return(msg);
> > >         }
> > >         /* Got an unexpected non-api message */
> > >         /* Queue it up for reading later */
> > >         enqueue_msg(pi, msg);
> > >     }
> > >     /*NOTREACHED*/
> > >     return(NULL);
> > > }
> > >
> > >
> > >
> > > I think that the following correction is necessary.
> > > snmp_subagent/hbagent.c
> > > (snip)
> > >                         } else {
> > >
> > >                                 /* snmp request */
> > >                                 snmp_read(&fdset);
> > >
> > >                                 ret = handle_heartbeat_msg(); ----> read queueing msg.!!
> > >                         }
> > > (snip)
> > I'm still confused about invoking handle_heartbreat_msg() when select()
> > finds that the SNMP socket has input. Is it an appropriate timing?
>
> Sorry....
>
> This correction is one example.
> Because I do not know a lot about handling of hbagent, I demand the instructions of your right correction.
>
> Best Regards,
> Hideo Yamauchi.
>
> >
> > Regards,
> >   Yan
> > --
> > Gao,Yan <ygao [at] suse>
> > Software Engineer
> > China Server Team, SUSE.
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lars.ellenberg at linbit

Jul 28, 2011, 2:46 AM

Post #10 of 38 (1766 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

On Tue, Jul 26, 2011 at 01:43:00PM +0900, renayama19661014 [at] ybb wrote:
> Hi Lars,
> Hi All,
>
> A cause to be delayed became clear.
>
> This problem occurs by a timing.
>
> When hbagent receives F_STATUS message while hbagent waits for a reply of the api communication, F_STATUS is performed queueing of.
>
> When hbagent caught the event from Heartbeat, this message is handled.
> Therefore, it is handled at the time of events such as one down of the inter-connect.
>
> Therefore, the active trap of the node is transmitted when inter-connect fell.
>
> /*
> * Read an API message. All other messages are enqueued to be read later.
> */
> static struct ha_msg *
> read_api_msg(llc_private_t* pi)
> {
>
> for (;;) {
> struct ha_msg* msg;
> const char * type;
>
> pi->chan->ops->waitin(pi->chan);
> if (pi->chan->ch_status == IPC_DISCONNECT){
> break;
> }
> if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
> ha_api_perror("read_api_msg: "
> "Cannot read reply from IPC channel");
> continue;
> }
> if ((type=ha_msg_value(msg, F_TYPE)) != NULL
> && strcmp(type, T_APIRESP) == 0) {
> return(msg);
> }
> /* Got an unexpected non-api message */
> /* Queue it up for reading later */
> enqueue_msg(pi, msg);
> }
> /*NOTREACHED*/
> return(NULL);
> }
>
>
>
> I think that the following correction is necessary.
> snmp_subagent/hbagent.c
> (snip)
> } else {
>
> /* snmp request */
> snmp_read(&fdset);
>
> ret = handle_heartbeat_msg(); ----> read queueing msg.!!

I suggest to place this before the select instead.
Or immediately after each call that involves the read_api_msg or
enqueue_msg.

Probably easier to just place it before the select, or any other call
that may sleep or block for some time.

As hbagent.c was dropped from the heartbeat source tree three years ago,
you will have to carry that patch yourself, I'm affraid.

Unless someone resurrects the hbagent for current heartbeat,
if still applicable, and possibly improves/integrates it
with the pacemaker side of things.


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Jul 28, 2011, 5:29 PM

Post #11 of 38 (1770 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Lars,
Hi Yan,
Hi All,

Thank you for comment.

> On Tue, Jul 26, 2011 at 01:43:00PM +0900, renayama19661014 [at] ybb wrote:
> > Hi Lars,
> > Hi All,
> >
> > A cause to be delayed became clear.
> >
> > This problem occurs by a timing.
> >
> > When hbagent receives F_STATUS message while hbagent waits for a reply of the api communication, F_STATUS is performed queueing of.
> >
> > When hbagent caught the event from Heartbeat, this message is handled.
> > Therefore, it is handled at the time of events such as one down of the inter-connect.
> >
> > Therefore, the active trap of the node is transmitted when inter-connect fell.
> >
> > /*
> >  * Read an API message.  All other messages are enqueued to be read later.
> >  */
> > static struct ha_msg *
> > read_api_msg(llc_private_t* pi)
> > {
> >
> >     for (;;) {
> >         struct ha_msg*    msg;
> >         const char *    type;
> >        
> >         pi->chan->ops->waitin(pi->chan);
> >         if (pi->chan->ch_status  == IPC_DISCONNECT){
> >             break;
> >         }
> >         if ((msg=msgfromIPC(pi->chan, 0)) == NULL) {
> >             ha_api_perror("read_api_msg: "
> >                       "Cannot read reply from IPC channel");
> >             continue;
> >         }
> >         if ((type=ha_msg_value(msg, F_TYPE)) != NULL
> >         &&    strcmp(type, T_APIRESP) == 0) {
> >             return(msg);
> >         }
> >         /* Got an unexpected non-api message */
> >         /* Queue it up for reading later */
> >         enqueue_msg(pi, msg);
> >     }
> >     /*NOTREACHED*/
> >     return(NULL);
> > }
> >
> >
> >
> > I think that the following correction is necessary.
> > snmp_subagent/hbagent.c
> > (snip)
> >                         } else {
> >
> >                                 /* snmp request */
> >                                 snmp_read(&fdset);
> >
> >                                 ret = handle_heartbeat_msg(); ----> read queueing msg.!!
>
> I suggest to place this before the select instead.
> Or immediately after each call that involves the read_api_msg or
> enqueue_msg.
>
> Probably easier to just place it before the select, or any other call
> that may sleep or block for some time.

Thank you for the suggestion of the correction.
I want to wait for the opinion of Mr. Yan.

> As hbagent.c was dropped from the heartbeat source tree three years ago,
> you will have to carry that patch yourself, I'm affraid.
>
> Unless someone resurrects the hbagent for current heartbeat,
> if still applicable, and possibly improves/integrates it
> with the pacemaker side of things.

We do not think about the correction of Heartbeat, too for the moment.
We think that only the correction of Pacemaker-mgmt is enough.

Best Regards,
Hideo Yamauchi.

>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ygao at novell

Jul 28, 2011, 9:13 PM

Post #12 of 38 (1764 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

On 07/29/11 08:29, renayama19661014 [at] ybb wrote:
>>>
>>> I think that the following correction is necessary.
>>> snmp_subagent/hbagent.c
>>> (snip)
>>> } else {
>>>
>>> /* snmp request */
>>> snmp_read(&fdset);
>>>
>>> ret = handle_heartbeat_msg(); ----> read queueing msg.!!
>>
>> I suggest to place this before the select instead.
>> Or immediately after each call that involves the read_api_msg or
>> enqueue_msg.
>>
>> Probably easier to just place it before the select, or any other call
>> that may sleep or block for some time.
>
> Thank you for the suggestion of the correction.
> I want to wait for the opinion of Mr. Yan.
>
>> As hbagent.c was dropped from the heartbeat source tree three years ago,
>> you will have to carry that patch yourself, I'm affraid.
>>
>> Unless someone resurrects the hbagent for current heartbeat,
>> if still applicable, and possibly improves/integrates it
>> with the pacemaker side of things.
>
> We do not think about the correction of Heartbeat, too for the moment.
> We think that only the correction of Pacemaker-mgmt is enough.
That attached patch places it at the end of the loop. I think it should
work. Please give it a test.

Regards,
Yan
--
Gao,Yan <ygao [at] suse>
Software Engineer
China Server Team, SUSE.

--
Gao,Yan <ygao [at] suse>
Software Engineer
China Server Team, SUSE.
Attachments: pacemaker-mgmt-snmp-subagent.diff (0.51 KB)


renayama19661014 at ybb

Jul 28, 2011, 9:43 PM

Post #13 of 38 (1774 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

> That attached patch places it at the end of the loop. I think it should
> work. Please give it a test.

Thank you for the making of the patch.

I confirm movement and report a result.

Best Regards,
Hideo Yamauchi.

--- On Fri, 2011/7/29, Gao,Yan <ygao [at] novell> wrote:

>
>
> On 07/29/11 08:29, renayama19661014 [at] ybb wrote:
> >>>
> >>> I think that the following correction is necessary.
> >>> snmp_subagent/hbagent.c
> >>> (snip)
> >>>                          } else {
> >>>
> >>>                                  /* snmp request */
> >>>                                  snmp_read(&fdset);
> >>>
> >>>                                  ret = handle_heartbeat_msg(); ----> read queueing msg.!!
> >>
> >> I suggest to place this before the select instead.
> >> Or immediately after each call that involves the read_api_msg or
> >> enqueue_msg.
> >>
> >> Probably easier to just place it before the select, or any other call
> >> that may sleep or block for some time.
> >
> > Thank you for the suggestion of the correction.
> > I want to wait for the opinion of Mr. Yan.
> >
> >> As hbagent.c was dropped from the heartbeat source tree three years ago,
> >> you will have to carry that patch yourself, I'm affraid.
> >>
> >> Unless someone resurrects the hbagent for current heartbeat,
> >> if still applicable, and possibly improves/integrates it
> >> with the pacemaker side of things.
> >
> > We do not think about the correction of Heartbeat, too for the moment.
> > We think that only the correction of Pacemaker-mgmt is enough.
> That attached patch places it at the end of the loop. I think it should
> work. Please give it a test.
>
> Regards,
>   Yan
> --
> Gao,Yan <ygao [at] suse>
> Software Engineer
> China Server Team, SUSE.
>
> --
> Gao,Yan <ygao [at] suse>
> Software Engineer
> China Server Team, SUSE.
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Aug 1, 2011, 6:14 PM

Post #14 of 38 (1735 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

I confirmed that a trap was transmitted with a patch definitely.

We request that we apply a patch to each pacemaker-mgmt of pacemaker1.0 and pacemaker1.1.

* After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new version is released.( pacemaker-mgmt-2.1.0 ? )

Thanks!

Hideo Yamauchi.

--- On Fri, 2011/7/29, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:

> Hi Yan,
>
> > That attached patch places it at the end of the loop. I think it should
> > work. Please give it a test.
>
> Thank you for the making of the patch.
>
> I confirm movement and report a result.
>
> Best Regards,
> Hideo Yamauchi.
>
> --- On Fri, 2011/7/29, Gao,Yan <ygao [at] novell> wrote:
>
> >
> >
> > On 07/29/11 08:29, renayama19661014 [at] ybb wrote:
> > >>>
> > >>> I think that the following correction is necessary.
> > >>> snmp_subagent/hbagent.c
> > >>> (snip)
> > >>>                          } else {
> > >>>
> > >>>                                  /* snmp request */
> > >>>                                  snmp_read(&fdset);
> > >>>
> > >>>                                  ret = handle_heartbeat_msg(); ----> read queueing msg.!!
> > >>
> > >> I suggest to place this before the select instead.
> > >> Or immediately after each call that involves the read_api_msg or
> > >> enqueue_msg.
> > >>
> > >> Probably easier to just place it before the select, or any other call
> > >> that may sleep or block for some time.
> > >
> > > Thank you for the suggestion of the correction.
> > > I want to wait for the opinion of Mr. Yan.
> > >
> > >> As hbagent.c was dropped from the heartbeat source tree three years ago,
> > >> you will have to carry that patch yourself, I'm affraid.
> > >>
> > >> Unless someone resurrects the hbagent for current heartbeat,
> > >> if still applicable, and possibly improves/integrates it
> > >> with the pacemaker side of things.
> > >
> > > We do not think about the correction of Heartbeat, too for the moment.
> > > We think that only the correction of Pacemaker-mgmt is enough.
> > That attached patch places it at the end of the loop. I think it should
> > work. Please give it a test.
> >
> > Regards,
> >   Yan
> > --
> > Gao,Yan <ygao [at] suse>
> > Software Engineer
> > China Server Team, SUSE.
> >
> > --
> > Gao,Yan <ygao [at] suse>
> > Software Engineer
> > China Server Team, SUSE.
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ygao at novell

Aug 3, 2011, 5:14 AM

Post #15 of 38 (1731 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Hideo,

On 08/02/11 09:14, renayama19661014 [at] ybb wrote:
> Hi Yan,
>
> I confirmed that a trap was transmitted with a patch definitely.
OK, thanks!

>
> We request that we apply a patch to each pacemaker-mgmt of pacemaker1.0 and pacemaker1.1.
Pushed. Since we don't have a separate branch, you might need to
back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
pacemaker-1.0.x

>
> * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new version is released.( pacemaker-mgmt-2.1.0 ? )
>
We'll probably tag a new version in the near future.

Regards,
Gao,Yan
--
Gao,Yan <ygao [at] suse>
Software Engineer
China Server Team, SUSE.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Aug 3, 2011, 5:13 PM

Post #16 of 38 (1730 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

> Pushed. Since we don't have a separate branch, you might need to
> back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
> pacemaker-1.0.x

Thanks!!

However, we need the release of pacemaker-mgmt for Pacemaker1.0.

Is it impossible you apply a patch to a repository of pacemaker-mgmt-2.0.0, and to release?

* http://hg.clusterlabs.org/pacemaker/pygui/rev/18332eae086e

> > * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new version is released.( pacemaker-mgmt-2.1.0 ? )
> >
> We'll probably tag a new version in the near future.

Ok.

Beat Regards,
Hideo Yamauchi.


--- On Wed, 2011/8/3, Gao,Yan <ygao [at] novell> wrote:

> Hi Hideo,
>
> On 08/02/11 09:14, renayama19661014 [at] ybb wrote:
> > Hi Yan,
> >
> > I confirmed that a trap was transmitted with a patch definitely.
> OK, thanks!
>
> >
> > We request that we apply a patch to each pacemaker-mgmt of pacemaker1.0 and pacemaker1.1.
> Pushed. Since we don't have a separate branch, you might need to
> back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
> pacemaker-1.0.x
>
> > 
> >  * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new version is released.( pacemaker-mgmt-2.1.0 ? )
> >
> We'll probably tag a new version in the near future.
>
> Regards,
>   Gao,Yan
> --
> Gao,Yan <ygao [at] suse>
> Software Engineer
> China Server Team, SUSE.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ygao at novell

Aug 21, 2011, 11:25 PM

Post #17 of 38 (1668 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Hideo,

On 08/04/11 08:13, renayama19661014 [at] ybb wrote:
> Hi Yan,
>
>> Pushed. Since we don't have a separate branch, you might need to
>> back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
>> pacemaker-1.0.x
>
> Thanks!!
>
> However, we need the release of pacemaker-mgmt for Pacemaker1.0.
>
> Is it impossible you apply a patch to a repository of pacemaker-mgmt-2.0.0, and to release?
Probably we'll need to ask Andrew to help move pacemaker-mgmt to git
later. Then we can create a separate branch/repo for pacemaker-mgmt-2.0
which would be keeping compatible with pacemaker-1.0

BTW, pacemaker-1.1 has been merged with devel repo. The tip of
pacemaker-mgmt can be built against pacemaker-1.1 now.

Regards,
Gaoyan
--
Gao,Yan <ygao [at] suse>
Software Engineer
China Server Team, SUSE.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Aug 22, 2011, 12:05 AM

Post #18 of 38 (1660 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,
Hi Andrew,

Thank you for comment.

> Hi Hideo,
>
> On 08/04/11 08:13, renayama19661014 [at] ybb wrote:
> > Hi Yan,
> >
> >> Pushed. Since we don't have a separate branch, you might need to
> >> back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
> >> pacemaker-1.0.x
> >
> > Thanks!!
> >
> > However, we need the release of pacemaker-mgmt for Pacemaker1.0.
> >
> > Is it impossible you apply a patch to a repository of pacemaker-mgmt-2.0.0, and to release?
> Probably we'll need to ask Andrew to help move pacemaker-mgmt to git
> later. Then we can create a separate branch/repo for pacemaker-mgmt-2.0
> which would be keeping compatible with pacemaker-1.0

To Andrew: Please please lend the hand to us.

>
> BTW, pacemaker-1.1 has been merged with devel repo. The tip of
> pacemaker-mgmt can be built against pacemaker-1.1 now.

Many Thanks!!

Hideo Yamauchi.

>
> Regards,
>   Gaoyan
> --
> Gao,Yan <ygao [at] suse>
> Software Engineer
> China Server Team, SUSE.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


andrew at beekhof

Sep 18, 2011, 5:19 PM

Post #19 of 38 (1552 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

On Mon, Aug 22, 2011 at 5:05 PM, <renayama19661014 [at] ybb> wrote:
> Hi Yan,
> Hi Andrew,
>
> Thank you for comment.
>
>> Hi Hideo,
>>
>> On 08/04/11 08:13, renayama19661014 [at] ybb wrote:
>> > Hi Yan,
>> >
>> >> Pushed. Since we don't have a separate branch, you might need to
>> >> back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
>> >> pacemaker-1.0.x
>> >
>> > Thanks!!
>> >
>> > However, we need the release of pacemaker-mgmt for Pacemaker1.0.
>> >
>> > Is it impossible you apply a patch to a repository of pacemaker-mgmt-2.0.0, and to release?
>> Probably we'll need to ask Andrew to help move pacemaker-mgmt to git
>> later. Then we can create a separate branch/repo for pacemaker-mgmt-2.0
>> which would be keeping compatible with pacemaker-1.0
>
> To Andrew: Please please lend the hand to us.

I can create a place for it on clusterlabs.org or github if thats
where Yan would like it to live, but the conversion itself it for the
project owner to do (and verify) ;-)

>
>>
>> BTW, pacemaker-1.1 has been merged with devel repo. The tip of
>> pacemaker-mgmt can be built against pacemaker-1.1 now.
>
> Many Thanks!!
>
> Hideo Yamauchi.
>
>>
>> Regards,
>>   Gaoyan
>> --
>> Gao,Yan <ygao [at] suse>
>> Software Engineer
>> China Server Team, SUSE.
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Sep 18, 2011, 9:19 PM

Post #20 of 38 (1555 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Andrew,

> I can create a place for it on clusterlabs.org or github if thats
> where Yan would like it to live, but the conversion itself it for the
> project owner to do (and verify) ;-)

Thank you for comment.

Wait for an opinion of Yan.

Best Regards,
Hideo Yamauchi.



--- On Mon, 2011/9/19, Andrew Beekhof <andrew [at] beekhof> wrote:

> On Mon, Aug 22, 2011 at 5:05 PM,  <renayama19661014 [at] ybb> wrote:
> > Hi Yan,
> > Hi Andrew,
> >
> > Thank you for comment.
> >
> >> Hi Hideo,
> >>
> >> On 08/04/11 08:13, renayama19661014 [at] ybb wrote:
> >> > Hi Yan,
> >> >
> >> >> Pushed. Since we don't have a separate branch, you might need to
> >> >> back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
> >> >> pacemaker-1.0.x
> >> >
> >> > Thanks!!
> >> >
> >> > However, we need the release of pacemaker-mgmt for Pacemaker1.0.
> >> >
> >> > Is it impossible you apply a patch to a repository of pacemaker-mgmt-2.0.0, and to release?
> >> Probably we'll need to ask Andrew to help move pacemaker-mgmt to git
> >> later. Then we can create a separate branch/repo for pacemaker-mgmt-2.0
> >> which would be keeping compatible with pacemaker-1.0
> >
> > To Andrew: Please please lend the hand to us.
>
> I can create a place for it on clusterlabs.org or github if thats
> where Yan would like it to live, but the conversion itself it for the
> project owner to do (and verify)  ;-)
>
> >
> >>
> >> BTW, pacemaker-1.1 has been merged with devel repo. The tip of
> >> pacemaker-mgmt can be built against pacemaker-1.1 now.
> >
> > Many Thanks!!
> >
> > Hideo Yamauchi.
> >
> >>
> >> Regards,
> >>   Gaoyan
> >> --
> >> Gao,Yan <ygao [at] suse>
> >> Software Engineer
> >> China Server Team, SUSE.
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA [at] lists
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ygao at novell

Sep 20, 2011, 8:09 AM

Post #21 of 38 (1551 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

On 09/19/11 12:19, renayama19661014 [at] ybb wrote:
> Hi Andrew,
>
>> I can create a place for it on clusterlabs.org or github if thats
>> where Yan would like it to live,
Thanks!

> but the conversion itself it for the
>> project owner to do (and verify) ;-)
OK. I'll convert it and decide where to host it.

Regards,
Gaoyan
--
Gao,Yan <ygao [at] suse>
Software Engineer
China Server Team, SUSE.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Nov 23, 2011, 11:48 PM

Post #22 of 38 (1477 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

About this matter, were you selected?

Best Regards,
Hideo Yamauchi.

--- On Wed, 2011/9/21, Gao,Yan <ygao [at] novell> wrote:

>
> On 09/19/11 12:19, renayama19661014 [at] ybb wrote:
> > Hi Andrew,
> >
> >> I can create a place for it on clusterlabs.org or github if thats
> >> where Yan would like it to live,
> Thanks!
>
> > but the conversion itself it for the
> >> project owner to do (and verify)  ;-)
> OK. I'll convert it and decide where to host it.
>
> Regards,
>   Gaoyan
> --
> Gao,Yan <ygao [at] suse>
> Software Engineer
> China Server Team, SUSE.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ygao at suse

Nov 24, 2011, 12:50 AM

Post #23 of 38 (1479 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Hideo,

On 11/24/11 15:48, renayama19661014 [at] ybb wrote:
> Hi Yan,
>
> About this matter, were you selected?
>
> Best Regards,
> Hideo Yamauchi.
>
> --- On Wed, 2011/9/21, Gao,Yan <ygao [at] novell> wrote:
>
>>
>> On 09/19/11 12:19, renayama19661014 [at] ybb wrote:
>>> Hi Andrew,
>>>
>>>> I can create a place for it on clusterlabs.org or github if thats
>>>> where Yan would like it to live,
>> Thanks!
>>
>>> but the conversion itself it for the
>>>> project owner to do (and verify) ;-)
>> OK. I'll convert it and decide where to host it.
I've converted it, and for now put it to:
https://github.com/gao-yan/pacemaker-mgmt

Andrew, how do you think to create a place for it from
https://github.com/ClusterLabs?

Thanks,
Gaoyan
--
Gao,Yan <ygao [at] suse>
Software Engineer
China Server Team, SUSE.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Nov 24, 2011, 1:54 AM

Post #24 of 38 (1470 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

Thank you for comment.

> I've converted it, and for now put it to:
> https://github.com/gao-yan/pacemaker-mgmt

I watch contents.

Cheers,
Hideo Yamauchi.

--- On Thu, 2011/11/24, Gao,Yan <ygao [at] suse> wrote:
> Hi Hideo,
>
> On 11/24/11 15:48, renayama19661014 [at] ybb wrote:
> > Hi Yan,
> >
> > About this matter, were you selected?
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> > --- On Wed, 2011/9/21, Gao,Yan <ygao [at] novell> wrote:
> >
> >>
> >> On 09/19/11 12:19, renayama19661014 [at] ybb wrote:
> >>> Hi Andrew,
> >>>
> >>>> I can create a place for it on clusterlabs.org or github if thats
> >>>> where Yan would like it to live,
> >> Thanks!
> >>
> >>> but the conversion itself it for the
> >>>> project owner to do (and verify)  ;-)
> >> OK. I'll convert it and decide where to host it.
> I've converted it, and for now put it to:
> https://github.com/gao-yan/pacemaker-mgmt
>
> Andrew, how do you think to create a place for it from
> https://github.com/ClusterLabs?
>
> Thanks,
>   Gaoyan
> --
> Gao,Yan <ygao [at] suse>
> Software Engineer
> China Server Team, SUSE.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


renayama19661014 at ybb

Nov 24, 2011, 4:26 PM

Post #25 of 38 (1463 views)
Permalink
Re: The active trap of the SNMP is delayed. [In reply to]

Hi Yan,

I confirmed contents.
I think that I do not have any problem.

I demand that I prepare the tag of 2.0.1 version that applied the next patch.
* http://hg.clusterlabs.org/pacemaker/pygui/rev/c08b84a8203f

Because we want latest GUI for Pacemaker1.0.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2011/11/24, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:

> Hi Yan,
>
> Thank you for comment.
>
> > I've converted it, and for now put it to:
> > https://github.com/gao-yan/pacemaker-mgmt
>
> I watch contents.
>
> Cheers,
> Hideo Yamauchi.
>
> --- On Thu, 2011/11/24, Gao,Yan <ygao [at] suse> wrote:
> > Hi Hideo,
> >
> > On 11/24/11 15:48, renayama19661014 [at] ybb wrote:
> > > Hi Yan,
> > >
> > > About this matter, were you selected?
> > >
> > > Best Regards,
> > > Hideo Yamauchi.
> > >
> > > --- On Wed, 2011/9/21, Gao,Yan <ygao [at] novell> wrote:
> > >
> > >>
> > >> On 09/19/11 12:19, renayama19661014 [at] ybb wrote:
> > >>> Hi Andrew,
> > >>>
> > >>>> I can create a place for it on clusterlabs.org or github if thats
> > >>>> where Yan would like it to live,
> > >> Thanks!
> > >>
> > >>> but the conversion itself it for the
> > >>>> project owner to do (and verify)  ;-)
> > >> OK. I'll convert it and decide where to host it.
> > I've converted it, and for now put it to:
> > https://github.com/gao-yan/pacemaker-mgmt
> >
> > Andrew, how do you think to create a place for it from
> > https://github.com/ClusterLabs?
> >
> > Thanks,
> >   Gaoyan
> > --
> > Gao,Yan <ygao [at] suse>
> > Software Engineer
> > China Server Team, SUSE.
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

First page Previous page 1 2 Next page Last page  View All Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.