Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

glib: ucast: error binding socket. Retrying: Address already in use

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


groen692 at grosc

Jun 10, 2009, 3:20 AM

Post #1 of 6 (669 views)
Permalink
glib: ucast: error binding socket. Retrying: Address already in use

Hi everybody,

I just experienced a strange behavior, after rebooting our server manual
the heart beat came not into service after the reboot. The message log
show Retrying already in use? but in netstat nothing shows up on port
694? The nodes were able to see each other. On both nodes services were
connecting using the same link (br0).

A heartbeart stop/start did not help and resulted in the same log messages
After the a second reboot the phenomenon was gone

heartbeat V2.99.2
openSUSE 11.1

Anybody seen this before? or know the cause of it?

best regards

jeroen

====== log =========
ClusterNode1:/ # tail /var/log/messages
Jun 10 12:00:08 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
error binding socket. Retrying: Address already in use
Jun 10 12:00:09 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
error binding socket. Retrying: Address already in use
Jun 10 12:00:10 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
error binding socket. Retrying: Address already in use
Jun 10 12:00:11 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
error binding socket. Retrying: Address already in use
Jun 10 12:00:12 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
error binding socket. Retrying: Address already in use
Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
unable to bind socket. Giving up: Address already in use
Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR:
make_io_childpair: cannot open ucast br0
Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
Shutdown: Master Control process died.
Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Killing pid 5315
with SIGTERM
Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
Shutdown(MCP dead): Killing ourselves.


========= netstat -ntlp ============

ClusterNode1:/ # netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 0.0.0.0:5801 0.0.0.0:*
LISTEN 4039/xinetd
tcp 0 0 0.0.0.0:5901 0.0.0.0:*
LISTEN 4039/xinetd
tcp 0 0 0.0.0.0:111 0.0.0.0:*
LISTEN 3063/rpcbind
tcp 0 0 0.0.0.0:6004 0.0.0.0:*
LISTEN 4823/Xvnc
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 3907/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:*
LISTEN 3841/cupsd
tcp 0 0 127.0.0.1:25 0.0.0.0:*
LISTEN 3868/master
tcp 0 0 :::111 :::*
LISTEN 3063/rpcbind
tcp 0 0 :::6004 :::*
LISTEN 4823/Xvnc
tcp 0 0 :::22 :::*
LISTEN 3907/sshd


======= ha.cf ==========

use_logd yes
ucast br0 192.168.1.1
ucast br0 192.168.1.2
ucast br1 172.27.74.136
ucast br1 172.27.74.137
#serial /dev/ttyS0
node ClusterNode1
node ClusterNode2
respawn root /usr/lib64/heartbeat/hbagent
apiauth mgmtd uid=root
respawn root /usr/lib64/heartbeat/mgmtd -v
crm on

_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Jun 10, 2009, 3:52 AM

Post #2 of 6 (633 views)
Permalink
Re: glib: ucast: error binding socket. Retrying: Address already in use [In reply to]

Hi,

On Wed, Jun 10, 2009 at 12:20:14PM +0200, jeroen groenewegen van der weyden wrote:
> Hi everybody,
>
> I just experienced a strange behavior, after rebooting our server manual
> the heart beat came not into service after the reboot. The message log
> show Retrying already in use? but in netstat nothing shows up on port

Did you try lsof or fuser?

> 694? The nodes were able to see each other. On both nodes services were
> connecting using the same link (br0).
>
> A heartbeart stop/start did not help and resulted in the same log messages
> After the a second reboot the phenomenon was gone
>
> heartbeat V2.99.2
> openSUSE 11.1
>
> Anybody seen this before? or know the cause of it?

No. The only explanation I can imagine is that another process is
using this port.

Thanks,

Dejan

>
> best regards
>
> jeroen
>
> ====== log =========
> ClusterNode1:/ # tail /var/log/messages
> Jun 10 12:00:08 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> error binding socket. Retrying: Address already in use
> Jun 10 12:00:09 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> error binding socket. Retrying: Address already in use
> Jun 10 12:00:10 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> error binding socket. Retrying: Address already in use
> Jun 10 12:00:11 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> error binding socket. Retrying: Address already in use
> Jun 10 12:00:12 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> error binding socket. Retrying: Address already in use
> Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> unable to bind socket. Giving up: Address already in use
> Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR:
> make_io_childpair: cannot open ucast br0
> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
> Shutdown: Master Control process died.
> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Killing pid 5315
> with SIGTERM
> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
> Shutdown(MCP dead): Killing ourselves.
>
>
> ========= netstat -ntlp ============
>
> ClusterNode1:/ # netstat -ntlp
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address Foreign Address
> State PID/Program name
> tcp 0 0 0.0.0.0:5801 0.0.0.0:*
> LISTEN 4039/xinetd
> tcp 0 0 0.0.0.0:5901 0.0.0.0:*
> LISTEN 4039/xinetd
> tcp 0 0 0.0.0.0:111 0.0.0.0:*
> LISTEN 3063/rpcbind
> tcp 0 0 0.0.0.0:6004 0.0.0.0:*
> LISTEN 4823/Xvnc
> tcp 0 0 0.0.0.0:22 0.0.0.0:*
> LISTEN 3907/sshd
> tcp 0 0 127.0.0.1:631 0.0.0.0:*
> LISTEN 3841/cupsd
> tcp 0 0 127.0.0.1:25 0.0.0.0:*
> LISTEN 3868/master
> tcp 0 0 :::111 :::*
> LISTEN 3063/rpcbind
> tcp 0 0 :::6004 :::*
> LISTEN 4823/Xvnc
> tcp 0 0 :::22 :::*
> LISTEN 3907/sshd
>
>
> ======= ha.cf ==========
>
> use_logd yes
> ucast br0 192.168.1.1
> ucast br0 192.168.1.2
> ucast br1 172.27.74.136
> ucast br1 172.27.74.137
> #serial /dev/ttyS0
> node ClusterNode1
> node ClusterNode2
> respawn root /usr/lib64/heartbeat/hbagent
> apiauth mgmtd uid=root
> respawn root /usr/lib64/heartbeat/mgmtd -v
> crm on
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


groen692 at grosc

Jun 10, 2009, 4:17 AM

Post #3 of 6 (635 views)
Permalink
Re: glib: ucast: error binding socket. Retrying: Address already in use [In reply to]

no I did not try lsof or fuser (next time I will). But shouldn-t
netstat show the process also. further it would be strange for an other
proces to keep this port. "randomly" after a reboot it should occupy the
same again, shouldn-t it?


regards

jeroen

Dejan Muhamedagic wrote:
> Hi,
>
> On Wed, Jun 10, 2009 at 12:20:14PM +0200, jeroen groenewegen van der weyden wrote:
>
>> Hi everybody,
>>
>> I just experienced a strange behavior, after rebooting our server manual
>> the heart beat came not into service after the reboot. The message log
>> show Retrying already in use? but in netstat nothing shows up on port
>>
>
> Did you try lsof or fuser?
>
>
>> 694? The nodes were able to see each other. On both nodes services were
>> connecting using the same link (br0).
>>
>> A heartbeart stop/start did not help and resulted in the same log messages
>> After the a second reboot the phenomenon was gone
>>
>> heartbeat V2.99.2
>> openSUSE 11.1
>>
>> Anybody seen this before? or know the cause of it?
>>
>
> No. The only explanation I can imagine is that another process is
> using this port.
>
> Thanks,
>
> Dejan
>
>
>> best regards
>>
>> jeroen
>>
>> ====== log =========
>> ClusterNode1:/ # tail /var/log/messages
>> Jun 10 12:00:08 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
>> error binding socket. Retrying: Address already in use
>> Jun 10 12:00:09 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
>> error binding socket. Retrying: Address already in use
>> Jun 10 12:00:10 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
>> error binding socket. Retrying: Address already in use
>> Jun 10 12:00:11 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
>> error binding socket. Retrying: Address already in use
>> Jun 10 12:00:12 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
>> error binding socket. Retrying: Address already in use
>> Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
>> unable to bind socket. Giving up: Address already in use
>> Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR:
>> make_io_childpair: cannot open ucast br0
>> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
>> Shutdown: Master Control process died.
>> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Killing pid 5315
>> with SIGTERM
>> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
>> Shutdown(MCP dead): Killing ourselves.
>>
>>
>> ========= netstat -ntlp ============
>>
>> ClusterNode1:/ # netstat -ntlp
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address Foreign Address
>> State PID/Program name
>> tcp 0 0 0.0.0.0:5801 0.0.0.0:*
>> LISTEN 4039/xinetd
>> tcp 0 0 0.0.0.0:5901 0.0.0.0:*
>> LISTEN 4039/xinetd
>> tcp 0 0 0.0.0.0:111 0.0.0.0:*
>> LISTEN 3063/rpcbind
>> tcp 0 0 0.0.0.0:6004 0.0.0.0:*
>> LISTEN 4823/Xvnc
>> tcp 0 0 0.0.0.0:22 0.0.0.0:*
>> LISTEN 3907/sshd
>> tcp 0 0 127.0.0.1:631 0.0.0.0:*
>> LISTEN 3841/cupsd
>> tcp 0 0 127.0.0.1:25 0.0.0.0:*
>> LISTEN 3868/master
>> tcp 0 0 :::111 :::*
>> LISTEN 3063/rpcbind
>> tcp 0 0 :::6004 :::*
>> LISTEN 4823/Xvnc
>> tcp 0 0 :::22 :::*
>> LISTEN 3907/sshd
>>
>>
>> ======= ha.cf ==========
>>
>> use_logd yes
>> ucast br0 192.168.1.1
>> ucast br0 192.168.1.2
>> ucast br1 172.27.74.136
>> ucast br1 172.27.74.137
>> #serial /dev/ttyS0
>> node ClusterNode1
>> node ClusterNode2
>> respawn root /usr/lib64/heartbeat/hbagent
>> apiauth mgmtd uid=root
>> respawn root /usr/lib64/heartbeat/mgmtd -v
>> crm on
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA[at]lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.339 / Virus Database: 270.12.60/2166 - Release Date: 06/09/09 18:08:00
>
>

_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Jun 10, 2009, 5:08 AM

Post #4 of 6 (633 views)
Permalink
Re: glib: ucast: error binding socket. Retrying: Address already in use [In reply to]

Hi,

On Wed, Jun 10, 2009 at 01:17:19PM +0200, jeroen groenewegen van der weyden wrote:
> no I did not try lsof or fuser (next time I will). But shouldn-t
> netstat show the process also.

Yes, netstat should show connections, but with lsof you'll see
which processes are holding them.

> further it would be strange for an other
> proces to keep this port. "randomly" after a reboot it should occupy the
> same again, shouldn-t it?

Yes. Though there are processes which get dynamically assigned
ports from portmapper (yellow pages and similar).

Thanks,

Dejan

>
> regards
>
> jeroen
>
> Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Wed, Jun 10, 2009 at 12:20:14PM +0200, jeroen groenewegen van der weyden wrote:
> >
> >> Hi everybody,
> >>
> >> I just experienced a strange behavior, after rebooting our server manual
> >> the heart beat came not into service after the reboot. The message log
> >> show Retrying already in use? but in netstat nothing shows up on port
> >>
> >
> > Did you try lsof or fuser?
> >
> >
> >> 694? The nodes were able to see each other. On both nodes services were
> >> connecting using the same link (br0).
> >>
> >> A heartbeart stop/start did not help and resulted in the same log messages
> >> After the a second reboot the phenomenon was gone
> >>
> >> heartbeat V2.99.2
> >> openSUSE 11.1
> >>
> >> Anybody seen this before? or know the cause of it?
> >>
> >
> > No. The only explanation I can imagine is that another process is
> > using this port.
> >
> > Thanks,
> >
> > Dejan
> >
> >
> >> best regards
> >>
> >> jeroen
> >>
> >> ====== log =========
> >> ClusterNode1:/ # tail /var/log/messages
> >> Jun 10 12:00:08 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> >> error binding socket. Retrying: Address already in use
> >> Jun 10 12:00:09 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> >> error binding socket. Retrying: Address already in use
> >> Jun 10 12:00:10 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> >> error binding socket. Retrying: Address already in use
> >> Jun 10 12:00:11 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> >> error binding socket. Retrying: Address already in use
> >> Jun 10 12:00:12 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> >> error binding socket. Retrying: Address already in use
> >> Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR: glib: ucast:
> >> unable to bind socket. Giving up: Address already in use
> >> Jun 10 12:00:13 ClusterNode1 heartbeat: [5315]: ERROR:
> >> make_io_childpair: cannot open ucast br0
> >> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
> >> Shutdown: Master Control process died.
> >> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Killing pid 5315
> >> with SIGTERM
> >> Jun 10 12:00:14 ClusterNode1 heartbeat: [5317]: CRIT: Emergency
> >> Shutdown(MCP dead): Killing ourselves.
> >>
> >>
> >> ========= netstat -ntlp ============
> >>
> >> ClusterNode1:/ # netstat -ntlp
> >> Active Internet connections (only servers)
> >> Proto Recv-Q Send-Q Local Address Foreign Address
> >> State PID/Program name
> >> tcp 0 0 0.0.0.0:5801 0.0.0.0:*
> >> LISTEN 4039/xinetd
> >> tcp 0 0 0.0.0.0:5901 0.0.0.0:*
> >> LISTEN 4039/xinetd
> >> tcp 0 0 0.0.0.0:111 0.0.0.0:*
> >> LISTEN 3063/rpcbind
> >> tcp 0 0 0.0.0.0:6004 0.0.0.0:*
> >> LISTEN 4823/Xvnc
> >> tcp 0 0 0.0.0.0:22 0.0.0.0:*
> >> LISTEN 3907/sshd
> >> tcp 0 0 127.0.0.1:631 0.0.0.0:*
> >> LISTEN 3841/cupsd
> >> tcp 0 0 127.0.0.1:25 0.0.0.0:*
> >> LISTEN 3868/master
> >> tcp 0 0 :::111 :::*
> >> LISTEN 3063/rpcbind
> >> tcp 0 0 :::6004 :::*
> >> LISTEN 4823/Xvnc
> >> tcp 0 0 :::22 :::*
> >> LISTEN 3907/sshd
> >>
> >>
> >> ======= ha.cf ==========
> >>
> >> use_logd yes
> >> ucast br0 192.168.1.1
> >> ucast br0 192.168.1.2
> >> ucast br1 172.27.74.136
> >> ucast br1 172.27.74.137
> >> #serial /dev/ttyS0
> >> node ClusterNode1
> >> node ClusterNode2
> >> respawn root /usr/lib64/heartbeat/hbagent
> >> apiauth mgmtd uid=root
> >> respawn root /usr/lib64/heartbeat/mgmtd -v
> >> crm on
> >>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA[at]lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA[at]lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > ------------------------------------------------------------------------
> >
> >
> > No virus found in this incoming message.
> > Checked by AVG - www.avg.com
> > Version: 8.5.339 / Virus Database: 270.12.60/2166 - Release Date: 06/09/09 18:08:00
> >
> >
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lars.ellenberg at linbit

Jun 10, 2009, 6:03 AM

Post #5 of 6 (636 views)
Permalink
Re: glib: ucast: error binding socket. Retrying: Address already in use [In reply to]

On Wed, Jun 10, 2009 at 12:52:35PM +0200, Dejan Muhamedagic wrote:
> Hi,
>
> On Wed, Jun 10, 2009 at 12:20:14PM +0200, jeroen groenewegen van der weyden wrote:
> > Hi everybody,
> >
> > I just experienced a strange behavior, after rebooting our server manual
> > the heart beat came not into service after the reboot. The message log
> > show Retrying already in use? but in netstat nothing shows up on port
>
> Did you try lsof or fuser?
>
> > 694? The nodes were able to see each other. On both nodes services were
> > connecting using the same link (br0).
> >
> > A heartbeart stop/start did not help and resulted in the same log messages
> > After the a second reboot the phenomenon was gone
> >
> > heartbeat V2.99.2
> > openSUSE 11.1
> >
> > Anybody seen this before? or know the cause of it?
>
> No. The only explanation I can imagine is that another process is
> using this port.

e.g. rpc mountd is known to do that sometimes.
or anything else using rpc portmaper. they are free to choose.
most of them _can_ be told to use a fixed port, though.
(see man rpc.mountd, -p option)

maybe the best solution would be to have heartbeat start earlier, just
after network and sshd are up, before any other network services are
started, yes, even before portmapper or any rpc services with
"arbitrary" ports. or tell portmapper to choose arbitrary ports from
above 1024, if that is possible.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


hop at unlieb

Jul 3, 2009, 2:31 AM

Post #6 of 6 (446 views)
Permalink
Re: glib: ucast: error binding socket. Retrying: Address already in use [In reply to]

Hello,

> ========= netstat -ntlp ============

heartbeat binds to 694/udp, so "netstat -lnup" should show the cause of
your problem.

Bye,
Andreas

_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.