Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Remote Access not Working

 

 

First page Previous page 1 2 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


colin.hch at gmail

Nov 9, 2009, 8:24 AM

Post #1 of 26 (1128 views)
Permalink
Remote Access not Working

Hi All,

just tried to get the remote access to the cluster up-and-running, but
with more error than success...

Starting point was a working cluster installation. Then I did

# cibadmin --modify -X '<cib remote-clear-port="6900"/>'
# /etc/init.d/corosync stop
# /etc/init.d/corosync start

to get the listener, erm, listening:

# netstat -ant | grep 6900
tcp 0 0 0.0.0.0:6900 0.0.0.0:* LISTEN

For a first test I also changed the password of the "hacluster" user.

Then, on another machine, I set up the environment variables as follows:

# env | grep CIB
CIB_server=192.168.80.10
CIB_user=hacluster
CIB_port=6900

And issued a simple command, crm_resource --list. The crm_resource
command asks for a password and then hangs, on the cluster machine I
find the following in /var/log/daemon.log:

Nov 9 17:15:10 mz-dom0-001-4000 cib: [15698]: debug:
cib_remote_listen: New clear-text connection
Nov 9 17:15:10 mz-dom0-001-4000 cib: [15698]: ERROR: crm_xml_err: XML
Error: Entity: line 1: parsererror : Start tag expected, '<' not found
Nov 9 17:15:10 mz-dom0-001-4000 cib: [15698]: ERROR: crm_xml_err: XML
Error: #026#003#002
Nov 9 17:15:10 mz-dom0-001-4000 cib: [15698]: ERROR: crm_xml_err: XML Error: ^
Nov 9 17:15:10 mz-dom0-001-4000 cib: [15698]: WARN: string2xml:
Parsing failed (domain=1, level=3, code=4): Start tag expected, '<'
not found
Nov 9 17:15:10 mz-dom0-001-4000 cib: [15698]: ERROR: string2xml:
Couldn't parse 3 chars: #026#003#002
Nov 9 17:15:10 mz-dom0-001-4000 cib: [15698]: ERROR:
cib_recv_remote_msg: Couldn't parse: '#026#003#002'
Nov 9 17:15:26 mz-dom0-001-4000 cib: [15698]: ERROR:
cib_recv_remote_msg: Empty reply
Nov 9 17:15:27 mz-dom0-001-4000 cib: [15698]: ERROR:
cib_recv_remote_msg: Empty reply
Nov 9 17:15:28 mz-dom0-001-4000 cib: [15698]: ERROR:
cib_recv_remote_msg: Empty reply
Nov 9 17:15:29 mz-dom0-001-4000 cib: [15698]: ERROR:
cib_recv_remote_msg: Empty reply
Nov 9 17:15:30 mz-dom0-001-4000 cib: [15698]: ERROR:
cib_recv_remote_msg: Empty reply
.........

This continues forever, an error message every second, and the process
does not stop itself the normal way:

# /etc/init.d/corosync stop
Stopping corosync daemon: corosync.
# ps aux | grep cib
105 15698 0.3 0.7 13844 4588 ? S 17:12 0:01
/usr/lib/heartbeat/cib

This seems to prevent other processes from cleanly shutting down, too.

Am I doing something obviously wrong?

Thanks, Colin


PS: AFAICS the remote access does not support something like failover,
or connections to multiple cluster hosts, so I'll have to roll my own
wrapper that takes care of the issue?

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 10, 2009, 6:54 AM

Post #2 of 26 (1095 views)
Permalink
Re: Remote Access not Working [In reply to]

Does anybody else successfully use this feature, or is it suffering
from bit-rot?

Thanks, Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 12, 2009, 6:36 AM

Post #3 of 26 (1092 views)
Permalink
Re: Remote Access not Working [In reply to]

I used it the other day.

http://www.clusterlabs.org/doc/pacemaker-explained/ch-advanced-options.html#s-remote-connection

Try setting CIB_encrypted to false.

On Tue, Nov 10, 2009 at 3:54 PM, Colin <colin.hch [at] gmail> wrote:
> Does anybody else successfully use this feature, or is it suffering
> from bit-rot?
>
> Thanks, Colin
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 12, 2009, 7:46 AM

Post #4 of 26 (1089 views)
Permalink
Re: Remote Access not Working [In reply to]

On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> I used it the other day.
>
> http://www.clusterlabs.org/doc/pacemaker-explained/ch-advanced-options.html#s-remote-connection
>
> Try setting CIB_encrypted to false.

Thanks, that got me a step further...

...but there are still various issues:

1) In cib/remote.c, the function check_group_membership() only checks
whether the user is explicitly listed as member of the group in
/etc/group, but does not accept the user if only the users's primary
group in /etc/passwd is set to the correct group (and the explicit,
then redundant, membership in /etc/group is missing).

2) "Configuration Explained" does not mention CIB_encryped, that's why
my first attempts didn't work in the first place.

3) "Configuration Explained" says "remote-open-port" instead of
"remote-clear-port" in one place.

4) "Configuration Explained" says that CIB_user must be in the
"hacluster" group, rather then the "haclient" group.

5) The log message "cib: [2941]: debug: cib_remote_listen: New
clear-text connection" should include from where the connection came.

6) The log message "cib: [2941]: ERROR: cib_remote_listen: User is not
a member of the required group" might mention which user and which
group...

7) "Configuration Explained" and the page you just sent me both state
that CIB_user must be part of the hacluster group; apart from the
mistake that the group is haclient, the commend in cib/remote.c and my
observations shows that CIB_user actually must be the user as which
the cib process is running.

8) Just tried with crm_resource: The password prompt when not setting
CIB_password is sent to stdout, rather than stderr [which makes it
near impossible to send the output someplace].

9) I am getting completely bogus results via the remote connection,
e.g. "crm_resource --list" shows only 2 of 8 resources, and shows the
as stopped, whereas on the cluster nodes I see the -- correct -- list
with 8 resources which are all started. With "cibadmin -Q" I get:

# cibadmin -Q | wc # on a cluster node
379 1895 50474

# cibadmin -Q | wc # via the remote connection
cibadmin: Opened connection to 192.168.80.10:6900
66 193 4731

10) It's very easy to trash the cib process, e.g. by connecting via
telnet and sending a few bytes of garbage; result is an endless loop
of "cib: [7846]: ERROR: cib_recv_remote_msg: Empty reply" messages,
one per second, and that I need to "killall -9 cib" in order to get
everything working again.

Only once, out of a couple dozen attempts, did the remote access
actually yield the correct output, other times it completely fails
without any apparent reason ... at this point I'm not quite sure what
to make of all this.

Regards, Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 16, 2009, 6:19 AM

Post #5 of 26 (1080 views)
Permalink
Re: Remote Access not Working [In reply to]

On Thu, Nov 12, 2009 at 4:46 PM, Colin <colin.hch [at] gmail> wrote:
> On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> I used it the other day.
>>
>> http://www.clusterlabs.org/doc/pacemaker-explained/ch-advanced-options.html#s-remote-connection
>>
>> Try setting CIB_encrypted to false.
>
> Thanks, that got me a step further...
>
> ...but there are still various issues:
>
> 1) In cib/remote.c, the function check_group_membership() only checks
> whether the user is explicitly listed as member of the group in
> /etc/group, but does not accept the user if only the users's primary
> group in /etc/passwd is set to the correct group (and the explicit,
> then redundant, membership in /etc/group is missing).

Agreed. Seems to be a PAM thing that I can't do much about though.

>
> 2) "Configuration Explained" does not mention CIB_encryped, that's why
> my first attempts didn't work in the first place.

I know, the docs I pointed you to are the new location for
"configuration explained", they'll be announced shortly.

>
> 3) "Configuration Explained" says "remote-open-port" instead of
> "remote-clear-port" in one place.

the new docs are correct

>
> 4) "Configuration Explained" says that CIB_user must be in the
> "hacluster" group, rather then the "haclient" group.

the new docs are correct

> 5) The log message "cib: [2941]: debug: cib_remote_listen: New
> clear-text connection" should include from where the connection came.

why and how?

> 6) The log message "cib: [2941]: ERROR: cib_remote_listen: User is not
> a member of the required group" might mention which user and which
> group...

it doesn't do so for security reasons

> 7) "Configuration Explained" and the page you just sent me both state
> that CIB_user must be part of the hacluster group; apart from the
> mistake that the group is haclient, the commend in cib/remote.c and my
> observations shows that CIB_user actually must be the user as which
> the cib process is running.

correct

> 8) Just tried with crm_resource: The password prompt when not setting
> CIB_password is sent to stdout, rather than stderr [.which makes it
> near impossible to send the output someplace].

we can probably change that

> 9) I am getting completely bogus results via the remote connection,
> e.g. "crm_resource --list" shows only 2 of 8 resources, and shows the
> as stopped, whereas on the cluster nodes I see the -- correct -- list
> with 8 resources which are all started. With "cibadmin -Q" I get:
>
> # cibadmin -Q | wc  # on a cluster node
>    379    1895   50474
>
> # cibadmin -Q | wc  # via the remote connection
> cibadmin: Opened connection to 192.168.80.10:6900
>     66     193    4731

someone else mentioned that, i've not been able to reproduce it yet.

> 10) It's very easy to trash the cib process, e.g. by connecting via
> telnet and sending a few bytes of garbage; result is an endless loop
> of "cib: [7846]: ERROR: cib_recv_remote_msg: Empty reply" messages,
> one per second, and that I need to "killall -9 cib" in order to get
> everything working again.

ok, thats not good.
I think this patch should fix it though:

diff -r 828b3329a64c cib/remote.c
--- a/cib/remote.c Fri Nov 06 16:28:21 2009 +0100
+++ b/cib/remote.c Mon Nov 16 15:18:41 2009 +0100
@@ -220,7 +220,7 @@ cib_remote_listen(int ssock, gpointer da
}

do {
- crm_debug_2("Iter: %d", lpc++);
+ crm_debug_2("Iter: %d", lpc);
if(ssock == remote_tls_fd) {
#ifdef HAVE_GNUTLS_GNUTLS_H
login = cib_recv_remote_msg(session, TRUE);
@@ -230,7 +230,7 @@ cib_remote_listen(int ssock, gpointer da
}
sleep(1);

- } while(login == NULL && lpc < 10);
+ } while(login == NULL && ++lpc < 10);

crm_log_xml_info(login, "Login: ");
if(login == NULL) {


>
> Only once, out of a couple dozen attempts, did the remote access
> actually yield the correct output, other times it completely fails
> without any apparent reason ... at this point I'm not quite sure what
> to make of all this.
>
> Regards, Colin
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 16, 2009, 7:31 AM

Post #6 of 26 (1080 views)
Permalink
Re: Remote Access not Working [In reply to]

Hi Andrew,

thanks for your response!

On Mon, Nov 16, 2009 at 3:19 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Thu, Nov 12, 2009 at 4:46 PM, Colin <colin.hch [at] gmail> wrote:
>> On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>
>> 5) The log message "cib: [2941]: debug: cib_remote_listen: New
>> clear-text connection" should include from where the connection came.
>
> why and how?

Why: It's like "file not found" without the info which file wasn't
found ... perhaps it's just me, but I would like to see the source IP
and port of the connection.

How: You're probably not asking me how to implement the feature, so
I'm assuming that you misunderstood what exactly I was asking for(?).

>> 6) The log message "cib: [2941]: ERROR: cib_remote_listen: User is not
>> a member of the required group" might mention which user and which
>> group...
>
> it doesn't do so for security reasons

Hm.

Security? I see, that's when you use unencrypted remote syslogging --
anybody already on the machine could just use ps(1).

How about logging it in the ERROR messages, but only when
debug-logging is enabled?

>> 8) Just tried with crm_resource: The password prompt when not setting
>> CIB_password is sent to stdout, rather than stderr [.which makes it
>> near impossible to send the output someplace].
>
> we can probably change that

That'd be great, also because the new behaviour would be more in-line
with what many other command line programs do...

>> 9) I am getting completely bogus results via the remote connection,
>> e.g. "crm_resource --list" shows only 2 of 8 resources, and shows the
>> as stopped, whereas on the cluster nodes I see the -- correct -- list
>> with 8 resources which are all started. With "cibadmin -Q" I get:
>>
>> # cibadmin -Q | wc  # on a cluster node
>>    379    1895   50474
>>
>> # cibadmin -Q | wc  # via the remote connection
>> cibadmin: Opened connection to 192.168.80.10:6900
>>     66     193    4731
>
> someone else mentioned that, i've not been able to reproduce it yet.

Weird. I'm using the precompiled Debian packages for Pacemaker 1.0.6
with Corosync. Anything that might help debug the problem?

root [at] cluster:~# tail -f /var/log/daemon.log
Nov 16 15:53:33 cluster1 cib: [24749]: debug: cib_remote_listen: New
clear-text connection
Nov 16 15:53:34 cluster1 cib: [24749]: info: log_data_element:
cib_remote_listen: Login: <cib_command op="authenticate"
user="hacluster" password="*****" hidden="password" />
Nov 16 15:53:34 cluster1 cib: [24749]: debug: cib_remote_listen: New
clear-text connection
Nov 16 15:53:35 cluster1 cib: [24749]: info: log_data_element:
cib_remote_listen: Login: <cib_command op="authenticate"
user="hacluster" password="*****" hidden="password" />
Nov 16 15:53:35 cluster1 corosync[7426]: [TOTEM ] mcasted message
added to pending queue
[... more corosync messages ...]
Nov 16 15:53:35 cluster1 corosync[7426]: [TOTEM ] releasing messages
up to and including 48a
Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_remote_msg: Empty reply
Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_plaintext:
Error receiving message: -1: Connection reset by peer (104)
Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_remote_msg: Empty reply
^C
root [at] cluster:~# cibadmin -Q | wc
382 1943 51825
root [at] cluster:~#

root [at] admi:~# cibadmin -Q > cib.xml
cibadmin: Opened connection to 192.168.80.10:6900
root [at] admi:~# wc cib.xml
86 255 6379 cib.xml
root [at] admi:~#

>> 10) It's very easy to trash the cib process, e.g. by connecting via
>> telnet and sending a few bytes of garbage; result is an endless loop
>> of "cib: [7846]: ERROR: cib_recv_remote_msg: Empty reply" messages,
>> one per second, and that I need to "killall -9 cib" in order to get
>> everything working again.
>
> ok, thats not good.
> I think this patch should fix it though:
>
> diff -r 828b3329a64c cib/remote.c
> --- a/cib/remote.c      Fri Nov 06 16:28:21 2009 +0100
> +++ b/cib/remote.c      Mon Nov 16 15:18:41 2009 +0100
> @@ -220,7 +220,7 @@ cib_remote_listen(int ssock, gpointer da
>        }
>
>        do {
> -               crm_debug_2("Iter: %d", lpc++);
> +               crm_debug_2("Iter: %d", lpc);
>                if(ssock == remote_tls_fd) {
>  #ifdef HAVE_GNUTLS_GNUTLS_H
>                    login = cib_recv_remote_msg(session, TRUE);
> @@ -230,7 +230,7 @@ cib_remote_listen(int ssock, gpointer da
>                }
>                sleep(1);
>
> -       } while(login == NULL && lpc < 10);
> +       } while(login == NULL && ++lpc < 10);
>
>        crm_log_xml_info(login, "Login: ");
>        if(login == NULL) {

Thanks, since we have been using precompiled packages I haven't
actually gone through the exercise of compiling Pacemaker, so it might
take some time before I get around to testing this patch...

Regards, Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 16, 2009, 7:42 AM

Post #7 of 26 (1075 views)
Permalink
Re: Remote Access not Working [In reply to]

On Mon, Nov 16, 2009 at 4:31 PM, Colin <colin.hch [at] gmail> wrote:
> Hi Andrew,
>
> thanks for your response!
>
> On Mon, Nov 16, 2009 at 3:19 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> On Thu, Nov 12, 2009 at 4:46 PM, Colin <colin.hch [at] gmail> wrote:
>>> On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>
>>> 5) The log message "cib: [2941]: debug: cib_remote_listen: New
>>> clear-text connection" should include from where the connection came.
>>
>> why and how?
>
> Why: It's like "file not found" without the info which file wasn't
> found ... perhaps it's just me, but I would like to see the source IP
> and port of the connection.
>
> How: You're probably not asking me how to implement the feature, so
> I'm assuming that you misunderstood what exactly I was asking for(?).

No, I'm saying that I'm pretty sure we don't have access to the IP information.

>
>>> 6) The log message "cib: [2941]: ERROR: cib_remote_listen: User is not
>>> a member of the required group" might mention which user and which
>>> group...
>>
>> it doesn't do so for security reasons
>
> Hm.
>
> Security? I see, that's when you use unencrypted remote syslogging --
> anybody already on the machine could just use ps(1).
>
> How about logging it in the ERROR messages, but only when
> debug-logging is enabled?

No, because then I'll get confused emails from people wondering why
there are a stream of ERRORs in the logs.

>
>>> 8) Just tried with crm_resource: The password prompt when not setting
>>> CIB_password is sent to stdout, rather than stderr [.which makes it
>>> near impossible to send the output someplace].
>>
>> we can probably change that
>
> That'd be great, also because the new behaviour would be more in-line
> with what many other command line programs do...
>
>>> 9) I am getting completely bogus results via the remote connection,
>>> e.g. "crm_resource --list" shows only 2 of 8 resources, and shows the
>>> as stopped, whereas on the cluster nodes I see the -- correct -- list
>>> with 8 resources which are all started. With "cibadmin -Q" I get:
>>>
>>> # cibadmin -Q | wc  # on a cluster node
>>>    379    1895   50474
>>>
>>> # cibadmin -Q | wc  # via the remote connection
>>> cibadmin: Opened connection to 192.168.80.10:6900
>>>     66     193    4731
>>
>> someone else mentioned that, i've not been able to reproduce it yet.
>
> Weird. I'm using the precompiled Debian packages for Pacemaker 1.0.6
> with Corosync. Anything that might help debug the problem?

add more hours to the day? :)

>
> root [at] cluster:~# tail -f /var/log/daemon.log
> Nov 16 15:53:33 cluster1 cib: [24749]: debug: cib_remote_listen: New
> clear-text connection
> Nov 16 15:53:34 cluster1 cib: [24749]: info: log_data_element:
> cib_remote_listen: Login:  <cib_command op="authenticate"
> user="hacluster" password="*****" hidden="password" />
> Nov 16 15:53:34 cluster1 cib: [24749]: debug: cib_remote_listen: New
> clear-text connection
> Nov 16 15:53:35 cluster1 cib: [24749]: info: log_data_element:
> cib_remote_listen: Login:  <cib_command op="authenticate"
> user="hacluster" password="*****" hidden="password" />
> Nov 16 15:53:35 cluster1 corosync[7426]:   [TOTEM ] mcasted message
> added to pending queue
> [... more corosync messages ...]
> Nov 16 15:53:35 cluster1 corosync[7426]:   [TOTEM ] releasing messages
> up to and including 48a
> Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_remote_msg: Empty reply
> Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_plaintext:
> Error receiving message: -1: Connection reset by peer (104)
> Nov 16 15:53:35 cluster1 cib: [24749]: ERROR: cib_recv_remote_msg: Empty reply
> ^C
> root [at] cluster:~# cibadmin -Q | wc
>    382    1943   51825
> root [at] cluster:~#
>
> root [at] admi:~# cibadmin -Q > cib.xml
> cibadmin: Opened connection to 192.168.80.10:6900
> root [at] admi:~# wc cib.xml
>  86  255 6379 cib.xml
> root [at] admi:~#
>
>>> 10) It's very easy to trash the cib process, e.g. by connecting via
>>> telnet and sending a few bytes of garbage; result is an endless loop
>>> of "cib: [7846]: ERROR: cib_recv_remote_msg: Empty reply" messages,
>>> one per second, and that I need to "killall -9 cib" in order to get
>>> everything working again.
>>
>> ok, thats not good.
>> I think this patch should fix it though:
>>
>> diff -r 828b3329a64c cib/remote.c
>> --- a/cib/remote.c      Fri Nov 06 16:28:21 2009 +0100
>> +++ b/cib/remote.c      Mon Nov 16 15:18:41 2009 +0100
>> @@ -220,7 +220,7 @@ cib_remote_listen(int ssock, gpointer da
>>        }
>>
>>        do {
>> -               crm_debug_2("Iter: %d", lpc++);
>> +               crm_debug_2("Iter: %d", lpc);
>>                if(ssock == remote_tls_fd) {
>>  #ifdef HAVE_GNUTLS_GNUTLS_H
>>                    login = cib_recv_remote_msg(session, TRUE);
>> @@ -230,7 +230,7 @@ cib_remote_listen(int ssock, gpointer da
>>                }
>>                sleep(1);
>>
>> -       } while(login == NULL && lpc < 10);
>> +       } while(login == NULL && ++lpc < 10);
>>
>>        crm_log_xml_info(login, "Login: ");
>>        if(login == NULL) {
>
> Thanks, since we have been using precompiled packages I haven't
> actually gone through the exercise of compiling Pacemaker, so it might
> take some time before I get around to testing this patch...
>
> Regards, Colin
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 16, 2009, 7:54 AM

Post #8 of 26 (1075 views)
Permalink
Re: Remote Access not Working [In reply to]

On Mon, Nov 16, 2009 at 4:42 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Mon, Nov 16, 2009 at 4:31 PM, Colin <colin.hch [at] gmail> wrote:
>>
>> On Mon, Nov 16, 2009 at 3:19 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>> On Thu, Nov 12, 2009 at 4:46 PM, Colin <colin.hch [at] gmail> wrote:
>>>> On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>
>>>> 5) The log message "cib: [2941]: debug: cib_remote_listen: New
>>>> clear-text connection" should include from where the connection came.
>>>
>>> why and how?
>>
>> Why: It's like "file not found" without the info which file wasn't
>> found ... perhaps it's just me, but I would like to see the source IP
>> and port of the connection.
>>
>> How: You're probably not asking me how to implement the feature, so
>> I'm assuming that you misunderstood what exactly I was asking for(?).
>
> No, I'm saying that I'm pretty sure we don't have access to the IP information.

In cib/remote.c the call to accept(2) which fills in the data
structure with the IP is just 2 lines after the call to crm_debug(),
is it a problem to change the order?

>>>> 6) The log message "cib: [2941]: ERROR: cib_remote_listen: User is not
>>>> a member of the required group" might mention which user and which
>>>> group...
>>>
>>> it doesn't do so for security reasons
>>
>> Hm.
>>
>> Security? I see, that's when you use unencrypted remote syslogging --
>> anybody already on the machine could just use ps(1).
>>
>> How about logging it in the ERROR messages, but only when
>> debug-logging is enabled?
>
> No, because then I'll get confused emails from people wondering why
> there are a stream of ERRORs in the logs.

Erm, I don't want to change the frequency or the level of any message,
just that the one ERROR message quoted above is changed in content to
include the uid/user and gid/group to which it refers when
debug-logging is enabled.

>> Weird. I'm using the precompiled Debian packages for Pacemaker 1.0.6
>> with Corosync. Anything that might help debug the problem?
>
> add more hours to the day? :)

One-way ticket to Mars help?

Colin ;-)

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 19, 2009, 11:31 AM

Post #9 of 26 (1065 views)
Permalink
Re: Remote Access not Working [In reply to]

Fixed the plaintext connections and made a couple of the changes you suggested.

http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/971d8989e9f0

On Mon, Nov 16, 2009 at 4:54 PM, Colin <colin.hch [at] gmail> wrote:
> On Mon, Nov 16, 2009 at 4:42 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> On Mon, Nov 16, 2009 at 4:31 PM, Colin <colin.hch [at] gmail> wrote:
>>>
>>> On Mon, Nov 16, 2009 at 3:19 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>> On Thu, Nov 12, 2009 at 4:46 PM, Colin <colin.hch [at] gmail> wrote:
>>>>> On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>>
>>>>> 5) The log message "cib: [2941]: debug: cib_remote_listen: New
>>>>> clear-text connection" should include from where the connection came.
>>>>
>>>> why and how?
>>>
>>> Why: It's like "file not found" without the info which file wasn't
>>> found ... perhaps it's just me, but I would like to see the source IP
>>> and port of the connection.
>>>
>>> How: You're probably not asking me how to implement the feature, so
>>> I'm assuming that you misunderstood what exactly I was asking for(?).
>>
>> No, I'm saying that I'm pretty sure we don't have access to the IP information.
>
> In cib/remote.c the call to accept(2) which fills in the data
> structure with the IP is just 2 lines after the call to crm_debug(),
> is it a problem to change the order?
>
>>>>> 6) The log message "cib: [2941]: ERROR: cib_remote_listen: User is not
>>>>> a member of the required group" might mention which user and which
>>>>> group...
>>>>
>>>> it doesn't do so for security reasons
>>>
>>> Hm.
>>>
>>> Security? I see, that's when you use unencrypted remote syslogging --
>>> anybody already on the machine could just use ps(1).
>>>
>>> How about logging it in the ERROR messages, but only when
>>> debug-logging is enabled?
>>
>> No, because then I'll get confused emails from people wondering why
>> there are a stream of ERRORs in the logs.
>
> Erm, I don't want to change the frequency or the level of any message,
> just that the one ERROR message quoted above is changed in content to
> include the uid/user and gid/group to which it refers when
> debug-logging is enabled.
>
>>> Weird. I'm using the precompiled Debian packages for Pacemaker 1.0.6
>>> with Corosync. Anything that might help debug the problem?
>>
>> add more hours to the day? :)
>
> One-way ticket to Mars help?
>
> Colin ;-)
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 19, 2009, 11:18 PM

Post #10 of 26 (1058 views)
Permalink
Re: Remote Access not Working [In reply to]

On Thu, Nov 19, 2009 at 8:31 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> Fixed the plaintext connections and made a couple of the changes you suggested.
>
> http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/971d8989e9f0

That's great, thanks!

/me is off to compile Pacemaker.

Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 20, 2009, 2:17 AM

Post #11 of 26 (1058 views)
Permalink
Re: Remote Access not Working [In reply to]

Hi,

this is looking better again: A remote "cibadmin -Q" is now doing the
right thing, however a remote "crm_mon" is still _not_ working
correctly.

Let's see, now that I should know where to look ... the function
cib_recv_plaintext() in lib/common/remote.c looks a bit suspicious to
me:

- The "if (len == 0)" check will never be true because len is
initialised to 512 and then only grows.
- The assumption that a partial read (wrt. the buffer) signals no more
data is IMO not valid.

With the following patch I can at least get a "crm_mon -1rf" to do the
right thing:

diff -ur Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c
Pacemaker-my/lib/common/remote.c
--- Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c 2009-11-19
21:12:53.000000000 +0100
+++ Pacemaker-my/lib/common/remote.c 2009-11-20 10:52:36.000000000 +0100
@@ -220,33 +220,29 @@
char*
cib_recv_plaintext(int sock)
{
- int last = 0;
char* buf = NULL;
- int chunk_size = 512;
- int len = chunk_size;
+ ssize_t buf_size = 512;
+ ssize_t len = 0;

- crm_malloc0(buf, chunk_size);
+ crm_malloc0(buf, buf_size);

while(1) {
- int rc = recv(sock, buf+last, chunk_size, 0);
+ ssize_t rc = recv(sock, buf+len, buf_size-len, 0);
if (rc == 0) {
if(len == 0) {
goto bail;
}
return buf;

- } else if(rc > 0 && rc < chunk_size) {
- return buf;
-
- } else if(rc == chunk_size) {
- last = len;
- len += chunk_size;
- crm_realloc(buf, len);
- CRM_ASSERT(buf != NULL);
+ } else if(rc > 0) {
+ len += rc;
+ if (len == buf_size) {
+ crm_realloc(buf, buf_size += 512); /* Should do
exponential growth for amortized constant time? */
+ CRM_ASSERT(buf != NULL);
+ }
}
-
if(rc < 0 && errno != EINTR) {
- crm_perror(LOG_ERR,"Error receiving message: %d", rc);
+ crm_perror(LOG_ERR,"Error receiving message: %d", (int)rc);
goto bail;
}
}

And that is as far as I can get with crm_mon, as it doesn't supports
continuous update via remote access?

static int cib_remote_set_connection_dnotify(
cib_t *cib, void (*dnotify)(gpointer user_data))
{
return cib_NOTSUPPORTED;
}


Regards, Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 20, 2009, 3:36 AM

Post #12 of 26 (1057 views)
Permalink
Re: Remote Access not Working [In reply to]

On Fri, Nov 20, 2009 at 11:17 AM, Colin <colin.hch [at] gmail> wrote:
> Hi,
>
> this is looking better again: A remote "cibadmin -Q" is now doing the
> right thing, however a remote "crm_mon" is still _not_ working
> correctly.
>
> Let's see, now that I should know where to look ... the function
> cib_recv_plaintext() in lib/common/remote.c looks a bit suspicious to
> me:
>
> - The "if (len == 0)" check will never be true because len is
> initialised to 512 and then only grows.
> - The assumption that a partial read (wrt. the buffer) signals no more
> data is IMO not valid.

It is if you didn't get a signal.
But I agree the code needs a cleanup.

I went with: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/5acf9f2e9c9e

> With the following patch I can at least get a "crm_mon -1rf" to do the
> right thing:
>
> diff -ur Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c
> Pacemaker-my/lib/common/remote.c
> --- Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c      2009-11-19
> 21:12:53.000000000 +0100
> +++ Pacemaker-my/lib/common/remote.c    2009-11-20 10:52:36.000000000 +0100
> @@ -220,33 +220,29 @@
>  char*
>  cib_recv_plaintext(int sock)
>  {
> -       int last = 0;
>        char* buf = NULL;
> -       int chunk_size = 512;
> -       int len = chunk_size;
> +       ssize_t buf_size = 512;
> +       ssize_t len = 0;
>
> -       crm_malloc0(buf, chunk_size);
> +       crm_malloc0(buf, buf_size);
>
>        while(1) {
> -               int rc = recv(sock, buf+last, chunk_size, 0);
> +               ssize_t rc = recv(sock, buf+len, buf_size-len, 0);
>                if (rc == 0) {
>                        if(len == 0) {
>                                goto bail;
>                        }
>                        return buf;
>
> -               } else if(rc > 0 && rc < chunk_size) {
> -                       return buf;
> -
> -               } else if(rc == chunk_size) {
> -                       last = len;
> -                       len += chunk_size;
> -                       crm_realloc(buf, len);
> -                       CRM_ASSERT(buf != NULL);
> +               } else if(rc > 0) {
> +                 len += rc;
> +                 if (len == buf_size) {
> +                   crm_realloc(buf, buf_size += 512);  /* Should do
> exponential growth for amortized constant time? */
> +                   CRM_ASSERT(buf != NULL);
> +                 }
>                }
> -
>                if(rc < 0 && errno != EINTR) {
> -                       crm_perror(LOG_ERR,"Error receiving message: %d", rc);
> +                 crm_perror(LOG_ERR,"Error receiving message: %d", (int)rc);
>                        goto bail;
>                }
>        }
>
> And that is as far as I can get with crm_mon, as it doesn't supports
> continuous update via remote access?
>
> static int cib_remote_set_connection_dnotify(
>    cib_t *cib, void (*dnotify)(gpointer user_data))
> {
>    return cib_NOTSUPPORTED;
> }

No, thats something else.
Remote notifications should work, I'll test that today.

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 20, 2009, 4:01 AM

Post #13 of 26 (1060 views)
Permalink
Re: Remote Access not Working [In reply to]

On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Fri, Nov 20, 2009 at 11:17 AM, Colin <colin.hch [at] gmail> wrote:
>> - The assumption that a partial read (wrt. the buffer) signals no more
>> data is IMO not valid.
>
> It is if you didn't get a signal.

What if the number of payload bytes per IP packet is not a multiple of
the third argument to recv(), and you have a slow connection? This is
TCP, so you the data can come at any fast or slow rate. And TCP
lacking any kind of implicit record markers (not like UDP or SCTP that
have them) you normally have to look at the data to know when you're
done reading... At least that's my current understanding of [the
shortcomings of the stream-abstraction provided by] TCP.

> But I agree the code needs a cleanup.
>
> I went with: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/5acf9f2e9c9e

Great, I'll set up Mercurial and then I'll test it.

>> And that is as far as I can get with crm_mon, as it doesn't supports
>> continuous update via remote access?
>>
>> static int cib_remote_set_connection_dnotify(
>>    cib_t *cib, void (*dnotify)(gpointer user_data))
>> {
>>    return cib_NOTSUPPORTED;
>> }
>
> No, thats something else.
> Remote notifications should work, I'll test that today.

Right, this function does not seem to get used. With:

if(full) {
crm_debug_3("Full connect: start");
if(rc == cib_ok) {
crm_debug_3("Full connect: dnotify");
rc = cib->cmds->set_connection_dnotify(cib,
mon_cib_connection_destroy);
}

if(rc == cib_ok) {
crm_debug_3("Full connect: callback");
cib->cmds->del_notify_callback(cib, T_CIB_DIFF_NOTIFY,
crm_diff_update);
rc = cib->cmds->add_notify_callback(cib,
T_CIB_DIFF_NOTIFY, crm_diff_update);
}

if(rc != cib_ok) {
print_as("Notification setup failed, could not monitor
CIB actions");
if(as_console) { sleep(2); }
clean_up(-rc);
}
}

the output of 'tools/.libs/crm_mon -VVVVVVVNrf' finishes with:

Migration summary:
* Node cluster1:
crm_mon[21188]: 2009/11/20_12:51:58 debug: debug3:
cleanup_calculations: deleting resources
crm_mon[21188]: 2009/11/20_12:51:58 debug: debug3:
cleanup_calculations: deleting actions
crm_mon[21188]: 2009/11/20_12:51:58 debug: debug3:
cleanup_calculations: deleting nodes
crm_mon[21188]: 2009/11/20_12:51:58 debug: debug3: cib_connect: Full
connect: start
crm_mon[21188]: 2009/11/20_12:51:58 debug: debug3: cib_connect: Full
connect: dnotify
crm_mon[21188]: 2009/11/20_12:51:58 debug: cib_remote_signoff: Signing
out of the CIB Service
crm_mon[21188]: 2009/11/20_12:51:58 WARN: cib_remote_free: Freeing CIB
Notification setup failed, could not monitor CIB
actionscluster1:~/Pacemaker-my# fg

Side note: Now I often get two password prompts?!?

cluster1:~/Pacemaker-my# tools/.libs/crm_mon -VNrf
Attempting connection to the cluster...Password:Password:

Thanks, Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 20, 2009, 4:05 AM

Post #14 of 26 (1062 views)
Permalink
Re: Remote Access not Working [In reply to]

PS: I believe this CRM_ASSERT() in lib/common/remote.c can never trigger.

if(encrypted) {
#ifdef HAVE_GNUTLS_GNUTLS_H
reply = cib_recv_tls(session);
#else
CRM_ASSERT(encrypted == FALSE);
#endif
} else {

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 20, 2009, 4:23 AM

Post #15 of 26 (1058 views)
Permalink
Re: Remote Access not Working [In reply to]

On Fri, Nov 20, 2009 at 1:05 PM, Colin <colin.hch [at] gmail> wrote:
> PS: I believe this CRM_ASSERT() in lib/common/remote.c can never trigger.

Its designed to detect if somehow we asked for an encrypted message
when Pacemaker wasn;t built with gnutls.
Its a sanity check, its not supposed to go off.

>
>    if(encrypted) {
> #ifdef HAVE_GNUTLS_GNUTLS_H
>        reply = cib_recv_tls(session);
> #else
>        CRM_ASSERT(encrypted == FALSE);
> #endif
>    } else {
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 20, 2009, 11:05 AM

Post #16 of 26 (1040 views)
Permalink
Re: Remote Access not Working [In reply to]

On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> Remote notifications should work, I'll test that today.

As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
they finally work for clear-text connections.
Testing encrypted ones now.

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 20, 2009, 12:04 PM

Post #17 of 26 (1039 views)
Permalink
Re: Remote Access not Working [In reply to]

On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> Remote notifications should work, I'll test that today.
>
> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
> they finally work for clear-text connections.
> Testing encrypted ones now.
>

And now TLS as of
http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/83f81a1219f1

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 23, 2009, 12:59 AM

Post #18 of 26 (1032 views)
Permalink
Re: Remote Access not Working [In reply to]

On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> Remote notifications should work, I'll test that today.
>
> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
> they finally work for clear-text connections.

Downloading ... Compiling ... Testing ... Success!

(Although there's still the following message from crm_mon:
"Notification setup failed, won't be able to reconnect after failure",
it does seem to hang on and update itself correctly when the CIB
changes...)

Thanks a lot, Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 23, 2009, 1:04 AM

Post #19 of 26 (1032 views)
Permalink
Re: Remote Access not Working [In reply to]

On Mon, Nov 23, 2009 at 9:59 AM, Colin <colin.hch [at] gmail> wrote:
> On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>> Remote notifications should work, I'll test that today.
>>
>> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
>> they finally work for clear-text connections.
>
> Downloading ... Compiling ... Testing ... Success!
>
> (Although there's still the following message from crm_mon:
> "Notification setup failed, won't be able to reconnect after failure",
> it does seem to hang on and update itself correctly when the CIB
> changes...)

Eventually I'll implement that functionality too and the message will go away.

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 23, 2009, 1:11 AM

Post #20 of 26 (1029 views)
Permalink
Re: Remote Access not Working [In reply to]

>> (Although there's still the following message from crm_mon:
>> "Notification setup failed, won't be able to reconnect after failure",
>> it does seem to hang on and update itself correctly when the CIB
>> changes...)
>
> Eventually I'll implement that functionality too and the message will go away.

Then the next Cool Thing would be to support multiple CIB_servers and
use the first one that a connection can be made to.

Hm.

Or do other people use a clustered IP address for remote
administration, together with e.g. some iptables forwarding?

Regards, Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Nov 27, 2009, 1:54 AM

Post #21 of 26 (1002 views)
Permalink
Re: Remote Access not Working [In reply to]

On Mon, Nov 23, 2009 at 9:59 AM, Colin <colin.hch [at] gmail> wrote:
> On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>> Remote notifications should work, I'll test that today.
>>
>> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
>> they finally work for clear-text connections.
>
> Downloading ... Compiling ... Testing ... Success!
>
> (Although there's still the following message from crm_mon:
> "Notification setup failed, won't be able to reconnect after failure",
> it does seem to hang on and update itself correctly when the CIB
> changes...)

On my other test cluster, with 32bit systems, the notification does
not work, i.e. crm_mon gives me the correct status and then doesn't
ever update.

Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Dec 10, 2009, 5:00 AM

Post #22 of 26 (947 views)
Permalink
Re: Remote Access not Working [In reply to]

On Fri, Nov 27, 2009 at 10:54 AM, Colin <colin.hch [at] gmail> wrote:
> On Mon, Nov 23, 2009 at 9:59 AM, Colin <colin.hch [at] gmail> wrote:
>> On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>> Remote notifications should work, I'll test that today.
>>>
>>> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
>>> they finally work for clear-text connections.
>>
>> Downloading ... Compiling ... Testing ... Success!
>>
>> (Although there's still the following message from crm_mon:
>> "Notification setup failed, won't be able to reconnect after failure",
>> it does seem to hang on and update itself correctly when the CIB
>> changes...)
>
> On my other test cluster, with 32bit systems, the notification does
> not work, i.e. crm_mon gives me the correct status and then doesn't
> ever update.

Very odd. Client and host were both 32-bit?

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


colin.hch at gmail

Dec 10, 2009, 1:51 PM

Post #23 of 26 (944 views)
Permalink
Re: Remote Access not Working [In reply to]

On Thu, Dec 10, 2009 at 2:00 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Fri, Nov 27, 2009 at 10:54 AM, Colin <colin.hch [at] gmail> wrote:
>> On Mon, Nov 23, 2009 at 9:59 AM, Colin <colin.hch [at] gmail> wrote:
>>> On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>>> Remote notifications should work, I'll test that today.
>>>>
>>>> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
>>>> they finally work for clear-text connections.
>>>
>>> Downloading ... Compiling ... Testing ... Success!
>>>
>>> (Although there's still the following message from crm_mon:
>>> "Notification setup failed, won't be able to reconnect after failure",
>>> it does seem to hang on and update itself correctly when the CIB
>>> changes...)
>>
>> On my other test cluster, with 32bit systems, the notification does
>> not work, i.e. crm_mon gives me the correct status and then doesn't
>> ever update.
>
> Very odd.  Client and host were both 32-bit?

AFAIR yes, one testing cluster has hardware that isn't even 64bit capable.

(Would you expect problems between mixed hosts?)

Colin

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Dec 11, 2009, 2:26 AM

Post #24 of 26 (945 views)
Permalink
Re: Remote Access not Working [In reply to]

On Thu, Dec 10, 2009 at 10:51 PM, Colin <colin.hch [at] gmail> wrote:
> On Thu, Dec 10, 2009 at 2:00 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>> On Fri, Nov 27, 2009 at 10:54 AM, Colin <colin.hch [at] gmail> wrote:
>>> On Mon, Nov 23, 2009 at 9:59 AM, Colin <colin.hch [at] gmail> wrote:
>>>> On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>>> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>>>>> Remote notifications should work, I'll test that today.
>>>>>
>>>>> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
>>>>> they finally work for clear-text connections.
>>>>
>>>> Downloading ... Compiling ... Testing ... Success!
>>>>
>>>> (Although there's still the following message from crm_mon:
>>>> "Notification setup failed, won't be able to reconnect after failure",
>>>> it does seem to hang on and update itself correctly when the CIB
>>>> changes...)
>>>
>>> On my other test cluster, with 32bit systems, the notification does
>>> not work, i.e. crm_mon gives me the correct status and then doesn't
>>> ever update.
>>
>> Very odd.  Client and host were both 32-bit?
>
> AFAIR yes, one testing cluster has hardware that isn't even 64bit capable.
>
> (Would you expect problems between mixed hosts?)

No, but its good to at least rule that out as a possibility.
Debug logs?

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


ygao at novell

Dec 13, 2009, 11:33 PM

Post #25 of 26 (937 views)
Permalink
Re: Remote Access not Working [In reply to]

Hi,

Andrew Beekhof wrote:
> On Thu, Nov 12, 2009 at 4:46 PM, Colin <colin.hch [at] gmail> wrote:
>> On Thu, Nov 12, 2009 at 3:36 PM, Andrew Beekhof <andrew [at] beekhof> wrote:
>>
>> 1) In cib/remote.c, the function check_group_membership() only checks
>> whether the user is explicitly listed as member of the group in
>> /etc/group, but does not accept the user if only the users's primary
>> group in /etc/passwd is set to the correct group (and the explicit,
>> then redundant, membership in /etc/group is missing).
>
> Agreed. Seems to be a PAM thing that I can't do much about though.
I think it should check whether the user's primary group is "haclient"
first, then determine whether he's listed in the group members.
Attached the patch for resolving this.

Thanks,
Yan
--
ygao [at] novell
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As Oneâ„¢
Attachments: pacemaker-cib-primary-group.diff (0.90 KB)

First page Previous page 1 2 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.