andrew at beekhof
Jul 3, 2012, 11:12 PM
Post #15 of 20
On Wed, Jul 4, 2012 at 10:06 AM, Brian J. Murrell <brian [at] interlinx> wrote:
Re: Call cib_query failed (-41): Remote node did not respond
[In reply to]
> On 12-07-03 04:26 PM, David Vossel wrote:
>> This is not a definite. Perhaps you are experiencing this given the pacemaker version you are running
> Yes, that is absolutely possible and it certainly has been under
> consideration throughout this process. I did also recognize however,
> that I am running the latest stable (1.1.6) release and while I might be
> able to experiment with with a development branch in the lab, I could
> not use it in production. So while it would be an interesting
> experiment, my primary goal had to be getting 1.1.6 to run stably.
>> and the torture test you are running with all those parallel commands,
> It is worth keeping in mind that all of those parallel commands are just
> as parallel with the 4 node cluster as they are with the 8 (4 nodes
> actively modifying the CIB + 4 completely idle nodes) and 16 node
> clusters -- both of which failed.
> Just because I reduced the number of nodes doesn't mean that I reduced
> the parallelism any.
Yes. You did. You reduced the number of "check what state the
resource is on every node" probes.
> The commands being run on each node are not
> serialized and are all launched in parallel on the 4 node cluster as
> much as they were with the 16 node cluster.
> So strictly speaking, it doesn't seem that parallelism in the CIB
> modifications are as much of a factor as simply the number of nodes in
> the cluster, even when some (i.e. in the 8 node test I did) of the nodes
> are entirely passive and not modifying the CIB at all.
Now I'm getting annoyed.
I keep explaining this is not true yet you keep repeating the above assertion.
Please go back an re-read my previous answers (both here and
off-list). Properly. I will be happy to clarify anything that is
>> but I wouldn't go as far as to say pacemaker cannot scale to more than a handful of nodes.
> I'd totally welcome being shown the error of my ways.
>> I'm sure you know this, I just wanted to be explicit about this so there is no confusion caused by people who may use your example as a concrete metric.
> But of course. In my experiments, it was clear that the cib process
> could peak a single core on my 12 core Xeons with just 4 nodes in the
> cluster at times.
> Therefore it is also clear that some time down the road, assuming CPU is
> the limiting factor here, it's quite easy to see how a faster CPU core,
> or multithreading the cib would allow for better scaling, but my point
> was simply at the current time, and again, assuming (since I don't know
> for sure what the limiting factor really is) CPU is the limiting factor
> here, somewhere between 4-8 nodes is the limit with more or less default
>> From the deployments I've seen on the mailing list and bug reports, the most common clusters appear to be around the 2-6 node mark.
> Which seems consistent.
>> The messaging involved with keeping the all the local resource operations in the CIB synced across that many nodes is pretty insane.
> Indeed, and I most certainly had considered that. What really threw a
> curve in that train of thought for me though was that even idle,
> non-CIB-modifying nodes (i.e. turning a working 4 node cluster into a
> non-working 8 node cluster by adding 4 nodes that do nothing with the
> CIB) can tip a working configuration over into non-working.
> I could most certainly see how the contention of 8 nodes all trying to
> jam stuff into the CIB might be taxing with all of the locking that
> needs to go on, etc, but for those 4 added idle nodes to add enough
> complexity to make an working 4 node cluster not work is puzzling.
> Puzzling enough (granted, to somebody who knows zilch about the
> messaging that goes on with CIB operations) to make is smell more like a
> bug than simple contention.
>> If you are set on using pacemaker,
> Well, I am not necessarily married to it. It did just seem like the
> tool with the critical mass behind it. As sketchy as it might seem to
> ask, (and I only am since you seem to be hinting that there might be a
> better tool for the job) is there a tool more suited to the job?
>> the best approach for scaling for your situation would probably be to try and figure out how to break nodes into smaller clusters that are easier to manage.
> Indeed, that is what I ended up doing. Now my 16 node cluster is 4 4
> node clusters. The problem with that though, is that when a node in a
> cluster fails, it has only 3 other nodes to spread it's resources around
> onto, and if 2 should fail, 2 nodes are trying to service twice their
> normal load. The benefit of larger clusters is clear. in giving
> pacemaker more nodes to evenly distribute resources to, impacting the
> load of other the other nodes minimally when one or more nodes of the
> cluster do fail.
>> I have not heard of a single deployment as large as you are thinking of.
> Heh. Not atypical of me to push the envelope I'm afraid. :-/
> Cheers, and many thanks for your input. It is valuable to this discussion.
> Pacemaker mailing list: Pacemaker [at] oss
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Pacemaker mailing list: Pacemaker [at] oss
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf