Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: HA-WG

Report: High Availability and Distributed Storage miniconf at LCA 2012



Linux-HA ha-wg RSS feed   Index | Next | Previous | View Threaded

tserong at suse

Jan 29, 2012, 7:24 PM

Post #1 of 1 (1738 views)
Report: High Availability and Distributed Storage miniconf at LCA 2012

Hi All,

Apologies for the mass email, but it seemed most appropriate to post a
followup to all the lists I originally sent the LCA 2012 HA miniconf CFP
to. I would humbly suggest that any miniconf-related replies be sent
either direct to myself, or to ha-wg [at] lists
Comments on the HA BOF mentioned below should probably go to either
pacemaker [at] oss or linux-ha [at] lists


The High Availability and Distributed Storage miniconf[1] at LCA 2012
went very well. Probably 60+ in attendance (so about 1/8th of the
conference attendees, given 7 other concurrent miniconfs), with maybe a
few less later in the day. First half was more linux-ha type stuff,
second half more database-y, with a bit of CTDB and Samba foo in the
middle. Sadly we didn't actually get much in the way of distributed
storage talks -- oddly enough, there was a conspicuous absence of
Gluster and Ceph talks in the main conf track as well. We hope to have
better luck next year (I plan to propose this miniconf again).

The talks were almost all 25 minute slots, as follows:

Storage Replication in High-Performance High-Availability Environments
by Florian Haas; discussion of using drbd with flashcache to
provide failover while still keeping the cache hot.

Building a Non-Shared Storage HA Cluster with Pacemaker & PostgreSQL 9.1
by Keisuke Mori; enhanced pgsql RA to work with PostgreSQL
streaming replication.

Extend Pacemaker to Support Geographically Distributed Clustering
by Tim Serong on behalf of Jiaju Zhang; an introduction to
Booth (what it is, how to configure it).

HiPBX - HiAv VoIP with Open Source Software and 5000 Lines of Bash
by Rob Thomas; showing how he built an HA VoIP system with live
demo (which almost worked) and a rickroll. Very entertaining.

Squashing SPOFs with Common Sense, Velcro, and a Hammer
also by Rob Thomas; somewhat more generic (label everything,
do proper cable management etc.), but still also entertaining.

CTDB Overview
by Ronnie Sahlberg; CTDB's approach to clustering - run
everything everywhere instead of classic active/passive, and
know what state is safe to drop/lose if a node dies.

High Availability Login Services with Samba4 Active Directory
by Kai Blin; Brief overview of using Samba4 for AD auth - Kai
has a whole bunch of little embedded systems in his house
running this, which is kind of cute.

HA Lessons Learned from Darth Vader
by Ronnie Sahlberg; essentially saying the Empire got it wrong
with the Death Star (big SPOF), but did better on Hoth with its
redundant army of AT-ATs.

MySQL for the Developer in a Post-Oracle World
by Adam Donnison; various forking etc. of MySQL, both project
forking and different companies providing dev, consulting etc.

MySQL and Postgres Cloud Offerings
by Stewart Smith & Selena Deckelmann; basically there aren't
many sensible DB cloud offerings and/or they don't work and/or
they don't scale (I might be exaggerating, but probably not

Scaling Data: Postgres, The Stack and the Future of Replication
by Selena Deckelmann; some general postgres discussion, live
demo of setting up binary replication, new stuff in 9.2.

Swift 101
by Monty Taylor; introduction to Swift in OpenStack - it's not
a RAID, it's not distributed storage, it's not (etc.), it's an
object store! Good for backups (large, write once, read never)
and web content (small, write once, read many).

MySQL Web Infra Scaling and Keeping it Online, Cheaply
by Arjen Lentz; the approaches his company takes when "fixing"
client systems so that they're resilient to failure (mysql
tuning, split web/db servers, backups, monitoring, master/slave
systems etc.)

We also had two lightning talks which apparently weren't recorded. One
was Avi Miller from Oracle announcing that they're supporting DRBD 8.3
in UEK2 (which is currently in beta). The other was from Florian Haas
ranting about crappy HA stack usability (e.g.: inscrutable command line
options and incomprehensible error messages). It was fun.

On Thursday, I co-presented the tutorial "High Availability Sprint: from
the brink of disaster to the Zen of Pacemaker" with Florian Haas. We
ran through basic concepts of drbd, corosync, pacemaker etc. then did a
walkthrough of setting up drbd+corosync+pacemaker+mysql on two VMs (VM
images were provided in advance, so participants could follow along).
This was well received, with people coming out of it actually
understanding what the hell we were talking about. Probably 30-40
attendees. The video is at http://www.youtube.com/watch?v=3GoT36cK6os

After that we had an HA birds of a feather session for a couple of
hours, maybe 15-20 people. Party this was answering questions and
random discussion, but also us (myself, Florian, Andrew Beekhof) seeking
feedback about pain points with the HA stack. Comments include:

- Documentation is still too hard to find.

- crm shell lacks some facilities for automation with e.g.: puppet.
Someone wanted to be able to query the current value of a monitor op
on a resource. Querying the whole primitive and grepping is too

- The whole stack is too complicated(?) and/or some concern about
maintenance of documentation going forwards.

- Corosync 2.0 drops support for plugins, and requires libqb.

- Someone wants resource-agents manpage generation foo to go to a
devel package, so people shipping their own RAs can utilize that.

- A "frequently encountered errors and solutions" help page somewhere
would be of major benefit. We could probably crowdsource this to some
extent. We're still evaluating where this could be hosted best, but
currently the Clusterlabs wiki seems like the most suitable candidate.

- The need to deprecate resource agents came up again ("should I use
ocf:heartbeat:drbd or ocf:linbit:drbd?"), highlighting the need for
the overdue OCF spec update.

- Some part of Red Hat's decision to use their own (new, in development)
shell for Pacemaker in RHEL 7(?) is because they want that shell to
do whole cluster setup, including corosync etc. which is a different
scope than the crm shell.

Thanks for reading, hope it was interesting.



Tim Serong
Senior Clustering Engineer
tserong [at] suse
ha-wg mailing list
ha-wg [at] lists

Linux-HA ha-wg RSS feed   Index | Next | Previous | View Threaded

Interested in having your list archived? Contact Gossamer Threads
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.