Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Single Point of Failure

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


paul at paulororke

Jan 12, 2012, 12:20 PM

Post #1 of 12 (712 views)
Permalink
Single Point of Failure

Please excuse me if this is documented and I failed to find it. I have
been investigating ha-linux to provide Business Continuity in our mail
server. Currently we have a single mail server in our main office. We
would like to set up a second server in a geographically different location
and make a cluster of 2 nodes so that we can continue doing business if one
fails.

As far as I can tell ha-linux with Pacemaker is ideally suited to this. My
question is around how the cluster handles requests to the mail server(s).
Can anyone suggest some appropriate reading for where/how this is
handled? My concern is that should there be a failure at the location
that is receiving the requests how does it know to use the second node? Is
this typically done through zone files and an priority? Obviously I am
missing some important reading here because it would seem to me that there
could still be a single point of failure and that doesn't seem right.

What am I missing?

Please and thanks

--
Paul O'Rorke
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


mfidelman at meetinghouse

Jan 12, 2012, 12:59 PM

Post #2 of 12 (688 views)
Permalink
Re: Single Point of Failure [In reply to]

Paul O'Rorke wrote:
> Please excuse me if this is documented and I failed to find it. I have
> been investigating ha-linux to provide Business Continuity in our mail
> server. Currently we have a single mail server in our main office. We
> would like to set up a second server in a geographically different location
> and make a cluster of 2 nodes so that we can continue doing business if one
> fails.
>
> As far as I can tell ha-linux with Pacemaker is ideally suited to this. My
> question is around how the cluster handles requests to the mail server(s).
> Can anyone suggest some appropriate reading for where/how this is
> handled? My concern is that should there be a failure at the location
> that is receiving the requests how does it know to use the second node? Is
> this typically done through zone files and an priority? Obviously I am
> missing some important reading here because it would seem to me that there
> could still be a single point of failure and that doesn't seem right.

Not really. ha-linux is primarily for local clusters, not
geographically dispersed ones. You can use ha-linux to make a single
mail server more reliable, but you need to do something different for
geographic redundancy - which is really a question for the list
associated with whatever mail software you're using (e.g.,
postfix-users). But... having said that:

If you want to make a single mail server more reliable (or make each
distributed server more redundant), then you can do what we do, and run
a stack that looks roughly like this:
mail server & list manager (and antivirus and antispam, ...)
xen-VM
DRBD
pacemaker/etc.

If a node fails, the entire VM fails over to another node - all the data
(mail spool, inboxes) is replicated by DRBD, all the IP addresses
migrate with a failover, and everything keep humming along.

----
For geographic redundancy, you need to do deal with several things, that
have to do with DNS records and mail server configs:

1. For outgoing mail:

- it depends on whether mail clients do DNS lookups and send mail
directly to their destinations, or whether the clients route everything
through your central server -- if the former, you don't need to do
anything; if the later:

- set up a 2nd server and either configure your clients to know about it
(not always possible), or set up the DNS record for your outgoing server
to contain records for both outgoing servers -- for the most part, this
will take care of things, with three caveats:
-- depending on how the clients do DNS lookups, and how they do retries
if they can't reach a server, stuff might sit in queue for a while
-- if mail is in transit between client and server, and the server
fails, that message might get lost (depends on the client behavior)
-- mail that's queued on the server, when it fails, will probably get
sent when the server comes back up, but also might get lost, depending
on the type of failure (note: there are some ways to configure some
servers, so that a mail session does not complete until the mail goes to
the next hop - not sure off the top of my head if you can set things up
so that an incoming session is kept open until mail has made it through
the server to its next hop)

2a. For incoming mail - case 1: SMTP, mailboxes, POP, IMAP all on the
same server:

- first off, make sure that you have some redundancy on that server, so
that you don't lose mail

- you can set up a 2ndry server (give it an MX record with lower
priority than the primary server) - it will receive mail when the
primary goes down; and you can set up the mail config to forward stuff
automatically to the primary server when it comes back up -- people
won't be able to get to their mail until the primary comes back up, but
mail will get accepted and will eventually get delivered

- if you want to have geographic failover for mailboxes/POP/IMAP, things
get a lot trickier (e.g. replication of mail directories, configuring
DNS and/or clients to know about alternate locations)

2b. For incoming mail - case 2: SMTP on one machine, mailboxes/POP/IMAP
on other machines (e.g., incoming mail host forwards to local servers):

- this is easy, set up MX records for each incoming server (they can be
of equal priority if you want, for load leveling)

- incoming mail will go to one server or the other, and get forwarded by
whichever server handles the message, to the local destinations

- if one server fails, mail will continue to flow through the other one
-- the only stuff that will get delayed or lost is stuff that's been
queued on the server, but not yet forwarded (and, as noted above, it may
be possible to set up your servers so that failure recovery is pushed
back to the sending host - i.e., if the mail hasn't been forwarded, the
incoming transaction fails and the sending host tries again)


Hope this helps,

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra


_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


jc at info-systems

Jan 12, 2012, 1:14 PM

Post #3 of 12 (691 views)
Permalink
Re: Single Point of Failure [In reply to]

Miles Fidelman wrote:
> - you can set up a 2ndry server (give it an MX record with lower
> priority than the primary server) - it will receive mail when the
> primary goes down; and you can set up the mail config to forward stuff
> automatically to the primary server when it comes back up -- people
> won't be able to get to their mail until the primary comes back up, but
> mail will get accepted and will eventually get delivered
>
Just one additional note: in such a setup, you should not assume that
the secondary server only receives mail when the first one is down from
your side of view.
A client somewhere might have a different connectivity view and might
deliver mail to your secondary MX at any time. It is well-known that
spammer systems even try to deliver to the secondary in the hope that
protection there is lower. So, if you have a secondary, you must arrange
for mail delivered to that server to be passed on to the primary or a
separate backend server. And you need to protect it exactly as good as
your primary against virus, spam, and DOS attacks.

Best regards,
Jakob Curdes
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


arnold at arnoldarts

Jan 12, 2012, 1:53 PM

Post #4 of 12 (686 views)
Permalink
Re: Single Point of Failure [In reply to]

On Thursday 12 January 2012 22:14:41 Jakob Curdes wrote:
> Miles Fidelman wrote:
> > - you can set up a 2ndry server (give it an MX record with lower
> > priority than the primary server) - it will receive mail when the
> > primary goes down; and you can set up the mail config to forward stuff
> > automatically to the primary server when it comes back up -- people
> > won't be able to get to their mail until the primary comes back up, but
> > mail will get accepted and will eventually get delivered
>
> Just one additional note: in such a setup, you should not assume that
> the secondary server only receives mail when the first one is down from
> your side of view.
> A client somewhere might have a different connectivity view and might
> deliver mail to your secondary MX at any time. It is well-known that
> spammer systems even try to deliver to the secondary in the hope that
> protection there is lower. So, if you have a secondary, you must arrange
> for mail delivered to that server to be passed on to the primary or a
> separate backend server. And you need to protect it exactly as good as
> your primary against virus, spam, and DOS attacks.

So: If you go through the hazzles to set up a second receiving host with the
same quality and administration requirements as the first one, you will also
want to reflect that by giving it an equally high score in the mx field. That
way both servers will be used equally and you get load-balancing where you
originally meant to buy hot-standby:-)

Another comment from here: Email is such an old protocol that the immunity to
network errors was built in. If a sending host can't reach the receiver, it
will try again after some time. And then again and again until a timeout is
reached. And that timeout is not 2-4 seconds like with many tcp-based
protocols but 4 days giving the admins the chance on monday to fix the
mailserver that crashed on friday evening. Of course, if you rely on "fast"
mail for your business, the price of redundant smtp and redundant pop3/imap
servers might pay off.
For redundant pop3/imap the cyrus project (and probably the other too) seem to
have a special daemon to sync mails and mail-actions across servers. Add a
redundant master-slave replicating mysql (or postgres) for the account
database or even ldap and you should get something that even scales beyond 2
machine. Completely off-topic for this list as I haven't thrown in any
heartbeat, pacemaker, corosync or drbd at this point.

Have fun,

Arnold
Attachments: signature.asc (0.19 KB)


paul at paulororke

Jan 12, 2012, 3:22 PM

Post #5 of 12 (681 views)
Permalink
Re: Single Point of Failure [In reply to]

hmmm - it looks like I may have to re-evaluate this.

Geographic redundency is the point of this exercise, our office is in a
location that has is less than ideal history for power reliability. We are
a small software company and rely on email for online sales and product
delivery so our solution - what ever it be - must allow for one location to
completely lose power and still deliver client emails.

Mail is a very complex subject and I must confess that the excellent
suggestions made here may be a little more than I was prepared to dive into.

Given that this is a HA-Linux list, and that if I understand this correctly
it is not really designed for multi-site clusters, can anyone suggest a
more suitable technology? (the server is running CentOS/Exim)

Or perhaps I should be doing the grunt work and trying out some of the
above suggestions...

I do appreciate the excellent feedback to date!

thanks

On Thu, Jan 12, 2012 at 1:53 PM, Arnold Krille <arnold [at] arnoldarts> wrote:

> On Thursday 12 January 2012 22:14:41 Jakob Curdes wrote:
> > Miles Fidelman wrote:
> > > - you can set up a 2ndry server (give it an MX record with lower
> > > priority than the primary server) - it will receive mail when the
> > > primary goes down; and you can set up the mail config to forward stuff
> > > automatically to the primary server when it comes back up -- people
> > > won't be able to get to their mail until the primary comes back up, but
> > > mail will get accepted and will eventually get delivered
> >
> > Just one additional note: in such a setup, you should not assume that
> > the secondary server only receives mail when the first one is down from
> > your side of view.
> > A client somewhere might have a different connectivity view and might
> > deliver mail to your secondary MX at any time. It is well-known that
> > spammer systems even try to deliver to the secondary in the hope that
> > protection there is lower. So, if you have a secondary, you must arrange
> > for mail delivered to that server to be passed on to the primary or a
> > separate backend server. And you need to protect it exactly as good as
> > your primary against virus, spam, and DOS attacks.
>
> So: If you go through the hazzles to set up a second receiving host with
> the
> same quality and administration requirements as the first one, you will
> also
> want to reflect that by giving it an equally high score in the mx field.
> That
> way both servers will be used equally and you get load-balancing where you
> originally meant to buy hot-standby:-)
>
> Another comment from here: Email is such an old protocol that the immunity
> to
> network errors was built in. If a sending host can't reach the receiver, it
> will try again after some time. And then again and again until a timeout is
> reached. And that timeout is not 2-4 seconds like with many tcp-based
> protocols but 4 days giving the admins the chance on monday to fix the
> mailserver that crashed on friday evening. Of course, if you rely on "fast"
> mail for your business, the price of redundant smtp and redundant pop3/imap
> servers might pay off.
> For redundant pop3/imap the cyrus project (and probably the other too)
> seem to
> have a special daemon to sync mails and mail-actions across servers. Add a
> redundant master-slave replicating mysql (or postgres) for the account
> database or even ldap and you should get something that even scales beyond
> 2
> machine. Completely off-topic for this list as I haven't thrown in any
> heartbeat, pacemaker, corosync or drbd at this point.
>
> Have fun,
>
> Arnold
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
Paul O'Rorke
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dmaziuk at bmrb

Jan 12, 2012, 3:48 PM

Post #6 of 12 (680 views)
Permalink
Re: Single Point of Failure [In reply to]

On 01/12/2012 05:22 PM, Paul O'Rorke wrote:
> hmmm - it looks like I may have to re-evaluate this.
>
> Geographic redundency is the point of this exercise, our office is in a
> location that has is less than ideal history for power reliability. We are
> a small software company and rely on email for online sales and product
> delivery so our solution - what ever it be - must allow for one location to
> completely lose power and still deliver client emails.

One solution is called "backup generator and a big fuel tank". Another
one is called gmail.

HTH
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachments: signature.asc (0.25 KB)


mfidelman at meetinghouse

Jan 12, 2012, 3:57 PM

Post #7 of 12 (681 views)
Permalink
Re: Single Point of Failure [In reply to]

Paul O'Rorke wrote:
>
> Geographic redundency is the point of this exercise, our office is in a
> location that has is less than ideal history for power reliability. We are
> a small software company and rely on email for online sales and product
> delivery so our solution - what ever it be - must allow for one location to
> completely lose power and still deliver client emails.
>
>
> Given that this is a HA-Linux list, and that if I understand this correctly
> it is not really designed for multi-site clusters, can anyone suggest a
> more suitable technology? (the server is running CentOS/Exim)
>
>
I would ask the question on a mail-oriented list (e.g., postfix,
sendmail, or whatever server you're using).

There might be something you can do with DRBD's relatively new functions
for maintaining remote synchronization of data.

Or you might consider migrating your email to a leased server in a data
center or a hosted solution where you're not as vulnerable to power
outages (we're a tiny r&d company - we pay a few hundred a month for
rackspace in a data center w/ multiple backbone connections, rock solid
power, etc. - our email runs in a xen VM on top of a 2-node ha
cluster). Short of a tornado (not too much of a problem in Boston),
it's all pretty solid.

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra


_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


andreas at hastexo

Jan 12, 2012, 4:22 PM

Post #8 of 12 (680 views)
Permalink
Re: Single Point of Failure [In reply to]

On 01/13/2012 12:22 AM, Paul O'Rorke wrote:
> hmmm - it looks like I may have to re-evaluate this.
>
> Geographic redundency is the point of this exercise, our office is in a
> location that has is less than ideal history for power reliability. We are
> a small software company and rely on email for online sales and product
> delivery so our solution - what ever it be - must allow for one location to
> completely lose power and still deliver client emails.

as Dimitri says ... you really want to have a look at Google apps for
business ...

Regards,
Andreas

>
> Mail is a very complex subject and I must confess that the excellent
> suggestions made here may be a little more than I was prepared to dive into.
>
> Given that this is a HA-Linux list, and that if I understand this correctly
> it is not really designed for multi-site clusters, can anyone suggest a
> more suitable technology? (the server is running CentOS/Exim)
>
> Or perhaps I should be doing the grunt work and trying out some of the
> above suggestions...
>
> I do appreciate the excellent feedback to date!
>
> thanks
>
> On Thu, Jan 12, 2012 at 1:53 PM, Arnold Krille <arnold [at] arnoldarts> wrote:
>
>> On Thursday 12 January 2012 22:14:41 Jakob Curdes wrote:
>>> Miles Fidelman wrote:
>>>> - you can set up a 2ndry server (give it an MX record with lower
>>>> priority than the primary server) - it will receive mail when the
>>>> primary goes down; and you can set up the mail config to forward stuff
>>>> automatically to the primary server when it comes back up -- people
>>>> won't be able to get to their mail until the primary comes back up, but
>>>> mail will get accepted and will eventually get delivered
>>>
>>> Just one additional note: in such a setup, you should not assume that
>>> the secondary server only receives mail when the first one is down from
>>> your side of view.
>>> A client somewhere might have a different connectivity view and might
>>> deliver mail to your secondary MX at any time. It is well-known that
>>> spammer systems even try to deliver to the secondary in the hope that
>>> protection there is lower. So, if you have a secondary, you must arrange
>>> for mail delivered to that server to be passed on to the primary or a
>>> separate backend server. And you need to protect it exactly as good as
>>> your primary against virus, spam, and DOS attacks.
>>
>> So: If you go through the hazzles to set up a second receiving host with
>> the
>> same quality and administration requirements as the first one, you will
>> also
>> want to reflect that by giving it an equally high score in the mx field.
>> That
>> way both servers will be used equally and you get load-balancing where you
>> originally meant to buy hot-standby:-)
>>
>> Another comment from here: Email is such an old protocol that the immunity
>> to
>> network errors was built in. If a sending host can't reach the receiver, it
>> will try again after some time. And then again and again until a timeout is
>> reached. And that timeout is not 2-4 seconds like with many tcp-based
>> protocols but 4 days giving the admins the chance on monday to fix the
>> mailserver that crashed on friday evening. Of course, if you rely on "fast"
>> mail for your business, the price of redundant smtp and redundant pop3/imap
>> servers might pay off.
>> For redundant pop3/imap the cyrus project (and probably the other too)
>> seem to
>> have a special daemon to sync mails and mail-actions across servers. Add a
>> redundant master-slave replicating mysql (or postgres) for the account
>> database or even ldap and you should get something that even scales beyond
>> 2
>> machine. Completely off-topic for this list as I haven't thrown in any
>> heartbeat, pacemaker, corosync or drbd at this point.
>>
>> Have fun,
>>
>> Arnold
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
>

--
Need help with Pacemaker?
http://www.hastexo.com/now
Attachments: signature.asc (0.28 KB)


jc at info-systems

Jan 12, 2012, 11:57 PM

Post #9 of 12 (674 views)
Permalink
Re: Single Point of Failure [In reply to]

Am 13.01.2012 00:22, schrieb Paul O'Rorke:
> hmmm - it looks like I may have to re-evaluate this.
>
> Given that this is a HA-Linux list, and that if I understand this correctly
> it is not really designed for multi-site clusters, can anyone suggest a
> more suitable technology? (the server is running CentOS/Exim)
>
For the filesystem part, you may want to look into glusterfs which
brings a sync mode suitable for slow links.
The rest might not be that hard - you can have separate SMTP servers
that just separately send mails on a regular basis. The mail receiving
part is more tricky:
how do you achieve a takeover of the front IP address or DNS name of the
service? Probably you do not have an IP that you can let freely wander
from one network to the other (this is how linux-HA normally does a
takeover in a local network, needing a redundant link between the
servers for this).
DNS on the other hand can be changed but how long will it take until the
clients pick up the change?
You might be able to control the local DNS resolution and put in a tweak
at this point, depending on reachability of the two mail servers, having
a DNS entry with a short lifetime pointing at a reachable server. But
perhaps a remote datacenter solution is the cheaper way... note that to
reach high availability you need to test your setup thoroughly,
otherwise you will end up with bad things like "split-brain" or unusable
services besides your hard work.

HTH,
Jakob Curdes

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


paul at paulororke

Jan 13, 2012, 11:57 AM

Post #10 of 12 (669 views)
Permalink
Re: Single Point of Failure [In reply to]

he he - I already thought that might be simpler...

On Thu, Jan 12, 2012 at 3:48 PM, Dimitri Maziuk <dmaziuk [at] bmrb>wrote:

> On 01/12/2012 05:22 PM, Paul O'Rorke wrote:
> > hmmm - it looks like I may have to re-evaluate this.
> >
> > Geographic redundency is the point of this exercise, our office is in a
> > location that has is less than ideal history for power reliability. We
> are
> > a small software company and rely on email for online sales and product
> > delivery so our solution - what ever it be - must allow for one location
> to
> > completely lose power and still deliver client emails.
>
> One solution is called "backup generator and a big fuel tank". Another
> one is called gmail.
>
> HTH
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
Paul O'Rorke
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dmaziuk at bmrb

Jan 13, 2012, 2:14 PM

Post #11 of 12 (664 views)
Permalink
Re: Single Point of Failure [In reply to]

On 01/13/2012 01:57 PM, Paul O'Rorke wrote:
> he he - I already thought that might be simpler...

Part of it is what do you mean by "deliver client e-mails". SMTP is one
thing, POP/IMAP is another, direct read from mbox file is different
still (though it sits behind pop/imap as well). Another side is although
SMTP RFCs start with "reliable and efficiently", e-mail delivery has
always been "best effort" -- so I would not build a business plan on the
assumption that e-mail is or will ever be reliable.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachments: signature.asc (0.25 KB)


paul at paulororke

Jan 13, 2012, 2:45 PM

Post #12 of 12 (665 views)
Permalink
Re: Single Point of Failure [In reply to]

Understood, perhaps 'as reliable as possible' would be a better
expectation.

I think that for now we will probably just do a secondary server to cache
mail in the event that the primary goes down, this was in place before we
moved the mail server to this office but we had DNS issues with our serice
provider IP being spam blocked and we were at their mercy to gewt it
cleared. Now we do it through our own office and an ISP that we have an
agreement with to resolve such things promptly. I think this is a stop-gap
solution until I do alittle more research.

The generator is looking good right about now...

thanks for all the input so far.

On Fri, Jan 13, 2012 at 2:14 PM, Dimitri Maziuk <dmaziuk [at] bmrb>wrote:

> On 01/13/2012 01:57 PM, Paul O'Rorke wrote:
> > he he - I already thought that might be simpler...
>
> Part of it is what do you mean by "deliver client e-mails". SMTP is one
> thing, POP/IMAP is another, direct read from mbox file is different
> still (though it sits behind pop/imap as well). Another side is although
> SMTP RFCs start with "reliable and efficiently", e-mail delivery has
> always been "best effort" -- so I would not build a business plan on the
> assumption that e-mail is or will ever be reliable.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
Paul O'Rorke
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.