Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Conceptual Setup - Apache and Postgresql

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


ghenry at suretecsystems

Jul 28, 2005, 6:11 AM

Post #1 of 6 (1161 views)
Permalink
Conceptual Setup - Apache and Postgresql

Dear list,

I have posted this to the linux-vs list too as it concerns that software
as well.

Our proposed setup for an upcoming project for HA Apache and Postgresql,
using PHP is as follows:

Internet
|
Firewall
|
Cisco Gigabit Switch
|
Director 1 + Director 2 (Director 2 is a backup via Heartbeat)
|
Realserver 1 + Realserver 2 (Apache and PHP with sessions going to the
below DBs, so only a bit of logging on these machines)
|
Postgresql DB 1 + Postgresql DB 2 (DB2 is a backup via Heartbeat adn DRDB)
|
Some kind of NAS for backing up all of above configuration with images for
SystemImager for more realservers.
|
LTO2 Tape Library


Would you be so kind as to share your thoughts and experience with me and
offer any advice?

Are we missing anything?

Thanks,

Gavin.

--
Kind Regards,

Gavin Henry.
Managing Director.

T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 742001
E ghenry [at] suretecsystems

Open Source. Open Solutions(tm).

http://www.suretecsystems.com/


_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha


dlang at digitalinsight

Jul 29, 2005, 7:06 PM

Post #2 of 6 (1105 views)
Permalink
Re: Conceptual Setup - Apache and Postgresql [In reply to]

backing up the underlying filesystem via DRDB for databases is a somewhat
questionable approach. If it was me I would be looking into useing slony
to replicate the database and then write a couple scripts (to be run by
heartbeat) to change from being the slony slave to being the master (slony
should already include scripts to do this, but they may need a wrapper to
allow heartbeat to trigger them properly)

the task of makeing the failed box be a slave to the other box after it
comes back up has enough pitfalls that I would seriously consider leaving
this as a manual step.

as for the other layers.

the directors and failovers are trivial and standard.

since the webservers don't store any state info on them there should not
be any issues with your plan.

overall it looks like a good approach, I would just handle the database a
little differently.

David Lang


On Thu, 28 Jul 2005, Gavin Henry wrote:

> Date: Thu, 28 Jul 2005 14:11:06 +0100 (BST)
> From: Gavin Henry <ghenry [at] suretecsystems>
> Reply-To: General Linux-HA mailing list <linux-ha [at] lists>
> To: linux-ha [at] lists
> Subject: [Linux-HA] Conceptual Setup - Apache and Postgresql
>
> Dear list,
>
> I have posted this to the linux-vs list too as it concerns that software
> as well.
>
> Our proposed setup for an upcoming project for HA Apache and Postgresql,
> using PHP is as follows:
>
> Internet
> |
> Firewall
> |
> Cisco Gigabit Switch
> |
> Director 1 + Director 2 (Director 2 is a backup via Heartbeat)
> |
> Realserver 1 + Realserver 2 (Apache and PHP with sessions going to the
> below DBs, so only a bit of logging on these machines)
> |
> Postgresql DB 1 + Postgresql DB 2 (DB2 is a backup via Heartbeat adn DRDB)
> |
> Some kind of NAS for backing up all of above configuration with images for
> SystemImager for more realservers.
> |
> LTO2 Tape Library
>
>
> Would you be so kind as to share your thoughts and experience with me and
> offer any advice?
>
> Are we missing anything?
>
> Thanks,
>
> Gavin.
>
> --
> Kind Regards,
>
> Gavin Henry.
> Managing Director.
>
> T +44 (0) 1224 279484
> M +44 (0) 7930 323266
> F +44 (0) 1224 742001
> E ghenry [at] suretecsystems
>
> Open Source. Open Solutions(tm).
>
> http://www.suretecsystems.com/
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha


alanr at unix

Jul 29, 2005, 7:47 PM

Post #3 of 6 (1101 views)
Permalink
Re: Conceptual Setup - Apache and Postgresql [In reply to]

David Lang wrote:
> backing up the underlying filesystem via DRDB for databases is a
> somewhat questionable approach. If it was me I would be looking into
> useing slony to replicate the database and then write a couple scripts
> (to be run by heartbeat) to change from being the slony slave to being
> the master (slony should already include scripts to do this, but they
> may need a wrapper to allow heartbeat to trigger them properly)
>
> the task of makeing the failed box be a slave to the other box after it
> comes back up has enough pitfalls that I would seriously consider
> leaving this as a manual step.
>
> as for the other layers.
>
> the directors and failovers are trivial and standard.
>
> since the webservers don't store any state info on them there should not
> be any issues with your plan.
>
> overall it looks like a good approach, I would just handle the database
> a little differently.


I'm afraid I have to disagree here - because of information I have which
might not be correct. If my next sentence is incorrect, then by all
means correct me.

It is my understanding that slony does not provide synchronous
replication (that is, it replicates asynchronously).

General rule:
Don't use asynchronous replication for a database over short distances
if synchronous replication is available.

Otherwise you may lose transactions - at the very least risk working
from (and updating) an out of data database. DRBD is quite reliable and
has been around a good bit longer than slony.

Asynchronous replication is great for longer distances, and smart
replication like databases perform is also great for longer distances.

Since DRBD can by synchronous and (to my knowledge) slony can't, then
I'd go for synchronous when I can.

--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha


david.lang at digitalinsight

Jul 29, 2005, 8:29 PM

Post #4 of 6 (1109 views)
Permalink
Re: Conceptual Setup - Apache and Postgresql [In reply to]

On Fri, 29 Jul 2005, Alan Robertson wrote:

> I'm afraid I have to disagree here - because of information I have which
> might not be correct. If my next sentence is incorrect, then by all means
> correct me.
>
> It is my understanding that slony does not provide synchronous replication
> (that is, it replicates asynchronously).

slony does not cause the primary database to stall while the secondary is
updateing, no.

I wasn't aware that DRDB did this.

there are several replication options available for postgres, some of
which do offer syncronous updateing of all databases (however they
useually have to talk to a gatekeeper box, which you have to make HA
itself, and then that gatekeeper talks to the multiple database servers)

you make good points, my general rule is don't use a filesystem level
replication if there is an application level replication available

in general the application level replication will be significantly more
efficiant then a filesystem one, and (depending on how you have things
setup) may also end up being more reliable.

note that postgres does a _lot_ of syncs, forceing the machine to pause
while the other system gets the data over the network and does a sync as
well (and acknowledges that it has completed the sync over the network) is
going to result in a significant performance hit. While you can turn off
the use of fsync doing so seriously weakens your data reliability, you
would be FAR better off with fsync on and async replication.

David Lang

> General rule:
> Don't use asynchronous replication for a database over short distances if
> synchronous replication is available.
>
> Otherwise you may lose transactions - at the very least risk working from
> (and updating) an out of data database. DRBD is quite reliable and has been
> around a good bit longer than slony.
>
> Asynchronous replication is great for longer distances, and smart replication
> like databases perform is also great for longer distances.
>
> Since DRBD can by synchronous and (to my knowledge) slony can't, then I'd go
> for synchronous when I can.
>
> --
> Alan Robertson <alanr [at] unix>
>
> "Openness is the foundation and preservative of friendship... Let me claim
> from you at all times your undisguised opinions." - William Wilberforce
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha


alanr at unix

Jul 29, 2005, 10:36 PM

Post #5 of 6 (1093 views)
Permalink
Re: Conceptual Setup - Apache and Postgresql [In reply to]

David Lang wrote:
> On Fri, 29 Jul 2005, Alan Robertson wrote:
>
>> I'm afraid I have to disagree here - because of information I have
>> which might not be correct. If my next sentence is incorrect, then by
>> all means correct me.
>>
>> It is my understanding that slony does not provide synchronous
>> replication (that is, it replicates asynchronously).
>
> slony does not cause the primary database to stall while the secondary
> is updateing, no.
>
> I wasn't aware that DRDB did this.
>
> there are several replication options available for postgres, some of
> which do offer syncronous updateing of all databases (however they
> useually have to talk to a gatekeeper box, which you have to make HA
> itself, and then that gatekeeper talks to the multiple database servers)
>
> you make good points, my general rule is don't use a filesystem level
> replication if there is an application level replication available
>
> in general the application level replication will be significantly more
> efficiant then a filesystem one, and (depending on how you have things
> setup) may also end up being more reliable.
>
> note that postgres does a _lot_ of syncs, forceing the machine to pause
> while the other system gets the data over the network and does a sync as
> well (and acknowledges that it has completed the sync over the network)
> is going to result in a significant performance hit. While you can turn
> off the use of fsync doing so seriously weakens your data reliability,
> you would be FAR better off with fsync on and async replication.
>
> David Lang
>
>> General rule:
>> Don't use asynchronous replication for a database over short distances
>> if synchronous replication is available.
>>
>> Otherwise you may lose transactions - at the very least risk working
>> from (and updating) an out of data database. DRBD is quite reliable
>> and has been around a good bit longer than slony.
>>
>> Asynchronous replication is great for longer distances, and smart
>> replication like databases perform is also great for longer distances.
>>
>> Since DRBD can by synchronous and (to my knowledge) slony can't, then
>> I'd go for synchronous when I can.

DRBD with protocol C (I think that's the one) is synchronous.

The worst case I've ever heard of for DRBD (when writes outnumbered
reads 10 to 1) was a 10% reduction in database throughput when doing
local replication. Actual measured write speed was reduced by 30% - but
even in this horrible case, the total throughput wasn't affected by that
much. [.Older versions of DRBD were faster, but then the current one
never does a full sync once set up].

On more normal workloads, there are at least 2 reads to every write. If
you take that more typical case you can see why DRBD overhead may not
even be measurable...

Let's say you do 2 reads to every write, and you're performing 10 writes
per second. That means your local disk is seeing 30 I/Os per second
with the expected effect on head movement, etc. Now, the other side is
only seeing 10 I/Os per second over a network that's MUCH faster than
the disk (if you use gigabit which is cheap enough to do).

So, it is _more_ than possible that the remote writes are completing on
average at least as fast as the local writes. If you have a high ratio
of reads to writes, they may be even completing faster than local writes
- because of shorter I/O queues, and less head movement.

Your comments about it consuming less bandwidth (i.e., being more
efficient) are correct. But, if you have a dedicated gigabit link
between the machines, it may not be important.

If filesystem level replication is less reliable than app-level
replication, then there's probably a bug somewhere - and it's probably
in the app (which probably can't recover properly from a crash).


--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha


ghenry at suretecsystems

Jul 30, 2005, 3:38 PM

Post #6 of 6 (1099 views)
Permalink
Re: Conceptual Setup - Apache and Postgresql [In reply to]

On Saturday 30 Jul 2005 06:36, Alan Robertson wrote:
> David Lang wrote:
> > On Fri, 29 Jul 2005, Alan Robertson wrote:
> >> I'm afraid I have to disagree here - because of information I have
> >> which might not be correct. If my next sentence is incorrect, then by
> >> all means correct me.
> >>
> >> It is my understanding that slony does not provide synchronous
> >> replication (that is, it replicates asynchronously).
> >
> > slony does not cause the primary database to stall while the secondary
> > is updateing, no.
> >
> > I wasn't aware that DRDB did this.
> >
> > there are several replication options available for postgres, some of
> > which do offer syncronous updateing of all databases (however they
> > useually have to talk to a gatekeeper box, which you have to make HA
> > itself, and then that gatekeeper talks to the multiple database servers)
> >
> > you make good points, my general rule is don't use a filesystem level
> > replication if there is an application level replication available
> >
> > in general the application level replication will be significantly more
> > efficiant then a filesystem one, and (depending on how you have things
> > setup) may also end up being more reliable.
> >
> > note that postgres does a _lot_ of syncs, forceing the machine to pause
> > while the other system gets the data over the network and does a sync as
> > well (and acknowledges that it has completed the sync over the network)
> > is going to result in a significant performance hit. While you can turn
> > off the use of fsync doing so seriously weakens your data reliability,
> > you would be FAR better off with fsync on and async replication.
> >
> > David Lang
> >
> >> General rule:
> >> Don't use asynchronous replication for a database over short distances
> >> if synchronous replication is available.
> >>
> >> Otherwise you may lose transactions - at the very least risk working
> >> from (and updating) an out of data database. DRBD is quite reliable
> >> and has been around a good bit longer than slony.
> >>
> >> Asynchronous replication is great for longer distances, and smart
> >> replication like databases perform is also great for longer distances.
> >>
> >> Since DRBD can by synchronous and (to my knowledge) slony can't, then
> >> I'd go for synchronous when I can.
>
> DRBD with protocol C (I think that's the one) is synchronous.
>
> The worst case I've ever heard of for DRBD (when writes outnumbered
> reads 10 to 1) was a 10% reduction in database throughput when doing
> local replication. Actual measured write speed was reduced by 30% - but
> even in this horrible case, the total throughput wasn't affected by that
> much. [.Older versions of DRBD were faster, but then the current one
> never does a full sync once set up].
>
> On more normal workloads, there are at least 2 reads to every write. If
> you take that more typical case you can see why DRBD overhead may not
> even be measurable...
>
> Let's say you do 2 reads to every write, and you're performing 10 writes
> per second. That means your local disk is seeing 30 I/Os per second
> with the expected effect on head movement, etc. Now, the other side is
> only seeing 10 I/Os per second over a network that's MUCH faster than
> the disk (if you use gigabit which is cheap enough to do).
>
> So, it is _more_ than possible that the remote writes are completing on
> average at least as fast as the local writes. If you have a high ratio
> of reads to writes, they may be even completing faster than local writes
> - because of shorter I/O queues, and less head movement.
>
> Your comments about it consuming less bandwidth (i.e., being more
> efficient) are correct. But, if you have a dedicated gigabit link
> between the machines, it may not be important.
>
> If filesystem level replication is less reliable than app-level
> replication, then there's probably a bug somewhere - and it's probably
> in the app (which probably can't recover properly from a crash).


Thanks for all this.

We'll take all your points on board.

So it looks like DRDB is a good option.

We will prove all this in the testing phase and let you know how it goes.

Thanks.

--
Kind Regards,

Gavin Henry.
Managing Director.

T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 742001
E ghenry [at] suretecsystems

Open Source. Open Solutions(tm).

http://www.suretecsystems.com/
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.