Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

MOSIX

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


tommy at artech

May 22, 1998, 11:38 AM

Post #1 of 18 (1185 views)
Permalink
MOSIX

Hi again!!!

I have done some serach on cluster and HA on the network and found
this...

Some who know if MOSIX (http://www.cs.huji.ac.il/mosix/) could be
something for the linux HA.
It seems to have some nice feature you need of a cluster.
It is now only avaiable for BSD/OS but they cliam to be develop a Linux
version.... Some1 who knows
more about it???

Thanks and sorry for my stupid questions...

//Tommy


hm at seneca

May 24, 1998, 1:08 AM

Post #2 of 18 (1165 views)
Permalink
MOSIX [In reply to]

> Yes you have right.... about MOSIX.... Btw CMIIW???

Correct me if I'm wrong ...

> About my previours question in HA-list, do you know any good why to
> mirror filesystem accross two or more linux (or FreeBSD/NetBSD/OpenBSD
> is ok too) box in a cluster??? I need (demand) fault tollerans and
> instand repair... I don't have time for

Currently, there's no way for Linux. I know the *BSD variants too little to
speak for them.

Linux has the network block device driver and RAID-1. Both could in theory
set up partition mirroring across nodes. The partition can also contain a
filesystem but if the FS does not work in a transaction oriented way,
you'll have a hard time fsck'ing it after a failure if you don't care for
the issues I outlined in an earlier post to the list. File corruption is
not the worst part but filesystem corruption. Could be fsck -A -a doesn't
work, and you have to intervene manually. This has nothing to do with fault
tolerance...

> the master then it goes down without a nice shutdown....... Linux need a
> journalled /logging filesystem but do you know any why to fix it will
> linux today?

There are some projects underway but in very different statuses. You could
join the linux-ljfs mailing list.

Ciao,
hm

--
Mother is the invention of necessity.


tommy at artech

May 24, 1998, 2:20 AM

Post #3 of 18 (1164 views)
Permalink
MOSIX [In reply to]

Harald Milz wrote:

> Currently, there's no way for Linux. I know the *BSD variants too little to
> speak for them.

Oki.... Not so good, hope the linux lfs project will get some resultat son
will make life a bit more easyer...

> Linux has the network block device driver and RAID-1. Both could in theory
> set up partition mirroring across nodes.

Yes I know or share the same SCSI disk work... Have try it.... But you got the
fsck problem so it isn't good... Do you doumount or/and shutdown on the other
disk then no problem, but you can't except that to happen every time.

> File corruption is
> not the worst part but filesystem corruption. Could be fsck -A -a doesn't
> work, and you have to intervene manually. This has nothing to do with fault
> tolerance...

OK... I see but my goal is to have the backup box up running in less 1-5 min.
Just now I see the biggest problem to get it to work
is that linux lack a journalled/logging file system.
But about filesystem corruption could happpen even to jfs/lfs or???

Other why to solve this problem is to have a truely distrubtion and redudant
filesystem a cross two or more computer there all
computer have to the hole data... But that filesystem should have som logging
to work.... I have look at Coda,Xfs but none of them is there today I think...

Some1 who have other ide about a good distrubtion filesystem and how to build
it??? I just intressting in known if some1 has think more about diffrent
filesystem which is distrubtion, fault tollerans, instant repair etc... If you
look arround the net you see a couple of diffrent projekt for distrubtion
filesystem at university,company or "internet" project but none of them has
some intressting ressult today I think.

> There are some projects underway but in very different statuses. You could
> join the linux-ljfs mailing list.

Yes... I have already join that list... But I think it move to slowly....

Thanks...

//TOmmy


h.milz at seneca

May 24, 1998, 6:44 AM

Post #4 of 18 (1164 views)
Permalink
MOSIX [In reply to]

Tommy Svensson (tommy [at] artech) wrote:
> Yes I know or share the same SCSI disk work... Have try it.... But you got the
> fsck problem so it isn't good... Do you doumount or/and shutdown on the other
> disk then no problem, but you can't except that to happen every time.

Especially not because a HA solution should typically cope with the
disaster situations, not clean unmounts. Sync mounts don't help either.
Sync mount means the file and filesystem data are less likely to be
corrupted but the fsck is necessary anyway.

> But about filesystem corruption could happpen even to jfs/lfs or???

Not really. I've been using and kicking AIX boxes for years now and never
had a FS corruption problem. Please note that logging / journalling FS
typically only care for FS meta data, not file contents. On AIX; if a file
is open for write and the box dies, the FS will come back just fine but the
file length will be zero.

> filesystem at university,company or "internet" project but none of them has
> some intressting ressult today I think.

Exactly *sigh*. More badly, they don't cooperate to get something quickly.


tommy at artech

May 24, 1998, 1:49 PM

Post #5 of 18 (1168 views)
Permalink
MOSIX [In reply to]

Harald Milz wrote:

> Not really. I've been using and kicking AIX boxes for years now and never
> had a FS corruption problem. Please note that logging / journalling FS
> typically only care for FS meta data, not file contents. On AIX; if a file
> is open for write and the box dies, the FS will come back just fine but the
> file length will be zero.

Ok... so all files that is open for write just then the box dies will be zero
length files??? Is there any why to fix that??Could you have a filesytem where you
save old copy of file that has been change the x minutes back??? Disk space is ship
today so isn't a big deal (at least as i think) to lost 10-25% because of easy,fast
repair???
The problem is that if you run some kind of application a lot of files can be under
write then the box dies and then you haven't get anything out of it. But can't
filesystem like NTFS repair and you get the last copy of the files that is under
write just then the box dies??? Or does I have wrong....???
I think that if everythings hould work fine after box dies and the backup is up and
working... Just must get back the file not the information you planied to write to
it then the box dies... But the information that the file contain before you begin
that write that was interuppted because of the machine dies... Do you know any
filesystem today that can handle that??? And if not do you know any project or
just any ide how to make a filesytem do it in theory??? Do you know any good source
of information for filesystem theory and so on???

> Exactly *sigh*. More badly, they don't cooperate to get something quickly.

Yes that was my point to.... I think that we already should have seen good results
if they have/had cooperate...

Bye!

Thanks!

//Tommy


grefen at carpe

May 24, 1998, 2:47 PM

Post #6 of 18 (1158 views)
Permalink
MOSIX [In reply to]

In message <356887F0.6B91DF3 [at] artech> Tommy Svensson wrote:
> Harald Milz wrote:
>
> > Not really. I've been using and kicking AIX boxes for years now and never
> > had a FS corruption problem. Please note that logging / journalling FS
> > typically only care for FS meta data, not file contents. On AIX; if a file
> > is open for write and the box dies, the FS will come back just fine but the
> > file length will be zero.
>
> Ok... so all files that is open for write just then the box dies will be zero
> length files??? Is there any why to fix that??Could you have a filesytem wher
> e you
> save old copy of file that has been change the x minutes back??? Disk space i
> s ship
> today so isn't a big deal (at least as i think) to lost 10-25% because of eas
> y,fast
> repair???

The size is zero, if nothing from the buffercache was committed to disk.
So you don't loose anything that is already on the disk (unless you rewrite
the file). In other words, your filesize is equal to the highest block
that was commited to disk, I wouldn't bet on that the application write
order is preserved ...
You don't solve anything by keeping a backup copy.
Suppose a program has 2 files open, a database index and the data.
Only one write makes it to disk before the crash ...

> The problem is that if you run some kind of application a lot of files can be
> under
> write then the box dies and then you haven't get anything out of it. But can'
> t
> filesystem like NTFS repair and you get the last copy of the files that is un
> der
> write just then the box dies??? Or does I have wrong....???

NTFS repair would love to be like fsck. The purpose is the same.
It doesn't magicly repair the data, only the inode-structures.
I think repair is the wrong word. It makes them consistent thats all.

> > Exactly *sigh*. More badly, they don't cooperate to get something quickly.
>
> Yes that was my point to.... I think that we already should have seen good re
> sults
> if they have/had cooperate...

I think the reasons differ here:
1) some don't want others to spoil there thesis
2) some tried a project way to big (if I need to figure out how a
filesystem works a jfs project is way to big)
3) some ran out of time (happens to me all the time :-)) )
4) think they are so brilliant that they don't need help


I can understand 1-3 , I consider people doing 4 to be plain stupid
(ok, than it loops back to 2 :-)) )

Some universities are afraid of OpenSource stuff, just look at all the
failed OS attempts there, the 'yield' in the OpenSource community ...

Stefan

>
> Bye!
>
> Thanks!
>
> //Tommy
>
>

--
Stefan Grefen Am Hummertal 4, 55283 Nierstein, Germany
grefen [at] carpe +49 6133 927484 Fax:+49 6133 927486
Idiot, n.:
A member of a large and powerful tribe whose influence in human
affairs has always been dominant and controlling.
-- Ambrose Bierce, "The Devil's Dictionary"


tommy at artech

May 24, 1998, 3:43 PM

Post #7 of 18 (1160 views)
Permalink
MOSIX [In reply to]

Hi Stefan!

Stefan Grefen wrote:

> The size is zero, if nothing from the buffercache was committed to disk.
> So you don't loose anything that is already on the disk (unless you rewrite
> the file). In other words, your filesize is equal to the highest block
> that was commited to disk,

Oki... good enought I think....

> You don't solve anything by keeping a backup copy.
> Suppose a program has 2 files open, a database index and the data.
> Only one write makes it to disk before the crash ...

Yes you have right... and you can't begin with make backup copy to insure 2 files,
3 files etc.... Then it will never end...So there isn't any good ide to get instant
and 100% repair after a crash??? I know/think that it will be every possbily it
everything most be build for it from hardware up to application level.

Other problem is diskcache.... So far as I understand linux even cache write to the
filesystem cache (you see this if you write to floppy and forgot to do sync)...
Which is also problem... If you have heavy machine with a lot of memmory up over
512Mb or couple of 1Gb you probably have a lot of disk information in the chace....
All that information you lost in a crash... You can do sync in ex crontab with
some minutes to fix it or more right to make you lost less information.

One goal in instand repair after a crash should/could be that you only lost
information/transaction that was add/change etc just the secounds before the
crash... But I don't know if it is possible even in the theory????

Could you in theory make everything distrubtion over a couple of computer.... I
mean processors (you could to this with PVM/MOSIX), memory, harddisk etc... I know
about the big problem to get the computer talk fast enough and the problem to get
why to which with low latence and so on...

How does the big ones like AS/400, VAX and other HA and cluster special computer
fix which problem??

I think that the Intel architure isn't the best to do HA because they lack some
basic HA feature even on the hardware like shared high speed low latence buses...
etc...


> NTFS repair would love to be like fsck. The purpose is the same.
> It doesn't magicly repair the data, only the inode-structures.
> I think repair is the wrong word. It makes them consistent thats all.

Ok so the only thing a jfs/lfs do is to repair the inode-structures so you can read
the filesystem whichout any error... But it dosen't repair the file so you can
still get broken file??? But could you desgin a filesystem (if it need a other
computer over network to work... it just fine) so you don't got broken file??? If
you lost the information that was write just before/during the crash it just ok i
think... But you should have almost 100% (I know that you nerver will be 100%
sure) change that everything will be up running whichout risk for broken file...
Then why I think (but this sound to easy so I think i have forgot something
important) is to mirror over network and the other end only write the file then the
transfer from the "master" host has been completeted ??? But which dosen't protect
if the two box dies at the same time.... But then isn't no cluster of two compute
or n-computer cluster dosen't protect for n-computers fall at the same time :-)


> I can understand 1-3 , I consider people doing 4 to be plain stupid
> (ok, than it loops back to 2 :-)) )

I agree with you....

> Some universities are afraid of OpenSource stuff, just look at all the
> failed OS attempts there, the 'yield' in the OpenSource community ...

Yes you have right... TO many universities try to do they own soultions on stuff...
Not just OS,programmering lanuage is also a good exempel I think... Almost every
bigger universites has there own lanuage just useing at there places....

But ok there is good if there is some diffrent and you can chose between diffrent
lanuage and stuff like that... But everything has a good and bad side so....


Sorry for my basic, lame and stupid questions...

//Tommy


grefen at carpe

May 24, 1998, 4:39 PM

Post #8 of 18 (1160 views)
Permalink
MOSIX [In reply to]

In message <3568A29E.A2D04149 [at] artech> you wrote:
> Hi Stefan!
>

>
> Yes you have right... and you can't begin with make backup copy to insure 2 f
> iles,
> 3 files etc.... Then it will never end...So there isn't any good ide to get i
> nstant
> and 100% repair after a crash??? I know/think that it will be every possbily
> it
> everything most be build for it from hardware up to application level.

It is impossible, because the relation is only known to the application.
For every algorithm you can think of, you can produce an application scenario
where it either fails or is unusable.

>
> Other problem is diskcache.... So far as I understand linux even cache write
> to the
> filesystem cache (you see this if you write to floppy and forgot to do sync).
> ..
> Which is also problem... If you have heavy machine with a lot of memmory up o
> ver
> 512Mb or couple of 1Gb you probably have a lot of disk information in the cha
> ce....
> All that information you lost in a crash... You can do sync in ex crontab wi
> th
> some minutes to fix it or more right to make you lost less information.

Thats normal UNIX (besides the aggressive reordering, which is wrong default
IMHO), on other machines there is an update daemon to do the same thing,
your application can sync too, if its important.

>
> One goal in instand repair after a crash should/could be that you only lost
> information/transaction that was add/change etc just the secounds before the
> crash... But I don't know if it is possible even in the theory????

Unless the application goofs badly, thats the case. Only dirty (eg. changed)
buffer cache data is lost.

>
> Could you in theory make everything distrubtion over a couple of computer....
> I
> mean processors (you could to this with PVM/MOSIX), memory, harddisk etc... I
> know
> about the big problem to get the computer talk fast enough and the problem to
> get
> why to which with low latence and so on...
>
> How does the big ones like AS/400, VAX and other HA and cluster special compu
> ter
> fix which problem??
>
> I think that the Intel architure isn't the best to do HA because they lack so
> me
> basic HA feature even on the hardware like shared high speed low latence buse
> s...
> etc...

The 'leader' in HA stuff, Tandem, uses basicly a shared nothing architekture.
That biggest problem is a propper design of the API and the application.

Also it means you're accepting a performance penalty, compared to the raw
CPU Power. Than you can build HA applications on top of nearly any OS (NT is
a good testcase, it crashes so often :-)) )
A good API and OS archtekures helps, but getting this done without the
application 'helping' doesn't work.

If you need to build a HS application now on LINUX, build HA into the
application, an HA - filesystem is some time away.

>
>
> > NTFS repair would love to be like fsck. The purpose is the same.
> > It doesn't magicly repair the data, only the inode-structures.
> > I think repair is the wrong word. It makes them consistent thats all.
>
> Ok so the only thing a jfs/lfs do is to repair the inode-structures so you ca
> n read
> the filesystem whichout any error... But it dosen't repair the file so you ca
> n
> still get broken file??? But could you desgin a filesystem (if it need a othe

Yes you can still get damaged data.

> r
> computer over network to work... it just fine) so you don't got broken file??
> ?

Just sync after every important step in your application. And record the fact
only after the sync completed. You can use sync-NFS to an machine that
exports synchronous (the defaults for that vary). This is slow as hell,
but has the requires semantics.

> If
> you lost the information that was write just before/during the crash it just
> ok i
> think... But you should have almost 100% (I know that you nerver will be 100%
> sure) change that everything will be up running whichout risk for broken fil
> e...
> Then why I think (but this sound to easy so I think i have forgot something
> important) is to mirror over network and the other end only write the file th
> en the
> transfer from the "master" host has been completeted ??? But which dosen't p

Ok, how does the OS detect that transfer is completed?? remember a
state change in an application may incolve multiple files and processes.
In other words the OS can't detect such a thing, anf trying could make the
damage worse.

> Yes you have right... TO many universities try to do they own soultions on st
> uff...
> Not just OS,programmering lanuage is also a good exempel I think... Almost ev
> ery
> bigger universites has there own lanuage just useing at there places....
>
> But ok there is good if there is some diffrent and you can chose between diff
> rent
> lanuage and stuff like that... But everything has a good and bad side so....

There must be variety and Universities should try stuff, on the other hand
the various OS attempts often enough try to reinvent the wheel, or try to
to do an OS from scratch with 4 people on a 2 year grant.

Both version don't work, and there are rarely success stories. So they try
to keep us out.

Stefan

>
> Sorry for my basic, lame and stupid questions...
>
> //Tommy
>
>
>

--
Stefan Grefen Am Hummertal 4, 55283 Nierstein, Germany
grefen [at] carpe +49 6133 927484 Fax:+49 6133 927486
Idiot, n.:
A member of a large and powerful tribe whose influence in human
affairs has always been dominant and controlling.
-- Ambrose Bierce, "The Devil's Dictionary"


tommy at artech

May 25, 1998, 1:45 PM

Post #9 of 18 (1163 views)
Permalink
MOSIX [In reply to]

Hi Stefan!


Stefan Grefen wrote:

> It is impossible, because the relation is only known to the application.
> For every algorithm you can think of, you can produce an application scenario
> where it either fails or is unusable.

Okey...

> Thats normal UNIX (besides the aggressive reordering, which is wrong default
> IMHO), on other machines there is an update daemon to do the same thing,
> your application can sync too, if its important.

Yes I know that.... I don't know how many application that does sync... Ex sql
servers after they have update
database and so on.... But to do sync to often is also a huge performence drop...
Diffrent sql server has diffrent approach
to this I think.... And if you have the source code you can do you own hack to make
them more resisten against failure and crashed....

> Unless the application goofs badly, thats the case. Only dirty (eg. changed)
> buffer cache data is lost.

Ok.... just in theory (now again).... If you said you have jfs/lfs (now taht linux
don't have it yet, and probablly don't will get one working ok in this year :-(
*sob*)...

I am not 100% sure if I have understand the advanced and disadanced of jfs/lfs if
it crash during file write... And about the file(s) which was writting just at that
point in time.

So far as I have understand jfs/lfs just prevent the filesystem from error but not
files from error if your box crash.... So if you have a couple of files which was
writting just the moment then the computer dies.... Those files are more or less
lost??? Or will be zero length files or like that (can be implemntion depend)....
Right???
Then you have some huge application writting big-very-BIG files (in theory a sql
server can do this)... If this sql-server is huge load and on save all data in few
huge files then you probably lost all data if you box crash... right??? And as far
as I have understand
you (and also my own think I do to day at work) where is no why to prevent it at a
general level.... just at application level....
You said before that you can record a change just after a sync has been complete...
But is that enoguh??? If you computer dies during one of the sync process you then
probably lost the information in a lot of your huge dirt files??? Not good.... Or
do I think wrong here??? And then you have the problem with recording the change if
you recoring it to the disk the box perhaps hang just during you record the change
of one of you big fat file and you again lost information and perhaps the hole file
(if you jfs/lfs has the disadvance that every file just in change then the box dies
will be zero length file or file without any meaningfull information like fulled
vid null/zeros).... Then you haven't won so much i think...

But it most and have been solved in HA application and system like banking system
and so on...

As far as I have understand jfs/lfs from some minor information source at the net
they don't change inode,data block directly but just put them in the "log" part of
the disks.... And at some intervall they updating the inode to point to the new
data locations and free old blocks... But does this prevent from data lose???
I don't think so because if you computer dies just during the inode update what
happen then???
Or I think the jfs/lfs on boot up go through the information in the log sence the
last check point and do the inode check up again?? But can the jfs/lfs deside if it
should to inode update or does it jus goes back to the old information before the
ckeckpoint??? Because the information in the log can be trashed because the only
information that was write during / just before the box dies is write to the log???
So does jfs/lfs filesystem roll back and update the inode to the old blocks???
To update a inode doesn't do a lot of write to the disks... So i think that can be
done in write just one block do the disk.... But can you insure that that block
either write or dosen't write and not just write the half block???? I don't think
so... And as far I have understand you can't prevent from scenario there you lost a
lot of your information even with jfs/lfs, right??
You point out that you can just record a change just after you have complete a
sync??? But that happend if the box dies just then you should record the change???
I think the only why to get near 100% chance that you don't lost a lot of data is
to record the change over network?? Bute even where you can get problem if the
record process hangs because of the compter dies.... But can't you construct som
protocol which need to be terminated in some why so you are ca 100% sure that you
don't lost get trouble (100% sure you can't be it is a littel change that then the
box dies the last thing the network card send out is just something that is like
the terminated control string)...
Or is there any easy they???

Sorry for my many question but like to understand how exactly a jfs/lfs work and
its disadvancde... If you lost all data in the files that was changed just and
inode update from the log just during the box dies it isn't good enough i think.
If you have a bankingsystem then you can't get a lot of transction lost in
space.... How do they fix it???


> The 'leader' in HA stuff, Tandem, uses basicly a shared nothing architekture.
> That biggest problem is a propper design of the API and the application.

After that I thinking that is the best why i think... But it should be better if
the os level has some support for clusterlfs/jfs,raid and stuff like that because
it make stuff much easer... And then you don't have to get huge performence lost if
you should
send every change over network and get conferm messages back and back again and so
on ...

> Also it means you're accepting a performance penalty, compared to the raw
> CPU Power.

Performance lost i don't think is so big deal to day if you can run the stuff in
cluster and make them work to gother doing load-balance (also best if the
application and client side also have support for it) no big deal computer is sheap
today..

> Than you can build HA applications on top of nearly any OS (NT is
> a good testcase, it crashes so often :-)) )

:-) *smile*


> If you need to build a HS application now on LINUX, build HA into the
> application, an HA - filesystem is some time away.

The place I work on don't have the need for 100% truely HA application right now...
This is only my own project... But I think some
"HA" or just a automatical backup computer but which can take some time to come
because check the disk is good enough for them to day.... But i am not pleasure
with that... Like to have even more HA and insant repair is one of my goal...

> Just sync after every important step in your application. And record the fact
> only after the sync completed. You can use sync-NFS to an machine that
> exports synchronous (the defaults for that vary). This is slow as hell,
> but has the requires semantics.

But does it insure that if you start write/sync a file to nfs that it don't write
it at the nfs server if the transmittion of the file interruppeted/end because of
the other side dies??? Can the nFS server detect it or does sync-nfs send some
special status bit in the protocol to tell them the transmittion of the file is
complete from the server??? And then the nfs get that messy it update it...(ok then
your nfs server can dies at the same time but it is other questions/scenior)....


> Ok, how does the OS detect that transfer is completed?? remember a
> state change in an application may incolve multiple files and processes.
> In other words the OS can't detect such a thing, anf trying could make the
> damage worse.

Yes you have write.... But how about nfs-sync you told aboute above.... Doesn't it
have the same problem??? But can't you in the nfs protocol or even better in the
application protocol (through a low-level API for HA) do send some end of file/ack
stuff them it get has send the file... and first them the backup box get it write
it and update the filesystem etc??? Ok I know that if the box dies the network
card can send out a ack even if the box didn't do it.... But it should be so big
change for it??? Then you just can design the protocl to make that change less...
But then you spend more bandwith and lost more in performence.... Or do mess
something??
I am assume that the backup box is working 100% perfect then the master box
dies.... I know that you never can be 100% sure of that.... And if you should think
of morfy low the backup box will dies or get some error in some hardware at the
exact point then the master dies :-) *lol*
Correct me if i have mess something in my think, I think that i have mess a lot of
important step.


> There must be variety and Universities should try stuff, on the other hand
> the various OS attempts often enough try to reinvent the wheel, or try to
> to do an OS from scratch with 4 people on a 2 year grant.

Yes you have right about it... But sometimes it will be to much of the good... But
it is always hard tomake a borader then the Universities does that to much or to
less or so....
And I think they can keep on doing there stuff... It is oftne good and sometime you
get some spin off project....Which can be really good....


//Tommy


grefen at carpe

May 25, 1998, 2:42 PM

Post #10 of 18 (1162 views)
Permalink
MOSIX [In reply to]

In message <3569D85A.4E694E13 [at] artech> you wrote:
> Hi Stefan!
>
>
> Stefan Grefen wrote:
>

> > Thats normal UNIX (besides the aggressive reordering, which is wrong defaul
> t
> > IMHO), on other machines there is an update daemon to do the same thing,
> > your application can sync too, if its important.
>
> Yes I know that.... I don't know how many application that does sync... Ex sql
> servers after they have update
> database and so on.... But to do sync to often is also a huge performence dro
> p...
> Diffrent sql server has diffrent approach
> to this I think.... And if you have the source code you can do you own hack t
> o make
> them more resisten against failure and crashed....

Most Databases use raw-devices and not the filesystem, so they don't have
that problem. They do have their own transaction logs, which they use
to complete or backout a partial transaction after a crash.

>
> > Unless the application goofs badly, thats the case. Only dirty (eg. changed
> )
> > buffer cache data is lost.
>
> Ok.... just in theory (now again).... If you said you have jfs/lfs (now taht
> linux
> don't have it yet, and probablly don't will get one working ok in this year :
> -(
> *sob*)...
>
> I am not 100% sure if I have understand the advanced and disadanced of jfs/lf
> s if
> it crash during file write... And about the file(s) which was writting just a
> t that
> point in time.
>
> So far as I have understand jfs/lfs just prevent the filesystem from error bu
> t not
> files from error if your box crash.... So if you have a couple of files which
> was
> writting just the moment then the computer dies.... Those files are more or l
> ess
> lost??? Or will be zero length files or like that (can be implemntion depend)

The advantage of lfs/jfs after a crash is that you normaly don't have to wait
for fsck to complete, because their repair mechanism is much faster.
If it doesn't work and you have to do a real fsck on them it takes usually
much longer. Veritas filesystem (in HPUX and Unixware ....) manages to
get there from time to time :-((

> ....
> Right???
> Then you have some huge application writting big-very-BIG files (in theory a
> sql
> server can do this)... If this sql-server is huge load and on save all data i
> n few
> huge files then you probably lost all data if you box crash... right??? And a
> s far
> as I have understand
> you (and also my own think I do to day at work) where is no why to prevent i
> t at a
> general level.... just at application level....
> You said before that you can record a change just after a sync has been compl
> ete...
> But is that enoguh??? If you computer dies during one of the sync process you
> then
> probably lost the information in a lot of your huge dirt files??? Not good...
> . Or
> do I think wrong here??? And then you have the problem with recording the cha
> nge if
> you recoring it to the disk the box perhaps hang just during you record the c
> hange
> of one of you big fat file and you again lost information and perhaps the hol
> e file
> (if you jfs/lfs has the disadvance that every file just in change then the bo
> x dies
> will be zero length file or file without any meaningfull information like ful
> led
> vid null/zeros).... Then you haven't won so much i think...
>
> But it most and have been solved in HA application and system like banking sy
> stem
> and so on...

If you change parts of a big file, than you don't loose it in a crash just the
changed aerea is in a undefined state. There are various options to deal
with this on an apllication level.
Basic rule, you must be able to rollback an incomplete operation.

>
> As far as I have understand jfs/lfs from some minor information source at the
> net
> they don't change inode,data block directly but just put them in the "log" pa
> rt of
> the disks.... And at some intervall they updating the inode to point to the n
> ew
> data locations and free old blocks... But does this prevent from data lose???
> I don't think so because if you computer dies just during the inode update wh
> at
> happen then???

You have the old version ... It's like crashing before the update.
Inode updats are synchronous (on all systems with a lfs/jfs).

> Or I think the jfs/lfs on boot up go through the information in the log sence
> the
> last check point and do the inode check up again?? But can the jfs/lfs deside
> if it
> should to inode update or does it jus goes back to the old information before
> the
> ckeckpoint??? Because the information in the log can be trashed because the o
> nly
> information that was write during / just before the box dies is write to the
> log???
> So does jfs/lfs filesystem roll back and update the inode to the old blocks??
> ?
> To update a inode doesn't do a lot of write to the disks... So i think that c
> an be
> done in write just one block do the disk.... But can you insure that that bl
> ock
> either write or dosen't write and not just write the half block???? I don't
> think
> so... And as far I have understand you can't prevent from scenario there you
> lost a
> lot of your information even with jfs/lfs, right??

A disk write is atomic. You can write half a disk block. With old disk
you produced a bad sector a modern disk doesn't write the block if the
transfer from the host was incomplete.


> You point out that you can just record a change just after you have complete
> a
> sync??? But that happend if the box dies just then you should record the chan
> ge???
> I think the only why to get near 100% chance that you don't lost a lot of dat
> a is
> to record the change over network?? Bute even where you can get problem if th
> e
> record process hangs because of the compter dies.... But can't you construct
> som
> protocol which need to be terminated in some why so you are ca 100% sure tha
> t you
> don't lost get trouble (100% sure you can't be it is a littel change that the
> n the
> box dies the last thing the network card send out is just something that is l
> ike
> the terminated control string)...
> Or is there any easy they???

The netwok doesn't help. The mechanism of the update is irrelevent,
the order and information updated is important.

>
> Sorry for my many question but like to understand how exactly a jfs/lfs work
> and
> its disadvancde... If you lost all data in the files that was changed just an
> d
> inode update from the log just during the box dies it isn't good enough i thi
> nk.
> If you have a bankingsystem then you can't get a lot of transction lost in
> space.... How do they fix it???

There are multiple ways to do that, depending on system requirements.
One thing:

copy old data to save place
sync
mark copy as valid
sync
rewrite new data
sync
mark copy as invalid
sync

Recovery is easy, if the copy is marked valid, copy it to the old place.

The performance sucks if this has a lot of updates ...

Again most databases don't use the filesystem .

>
>
> > The 'leader' in HA stuff, Tandem, uses basicly a shared nothing architektur
> e.
> > That biggest problem is a propper design of the API and the application.
>
> After that I thinking that is the best why i think... But it should be better
> if
> the os level has some support for clusterlfs/jfs,raid and stuff like that bec
> ause
> it make stuff much easer... And then you don't have to get huge performence l
> ost if
> you should
> send every change over network and get conferm messages back and back again a
> nd so
> on ...
>
> > Also it means you're accepting a performance penalty, compared to the raw
> > CPU Power.
>
> Performance lost i don't think is so big deal to day if you can run the stuf
> f in
> cluster and make them work to gother doing load-balance (also best if the
> application and client side also have support for it) no big deal computer is
> sheap
> today..

Clusters don't scale if the application is not designed for it.
Clusters don't provide HA if the application is not designed for it.
Those performance penalty can come as latency and than they are a big deal.

As someone noted a long time ago (I think it was Seymor Cray)
You can buy bandwidth but latency is here to stay ....

> > Just sync after every important step in your application. And record the fa
> ct
> > only after the sync completed. You can use sync-NFS to an machine that
> > exports synchronous (the defaults for that vary). This is slow as hell,
> > but has the requires semantics.
>
> But does it insure that if you start write/sync a file to nfs that it don't w
> rite
> it at the nfs server if the transmittion of the file interruppeted/end becaus
> e of
> the other side dies??? Can the nFS server detect it or does sync-nfs send so
> me
> special status bit in the protocol to tell them the transmittion of the file
> is
> complete from the server??? And then the nfs get that messy it update it...(o
> k then
> your nfs server can dies at the same time but it is other questions/scenior).
> ...

No, the sync-NFS just ensures that after a write completes it is on disk
(no buffer cache).

> I am assume that the backup box is working 100% perfect then the master box
> dies.... I know that you never can be 100% sure of that.... And if you should
> think
> of morfy low the backup box will dies or get some error in some hardware at t
> he
> exact point then the master dies :-) *lol*
> Correct me if i have mess something in my think, I think that i have mess a l
> ot of
> important step.

You have, the key to HA is to be fault-tolerant. That means your data-integrity
is not threatend by a fault in a device (CPU,network,disk ... ,power).
You may endup with a state a few transactions before the actual crash, but
you can alwys replay them to get back to that step.
Only the application can do that.

>
> //Tommy
>

Stefan

--
Stefan Grefen Am Hummertal 4, 55283 Nierstein, Germany
grefen [at] carpe +49 6133 927484 Fax:+49 6133 927486
Idiot, n.:
A member of a large and powerful tribe whose influence in human
affairs has always been dominant and controlling.
-- Ambrose Bierce, "The Devil's Dictionary"


h.milz at seneca

May 25, 1998, 4:15 PM

Post #11 of 18 (1165 views)
Permalink
MOSIX [In reply to]

Stefan Grefen (grefen [at] carpe) wrote:
> CPU Power. Than you can build HA applications on top of nearly any OS (NT is
> a good testcase, it crashes so often :-)) )

That could be one reason why so many HA/failover solutions exist for NT
(see IX Mag 10/97 pp. 86).

> If you need to build a HS application now on LINUX, build HA into the
> application, an HA - filesystem is some time away.

Yes you could modify server _and_ client to achieve HA-like behaviour. The
client needs mods too (IP / MAC address failover doesn't exist if you
just let the app be HA).


grefen at carpe

May 25, 1998, 4:47 PM

Post #12 of 18 (1162 views)
Permalink
MOSIX [In reply to]

In message <m0ye4b0-0000dIC [at] senec> Harald Milz wrote:
> Stefan Grefen (grefen [at] carpe) wrote:

>
> > If you need to build a HS application now on LINUX, build HA into the
> > application, an HA - filesystem is some time away.
>
> Yes you could modify server _and_ client to achieve HA-like behaviour. The
> client needs mods too (IP / MAC address failover doesn't exist if you
> just let the app be HA).

I think IP / mac takeover are just goodies. If the application is not
designed for HA the failover will do more harm than good.

Stefan

--
Stefan Grefen Am Hummertal 4, 55283 Nierstein, Germany
grefen [at] carpe +49 6133 927484 Fax:+49 6133 927486
Idiot, n.:
A member of a large and powerful tribe whose influence in human
affairs has always been dominant and controlling.
-- Ambrose Bierce, "The Devil's Dictionary"


h.milz at seneca

May 26, 1998, 5:01 AM

Post #13 of 18 (1162 views)
Permalink
MOSIX [In reply to]

Stefan Grefen (grefen [at] carpe) wrote:
> Most Databases use raw-devices and not the filesystem, so they don't have
> that problem. They do have their own transaction logs, which they use
> to complete or backout a partial transaction after a crash.

If only we had true raw devices in Linux. Fact is we don't.


h.milz at seneca

May 26, 1998, 12:03 PM

Post #14 of 18 (1166 views)
Permalink
MOSIX [In reply to]

Stefan Grefen (grefen [at] carpe) wrote:
> In message <m0ye4b0-0000dIC [at] senec> Harald Milz wrote:
> > Stefan Grefen (grefen [at] carpe) wrote:
>
> >
> > > If you need to build a HS application now on LINUX, build HA into the
> > > application, an HA - filesystem is some time away.
> >
> > Yes you could modify server _and_ client to achieve HA-like behaviour. The
> > client needs mods too (IP / MAC address failover doesn't exist if you
> > just let the app be HA).
>
> I think IP / mac takeover are just goodies. If the application is not
> designed for HA the failover will do more harm than good.

HACMP and the customer satisfaction achieved with it proves the contrary.
I've had too many talks to customers as a tech support and tech marketing
specialist to believe what you say :-)


grefen at carpe

May 26, 1998, 1:41 PM

Post #15 of 18 (1167 views)
Permalink
MOSIX [In reply to]

In message <m0yeN8S-0000EPC [at] senec> Harald Milz wrote:
> >
> > I think IP / mac takeover are just goodies. If the application is not
> > designed for HA the failover will do more harm than good.
>
> HACMP and the customer satisfaction achieved with it proves the contrary.
> I've had too many talks to customers as a tech support and tech marketing
> specialist to believe what you say :-)

This proves nothing from a technical point. There are rumors about
customer satisfaction with NT :-))))

Stefan

--
Stefan Grefen Am Hummertal 4, 55283 Nierstein, Germany
grefen [at] carpe +49 6133 927484 Fax:+49 6133 927486
Idiot, n.:
A member of a large and powerful tribe whose influence in human
affairs has always been dominant and controlling.
-- Ambrose Bierce, "The Devil's Dictionary"


sopwith at cuc

May 26, 1998, 7:22 PM

Post #16 of 18 (1166 views)
Permalink
MOSIX [In reply to]

On Tue, 26 May 1998, Harald Milz wrote:

> Stefan Grefen (grefen [at] carpe) wrote:
> > Most Databases use raw-devices and not the filesystem, so they don't have
> > that problem. They do have their own transaction logs, which they use
> > to complete or backout a partial transaction after a crash.
>
> If only we had true raw devices in Linux. Fact is we don't.

You don't need them. Use fsync().

-- Elliot http://www.redhat.com/
Chicken Little was right.


h.milz at seneca

May 27, 1998, 3:09 PM

Post #17 of 18 (1163 views)
Permalink
MOSIX [In reply to]

Elliot Lee (sopwith [at] cuc) wrote:
> > If only we had true raw devices in Linux. Fact is we don't.
>
> You don't need them. Use fsync().

After every write? Do you think you can convince Oracle etc. developers to
just do that? Or do you propose a special library loaded with LD_PRELOAD?
;^)

Performance?


h.milz at seneca

May 27, 1998, 3:10 PM

Post #18 of 18 (1165 views)
Permalink
MOSIX [In reply to]

Stefan Grefen (grefen [at] carpe) wrote:
>
> This proves nothing from a technical point. There are rumors about
> customer satisfaction with NT :-))))

Rumours, as you say. The other thing I mentioned I know for sure. :-)
Our big customers handle this as follows: the person mentioning NT won't get
any coffee during the next meeting.

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.