
grefen at carpe
May 25, 1998, 2:42 PM
Post #10 of 18
(1162 views)
Permalink
|
In message <3569D85A.4E694E13 [at] artech> you wrote: > Hi Stefan! > > > Stefan Grefen wrote: > > > Thats normal UNIX (besides the aggressive reordering, which is wrong defaul > t > > IMHO), on other machines there is an update daemon to do the same thing, > > your application can sync too, if its important. > > Yes I know that.... I don't know how many application that does sync... Ex sql > servers after they have update > database and so on.... But to do sync to often is also a huge performence dro > p... > Diffrent sql server has diffrent approach > to this I think.... And if you have the source code you can do you own hack t > o make > them more resisten against failure and crashed.... Most Databases use raw-devices and not the filesystem, so they don't have that problem. They do have their own transaction logs, which they use to complete or backout a partial transaction after a crash. > > > Unless the application goofs badly, thats the case. Only dirty (eg. changed > ) > > buffer cache data is lost. > > Ok.... just in theory (now again).... If you said you have jfs/lfs (now taht > linux > don't have it yet, and probablly don't will get one working ok in this year : > -( > *sob*)... > > I am not 100% sure if I have understand the advanced and disadanced of jfs/lf > s if > it crash during file write... And about the file(s) which was writting just a > t that > point in time. > > So far as I have understand jfs/lfs just prevent the filesystem from error bu > t not > files from error if your box crash.... So if you have a couple of files which > was > writting just the moment then the computer dies.... Those files are more or l > ess > lost??? Or will be zero length files or like that (can be implemntion depend) The advantage of lfs/jfs after a crash is that you normaly don't have to wait for fsck to complete, because their repair mechanism is much faster. If it doesn't work and you have to do a real fsck on them it takes usually much longer. Veritas filesystem (in HPUX and Unixware ....) manages to get there from time to time :-(( > .... > Right??? > Then you have some huge application writting big-very-BIG files (in theory a > sql > server can do this)... If this sql-server is huge load and on save all data i > n few > huge files then you probably lost all data if you box crash... right??? And a > s far > as I have understand > you (and also my own think I do to day at work) where is no why to prevent i > t at a > general level.... just at application level.... > You said before that you can record a change just after a sync has been compl > ete... > But is that enoguh??? If you computer dies during one of the sync process you > then > probably lost the information in a lot of your huge dirt files??? Not good... > . Or > do I think wrong here??? And then you have the problem with recording the cha > nge if > you recoring it to the disk the box perhaps hang just during you record the c > hange > of one of you big fat file and you again lost information and perhaps the hol > e file > (if you jfs/lfs has the disadvance that every file just in change then the bo > x dies > will be zero length file or file without any meaningfull information like ful > led > vid null/zeros).... Then you haven't won so much i think... > > But it most and have been solved in HA application and system like banking sy > stem > and so on... If you change parts of a big file, than you don't loose it in a crash just the changed aerea is in a undefined state. There are various options to deal with this on an apllication level. Basic rule, you must be able to rollback an incomplete operation. > > As far as I have understand jfs/lfs from some minor information source at the > net > they don't change inode,data block directly but just put them in the "log" pa > rt of > the disks.... And at some intervall they updating the inode to point to the n > ew > data locations and free old blocks... But does this prevent from data lose??? > I don't think so because if you computer dies just during the inode update wh > at > happen then??? You have the old version ... It's like crashing before the update. Inode updats are synchronous (on all systems with a lfs/jfs). > Or I think the jfs/lfs on boot up go through the information in the log sence > the > last check point and do the inode check up again?? But can the jfs/lfs deside > if it > should to inode update or does it jus goes back to the old information before > the > ckeckpoint??? Because the information in the log can be trashed because the o > nly > information that was write during / just before the box dies is write to the > log??? > So does jfs/lfs filesystem roll back and update the inode to the old blocks?? > ? > To update a inode doesn't do a lot of write to the disks... So i think that c > an be > done in write just one block do the disk.... But can you insure that that bl > ock > either write or dosen't write and not just write the half block???? I don't > think > so... And as far I have understand you can't prevent from scenario there you > lost a > lot of your information even with jfs/lfs, right?? A disk write is atomic. You can write half a disk block. With old disk you produced a bad sector a modern disk doesn't write the block if the transfer from the host was incomplete. > You point out that you can just record a change just after you have complete > a > sync??? But that happend if the box dies just then you should record the chan > ge??? > I think the only why to get near 100% chance that you don't lost a lot of dat > a is > to record the change over network?? Bute even where you can get problem if th > e > record process hangs because of the compter dies.... But can't you construct > som > protocol which need to be terminated in some why so you are ca 100% sure tha > t you > don't lost get trouble (100% sure you can't be it is a littel change that the > n the > box dies the last thing the network card send out is just something that is l > ike > the terminated control string)... > Or is there any easy they??? The netwok doesn't help. The mechanism of the update is irrelevent, the order and information updated is important. > > Sorry for my many question but like to understand how exactly a jfs/lfs work > and > its disadvancde... If you lost all data in the files that was changed just an > d > inode update from the log just during the box dies it isn't good enough i thi > nk. > If you have a bankingsystem then you can't get a lot of transction lost in > space.... How do they fix it??? There are multiple ways to do that, depending on system requirements. One thing: copy old data to save place sync mark copy as valid sync rewrite new data sync mark copy as invalid sync Recovery is easy, if the copy is marked valid, copy it to the old place. The performance sucks if this has a lot of updates ... Again most databases don't use the filesystem . > > > > The 'leader' in HA stuff, Tandem, uses basicly a shared nothing architektur > e. > > That biggest problem is a propper design of the API and the application. > > After that I thinking that is the best why i think... But it should be better > if > the os level has some support for clusterlfs/jfs,raid and stuff like that bec > ause > it make stuff much easer... And then you don't have to get huge performence l > ost if > you should > send every change over network and get conferm messages back and back again a > nd so > on ... > > > Also it means you're accepting a performance penalty, compared to the raw > > CPU Power. > > Performance lost i don't think is so big deal to day if you can run the stuf > f in > cluster and make them work to gother doing load-balance (also best if the > application and client side also have support for it) no big deal computer is > sheap > today.. Clusters don't scale if the application is not designed for it. Clusters don't provide HA if the application is not designed for it. Those performance penalty can come as latency and than they are a big deal. As someone noted a long time ago (I think it was Seymor Cray) You can buy bandwidth but latency is here to stay .... > > Just sync after every important step in your application. And record the fa > ct > > only after the sync completed. You can use sync-NFS to an machine that > > exports synchronous (the defaults for that vary). This is slow as hell, > > but has the requires semantics. > > But does it insure that if you start write/sync a file to nfs that it don't w > rite > it at the nfs server if the transmittion of the file interruppeted/end becaus > e of > the other side dies??? Can the nFS server detect it or does sync-nfs send so > me > special status bit in the protocol to tell them the transmittion of the file > is > complete from the server??? And then the nfs get that messy it update it...(o > k then > your nfs server can dies at the same time but it is other questions/scenior). > ... No, the sync-NFS just ensures that after a write completes it is on disk (no buffer cache). > I am assume that the backup box is working 100% perfect then the master box > dies.... I know that you never can be 100% sure of that.... And if you should > think > of morfy low the backup box will dies or get some error in some hardware at t > he > exact point then the master dies :-) *lol* > Correct me if i have mess something in my think, I think that i have mess a l > ot of > important step. You have, the key to HA is to be fault-tolerant. That means your data-integrity is not threatend by a fault in a device (CPU,network,disk ... ,power). You may endup with a state a few transactions before the actual crash, but you can alwys replay them to get back to that step. Only the application can do that. > > //Tommy > Stefan -- Stefan Grefen Am Hummertal 4, 55283 Nierstein, Germany grefen [at] carpe +49 6133 927484 Fax:+49 6133 927486 Idiot, n.: A member of a large and powerful tribe whose influence in human affairs has always been dominant and controlling. -- Ambrose Bierce, "The Devil's Dictionary"
|