Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Ext4 data loss

 

 

First page Previous page 1 2 3 4 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


lists at cheimes

Mar 10, 2009, 1:11 PM

Post #1 of 82 (4677 views)
Permalink
Ext4 data loss

Multiple blogs and news sites are swamped with a discussion about ext4
and KDE 4.0. Theodore Ts'o - the developer of ext4 - explains the issue
at
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.


Python's file type doesn't use fsync() and be the victim of the very
same issue, too. Should we do anything about it?

Christian

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


guido at python

Mar 10, 2009, 1:23 PM

Post #2 of 82 (4617 views)
Permalink
Re: Ext4 data loss [In reply to]

On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes <lists [at] cheimes> wrote:
> Multiple blogs and news sites are swamped with a discussion about ext4
> and KDE 4.0. Theodore Ts'o - the developer of ext4 - explains the issue
> at
> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
>
>
> Python's file type doesn't use fsync() and be the victim of the very
> same issue, too. Should we do anything about it?

If I understand the post properly, it's up to the app to call fsync(),
and it's only necessary when you're doing one of the rename dances, or
updating a file in place. Basically, as he explains, fsync() is a very
heavyweight operation; I'm against calling it by default anywhere.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


nyamatongwe at gmail

Mar 10, 2009, 1:46 PM

Post #3 of 82 (4595 views)
Permalink
Re: Ext4 data loss [In reply to]

The technique advocated by Theodore Ts'o (save to temporary then
rename) discards metadata. What would be useful is a simple, generic
way in Python to copy all the appropriate metadata (ownership, ACLs,
...) to another file so the temporary-and-rename technique could be
used.

On Windows, there is a hack in the file system that tries to track
the use of temporary-and-rename and reapply ACLs and on OS X there is
a function FSPathReplaceObject but I don't know how to do this
correctly on Linux.

Neil
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


barry at python

Mar 10, 2009, 1:49 PM

Post #4 of 82 (4606 views)
Permalink
Re: Ext4 data loss [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mar 10, 2009, at 4:23 PM, Guido van Rossum wrote:

> On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes <lists [at] cheimes>
> wrote:
>> Multiple blogs and news sites are swamped with a discussion about
>> ext4
>> and KDE 4.0. Theodore Ts'o - the developer of ext4 - explains the
>> issue
>> at
>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
>> .
>>
>>
>> Python's file type doesn't use fsync() and be the victim of the very
>> same issue, too. Should we do anything about it?
>
> If I understand the post properly, it's up to the app to call fsync(),
> and it's only necessary when you're doing one of the rename dances, or
> updating a file in place. Basically, as he explains, fsync() is a very
> heavyweight operation; I'm against calling it by default anywhere.

Right. Python /applications/ should call fsync() and do the rename
dance if appropriate, and fortunately it's easy enough to implement in
Python. Mailman's queue runner has done exactly this for ages.

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCUAwUBSbbSXHEjvBPtnXfVAQLrsQP2NxJL+js6fMDgluoSpB6kW+VCJfSS0G58
KaDiRniinl3E9qH9w+hvNE7Es9JzPSiOP79KkuqRkzvCCmkrQRvsY6dKukOs/1zq
KNpTB4I3bGzUHgM+OwAh2KuxJ3pXzOPhrPwLLXLq7k1OuGRODmPxWXZ+i8FirR7C
8fpV6wNFAQ==
=JIdS
-----END PGP SIGNATURE-----
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


guido at python

Mar 10, 2009, 1:54 PM

Post #5 of 82 (4605 views)
Permalink
Re: Ext4 data loss [In reply to]

On Tue, Mar 10, 2009 at 1:46 PM, Neil Hodgson <nyamatongwe [at] gmail> wrote:
>   The technique advocated by Theodore Ts'o (save to temporary then
> rename) discards metadata. What would be useful is a simple, generic
> way in Python to copy all the appropriate metadata (ownership, ACLs,
> ...) to another file so the temporary-and-rename technique could be
> used.
>
>   On Windows, there is a hack in the file system that tries to track
> the use of temporary-and-rename and reapply ACLs and on OS X there is
> a function FSPathReplaceObject but I don't know how to do this
> correctly on Linux.

I don't know how to implement this for metadata beyond the traditional
stat metadata, but the API could be an extension of shutil.copystat().

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


barry at python

Mar 10, 2009, 2:30 PM

Post #6 of 82 (4607 views)
Permalink
Re: Ext4 data loss [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mar 10, 2009, at 4:46 PM, Neil Hodgson wrote:

> The technique advocated by Theodore Ts'o (save to temporary then
> rename) discards metadata. What would be useful is a simple, generic
> way in Python to copy all the appropriate metadata (ownership, ACLs,
> ...) to another file so the temporary-and-rename technique could be
> used.
>
> On Windows, there is a hack in the file system that tries to track
> the use of temporary-and-rename and reapply ACLs and on OS X there is
> a function FSPathReplaceObject but I don't know how to do this
> correctly on Linux.

Of course, a careful *nix application can ensure that the file owners
and mod bits are set the way it needs them to be set. A convenience
function might be useful though.

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSbbb8XEjvBPtnXfVAQLxvgP/SDnwzcKHI9E9K/ShAVWk3aShsDvJSztH
wHRQlOkbxxG/xcGJ7hGYaxJh5/TszU4wvtSc7JV5p6tRWrk/FRvAPW9lFBrlVQ8I
ZTV/bsNRJLSDxEXe7H4S2/c0L8LuGu58RGWtQzFH0UlnIFYIDwxxVGjfpVckXAc4
l54OAhDPFSk=
=njh4
-----END PGP SIGNATURE-----
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Mar 10, 2009, 3:03 PM

Post #7 of 82 (4594 views)
Permalink
Re: Ext4 data loss [In reply to]

> If I understand the post properly, it's up to the app to call fsync(),

Correct.

> and it's only necessary when you're doing one of the rename dances, or
> updating a file in place.

No. It's in general necessary when you want to be sure that the data is
on disk, even if the power is lost. So even if you write a file (say, a
.pyc) only once - if the lights go out, and on again, your .pyc might be
corrupted, as the file system may have chosen to flush the metadata onto
disk, but not the actual data (or only parts of it). This may
happen even if the close(2) operation was successful.

In the specific case of config files, that's unfortunate because you
then can't revert to the old state, either - because that may be gone.
Ideally, you want transactional updates - you get either the old config
or the new config after a crash. You can get that with explicit
fdatasync, or with a transactional database (which may chose to sync
only infrequently, but then will be able to rollback the old state if
the new one wasn't written completely).

But yes, I agree, it's the applications' responsibility to properly
sync. If I had to place sync calls into the standard library, they would
go into dumbdbm.

I somewhat disagree that it is the application's fault entirely, and not
the operating system's/file system's fault. Ideally, there would be an
option of specifying transaction brackets for file operations, so that
the system knows it cannot flush the unlink operation of the old file
before it has flushed the data of the new file. This would still allow
the system to schedule IO fairly freely, but also guarantee that not all
gets lost in a crash. I thought that the data=ordered ext3 mount option
was going in that direction - not sure what happened to it in ext4.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


amk at amk

Mar 10, 2009, 3:09 PM

Post #8 of 82 (4598 views)
Permalink
Re: Ext4 data loss [In reply to]

On Tue, Mar 10, 2009 at 09:11:38PM +0100, Christian Heimes wrote:
> Python's file type doesn't use fsync() and be the victim of the very
> same issue, too. Should we do anything about it?

The mailbox module tries to be careful and always fsync() before
closing files, because mail messages are pretty important. The
various *dbm modules mostly have .sync() method.

dumbdbm.py doesn't call fsync(), AFAICT; _commit() writes stuff and
closes the file, but doesn't call fsync().

sqlite3 doesn't have a sync() or flush() call. Does SQLite handle
this itself?

The tarfile, zipfile, and gzip/bzip2 classes don't seem to use fsync()
at all, either implicitly or by having methods for calling them.
Should they? What about cookielib.CookieJar?

--amk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Mar 10, 2009, 4:34 PM

Post #9 of 82 (4602 views)
Permalink
Re: Ext4 data loss [In reply to]

Neil Hodgson <nyamatongwe <at> gmail.com> writes:
>
> What would be useful is a simple, generic
> way in Python to copy all the appropriate metadata (ownership, ACLs,
> ...) to another file so the temporary-and-rename technique could be
> used.

How about shutil.copystat()?


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


cs at zip

Mar 10, 2009, 5:31 PM

Post #10 of 82 (4596 views)
Permalink
Re: Ext4 data loss [In reply to]

On 10Mar2009 18:09, A.M. Kuchling <amk [at] amk> wrote:
| On Tue, Mar 10, 2009 at 09:11:38PM +0100, Christian Heimes wrote:
| > Python's file type doesn't use fsync() and be the victim of the very
| > same issue, too. Should we do anything about it?

IMHO, beyond _offering_ an fsync method, no.

| The mailbox module tries to be careful and always fsync() before
| closing files, because mail messages are pretty important.

Can it be turned off? I hadn't realised this.

| The
| various *dbm modules mostly have .sync() method.
|
| dumbdbm.py doesn't call fsync(), AFAICT; _commit() writes stuff and
| closes the file, but doesn't call fsync().
|
| sqlite3 doesn't have a sync() or flush() call. Does SQLite handle
| this itself?

Yeah, most obnoxiously. There's a longstanding firefox bug about the
horrendous performance side effects of sqlite's zeal in this regard:

https://bugzilla.mozilla.org/show_bug.cgi?id=421482

At least there's now an (almost undocumented) preference to disable it,
which I do on a personal basis.

| The tarfile, zipfile, and gzip/bzip2 classes don't seem to use fsync()
| at all, either implicitly or by having methods for calling them.
| Should they? What about cookielib.CookieJar?

I think they should not do this implicitly. By all means let a user
issue policy.

In case you hadn't guessed, I fall into the "never fsync" group,
something of a simplification of my real position. In my opinion,
deciding to fsync is almost always a user policy decision, not an app
decision. An app talks to the OS; if the OS' filesystem has accepted
responsibility for the data (as it has after a successful fflush, for
example) then normally the app should have no further responsibility;
that is _exactly_ what the OS is responsible for.

Recovery is what backups are for, generally speaking.
All this IMHO, of course.

Of course there are some circumstances where one might fsync, as part
of one's risk mitigation policies (eg database checkpointing etc). But
whenever you do this you're basicly saying you don't trust the OS
abstraction of the hardware and also imposing an inherent performance
bottleneck.

With things like ext3 (and ext4 may well be the same - I have not
checked) an fsync doesn't just sync that file data and metadata, it does
a whole-filesystem sync. Really expensive. If underlying libraries do that
quietly and without user oversight/control then this failure to trust the
OS puts an unresolvable bottlneck on various things, and as an app scales
up in I/O or operational throughput or as a library or facility becomes
"higher level" (i.e. _involving_ more and more underlying complexity or
number of basic operations) the more intrusive and unfixable such a low
level "auto-fsync" would become.

Also, how far do you want to go to assure integrity for particular
filesystems' integrity issues/behaviours? Most filesystems sync to disc
regularly (or frequently, at any rate) anyway. What's too big a window
of potential loss?

For myself, I'm against libraries that implicitly do fsyncs, especially
if the user can't issue policy about it.

Cheers,
--
Cameron Simpson <cs [at] zip> DoD#743
http://www.cskk.ezoshosting.com/cs/

If it can't be turned off, it's not a feature. - Karl Heuer
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Mar 10, 2009, 6:01 PM

Post #11 of 82 (4594 views)
Permalink
Re: Ext4 data loss [In reply to]

Guido van Rossum wrote:
> If I understand the post properly, it's up to the app to call fsync(),
> and it's only necessary when you're doing one of the rename dances, or
> updating a file in place. Basically, as he explains, fsync() is a very
> heavyweight operation; I'm against calling it by default anywhere.

I agree with you, fsync() shouldn't be called by default. I didn't plan
on adding fsync() calls all over our code. However I like to suggest a
file.sync() method and a synced flag for files to make the job of
application developers easier.

When a file is opened for writing and has the synced flag set, fsync()
is called immediately before the FILE *fp is closed.

Suggested syntax:

>>> f = open("somefile", "ws")
>>> f.synced
True

or:

>>> f = open(somefile, "w")
>>> f.synced
False
>>> f.synced = True
>>> f.synced
True


The sync() method of a file object flushes the internal buffer(fflush()
for Python 2's file object) and fsync() the file descriptor.

Finally the documentation should give the user a hint that close() does
not necessarily mean the data is written to disk. It's not our
responsibility to teach Python users how to deal with low level stuff.
On the other hand a short warning doesn't hurt us and may help Python
users to write better programs.

For the rest I concur with MvL and AMK.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


amk at amk

Mar 10, 2009, 7:14 PM

Post #12 of 82 (4595 views)
Permalink
Re: Ext4 data loss [In reply to]

On Wed, Mar 11, 2009 at 11:31:52AM +1100, Cameron Simpson wrote:
> On 10Mar2009 18:09, A.M. Kuchling <amk [at] amk> wrote:
> | The mailbox module tries to be careful and always fsync() before
> | closing files, because mail messages are pretty important.
>
> Can it be turned off? I hadn't realised this.

No, there's no way to turn it off (well, you could delete 'fsync' from
the os module).

> | The tarfile, zipfile, and gzip/bzip2 classes don't seem to use fsync()
> | at all, either implicitly or by having methods for calling them.
> | Should they? What about cookielib.CookieJar?
>
> I think they should not do this implicitly. By all means let a user
> issue policy.

The problem is that in some cases the user can't issue policy. For
example, look at dumbdbm._commit(). It renames a file to a backup,
opens a new file object, writes to it, and closes it. A caller can't
fsync() because the file object is created, used, and closed
internally. With zipfile, you could at least access the .fp attribute
to sync it (though is the .fp documented as part of the interface?).

In other words, do we need to ensure that all the relevant library
modules expose an interface to allow requesting a sync, or getting the
file descriptor in order to sync it?

--amk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Mar 10, 2009, 7:20 PM

Post #13 of 82 (4595 views)
Permalink
Re: Ext4 data loss [In reply to]

Christian Heimes <lists <at> cheimes.de> writes:
>
> I agree with you, fsync() shouldn't be called by default. I didn't plan
> on adding fsync() calls all over our code. However I like to suggest a
> file.sync() method and a synced flag for files to make the job of
> application developers easier.

We already have os.fsync() and os.fdatasync(). Should the sync() (and
datasync()?) method be added as an object-oriented convenience?



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Mar 10, 2009, 7:45 PM

Post #14 of 82 (4599 views)
Permalink
Re: Ext4 data loss [In reply to]

Antoine Pitrou wrote:
> Christian Heimes <lists <at> cheimes.de> writes:
>> I agree with you, fsync() shouldn't be called by default. I didn't plan
>> on adding fsync() calls all over our code. However I like to suggest a
>> file.sync() method and a synced flag for files to make the job of
>> application developers easier.
>
> We already have os.fsync() and os.fdatasync(). Should the sync() (and
> datasync()?) method be added as an object-oriented convenience?

It's more than an object oriented convenience. fsync() takes a file
descriptor as argument. Therefore I assume fsync() only syncs the data
to disk that was written to the file descriptor. [*] In Python 2.x we
are using a FILE* based stream. In Python 3.x we have our own buffered
writer class.

In order to write all data to disk the FILE* stream must be flushed
first before fsync() is called:

PyFileObject *f;
if (fflush(f->f_fp) != 0) {
/* report error */
}
if (fsync(fileno(f->f_fp)) != 0) {
/* report error */
}


Christian

[*] Is my assumption correct, anybody?
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


guido at python

Mar 10, 2009, 7:55 PM

Post #15 of 82 (4594 views)
Permalink
Re: Ext4 data loss [In reply to]

On Tue, Mar 10, 2009 at 7:45 PM, Christian Heimes <lists [at] cheimes> wrote:
> Antoine Pitrou wrote:
>> Christian Heimes <lists <at> cheimes.de> writes:
>>> I agree with you, fsync() shouldn't be called by default. I didn't plan
>>> on adding fsync() calls all over our code. However I like to suggest a
>>> file.sync() method and a synced flag for files to make the job of
>>> application developers easier.
>>
>> We already have os.fsync() and os.fdatasync(). Should the sync() (and
>> datasync()?) method be added as an object-oriented convenience?
>
> It's more than an object oriented convenience. fsync() takes a file
> descriptor as argument. Therefore I assume fsync() only syncs the data
> to disk that was written to the file descriptor. [*] In Python 2.x we
> are using a FILE* based stream. In Python 3.x we have our own buffered
> writer class.
>
> In order to write all data to disk the FILE* stream must be flushed
> first before fsync() is called:
>
>    PyFileObject *f;
>    if (fflush(f->f_fp) != 0) {
>        /* report error */
>    }
>    if (fsync(fileno(f->f_fp)) != 0) {
>        /* report error */
>    }

Let's not think too Unix-specific. If we add such an API it should do
something on Windows too -- the app shouldn't have to test for the
presence of the API. (And thus the API probably shouldn't be called
fsync.)

> Christian
>
> [*] Is my assumption correct, anybody?

It seems to be, at least it's ambiguous.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


cs at zip

Mar 10, 2009, 7:59 PM

Post #16 of 82 (4581 views)
Permalink
Re: Ext4 data loss [In reply to]

On 10Mar2009 22:14, A.M. Kuchling <amk [at] amk> wrote:
| On Wed, Mar 11, 2009 at 11:31:52AM +1100, Cameron Simpson wrote:
| > On 10Mar2009 18:09, A.M. Kuchling <amk [at] amk> wrote:
| > | The mailbox module tries to be careful and always fsync() before
| > | closing files, because mail messages are pretty important.
| >
| > Can it be turned off? I hadn't realised this.
|
| No, there's no way to turn it off (well, you could delete 'fsync' from
| the os module).

Ah. For myself, were I writing a high load mailbox tool (eg a mail filer
or more to the point, a mail refiler - which I do actually intend to) I
would want to be able to do a huge mass of mailbox stuff and then
possibly issue a sync at the end. For "unix mbox" that might be ok but
for maildirs I'd imagine it leads to an fsync per message.

| > | The tarfile, zipfile, and gzip/bzip2 classes don't seem to use fsync()
| > | at all, either implicitly or by having methods for calling them.
| > | Should they? What about cookielib.CookieJar?
| >
| > I think they should not do this implicitly. By all means let a user
| > issue policy.
|
| The problem is that in some cases the user can't issue policy. For
| example, look at dumbdbm._commit(). It renames a file to a backup,
| opens a new file object, writes to it, and closes it. A caller can't
| fsync() because the file object is created, used, and closed
| internally. With zipfile, you could at least access the .fp attribute
| to sync it (though is the .fp documented as part of the interface?).

I didn't so much mean giving the user an fsync hook so much as publishing a
flag such as ".do_critical_fsyncs" inside the dbm or zipfile object. If true,
issue fsyncs at appropriate times.

| In other words, do we need to ensure that all the relevant library
| modules expose an interface to allow requesting a sync, or getting the
| file descriptor in order to sync it?

With a policy flag you could solve the control issue even for things
which don't expose the fd such as your dumbdbm._commit() example.
If you supply both a flag and an fsync() method it becomes easy for
a user of a module to go:

obj = get_dbm_handle(....)
obj.do_critical_fsyncs = False
... do lots and lots of stuff ...
obj.fsync()
obj.close()

Cheers,
--
Cameron Simpson <cs [at] zip> DoD#743
http://www.cskk.ezoshosting.com/cs/

In the end, winning is the only safety. - Kerr Avon
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


cs at zip

Mar 10, 2009, 8:02 PM

Post #17 of 82 (4618 views)
Permalink
Re: Ext4 data loss [In reply to]

On 11Mar2009 02:20, Antoine Pitrou <solipsis [at] pitrou> wrote:
| Christian Heimes <lists <at> cheimes.de> writes:
| > I agree with you, fsync() shouldn't be called by default. I didn't plan
| > on adding fsync() calls all over our code. However I like to suggest a
| > file.sync() method and a synced flag for files to make the job of
| > application developers easier.
|
| We already have os.fsync() and os.fdatasync(). Should the sync() (and
| datasync()?) method be added as an object-oriented convenience?

I can imagine plenty of occasions when there may not be an available
file descriptor to hand to os.fsync() et al. Having sync() and
datasync() methods in the object would obviate the need for the caller
to know the object internals.
--
Cameron Simpson <cs [at] zip> DoD#743
http://www.cskk.ezoshosting.com/cs/

I must construct my own System, or be enslaved to another Man's.
- William Blake
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


Scott.Daniels at Acm

Mar 11, 2009, 12:17 AM

Post #18 of 82 (4590 views)
Permalink
Re: Ext4 data loss [In reply to]

A.M. Kuchling wrote:
> .... With zipfile, you could at least access the .fp attribute
> to sync it (though is the .fp documented as part of the interface?).

For this one, I'd like to add the sync as a method (so that Zip-inside-
Zip is eventually possible). In fact, a sync on an exposed writable
for a single file should probably push back out to a full sync. This
would be trickier to accomplish if the using code had to suss out how
to get to the fp. Clearly I have plans for a ZipFile expansion, but
this could only conceivably hit 2.7, and 2.8 / 3.2 is a lot more likely.

--Scott David Daniels
Scott.Daniels [at] Acm

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Mar 11, 2009, 1:47 AM

Post #19 of 82 (4596 views)
Permalink
Re: Ext4 data loss [In reply to]

>> We already have os.fsync() and os.fdatasync(). Should the sync() (and
>> datasync()?) method be added as an object-oriented convenience?
>
> It's more than an object oriented convenience. fsync() takes a file
> descriptor as argument. Therefore I assume fsync() only syncs the data
> to disk that was written to the file descriptor. [*]
[...]
> [*] Is my assumption correct, anybody?

Not necessarily. In Linux, for many releases, fsync() was really
equivalent to sync() (i.e. flushing all data for all files on all
file systems to disk). It may be that some systems still implement
it that way today.

However, even it it was true, I don't see why a .sync method would
be more than a convenience. An application wishing to sync a file
before close can do

f.flush()
os.fsync(f.fileno)
f.close()

With a sync method, it would become

f.flush()
f.sync()
f.close()

which is *really* nothing more than convenience.

O'd also like to point to the O_SYNC/O_DSYNC/O_RSYNC open(2)
flags. Applications that require durable writes can also chose
to set those on open, and be done.

Regrds,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


him at online

Mar 11, 2009, 2:09 AM

Post #20 of 82 (4595 views)
Permalink
Re: Ext4 data loss [In reply to]

Guido van Rossum wrote:
> On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes <lists [at] cheimes> wrote:
>
>> [...]
>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
>> [...]
>>
> If I understand the post properly, it's up to the app to call fsync(),
> and it's only necessary when you're doing one of the rename dances, or
> updating a file in place. Basically, as he explains, fsync() is a very
> heavyweight operation; I'm against calling it by default anywhere.
>
>
To me, the flaw seem to be in the close() call (of the operating
system). I'd expect the data to be
in a persistent state once the close() returns. So there would be no
need to fsync if the file gets closed anyway.

Of course the close() call could take a while (up to 30 seconds in
laptop mode), but if one does
not want to wait that long, than one can continue without calling
close() and take the risk.

Of course, if the data should be on a persistant storage without closing
the file (e.g. for database
applications), than one has to carefully call the different sync
methods, but that's an other story.

Why has this ext4 problem not come up for other filesystems?



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


nyamatongwe at gmail

Mar 11, 2009, 2:14 AM

Post #21 of 82 (4586 views)
Permalink
Re: Ext4 data loss [In reply to]

Antoine Pitrou:

> How about shutil.copystat()?

shutil.copystat does not copy over the owner, group or ACLs.

Modeling a copymetadata method on copystat would provide an easy to
understand API and should be implementable on Windows and POSIX.
Reading the OS X documentation shows a set of low-level POSIX
functions for ACLs. Since there are multiple pieces of metadata and an
application may not want to copy all pieces there could be multiple
methods (copygroup ...) or one method with options
shutil.copymetadata(src, dst, group=True, resource_fork=False)

Neil
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


hrvoje.niksic at avl

Mar 11, 2009, 2:55 AM

Post #22 of 82 (4577 views)
Permalink
Re: Ext4 data loss [In reply to]

Joachim König wrote:
> To me, the flaw seem to be in the close() call (of the operating
> system). I'd expect the data to be
> in a persistent state once the close() returns.

I wouldn't, because that would mean that every cp -r would effectively
do an fsync() for each individual file it copies, which would bog down
in the case of copying many small files. Operating systems aggressively
buffer file systems for good reason: performance of the common case.

> Why has this ext4 problem not come up for other filesystems?

It has come up for XFS many many times, for example
https://launchpad.net/ubuntu/+bug/37435

ext3 was resillient to the problem because of its default allocation
policy; now that ext4 has implemented the same optimization XFS had
before, it shares the problems.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Mar 11, 2009, 4:43 AM

Post #23 of 82 (4563 views)
Permalink
Re: Ext4 data loss [In reply to]

Neil Hodgson <nyamatongwe <at> gmail.com> writes:
>
> shutil.copystat does not copy over the owner, group or ACLs.

It depends on what you call "ACLs". It does copy the chmod permission bits.
As for owner and group, I think there is a very good reason that it doesn't copy
them: under Linux, only root can change these properties.



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


phd at phd

Mar 11, 2009, 4:46 AM

Post #24 of 82 (4553 views)
Permalink
Re: Ext4 data loss [In reply to]

On Wed, Mar 11, 2009 at 11:43:33AM +0000, Antoine Pitrou wrote:
> As for owner and group, I think there is a very good reason that it doesn't copy
> them: under Linux, only root can change these properties.

Only root can change file ownership - and yes, there are scripts that
run with root privileges, so why not copy? As for group ownership - any
user can change group if [s]he belongs to the group.

Oleg.
--
Oleg Broytmann http://phd.pp.ru/ phd [at] phd
Programmers don't die, they just GOSUB without RETURN.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Mar 11, 2009, 4:47 AM

Post #25 of 82 (4579 views)
Permalink
Re: Ext4 data loss [In reply to]

Christian Heimes <lists <at> cheimes.de> writes:
>
> It's more than an object oriented convenience. fsync() takes a file
> descriptor as argument. Therefore I assume fsync() only syncs the data
> to disk that was written to the file descriptor.

Ok, I agree that a .sync() method makes sense.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 3 4 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.