Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: OpenStack: Operators

XFS documentation seems to conflict with recommendations in Swift

 

 

OpenStack operators RSS feed   Index | Next | Previous | View Threaded


slyphon at gmail

Oct 12, 2011, 8:08 AM

Post #1 of 5 (403 views)
Permalink
XFS documentation seems to conflict with recommendations in Swift

Hello all,

I'm in the middle of a 120T Swift deployment, and I've had some
concerns about the backing filesystem. I formatted everything with
ext4 with 1024b inodes (for storing xattrs), but the process took so
long that I'm now looking at XFS again. In particular, this concerns
me http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.

In the swift documentation, it's recommended to mount the filesystems
w/ 'nobarrier', but it would seem to me that this would leave the data
open to corruption in the case of a crash. AFAIK, swift doesn't do
checksumming (and checksum checking) of stored data (after it is
written), which would mean that any data corruption would silently get
passed back to the users.

Now, I haven't had operational experience running XFS in production,
I've mainly used ZFS, JFS, and ext{3,4}. Are there any recommendations
for using XFS safely in production?


btorch-os at zeroaccess

Oct 13, 2011, 9:18 AM

Post #2 of 5 (378 views)
Permalink
XFS documentation seems to conflict with recommendations in Swift [In reply to]

Hi Jonathan,


I guess that will depend on how your storage nodes are configured (hardware wise). The reason why it's recommended is because the storage drives are actually attached to a controller that has RiW cache enabled.



Q. Should barriers be enabled with storage which has a persistent write cache?

Many hardware RAID have a persistent write cache which preserves it across power failure, interface resets, system crashes, etc. Using write barriers in this instance is not recommended and will in fact lower performance. Therefore, it is recommended to turn off the barrier support and mount the filesystem with "nobarrier". But take care about the hard disk write cache, which should be off.


Marcelo Martins
Openstack-swift
btorch-os at zeroaccess.org

?Knowledge is the wings on which our aspirations take flight and soar. When it comes to surfing and life if you know what to do you can do it. If you desire anything become educated about it and succeed. ?




On Oct 12, 2011, at 10:08 AM, Jonathan Simms wrote:

> Hello all,
>
> I'm in the middle of a 120T Swift deployment, and I've had some
> concerns about the backing filesystem. I formatted everything with
> ext4 with 1024b inodes (for storing xattrs), but the process took so
> long that I'm now looking at XFS again. In particular, this concerns
> me http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>
> In the swift documentation, it's recommended to mount the filesystems
> w/ 'nobarrier', but it would seem to me that this would leave the data
> open to corruption in the case of a crash. AFAIK, swift doesn't do
> checksumming (and checksum checking) of stored data (after it is
> written), which would mean that any data corruption would silently get
> passed back to the users.
>
> Now, I haven't had operational experience running XFS in production,
> I've mainly used ZFS, JFS, and ext{3,4}. Are there any recommendations
> for using XFS safely in production?
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20111013/67d48167/attachment.html>


linuxcole at gmail

Oct 13, 2011, 1:50 PM

Post #3 of 5 (376 views)
Permalink
XFS documentation seems to conflict with recommendations in Swift [In reply to]

generally mounting with -o nobarrier is a bad idea (ext4 or xfs), unless you
have disks that do not have write caches. don't follow that

recommendation, or for example - fsync won't work which is something swift
relies upon.


On Thu, Oct 13, 2011 at 9:18 AM, Marcelo Martins
<btorch-os at zeroaccess.org>wrote:

> Hi Jonathan,
>
>
> I guess that will depend on how your storage nodes are configured (hardware
> wise). The reason why it's recommended is because the storage drives are
> actually attached to a controller that has RiW cache enabled.
>
>
>
> Q. Should barriers be enabled with storage which has a persistent write
> cache?
> Many hardware RAID have a persistent write cache which preserves it across
> power failure, interface resets, system crashes, etc. Using write barriers
> in this instance is not recommended and will in fact lower performance.
> Therefore, it is recommended to turn off the barrier support and mount the
> filesystem with "nobarrier". But take care about the hard disk write cache,
> which should be off.
>
>
> Marcelo Martins
> Openstack-swift
> btorch-os at zeroaccess.org
>
> ?Knowledge is the wings on which our aspirations take flight and soar. When
> it comes to surfing and life if you know what to do you can do it. If you
> desire anything become educated about it and succeed. ?
>
>
>
>
> On Oct 12, 2011, at 10:08 AM, Jonathan Simms wrote:
>
> Hello all,
>
> I'm in the middle of a 120T Swift deployment, and I've had some
> concerns about the backing filesystem. I formatted everything with
> ext4 with 1024b inodes (for storing xattrs), but the process took so
> long that I'm now looking at XFS again. In particular, this concerns
> me http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>
> In the swift documentation, it's recommended to mount the filesystems
> w/ 'nobarrier', but it would seem to me that this would leave the data
> open to corruption in the case of a crash. AFAIK, swift doesn't do
> checksumming (and checksum checking) of stored data (after it is
> written), which would mean that any data corruption would silently get
> passed back to the users.
>
> Now, I haven't had operational experience running XFS in production,
> I've mainly used ZFS, JFS, and ext{3,4}. Are there any recommendations
> for using XFS safely in production?
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20111013/b75181d7/attachment.html>


gordon.irving at sophos

Oct 13, 2011, 3:11 PM

Post #4 of 5 (376 views)
Permalink
XFS documentation seems to conflict with recommendations in Swift [In reply to]

If you are on a Battery Backed Unit raid controller, then its generally safe to disable barriers for journal filesystems. If your doing soft raid, jbod, single disk arrays or cheaped out and did not get a BBU then you may want to enable barriers for filesystem consistency.

For raid cards with a BBU then set your io scheduler to noop, and disable barriers. The raid card does its own re-ordering of io operations, the OS has an incomplete picture of the true drive geometry. The raid card is emulating one disk geometry which could be an array of 2 - 100+ disks. The OS simply can not make good judgment calls on how best to schedule io to different parts of the disk because its built around the assumption of a single spinning disk. This is also true for if a write has made it safely non persistent cache (ie disk cache), to a persistent cache (ie the battery in your raid card) or persistent storage (that array of disks) . This is a failure of the Raid card <-> OS interface. There simply is not the richness to say (signal write is ok if on platter or persistent cache not okay in disk cache) or

Enabling barriers effectively turns all writes into Write-Through operations, so the write goes straight to the disk platter and you get little performance benefit from the raid card (which hurts a lot in terms of lost iops). If the BBU looses charge/fails then the raid controller downgrades to Write-Through (vs Write-Backed) operation.

BBU raid controllers disable disk caches, as these are not safe in event of power loss, and do not provide any benefit over the raid card cache.

In the context of swift, hdfs and other highly replicated datastores, I run them in jbod or raid-0 + nobarrier , noatime, nodiratime with a filesystem aligned to the geometry of underlying storage* etc to squeeze as much performance as possible out of the raw storage. Let the application layer deal with redundancy of data across the network, if a machine /disk dies ... so what, you have N other copies of that data elsewhere on the network. A bit of storage is lost ... do consider how many nodes can be down at any time when operating these sorts of clusters Big boxen with lots of storage may seem attractive from a density perspective until you loose one and 25% of your storage capacity with it ... many smaller baskets ...

For network level data consistency swift should have a data scrubber (periodic process to read and compare checksums of replicated blocks), I have not checked if this is implemented or on the roadmap. I would be very surprised if this was not a part of swift.

*you can hint to the fs layer how to offset block writes by specifying a stride width which is the number of data carrying disks in the array and the block size typically the default is 64k for raid arrays

From: openstack-operators-bounces [at] lists [mailto:openstack-operators-bounces [at] lists] On Behalf Of Cole Crawford
Sent: 13 October 2011 13:51
To: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] XFS documentation seems to conflict with recommendations in Swift


generally mounting with -o nobarrier is a bad idea (ext4 or xfs), unless you have disks that do not have write caches. don't follow that

recommendation, or for example - fsync won't work which is something swift relies upon.


On Thu, Oct 13, 2011 at 9:18 AM, Marcelo Martins <btorch-os at zeroaccess.org<mailto:btorch-os at zeroaccess.org>> wrote:
Hi Jonathan,


I guess that will depend on how your storage nodes are configured (hardware wise). The reason why it's recommended is because the storage drives are actually attached to a controller that has RiW cache enabled.



Q. Should barriers be enabled with storage which has a persistent write cache?
Many hardware RAID have a persistent write cache which preserves it across power failure, interface resets, system crashes, etc. Using write barriers in this instance is not recommended and will in fact lower performance. Therefore, it is recommended to turn off the barrier support and mount the filesystem with "nobarrier". But take care about the hard disk write cache, which should be off.


Marcelo Martins
Openstack-swift
btorch-os at zeroaccess.org<mailto:btorch-os at zeroaccess.org>

"Knowledge is the wings on which our aspirations take flight and soar. When it comes to surfing and life if you know what to do you can do it. If you desire anything become educated about it and succeed. "



On Oct 12, 2011, at 10:08 AM, Jonathan Simms wrote:


Hello all,

I'm in the middle of a 120T Swift deployment, and I've had some
concerns about the backing filesystem. I formatted everything with
ext4 with 1024b inodes (for storing xattrs), but the process took so
long that I'm now looking at XFS again. In particular, this concerns
me http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.

In the swift documentation, it's recommended to mount the filesystems
w/ 'nobarrier', but it would seem to me that this would leave the data
open to corruption in the case of a crash. AFAIK, swift doesn't do
checksumming (and checksum checking) of stored data (after it is
written), which would mean that any data corruption would silently get
passed back to the users.

Now, I haven't had operational experience running XFS in production,
I've mainly used ZFS, JFS, and ext{3,4}. Are there any recommendations
for using XFS safely in production?
_______________________________________________
Openstack-operators mailing list
Openstack-operators at lists.openstack.org<mailto:Openstack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
Openstack-operators mailing list
Openstack-operators at lists.openstack.org<mailto:Openstack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


________________________________
Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 991 2418 08.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20111013/48775c35/attachment-0001.html>


slyphon at gmail

Oct 24, 2011, 7:13 AM

Post #5 of 5 (377 views)
Permalink
XFS documentation seems to conflict with recommendations in Swift [In reply to]

Thanks all for the information! I'm going to use this advice as part
of the next round of hardware purchasing we're doing.


On Thu, Oct 13, 2011 at 6:11 PM, Gordon Irving <gordon.irving at sophos.com> wrote:
>
>
> If you are on a Battery Backed Unit raid controller, then its generally safe
> to disable barriers for journal filesystems.? If your doing soft raid, jbod,
> single disk arrays or cheaped out and did not get a BBU then you may want to
> enable barriers for filesystem consistency.
>
>
>
> For raid cards with a BBU then set your io scheduler to noop, and disable
> barriers.? The raid card does its own re-ordering of io operations, the OS
> has an incomplete picture of the true drive geometry. ?The raid card is
> emulating one disk geometry which could be an array of 2 ? 100+ disks.? The
> OS simply can not make good judgment calls on how best to schedule io to
> different parts of the disk because its built around the assumption of a
> single spinning disk.? This is also true for if a write has made it safely
> non persistent cache (ie disk cache), ?to a persistent cache (ie the battery
> in your raid card) or persistent storage (that array of disks) .? ???This is
> a failure of the Raid card <-> OS interface.? There simply is not the
> richness to say (signal write is ok if on platter or persistent cache not
> okay in disk cache) or
>
>
>
> Enabling barriers effectively turns all writes into Write-Through
> operations, so the write goes straight to the disk platter and you get
> little performance benefit from the raid card (which hurts a lot in terms of
> lost iops). ??If the BBU looses charge/fails ?then the raid controller
> downgrades to Write-Through (vs Write-Backed) operation.
>
>
>
> BBU ?raid controllers disable disk caches, as these are not safe in event of
> power loss, and do not provide any benefit over the raid card cache.
>
>
>
> In the context of swift, hdfs and other highly replicated datastores, I run
> them in jbod or raid-0 + nobarrier , noatime, nodiratime with a filesystem
> aligned to the geometry of underlying storage* ?etc to squeeze as much
> performance as possible out of the raw storage.? Let the application layer
> deal with redundancy of data across the network, if a machine /disk dies ?
> so what, you have N other copies of that data elsewhere on the network.? A
> bit of storage is lost ? do consider how many nodes can be down at any time
> when operating these sorts of clusters Big boxen with lots of storage may
> seem attractive from a density perspective until you loose one and 25% of
> your storage capacity with it ? many smaller baskets ?
>
>
>
> For network level data consistency ?swift should have a ?data scrubber
> (periodic process to read and compare checksums of replicated blocks), I
> have not checked if this is implemented or on the roadmap.?? I would be very
> surprised if this was not a part of swift.
>
>
>
> *you can hint to the fs layer how to offset block writes by specifying a
> stride width which is the number of data carrying disks in the array and the
> block size typically the default is 64k for raid arrays
>
>
>
> From: openstack-operators-bounces at lists.openstack.org
> [mailto:openstack-operators-bounces at lists.openstack.org] On Behalf Of Cole
> Crawford
> Sent: 13 October 2011 13:51
> To: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] XFS documentation seems to conflict with
> recommendations in Swift
>
>
>
> generally mounting with -o nobarrier is a bad idea (ext4 or xfs), unless?you
> have disks that do not have write caches. don't follow that
>
> recommendation, or for example - fsync won't work which is something swift
> relies?upon.
>
>
>
>
>
> On Thu, Oct 13, 2011 at 9:18 AM, Marcelo Martins <btorch-os at zeroaccess.org>
> wrote:
>
> Hi Jonathan,
>
>
>
>
>
> I guess that will depend on how your storage nodes are configured (hardware
> wise). ?The reason why it's recommended is because the storage drives are
> actually attached to a controller that has RiW cache enabled.
>
>
>
>
>
>
>
> Q. Should barriers be enabled with storage which has a persistent write
> cache?
>
> Many hardware RAID have a persistent write cache which preserves it across
> power failure, interface resets, system crashes, etc. Using write barriers
> in this instance is not recommended and will in fact lower performance.
> Therefore, it is recommended to turn off the barrier support and mount the
> filesystem with "nobarrier". But take care about the hard disk write cache,
> which should be off.
>
>
>
>
>
> Marcelo Martins
>
> Openstack-swift
>
> btorch-os at zeroaccess.org
>
>
>
> ?Knowledge is the wings on which our aspirations take flight and soar. When
> it comes to surfing and life if you know what to do you can do it. If you
> desire anything become educated about it and succeed. ?
>
>
>
>
>
>
>
> On Oct 12, 2011, at 10:08 AM, Jonathan Simms wrote:
>
> Hello all,
>
> I'm in the middle of a 120T Swift deployment, and I've had some
> concerns about the backing filesystem. I formatted everything with
> ext4 with 1024b inodes (for storing xattrs), but the process took so
> long that I'm now looking at XFS again. In particular, this concerns
> me http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>
> In the swift documentation, it's recommended to mount the filesystems
> w/ 'nobarrier', but it would seem to me that this would leave the data
> open to corruption in the case of a crash. AFAIK, swift doesn't do
> checksumming (and checksum checking) of stored data (after it is
> written), which would mean that any data corruption would silently get
> passed back to the users.
>
> Now, I haven't had operational experience running XFS in production,
> I've mainly used ZFS, JFS, and ext{3,4}. Are there any recommendations
> for using XFS safely in production?
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> ________________________________
> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP,
> United Kingdom.
> Company Reg No 2096520. VAT Reg No GB 991 2418 08.
>
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>

OpenStack operators RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.