Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: MythTV: Users

Network storage fault tolerance (was Re: Myth autoexpiring brand new shows)

 

 

MythTV users RSS feed   Index | Next | Previous | View Threaded


mtdean at thirdcontact

Aug 26, 2008, 3:24 PM

Post #1 of 13 (1649 views)
Permalink
Network storage fault tolerance (was Re: Myth autoexpiring brand new shows)

On 08/26/2008 05:47 PM, Kevin Kuphal wrote:
> On Tue, Aug 26, 2008 at 4:36 PM, Allen Edwards wrote:
>
>> I am having a hard time understanding why anyone would want that to
>> happen. I mean, if I set the drives up in the same storage group, why
>> is it an advantage to me as a user to effectively not use the drive,
>> which is what I read would happen...
> In the case of network disks, it has to do with network I/O from, for
> example, multiple HD recordings + multiple commflag operations that could
> seriously degrade a network and/or result in bad recordings.

Oh, and a far more important, IMHO, reason that neither Kevin nor I
mentioned--fault tolerance. If your network storage goes down in the
middle of a recording, the recording is ruined. Your local storage is
/far/ less likely to go down in the middle of the recording.

The network storage could go down because of a) network switch goes down
(Myth box is on an UPS but network switch isn't or whatever), b) host
containing the network storage goes down (reboots, crashes, ...), plus
all the same reasons that local storage could go down.

Speaking of which, does anyone know of a way to make NFS tolerant of the
NFS server's going down so that it will automatically unmount/remount
the filesystems?

Mike
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


matt at mossholder

Aug 26, 2008, 3:31 PM

Post #2 of 13 (1582 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

----- "Michael T. Dean" <mtdean [at] thirdcontact> wrote:

> Speaking of which, does anyone know of a way to make NFS tolerant of
> the
> NFS server's going down so that it will automatically unmount/remount
>
> the filesystems?
>
> Mike

You might want to take a look at NFS v4, which allows you to stripe/mirror data across servers,
rather than sticking to just a single server. Also referred to as pNFS.

Copied from a SNIA powerpoint:



==========================
pNFS allows servers to stripe data of regular
files across multiple storage devices

A pNFS server consists of:
– A metadata server (MDS) that implements the
full NFSv4.1 protocol
– One or more storage devices

A pNFS client is an NFSv4.1 client that is
prepared to directly access storage devices

The pNFS client finds out about storage devices
from the MDS via a new LAYOUTGET operation

LAYOUTGET returns a layout that describes the
striping pattern for a given file

layouts are recallable which allows pNFS
servers to re-stripe a file if desired or necessary

striping patterns can indicate if a some or all of a
pattern has mirrors
– clients are not required to construct mirrors
– Thus pNFS offers RAID 0 and RAID 1+0

===========================


--Matt

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


gull at gull

Aug 26, 2008, 3:40 PM

Post #3 of 13 (1569 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

Michael T. Dean wrote:
> Speaking of which, does anyone know of a way to make NFS tolerant of the
> NFS server's going down so that it will automatically unmount/remount
> the filesystems?

Technically NFS is stateless -- if a server goes down, I/O requests on
that filesystem just hang until it comes back. No need to remount.
This usually isn't what people want, though, because it leads to lots of
applications sitting around in unkillable D states.

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


myth at dermanouelian

Aug 26, 2008, 4:34 PM

Post #4 of 13 (1558 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

On Aug 26, 2008, at 3:24 PM, Michael T. Dean wrote:

> Speaking of which, does anyone know of a way to make NFS tolerant of
> the
> NFS server's going down so that it will automatically unmount/remount
> the filesystems?

Unfortunately, the company I used to work for did it as a cron job to
see if a touched file was present in the path. If not, try to remount
10 times. If not, try again in 5 minutes.

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


myth at dermanouelian

Aug 26, 2008, 4:44 PM

Post #5 of 13 (1572 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

On Aug 26, 2008, at 3:40 PM, David Brodbeck wrote:

> Michael T. Dean wrote:
>> Speaking of which, does anyone know of a way to make NFS tolerant
>> of the
>> NFS server's going down so that it will automatically unmount/remount
>> the filesystems?
>
> Technically NFS is stateless -- if a server goes down, I/O requests on
> that filesystem just hang until it comes back. No need to remount.
> This usually isn't what people want, though, because it leads to
> lots of
> applications sitting around in unkillable D states.

That's why I used the intr option when I was using remote storage:
intr
Allow signals to interrupt an NFS call. Useful for aborting when the
server doesn't respond.
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


rogerheflin at gmail

Aug 26, 2008, 4:47 PM

Post #6 of 13 (1577 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

David Brodbeck wrote:
> Michael T. Dean wrote:
>> Speaking of which, does anyone know of a way to make NFS tolerant of the
>> NFS server's going down so that it will automatically unmount/remount
>> the filesystems?
>
> Technically NFS is stateless -- if a server goes down, I/O requests on
> that filesystem just hang until it comes back. No need to remount.
> This usually isn't what people want, though, because it leads to lots of
> applications sitting around in unkillable D states.
>

Given the D state, and given having intr set on the mount, it should be possible
to have an application always be setting up a timer and detect the hang (with a
signal handler getting a timer expired signal) and possibly do something about
it either move to another machine or put data on a local disk until things come
back. To make it truly tolerant things would need to be threaded in such a way
that writing to the final disk would not stop the recording and could be
buffered on local disk and then put back to the nfs server when it came back,
then at least in that case with a reasonable amount of local disk (say 30-60 GB)
one could buffer quite a lot and fix the nfs server when it was noticed.

Though this would not be terribly helpful if the NFS server and the backend were
the same machine (though if it were fixed before the next recordings came up it
would be possible to not lose any record data).

Roger

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


rogerheflin at gmail

Aug 26, 2008, 5:13 PM

Post #7 of 13 (1566 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

Brad DerManouelian wrote:
> On Aug 26, 2008, at 3:24 PM, Michael T. Dean wrote:
>
>> Speaking of which, does anyone know of a way to make NFS tolerant of
>> the
>> NFS server's going down so that it will automatically unmount/remount
>> the filesystems?
>
> Unfortunately, the company I used to work for did it as a cron job to
> see if a touched file was present in the path. If not, try to remount
> 10 times. If not, try again in 5 minutes.

That is unfortunate, using hard,intr things almost always recover automatically.

The only case I have ever had to automatically umount/remount nfs filesystems
was when (for an unknown reason) that the mount went stale or got a certain
class of really weird error (both detectable from messages in the messages)
every so often on some of the many machines there were using it.

Roger

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


mtdean at thirdcontact

Aug 26, 2008, 5:41 PM

Post #8 of 13 (1571 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

On 08/26/2008 06:40 PM, David Brodbeck wrote:
> Michael T. Dean wrote:
>
>> Speaking of which, does anyone know of a way to make NFS tolerant of the
>> NFS server's going down so that it will automatically unmount/remount
>> the filesystems?
>>
> Technically NFS is stateless -- if a server goes down, I/O requests on
> that filesystem just hang until it comes back.

For me the hanging part works, but the "until it comes back," doesn't
when there's a reboot of the server involved. When the server comes
back, it continues to hang. I have to unmount (which gives errors
because the server "isn't there"--even if it is). Then, eventually, I
just remount, and it works (though sometimes it seems to show multiple
connections in the nfs *tab files).

> No need to remount.
> This usually isn't what people want, though, because it leads to lots of
> applications sitting around in unkillable D states.

I'm mounting with:

rw,_netdev,rsize=8192,wsize=8192,hard,intr,actimeo=0

(from http://mythtv.org/docs/mythtv-HOWTO-23.html#ss23.10 , but with
nfsvers=3 removed as it's not required on my systems). But, after doing
some more reading on nfs.sf.net, I switched it to use nfs4 (rather than
3) because it's supposed to be more reboot tolerant. Can't test while
the backend is recording, but if it works, I'll try to remember to post
the results.

Mike
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


rogerheflin at gmail

Aug 26, 2008, 6:18 PM

Post #9 of 13 (1580 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

Michael T. Dean wrote:
> On 08/26/2008 06:40 PM, David Brodbeck wrote:
>> Michael T. Dean wrote:
>>
>>> Speaking of which, does anyone know of a way to make NFS tolerant of the
>>> NFS server's going down so that it will automatically unmount/remount
>>> the filesystems?
>>>
>> Technically NFS is stateless -- if a server goes down, I/O requests on
>> that filesystem just hang until it comes back.
>
> For me the hanging part works, but the "until it comes back," doesn't
> when there's a reboot of the server involved. When the server comes
> back, it continues to hang. I have to unmount (which gives errors
> because the server "isn't there"--even if it is). Then, eventually, I
> just remount, and it works (though sometimes it seems to show multiple
> connections in the nfs *tab files).

How long do you wait? The retry timeouts get larger until they get up
to a fairly large number of minutes (I don't remember what the max timeout is),
so once the NFS server comes back it will take up to that time for things
to retry and continue one.

>
>> No need to remount.
>> This usually isn't what people want, though, because it leads to lots of
>> applications sitting around in unkillable D states.
>
> I'm mounting with:
>
> rw,_netdev,rsize=8192,wsize=8192,hard,intr,actimeo=0
>
> (from http://mythtv.org/docs/mythtv-HOWTO-23.html#ss23.10 , but with
> nfsvers=3 removed as it's not required on my systems). But, after doing
> some more reading on nfs.sf.net, I switched it to use nfs4 (rather than
> 3) because it's supposed to be more reboot tolerant. Can't test while
> the backend is recording, but if it works, I'll try to remember to post
> the results.

Lose the rsize/wsize too, you only really need it if your network is
screwed up, 8192 has not been an optimal setting for anything for quite
a while.

Roger
>
> Mike
> _______________________________________________
> mythtv-users mailing list
> mythtv-users [at] mythtv
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
>

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


chrisribe at gmail

Aug 26, 2008, 8:14 PM

Post #10 of 13 (1563 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

>
> Speaking of which, does anyone know of a way to make NFS tolerant of the
> NFS server's going down so that it will automatically unmount/remount
> the filesystems?
>

Not exactly what you are asking for, but DRBD + Heartbeat + NFS is a
nice solution for highly available NAT.
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


gull at gull

Aug 26, 2008, 10:58 PM

Post #11 of 13 (1539 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

Michael T. Dean wrote:
>
> For me the hanging part works, but the "until it comes back," doesn't
> when there's a reboot of the server involved. When the server comes
> back, it continues to hang. I have to unmount (which gives errors
> because the server "isn't there"--even if it is). Then, eventually, I
> just remount, and it works (though sometimes it seems to show multiple
> connections in the nfs *tab files).
>
Huh, odd. It seems to work OK on reboot on the cluster at work,
although sometimes it takes a couple of minutes for the machines to
notice the server is back.

NFS behavior varies somewhat from OS to OS and version to version,
though. It's just part of the 'fun' of it.

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


mtdean at thirdcontact

Aug 27, 2008, 12:07 AM

Post #12 of 13 (1535 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

On 08/27/2008 01:58 AM, David Brodbeck wrote:
> Michael T. Dean wrote:
>
>> For me the hanging part works, but the "until it comes back," doesn't
>> when there's a reboot of the server involved. When the server comes
>> back, it continues to hang. I have to unmount (which gives errors
>> because the server "isn't there"--even if it is). Then, eventually, I
>> just remount, and it works (though sometimes it seems to show multiple
>> connections in the nfs *tab files).
> Huh, odd. It seems to work OK on reboot on the cluster at work,
> although sometimes it takes a couple of minutes for the machines to
> notice the server is back.
>

Yeah, now I'm thinking there's a problem with the statd NSM reboot
notification message passing on my systems. At least now I know how
it's supposed to work (which makes figuring out how to make it work a
lot easier).

> NFS behavior varies somewhat from OS to OS and version to version,
> though. It's just part of the 'fun' of it.

Oh, that's "fun," is it? I didn't realize that. :)

Thanks for the input. This is the one issue that's been really annoying
me on my Myth setup.

Mike
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


belcampo at zonnet

Aug 27, 2008, 12:36 AM

Post #13 of 13 (1543 views)
Permalink
Re: Network storage fault tolerance (was Re: Myth autoexpiring brand new shows) [In reply to]

Michael T. Dean wrote:
> On 08/26/2008 06:40 PM, David Brodbeck wrote:
>
>> Michael T. Dean wrote:
>>
>>
>>> Speaking of which, does anyone know of a way to make NFS tolerant of the
>>> NFS server's going down so that it will automatically unmount/remount
>>> the filesystems?
>>>
>>>
>> Technically NFS is stateless -- if a server goes down, I/O requests on
>> that filesystem just hang until it comes back.
>>
>
> For me the hanging part works, but the "until it comes back," doesn't
> when there's a reboot of the server involved. When the server comes
> back, it continues to hang. I have to unmount (which gives errors
> because the server "isn't there"--even if it is). Then, eventually, I
> just remount, and it works (though sometimes it seems to show multiple
> connections in the nfs *tab files).
>
>
>> No need to remount.
>> This usually isn't what people want, though, because it leads to lots of
>> applications sitting around in unkillable D states.
>>
>
> I'm mounting with:
>
> rw,_netdev,rsize=8192,wsize=8192,hard,intr,actimeo=0
>
Try leaving out the rsize wsize things and watch what server/client
negotiate on optimal size. Check with less /proc/mounts. If playing from
nfs did have stuttering problems they probably are gone.
> (from http://mythtv.org/docs/mythtv-HOWTO-23.html#ss23.10 , but with
> nfsvers=3 removed as it's not required on my systems). But, after doing
> some more reading on nfs.sf.net, I switched it to use nfs4 (rather than
> 3) because it's supposed to be more reboot tolerant. Can't test while
> the backend is recording, but if it works, I'll try to remember to post
> the results.
>
> Mike
> _______________________________________________
> mythtv-users mailing list
> mythtv-users [at] mythtv
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
>


_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users

MythTV users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.