Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Netapp: toasters

How many retries on a disk before you pre-fail?

 

 

Netapp toasters RSS feed   Index | Next | Previous | View Threaded


fcocquyt at stanford

Aug 4, 2012, 9:47 AM

Post #1 of 3 (543 views)
Permalink
How many retries on a disk before you pre-fail?

I ran a syslog search for "retry" on this 3270 head for the last 7 days to get these disk retry messages:

Aug 3 20:55:29 [na04:scsi.cmd.retrySuccess:debug]: Enclosure services device 3a.03.99: request successful after retry #1/#0: cdb 0x3c (1301).

then get a frequency count to find chronic retry disks.

awk '{print $12}' irt-na04.retry | sort | uniq -c | sort -nr
15 3a.06.99:
14 3a.05.99:
13 3a.03.99:
11 3a.04.99:
10 3a.02.99:
10 3a.01.99:
9 3c.00.99:
7 0a.08.99:
6 0a.07.99:
4 3a.08.99:
3 3a.07.99:
3 0b.00.99:
3 0a.05.99:
3 0a.03.99:
2 0a.06.99:
2 0a.04.99:
1 0a.02.99:
1 0a.01.99:

Q: what level of retries should we look at pre failing a disk and replacing it proactively?
Will the retries cause performance issues if ignored?

thanks,

Fletcher






_______________________________________________
Toasters mailing list
Toasters [at] teaparty
http://www.teaparty.net/mailman/listinfo/toasters


speedtoys.racing at gmail

Aug 4, 2012, 10:35 AM

Post #2 of 3 (517 views)
Permalink
Re: How many retries on a disk before you pre-fail? [In reply to]

Ontap manages that for you, and softly ore fails when specific thresholds are met.

Doesn't need user management.

Retries are normal.

Sent from my iPhone

On Aug 4, 2012, at 9:47 AM, Fletcher Cocquyt <fcocquyt [at] stanford> wrote:

> I ran a syslog search for "retry" on this 3270 head for the last 7 days to get these disk retry messages:
>
> Aug 3 20:55:29 [na04:scsi.cmd.retrySuccess:debug]: Enclosure services device 3a.03.99: request successful after retry #1/#0: cdb 0x3c (1301).
>
> then get a frequency count to find chronic retry disks.
>
> awk '{print $12}' irt-na04.retry | sort | uniq -c | sort -nr
> 15 3a.06.99:
> 14 3a.05.99:
> 13 3a.03.99:
> 11 3a.04.99:
> 10 3a.02.99:
> 10 3a.01.99:
> 9 3c.00.99:
> 7 0a.08.99:
> 6 0a.07.99:
> 4 3a.08.99:
> 3 3a.07.99:
> 3 0b.00.99:
> 3 0a.05.99:
> 3 0a.03.99:
> 2 0a.06.99:
> 2 0a.04.99:
> 1 0a.02.99:
> 1 0a.01.99:
>
> Q: what level of retries should we look at pre failing a disk and replacing it proactively?
> Will the retries cause performance issues if ignored?
>
> thanks,
>
> Fletcher
>
>
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters [at] teaparty
> http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
Toasters [at] teaparty
http://www.teaparty.net/mailman/listinfo/toasters


jeff.cleverley at avagotech

Aug 4, 2012, 12:43 PM

Post #3 of 3 (515 views)
Permalink
Re: How many retries on a disk before you pre-fail? [In reply to]

Fletcher,

I got a batch of Seagate 1 TB SATA drives (288) in DS4243s that do
this on a somewhat regular basis. The log file showed they had spun
down and it took them a while to spin back up when requested. These
are on a NearStore so I don't worry much about the performance aspect.
I also have a lot of spares so I just wait until the system fails
them. If these are SAS drives on a primary filer I would expect the
retries to cause latency issues. I would install the latest disk
firmware bundle and see if it helps.

Jeff

On Sat, Aug 4, 2012 at 11:35 AM, Jeff Mother <speedtoys.racing [at] gmail> wrote:
> Ontap manages that for you, and softly ore fails when specific thresholds are met.
>
> Doesn't need user management.
>
> Retries are normal.
>
> Sent from my iPhone
>
> On Aug 4, 2012, at 9:47 AM, Fletcher Cocquyt <fcocquyt [at] stanford> wrote:
>
>> I ran a syslog search for "retry" on this 3270 head for the last 7 days to get these disk retry messages:
>>
>> Aug 3 20:55:29 [na04:scsi.cmd.retrySuccess:debug]: Enclosure services device 3a.03.99: request successful after retry #1/#0: cdb 0x3c (1301).
>>
>> then get a frequency count to find chronic retry disks.
>>
>> awk '{print $12}' irt-na04.retry | sort | uniq -c | sort -nr
>> 15 3a.06.99:
>> 14 3a.05.99:
>> 13 3a.03.99:
>> 11 3a.04.99:
>> 10 3a.02.99:
>> 10 3a.01.99:
>> 9 3c.00.99:
>> 7 0a.08.99:
>> 6 0a.07.99:
>> 4 3a.08.99:
>> 3 3a.07.99:
>> 3 0b.00.99:
>> 3 0a.05.99:
>> 3 0a.03.99:
>> 2 0a.06.99:
>> 2 0a.04.99:
>> 1 0a.02.99:
>> 1 0a.01.99:
>>
>> Q: what level of retries should we look at pre failing a disk and replacing it proactively?
>> Will the retries cause performance issues if ignored?
>>
>> thanks,
>>
>> Fletcher
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters [at] teaparty
>> http://www.teaparty.net/mailman/listinfo/toasters
>
> _______________________________________________
> Toasters mailing list
> Toasters [at] teaparty
> http://www.teaparty.net/mailman/listinfo/toasters



--
Jeff Cleverley
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611
_______________________________________________
Toasters mailing list
Toasters [at] teaparty
http://www.teaparty.net/mailman/listinfo/toasters

Netapp toasters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.