Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: MythTV: Users

Hard Drive Problems

 

 

MythTV users RSS feed   Index | Next | Previous | View Threaded


dbrieck at gmail

May 1, 2012, 5:55 AM

Post #1 of 9 (530 views)
Permalink
Hard Drive Problems

This past weekend my master backend froze up hard and wouldn't boot
back up due to hard drive problems.

I was able to boot from a USB stick and my two 1 TB WD Green drives
eventually showed up, however unlikely as it seems, the both seem to
have problems.

The drive that's got me confused the most is currently on /dev/sda.
It's got 38625 hours on it (4.5 yrs). I was able to mount it once
without a problem, but the next time I had to run fsck on it and it
came up with a ton of errors. After the errors were fixed it mounted,
however, here's where it gets even more confusing.

Here's the output from fdisk:

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00047280

Device Boot Start End Blocks Id System
/dev/sda1 1 121601 976760001 83 Linux

Based on this, you would see that the drive has one partition and you
would assume it was 1 TB. However, that's not what the OS is seeing.

Here's the output of df -h:

Filesystem Size Used Avail Use% Mounted on
/dev/sda1 147G 6.8G 133G 5% /mnt/data2

So where did the rest of the drive go? Oddly enough, the SMART data
says there's nothing wrong with the drive, but it is giving me errors
on the other drive. The other drive, mounted on /dev/sdb currently
originally didn't want to mount, but after a few reboots it mounted
without any errors and I was able to pull data off without a problem.

However, here's the SMART data for that drive:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 142 124 021 Pre-fail
Always - 5858
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 27
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 078 078 000 Old_age
Always - 16072
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 24
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
Always - 19
193 Load_Cycle_Count 0x0032 017 017 000 Old_age
Always - 550422
194 Temperature_Celsius 0x0022 110 102 000 Old_age
Always - 37
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0032 192 192 000 Old_age
Always - 1425
198 Offline_Uncorrectable 0x0030 200 194 000 Old_age
Offline - 33
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x0008 148 148 000 Old_age
Offline - 10543

SMART Error Log Version: 1
Warning: ATA error count 834 inconsistent with error log pointer 1

ATA Error Count: 834 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]

This drive is actually younger than the other drive at 16072 (1.8 yrs)
but there are a bunch of errors on the drive.

Was it a pure coincidence that both drives went bad at the same time
or is one of them still good??

I have some replacement drives on the way, but I'm not sure if any
recordings could be recovered from the first drive that seems to be
missing or if I should bother even trying to keep one of these drives
around?

I'm going to try to get a warranty replacement for the newer drive
since it's still in warranty, but the older drive by all rights still
seems like it might be good.

Thoughts?
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


drescherjm at gmail

May 1, 2012, 6:16 AM

Post #2 of 9 (503 views)
Permalink
Re: Hard Drive Problems [In reply to]

On Tue, May 1, 2012 at 8:55 AM, David Brieck Jr. <dbrieck [at] gmail> wrote:
> This past weekend my master backend froze up hard and wouldn't boot
> back up due to hard drive problems.
>
> I was able to boot from a USB stick and my two 1 TB WD Green drives
> eventually showed up, however unlikely as it seems, the both seem to
> have problems.
>
> The drive that's got me confused the most is currently on /dev/sda.
> It's got 38625 hours on it (4.5 yrs). I was able to mount it once
> without a problem, but the next time I had to run fsck on it and it
> came up with a ton of errors. After the errors were fixed it mounted,
> however, here's where it gets even more confusing.
>
> Here's the output from fdisk:
>
> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00047280
>
>   Device Boot      Start         End      Blocks   Id  System
> /dev/sda1               1      121601   976760001   83  Linux
>
> Based on this, you would see that the drive has one partition and you
> would assume it was 1 TB. However, that's not what the OS is seeing.
>
> Here's the output of df -h:
>
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1             147G  6.8G  133G   5% /mnt/data2
>
> So where did the rest of the drive go?

Filesystem corruption. You may be able to recover from this by making
a duplicate using ddrescue then use fsck on the duplicate. ddrescue
will try to copy as many of the good sectors skipping over the bad
ones. It keeps a log file so that if you need to reboot inbetween the
copy (drive went totally offline) you can continue where you left off.

> Oddly enough, the SMART data
> says there's nothing wrong with the drive, but it is giving me errors
> on the other drive. The other drive, mounted on /dev/sdb currently
> originally didn't want to mount, but after a few reboots it mounted
> without any errors and I was able to pull data off without a problem.
>
> However, here's the SMART data for that drive:
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   142   124   021    Pre-fail
> Always       -       5858
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       27
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   078   078   000    Old_age
> Always       -       16072
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       24
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       19
> 193 Load_Cycle_Count        0x0032   017   017   000    Old_age
> Always       -       550422
> 194 Temperature_Celsius     0x0022   110   102   000    Old_age
> Always       -       37
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   192   192   000    Old_age
> Always       -       1425
> 198 Offline_Uncorrectable   0x0030   200   194   000    Old_age
> Offline      -       33
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   148   148   000    Old_age
> Offline      -       10543
>
> SMART Error Log Version: 1
> Warning: ATA error count 834 inconsistent with error log pointer 1
>
> ATA Error Count: 834 (device log contains only the most recent five errors)
>        CR = Command Register [HEX]
>        FR = Features Register [HEX]
>        SC = Sector Count Register [HEX]
>        SN = Sector Number Register [HEX]
>        CL = Cylinder Low Register [HEX]
>        CH = Cylinder High Register [HEX]
>        DH = Device/Head Register [HEX]
>        DC = Device Command Register [HEX]
>        ER = Error register [HEX]
>        ST = Status register [HEX]
>
> This drive is actually younger than the other drive at 16072 (1.8 yrs)
> but there are a bunch of errors on the drive.
>
> Was it a pure coincidence that both drives went bad at the same time
> or is one of them still good??
>
Have you looked at the SMART before? At work I monitor SMART for the
lifetime of all drives in my servers.
>
> I have some replacement drives on the way, but I'm not sure if any
> recordings could be recovered from the first drive that seems to be
> missing or if I should bother even trying to keep one of these drives
> around?
>
> I'm going to try to get a warranty replacement for the newer drive
> since it's still in warranty, but the older drive by all rights still
> seems like it might be good.
>
> Thoughts?
> _______________________________________________


The drive that you show the SMART data could be dieing or this could
be an isolated media defect. SMART attribute 197 and 198 should be
close to 0. Both of these represent what some call drive amnesia. The
drive wrote data that it no longer can read.

John
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


drescherjm at gmail

May 1, 2012, 6:32 AM

Post #3 of 9 (493 views)
Permalink
Re: Hard Drive Problems [In reply to]

> The drive that you show the SMART data could be dieing or this could
> be an isolated media defect. SMART attribute 197 and 198 should be
> close to 0. Both of these represent what some call drive amnesia. The
> drive wrote data that it no longer can read.

BTW, one way to tell if this is an isolated media defect or real
problem with the heads is to copy your data to a new disk (possibly
using ddrescue) and then run a 4 pattern badblocks read / write test
on the drive. If you see any bad sectors reported repeat the test if
the second time you see any badblocks reported I would call the
drive..

badblocks -wsv /dev/mydrive

remember to copy the data first as this will wipe the disk.



--
John M. Drescher
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


dbrieck at gmail

May 1, 2012, 6:40 AM

Post #4 of 9 (490 views)
Permalink
Re: Hard Drive Problems [In reply to]

On Tue, May 1, 2012 at 9:32 AM, John Drescher <drescherjm [at] gmail> wrote:
>> The drive that you show the SMART data could be dieing or this could
>> be an isolated media defect. SMART attribute 197 and 198 should be
>> close to 0. Both of these represent what some call drive amnesia. The
>> drive wrote data that it no longer can read.
>
> BTW, one way to tell if this is an isolated media defect or real
> problem with the heads is to copy your data to a new disk (possibly
> using ddrescue) and then run a 4 pattern badblocks read / write test
> on the drive. If you see any bad sectors reported repeat the test if
> the second time you see any badblocks reported I would call the
> drive..
>
> badblocks -wsv /dev/mydrive
>
> remember to copy the data first as this will wipe the disk.
>
>
>
> --
> John M. Drescher

Thanks for the response John.

I've checked the SMART data infrequently so I don't have a real point
of reference for when it could have started showing signs of a
problem. This isn't the first time I've lost drives on myth dedicated
systems in the past, but it's the first time two seem to be showing
problems at once.

The only other drive I've ever lost that was still under warranty was
an IBM DeathStar.

Here's the SMART data for the drive you're thinking has file system corruption:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail
Always - 601
3 Spin_Up_Time 0x0003 180 180 021 Pre-fail
Always - 7991
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 76
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000e 200 200 051 Old_age
Always - 0
9 Power_On_Hours 0x0032 048 048 000 Old_age
Always - 38625
10 Spin_Retry_Count 0x0012 100 253 051 Old_age
Always - 0
11 Calibration_Retry_Count 0x0012 100 253 051 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 74
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
Always - 101
193 Load_Cycle_Count 0x0032 001 001 000 Old_age
Always - 649774
194 Temperature_Celsius 0x0022 119 099 000 Old_age
Always - 33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age
Offline - 19

I'm tempted to try the ddrescue command, but the SMART data isn't
showing any bad sectors.
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


max at mjhodgson

May 1, 2012, 6:53 AM

Post #5 of 9 (495 views)
Permalink
Re: Hard Drive Problems [In reply to]

<snip>

> I've checked the SMART data infrequently so I don't have a real point
> of reference for when it could have started showing signs of a
> problem. This isn't the first time I've lost drives on myth dedicated
> systems in the past, but it's the first time two seem to be showing
> problems at once.
>

Have you ruled out dodgy PSU and dodgy hard disk controller?

I had hd errors late last year on multiple drives and it turned out to
be the PCI SATA card causing them.
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


drescherjm at gmail

May 1, 2012, 6:54 AM

Post #6 of 9 (493 views)
Permalink
Re: Hard Drive Problems [In reply to]

> Thanks for the response John.
>
> I've checked the SMART data infrequently so I don't have a real point
> of reference for when it could have started showing signs of a
> problem. This isn't the first time I've lost drives on myth dedicated
> systems in the past, but it's the first time two seem to be showing
> problems at once.
>
> The only other drive I've ever lost that was still under warranty was
> an IBM DeathStar.
>

At work where I have 200 to 300 drives, I ship back 10 to 20 annually
for the last 3 to 4 years. Most of these are by a single manufacturer.

>
> Here's the SMART data for the drive you're thinking has file system corruption:
>
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail
> Always       -       601
>  3 Spin_Up_Time            0x0003   180   180   021    Pre-fail
> Always       -       7991
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       76
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x000e   200   200   051    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   048   048   000    Old_age
> Always       -       38625
>  10 Spin_Retry_Count        0x0012   100   253   051    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0012   100   253   051    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       74
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       101
> 193 Load_Cycle_Count        0x0032   001   001   000    Old_age
> Always       -       649774
> 194 Temperature_Celsius     0x0022   119   099   000    Old_age
> Always       -       33
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   200   200   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age
> Offline      -       19
>
> I'm tempted to try the ddrescue command, but the SMART data isn't
> showing any bad sectors.

I agree the SMART looks fine on this one. I would try what you said.

--
John M. Drescher
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


dbrieck at gmail

May 1, 2012, 7:07 AM

Post #7 of 9 (494 views)
Permalink
Re: Hard Drive Problems [In reply to]

On Tue, May 1, 2012 at 9:53 AM, Max Hodgson <max [at] mjhodgson> wrote:
> <snip>
>
>> I've checked the SMART data infrequently so I don't have a real point
>> of reference for when it could have started showing signs of a
>> problem. This isn't the first time I've lost drives on myth dedicated
>> systems in the past, but it's the first time two seem to be showing
>> problems at once.
>>
>
> Have you ruled out dodgy PSU and dodgy hard disk controller?
>
> I had hd errors late last year on multiple drives and it turned out to
> be the PCI SATA card causing them.

There's a 3rd drive in the system that's about 4 yrs old that's not
showing any problems at all, which is why I haven't really looked at
PSU problems.

This system is running an Antec earthwatts EA430 430W Continuous Power
ATX12V v2.0 80 PLUS Certified Active PFC Power Supply.

As far as the SATA ports are concerned, the system has an Intel
BOXDG31PR LGA 775 Intel G31 Micro ATX Intel Motherboard and I'm using
the onboard SATA ports.

I try to use pretty good stuff in these Myth systems because running
24/7 is really hard on cheap parts.
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


dbrieck at gmail

May 1, 2012, 7:18 AM

Post #8 of 9 (490 views)
Permalink
Re: Hard Drive Problems [In reply to]

On Tue, May 1, 2012 at 9:54 AM, John Drescher <drescherjm [at] gmail> wrote:
>> Thanks for the response John.
>>
>> I've checked the SMART data infrequently so I don't have a real point
>> of reference for when it could have started showing signs of a
>> problem. This isn't the first time I've lost drives on myth dedicated
>> systems in the past, but it's the first time two seem to be showing
>> problems at once.
>>
>> The only other drive I've ever lost that was still under warranty was
>> an IBM DeathStar.
>>
>
> At work where I have 200 to 300 drives, I ship back 10 to 20 annually
> for the last 3 to 4 years. Most of these are by a single manufacturer.
>
>>
>> Here's the SMART data for the drive you're thinking has file system corruption:
>>
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
>> UPDATED  WHEN_FAILED RAW_VALUE
>>  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail
>> Always       -       601
>>  3 Spin_Up_Time            0x0003   180   180   021    Pre-fail
>> Always       -       7991
>>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
>> Always       -       76
>>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
>> Always       -       0
>>  7 Seek_Error_Rate         0x000e   200   200   051    Old_age
>> Always       -       0
>>  9 Power_On_Hours          0x0032   048   048   000    Old_age
>> Always       -       38625
>>  10 Spin_Retry_Count        0x0012   100   253   051    Old_age
>> Always       -       0
>>  11 Calibration_Retry_Count 0x0012   100   253   051    Old_age
>> Always       -       0
>>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
>> Always       -       74
>> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
>> Always       -       101
>> 193 Load_Cycle_Count        0x0032   001   001   000    Old_age
>> Always       -       649774
>> 194 Temperature_Celsius     0x0022   119   099   000    Old_age
>> Always       -       33
>> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
>> Always       -       0
>> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age
>> Always       -       0
>> 198 Offline_Uncorrectable   0x0010   200   200   000    Old_age
>> Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
>> Always       -       0
>> 200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age
>> Offline      -       19
>>
>> I'm tempted to try the ddrescue command, but the SMART data isn't
>> showing any bad sectors.
>
> I agree the SMART looks fine on this one. I would try what you said.
>
> --
> John M. Drescher

How does this sound... The drive with the questionable SMART data also
happened to be the main OS drive. I'm thinking maybe with that drive
going bad it caused some FS corruption on the other drive as it was
writing data from NFS coming from my slave system?

Seems pretty out there, but that's the best I can come up with the
explain what I'm seeing.
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users


linux at thehobsons

May 1, 2012, 8:52 AM

Post #9 of 9 (481 views)
Permalink
Re: Hard Drive Problems [In reply to]

Max Hodgson wrote:

>Have you ruled out dodgy PSU and dodgy hard disk controller?

I'd be thinking that as well - it is a bit of a coincidence for two
drives to "fail" at once. I'd stick the drives in another machine
(one at a time) and see how they look.

--
Simon Hobson

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.
_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://www.mythtv.org/mailman/listinfo/mythtv-users

MythTV users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.