Gossamer Forum
Home : General : Internet Technologies :

Errors with disk

Quote Reply
Errors with disk
Hi

My file server has been crashing a lot lately when trying to write new data to disk, so I decided to do a little investigative work. The machine is running Debian testing (sarge). It's got two disks in it:

disk 1: Operating system and all programs.
disk 2: Storage space for files.

Thankfully, disk 1 is where I'm having problems. Before throwing it out and replacing it with a new drive and a reinstall of the OS however, I'm trying to figure out what's going wrong.

I downloaded 'smartmontools' and ran 'smartctl' on the two disks. As I suspected the second disk came out without errors but the first gave me a few errors along these lines:

Code:
Error 203 occurred at disk power-on lifetime: 936 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:01 SN:89 CL:e0 CH:09 D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 01 89 e0 09 e0 c4 329.000

Anyone know what these errors mean? There are no explanations on the smartmontools web pages.

I take it the disk is dying in someway, as it's having problem writing data to bad sectors.

I'm just wondering is there now a way to save this disk by barring data from being written to those sectors or should I just cash in my chips and replace the disk now while I have my data intact?

Cheers

- wil
Quote Reply
Re: [Wil] Errors with disk In reply to
Running a quick 'health check' on the disk returns this error:

Code:
memphis# smartctl -H /dev/hda2
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 001 032 Pre-fail Always In_the_past 12430

memphis#

- wil
Quote Reply
Re: [Wil] Errors with disk In reply to
Hi,

Try `smartctl -a /dev/hda` which should give you something like:

Code:
Attribute Flag Value Worst Threshold Raw Value
( 1)Raw Read Error Rate 0x0029 100 253 020 0
( 3)Spin Up Time 0x0027 082 082 020 2304
( 4)Start Stop Count 0x0032 100 100 008 26
( 5)Reallocated Sector Ct 0x0033 100 100 020 0
( 7)Seek Error Rate 0x000b 100 093 023 0
( 9)Power On Hours 0x0012 086 086 001 9792
( 10)Spin Retry Count 0x0026 100 100 000 0
( 11)Calibration Retry Count 0x0013 100 253 020 0
( 12)Power Cycle Count 0x0032 100 100 008 25
( 13)Read Soft Error Rate 0x000b 100 100 023 0
(194)Temperature 0x0022 088 081 042 32
(195)Hardware ECC Recovered 0x001a 014 001 000 368535908
(196)Reallocated Event Count 0x0010 100 253 020 0
(197)Current Pending Sector 0x0032 100 100 020 0
(198)Offline Uncorrectable 0x0010 100 253 000 0
(199)UDMA CRC Error Count 0x001a 200 200 000 0
SMART Error Log:

Our experience has been to focus mainly on reallocated sectors. If that number starts increasing, it's a good sign that the drive is going.

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Errors with disk In reply to
Hi Alex

Thanks for the reply. Here's the output on the two disks.

Disk 1 (suspected of being faulty):

Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 001 032 Pre-fail Always In_the_past 208954
2 Throughput_Performance 0x0005 100 100 020 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 096 093 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0012 098 098 016 Old_age Always - 885
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 020 Pre-fail Always - 397
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 038 038 020 Old_age Always - 33619052
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 095 095 020 Old_age Always - 793
196 Reallocated_Event_Count 0x0033 100 100 024 Pre-fail Always - 0
197 Current_Pending_Sector 0x0010 100 100 020 Old_age Offline - 0
198 Offline_Uncorrectable 0x0010 100 100 020 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000b 100 092 020 Pre-fail Always - 1453

Disk 2 (suspected of being all OK):

Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0029 100 253 020 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 065 065 020 Pre-fail Always - 4379
4 Start_Stop_Count 0x0032 100 100 008 Old_age Always - 41
5 Reallocated_Sector_Ct 0x0033 092 092 020 Pre-fail Always - 44
7 Seek_Error_Rate 0x000b 100 100 023 Pre-fail Always - 0
9 Power_On_Hours 0x0012 088 088 001 Old_age Always - 8129
10 Spin_Retry_Count 0x0026 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 008 Old_age Always - 41
13 Read_Soft_Error_Rate 0x000b 100 100 023 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 087 083 042 Old_age Always - 34
195 Hardware_ECC_Recovered 0x001a 100 010 000 Old_age Always - 437039
196 Reallocated_Event_Count 0x0010 100 253 020 Old_age Offline - 0
197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x001a 200 200 000 Old_age Always - 0

- wil
Quote Reply
Re: [Wil] Errors with disk In reply to
Oh boy, here's the output from our firewall here:

Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 032 Pre-fail Always - 180054
2 Throughput_Performance 0x0005 100 100 020 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 094 094 025 Pre-fail Always - 2
4 Start_Stop_Count 0x0012 100 100 016 Old_age Always - 104
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 020 Pre-fail Always - 3788
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 001 001 020 Old_age Always FAILING_NOW 81347865
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 104
196 Reallocated_Event_Count 0x0033 100 100 024 Pre-fail Always - 0
197 Current_Pending_Sector 0x0010 100 100 020 Old_age Offline - 0
198 Offline_Uncorrectable 0x0010 100 100 020 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 197 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000b 100 098 020 Pre-fail Always - 44

- wil
Quote Reply
Re: [Wil] Errors with disk In reply to
Remember that you can't always rely on the data that the drive returns as there's no standard in what's what (so some of the values might be waaaay off). If you can afford some downtime, download the drive manufacturer's disk checking tool and run it through on the drive.

Adrian

Last edited by:

brewt: Sep 10, 2003, 12:41 PM
Quote Reply
Re: [brewt] Errors with disk In reply to
Well, the disk died sometime last night and I don't think I can save it. To be honest, I don't want to save it, either. Once a disk is faulty then it's best to throw it out and get a new one, in my experience.

So I'm now re-installing Debian onto a new disk so I can then re-attach the data drive to get access back to the file server. Fun fun fun.

- wil
Quote Reply
Re: [Wil] Errors with disk In reply to
Well. Three brand spanking new disks later it became pretty obvious that the disk wasn't the source of the problem. Kernel panicks kept occurring. I've got no idea if it's the video card, bad memory or what so I took the disks out and put them into a new machine and rebuilt Debian, again. All works so far, with no crashes, although I've now only for 46MB of RAM to play with, and this is our test web and database server. Ouch.

- wil
Quote Reply
Re: [Wil] Errors with disk In reply to
Thats gotta suck :(

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!