Hi
My file server has been crashing a lot lately when trying to write new data to disk, so I decided to do a little investigative work. The machine is running Debian testing (sarge). It's got two disks in it:
disk 1: Operating system and all programs.
disk 2: Storage space for files.
Thankfully, disk 1 is where I'm having problems. Before throwing it out and replacing it with a new drive and a reinstall of the OS however, I'm trying to figure out what's going wrong.
I downloaded 'smartmontools' and ran 'smartctl' on the two disks. As I suspected the second disk came out without errors but the first gave me a few errors along these lines:
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:01 SN:89 CL:e0 CH:09 D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 01 89 e0 09 e0 c4 329.000
Anyone know what these errors mean? There are no explanations on the smartmontools web pages.
I take it the disk is dying in someway, as it's having problem writing data to bad sectors.
I'm just wondering is there now a way to save this disk by barring data from being written to those sectors or should I just cash in my chips and replace the disk now while I have my data intact?
Cheers
- wil
My file server has been crashing a lot lately when trying to write new data to disk, so I decided to do a little investigative work. The machine is running Debian testing (sarge). It's got two disks in it:
disk 1: Operating system and all programs.
disk 2: Storage space for files.
Thankfully, disk 1 is where I'm having problems. Before throwing it out and replacing it with a new drive and a reinstall of the OS however, I'm trying to figure out what's going wrong.
I downloaded 'smartmontools' and ran 'smartctl' on the two disks. As I suspected the second disk came out without errors but the first gave me a few errors along these lines:
Code:
Error 203 occurred at disk power-on lifetime: 936 hours When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:01 SN:89 CL:e0 CH:09 D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 01 89 e0 09 e0 c4 329.000
Anyone know what these errors mean? There are no explanations on the smartmontools web pages.
I take it the disk is dying in someway, as it's having problem writing data to bad sectors.
I'm just wondering is there now a way to save this disk by barring data from being written to those sectors or should I just cash in my chips and replace the disk now while I have my data intact?
Cheers
- wil