Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

sata_nv issues with MCP51 SATA controller

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


jonry at pvv

Sep 13, 2007, 12:46 AM

Post #1 of 25 (938 views)
Permalink
sata_nv issues with MCP51 SATA controller

Hi, I'm resending (didn't see my first attempt appear on the maillist):



I'm having serious disk-issues when using the on-board nvidia controller
for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
chipset, cpu is intel Core2Quad)

excerpt from "lspci":
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
(rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
(rev a1)

I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
works fine (/dev/hda)

However, any number of disks (I have tried 2 and 4) connected to the
SATA-controller(s), will eventually fail. - See attached log (excerpt /
anything relevant from /var/log/messages)

At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
(both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
kernel.org:

> uname -a
Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
i686 i686 i386 GNU/Linux

Now it will normally take a day or two before SATA crashes, so things
are better, but still rather useless.

First error when sata_nv get into problems is always:
"exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
(as shown in the attached log-file.) - when this happens to one device,
it'll almost instantly happen to the other disk attached to that
controller as well. A couple of minutes (or so) later, the disk(s)
connected to the other controller will start acting up as well (in the
same manner). - I/O freezes, and nothing helps except a reboot...

As I run a rather large (software / md) RAID-5 disk array on this server
(I'm doing a bit of video editing), every crash means a time-consuming
rebuild of the disk-array...

I have given up on the sata_nv / nvidia-controllers for the time being.
I now resort to some old PCI-connected sata-controllers which work fine
(but slow, as they are outdated and "overloaded").

So, if anyone has a good solution / suggestion / improved driver (over
the one supplied with the official 2.6.22.5-kernel) I am eager to give
it a go and see if the situation can be resolved.

I appreciate any sensible suggestions.

BR
Jon Ivar
Attachments: sata_nv-error.log (16.7 KB)


jonry at pvv

Sep 13, 2007, 12:18 AM

Post #2 of 25 (924 views)
Permalink
sata_nv issues with MCP51 SATA controller [In reply to]

Hi, I was told to forward my error report to this address.
I am keen to test again if someone has a good suggestion / updated
driver etc... (Give me a couple of days in that case...)


-----
Hi,

I'm having serious disk-issues when using the on-board nvidia controller
for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
chipset, cpu is intel Core2Quad)

excerpt from "lspci":
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
(rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
(rev a1)

I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
works fine (/dev/hda)

However, any number of disks (I have tried 2 and 4) connected to the
SATA-controller(s), will eventually fail. - See attached log (excerpt /
anything relevant from /var/log/messages)

At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
(both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
RHEL5) kernel (2.6.18) to the latest (at that time) official kernel form
kernel.org:

> uname -a
Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
i686 i686 i386 GNU/Linux

Now it will normally take a day or two before SATA crashes, so things
are better, but still rather useless.

First error when sata_nv get into problems is always:
"exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
(as shown in the attached log-file.) - when this happens to one device,
it'll almost instantly happen to the other disk attached to that
controller as well. A couple of minutes (or so) later, the disk(s)
connected to the other controller will start acting up as well (in the
same manner). - I/O freezes, and nothing helps except a reboot...

As I run a rather large (software / md) RAID-5 disk array on this server
(I'm doing a bit of video editing), every crash means a time-consuming
rebuild of the disk-array...

I have given up on the sata_nv / nvidia-controllers for the time being.
I now resort to some old PCI-connected sata-controllers which work fine
(but slow, as they are outdated and "overloaded").

So, if anyone has a good solution / suggestion / improved driver (over
the one supplied with the official 2.6.22.5-kernel) I am eager to give
it a go and see if the situation can be resolved.

I appreciate any sensible suggestions.

BR
Jon Ivar

-----
Attachments: sata_nv-error.log (16.7 KB)


htejun at gmail

Sep 13, 2007, 2:16 AM

Post #3 of 25 (921 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jon Ivar Rykkelid wrote:
> I'm having serious disk-issues when using the on-board nvidia controller
> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
> chipset, cpu is intel Core2Quad)
>
> excerpt from "lspci":
> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
>
> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
> works fine (/dev/hda)
>
> However, any number of disks (I have tried 2 and 4) connected to the
> SATA-controller(s), will eventually fail. - See attached log (excerpt /
> anything relevant from /var/log/messages)
>
> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel form
> kernel.org:
>
>> uname -a
> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
> i686 i686 i386 GNU/Linux
>
> Now it will normally take a day or two before SATA crashes, so things
> are better, but still rather useless.
>
> First error when sata_nv get into problems is always:
> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
> (as shown in the attached log-file.) - when this happens to one device,
> it'll almost instantly happen to the other disk attached to that
> controller as well. A couple of minutes (or so) later, the disk(s)
> connected to the other controller will start acting up as well (in the
> same manner). - I/O freezes, and nothing helps except a reboot...
>
> As I run a rather large (software / md) RAID-5 disk array on this server
> (I'm doing a bit of video editing), every crash means a time-consuming
> rebuild of the disk-array...
>
> I have given up on the sata_nv / nvidia-controllers for the time being.
> I now resort to some old PCI-connected sata-controllers which work fine
> (but slow, as they are outdated and "overloaded").
>
> So, if anyone has a good solution / suggestion / improved driver (over
> the one supplied with the official 2.6.22.5-kernel) I am eager to give
> it a go and see if the situation can be resolved.
>
> I appreciate any sensible suggestions.

Wheeee... the whole controller seems to have went down at once and it's
not even IRQ routing problem - resets are failing. This is the first
time I see something like this. Sorry but I don't have any idea what's
going on. cc'ing Robert. Any ideas?

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeff at garzik

Sep 13, 2007, 7:20 AM

Post #4 of 25 (919 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jon Ivar Rykkelid wrote:
>
> Hi, I'm resending (didn't see my first attempt appear on the maillist):
>
>
>
> I'm having serious disk-issues when using the on-board nvidia controller
> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
> chipset, cpu is intel Core2Quad)
>
> excerpt from "lspci":
> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
>
> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
> works fine (/dev/hda)
>
> However, any number of disks (I have tried 2 and 4) connected to the
> SATA-controller(s), will eventually fail. - See attached log (excerpt /
> anything relevant from /var/log/messages)
>
> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
> kernel.org:
>
> > uname -a
> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
> i686 i686 i386 GNU/Linux
>
> Now it will normally take a day or two before SATA crashes, so things
> are better, but still rather useless.
>
> First error when sata_nv get into problems is always:
> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
> (as shown in the attached log-file.) - when this happens to one device,
> it'll almost instantly happen to the other disk attached to that
> controller as well. A couple of minutes (or so) later, the disk(s)
> connected to the other controller will start acting up as well (in the
> same manner). - I/O freezes, and nothing helps except a reboot...
>
> As I run a rather large (software / md) RAID-5 disk array on this server
> (I'm doing a bit of video editing), every crash means a time-consuming
> rebuild of the disk-array...
>
> I have given up on the sata_nv / nvidia-controllers for the time being.
> I now resort to some old PCI-connected sata-controllers which work fine
> (but slow, as they are outdated and "overloaded").
>
> So, if anyone has a good solution / suggestion / improved driver (over
> the one supplied with the official 2.6.22.5-kernel) I am eager to give
> it a go and see if the situation can be resolved.

does adma=0 module option do anything?

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 13, 2007, 8:05 AM

Post #5 of 25 (918 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>>
>> Hi, I'm resending (didn't see my first attempt appear on the maillist):
>>
>>
>>
>> I'm having serious disk-issues when using the on-board nvidia controller
>> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
>> chipset, cpu is intel Core2Quad)
>>
>> excerpt from "lspci":
>> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
>> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
>> (rev a1)
>> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
>> (rev a1)
>>
>> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
>> works fine (/dev/hda)
>>
>> However, any number of disks (I have tried 2 and 4) connected to the
>> SATA-controller(s), will eventually fail. - See attached log (excerpt /
>> anything relevant from /var/log/messages)
>>
>> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
>> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
>> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
>> kernel.org:
>>
>> > uname -a
>> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
>> i686 i686 i386 GNU/Linux
>>
>> Now it will normally take a day or two before SATA crashes, so things
>> are better, but still rather useless.
>>
>> First error when sata_nv get into problems is always:
>> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
>> (as shown in the attached log-file.) - when this happens to one device,
>> it'll almost instantly happen to the other disk attached to that
>> controller as well. A couple of minutes (or so) later, the disk(s)
>> connected to the other controller will start acting up as well (in the
>> same manner). - I/O freezes, and nothing helps except a reboot...
>>
>> As I run a rather large (software / md) RAID-5 disk array on this server
>> (I'm doing a bit of video editing), every crash means a time-consuming
>> rebuild of the disk-array...
>>
>> I have given up on the sata_nv / nvidia-controllers for the time being.
>> I now resort to some old PCI-connected sata-controllers which work fine
>> (but slow, as they are outdated and "overloaded").
>>
>> So, if anyone has a good solution / suggestion / improved driver (over
>> the one supplied with the official 2.6.22.5-kernel) I am eager to give
>> it a go and see if the situation can be resolved.
>
> does adma=0 module option do anything?
>
> Jeff
Thanks for the suggestion, but sata_nv is not built modular in my
current kernel, so "no can do" at the moment
(However, if some expert REALLY thinks this will fix things, I will
CERTAINLY recompile and give it a go)

As I said before, it all works for some time (a day or two) before it
crashes with the current kernel & no "S.M.A.R.T.". With my current setup
I have always had the time to fully rebuild my disk-array before a new
crash. - In the case of 4 disks attached to the nvidia controllers
(disregarding the disks on other controllers), this means that the
sata_nv-driver / controllers alone have read at least 750GB and written
250GB of data before the crash (with no resets working) - soft reboot
fixes everything. - I'm pretty confident that this is a driver issue.

As Tejun Heo <htejun[at]gmail.com> writes "the whole controller seems to
have went down at once and it's not even IRQ routing problem - resets
are failing."

The error-messages / crash-symptoms were the same with SMART enabled and
the original CentOS5-kernel, except that with that setup, the crashes
were much more frequent.

Any help?

BR
Jon Ivar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


htejun at gmail

Sep 13, 2007, 8:14 AM

Post #6 of 25 (919 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jon Ivar Rykkelid wrote:
> Thanks for the suggestion, but sata_nv is not built modular in my
> current kernel, so "no can do" at the moment
> (However, if some expert REALLY thinks this will fix things, I will
> CERTAINLY recompile and give it a go)

Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 13, 2007, 11:01 AM

Post #7 of 25 (916 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Resending, as my first attempts contained HTML and was blocked...

Tejun Heo wrote:
> Jon Ivar Rykkelid wrote:
>
>> Thanks for the suggestion, but sata_nv is not built modular in my
>> current kernel, so "no can do" at the moment
>> (However, if some expert REALLY thinks this will fix things, I will
>> CERTAINLY recompile and give it a go)
>>
>
> Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.
>
>
Ahh, silly me... Of course!
Ooops, I just got back, and verified: I actually have sata_nv running as
a module after all on this server... My bad.
I fixed /etc/modprobe.conf to include the following two lines:
"
alias scsi_hostadapter sata_nv
options sata_nv adma=0
...
"

I then ran "mkinitrd" (to ensure that the latest options from
modprobe.conf were included) in the initrd-image that I load at boot.

- Do you guys think this is worth a try? Anyway, I have rebooted now, so
I'll test it for a few days and let you know - We'll just have to wait
and see...
Do you think I should re-enable SMART to provoke a failure, or would
that be to tempt fate too much? (For now I have not re-enabled SMART)

PS: Is there any way of testing / verifying that sata_nv is now running
with this option? - I am pretty sure I have done it correctly, but I
would still like to confirm that the proper option has been passed if
possible.

Thanks
Jon Ivar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 13, 2007, 12:26 PM

Post #8 of 25 (916 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Hi,

I now tested with the adma=0 option, but if anything I got a crash
quicker than before. Same error message started coming in, but this time
the system hung before I was able to capture the log as well (but I saw
the error, and it was the same as before, except that this time it was
the ata3-channel that first started acting up..) - To remind you all
what this is about, I have reattached the log that I originally captured...

Any help / clever suggestions is appreciated.

Jon Ivar Rykkelid wrote:
> I fixed /etc/modprobe.conf to include the following two lines:
> "
> alias scsi_hostadapter sata_nv
> options sata_nv adma=0
> ...
> "

Jon Ivar
Attachments: sata_nv-error.log (16.7 KB)


jeff at garzik

Sep 13, 2007, 12:54 PM

Post #9 of 25 (914 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jon Ivar Rykkelid wrote:
> Hi,
>
> I now tested with the adma=0 option, but if anything I got a crash
> quicker than before. Same error message started coming in, but this time
> the system hung before I was able to capture the log as well (but I saw
> the error, and it was the same as before, except that this time it was
> the ata3-channel that first started acting up..) - To remind you all
> what this is about, I have reattached the log that I originally captured...

Sounds like a hardware problem, since disabling ADMA is generally the
cure-all we use -- it appears to stress the hardware less.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 13, 2007, 2:15 PM

Post #10 of 25 (913 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Is this the general opinion? - Should I try to get a replacement
motherboard of the same type?

If so, can anyone confirm that the sata_nv-driver is working with the
Gigabyte GA-N650SLI-DS4 motherboard at all / have anyone been successful
with this MB? How about the MCP51 SATA controller? - Can anyone confirm
that the driver is working for this HW? I would feel awkward to try to
claim a warranty replacement if it is proved that the HW is OK after
all, and the problem is with the linux-driver...

BR
Jon Ivar

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Hi,
>>
>> I now tested with the adma=0 option, but if anything I got a crash
>> quicker than before. Same error message started coming in, but this
>> time the system hung before I was able to capture the log as well
>> (but I saw the error, and it was the same as before, except that this
>> time it was the ata3-channel that first started acting up..) - To
>> remind you all what this is about, I have reattached the log that I
>> originally captured...
>
> Sounds like a hardware problem, since disabling ADMA is generally the
> cure-all we use -- it appears to stress the hardware less.
>
> Jeff
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo[at]vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Jon Ivar Rykkelid Web: http://www.pvv.org/~jonry
Enromvegen 191 Phone: +47 72 56 86 86
N-7026 Trondheim Mob.: +47 906 20 250
Norway Email: jonry[at]pvv.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hancockr at shaw

Sep 13, 2007, 5:37 PM

Post #11 of 25 (917 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Hi,
>>
>> I now tested with the adma=0 option, but if anything I got a crash
>> quicker than before. Same error message started coming in, but this
>> time the system hung before I was able to capture the log as well (but
>> I saw the error, and it was the same as before, except that this time
>> it was the ata3-channel that first started acting up..) - To remind
>> you all what this is about, I have reattached the log that I
>> originally captured...
>
> Sounds like a hardware problem, since disabling ADMA is generally the
> cure-all we use -- it appears to stress the hardware less.

If this is an MCP51 chipset, adma=0 will make no difference since that
chipset does not support ADMA in the first place.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr[at]nospamshaw.ca
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 14, 2007, 5:10 AM

Post #12 of 25 (910 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Hi,

To eliminate the possibility of this being a hardware issue, I have now
acquired another "Gigabyte GA-N650SLI-DS4" motherboard (with the "MCP51"
chipset) for testing. I'll swap parts this evening. Hopefully I'll be
able to tell you in a few hours whether this appears to be working as it
should. The motherboard that I'm going to swap to has actually been
tested (with MS Windows OS+driver) for more than a day with a disk
connected, so if this MB also fails, I think it will be safe to say that
the issue is with the sata_nv driver... So hang on.

(You can't think of something else that could conflict with the sata_nv
driver after a bit of time, like two of my raid-disks being encrypted,
me running a SW raid-5 array / some special HW (quad-core CPU) / me
running vmware on this server ... ? - To me, all these suggestions seems
rather far fetched, especially as all is working with another
controller, so I'm arguing that unless there's a HW issue, the issue is
with the driver, but you're the expert(s), so let me know if you differ.)

I'll keep you posted as to the result of swapping HW.. Give me a few
hours. :-)

BR
Jon Ivar

Robert Hancock wrote:
> Jeff Garzik wrote:
>> Jon Ivar Rykkelid wrote:
>>> Hi,
>>>
>>> I now tested with the adma=0 option, but if anything I got a crash
>>> quicker than before. Same error message started coming in, but this
>>> time the system hung before I was able to capture the log as well
>>> (but I saw the error, and it was the same as before, except that
>>> this time it was the ata3-channel that first started acting up..) -
>>> To remind you all what this is about, I have reattached the log that
>>> I originally captured...
>>
>> Sounds like a hardware problem, since disabling ADMA is generally the
>> cure-all we use -- it appears to stress the hardware less.
>
> If this is an MCP51 chipset, adma=0 will make no difference since that
> chipset does not support ADMA in the first place.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


prakash at punnoor

Sep 14, 2007, 6:29 AM

Post #13 of 25 (910 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

On the day of Thursday 13 September 2007 Jon Ivar Rykkelid hast written:
> Resending, as my first attempts contained HTML and was blocked...
>
> Tejun Heo wrote:
> > Jon Ivar Rykkelid wrote:
> >> Thanks for the suggestion, but sata_nv is not built modular in my
> >> current kernel, so "no can do" at the moment
> >> (However, if some expert REALLY thinks this will fix things, I will
> >> CERTAINLY recompile and give it a go)
> >
> > Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.
>
> Ahh, silly me... Of course!
> Ooops, I just got back, and verified: I actually have sata_nv running as
> a module after all on this server... My bad.
> I fixed /etc/modprobe.conf to include the following two lines:
> "
> alias scsi_hostadapter sata_nv
> options sata_nv adma=0
> ...
> "

I don't think it will matter, as adma doesn't affect MCP51, but only nforce4.
So I'd look for other trouble makers.
--
(°= =°)
//\ Prakash Punnoor /\\
V_/ \_V
Attachments: signature.asc (0.18 KB)


jonry at pvv

Sep 14, 2007, 7:17 AM

Post #14 of 25 (910 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Prakash Punnoor wrote:
> I don't think it will matter, as adma doesn't affect MCP51, but only nforce4.
> So I'd look for other trouble makers.
>
Robert told me. (And you're correct - It didn't help).

I'm going to test another (identical) motherboard this evening to
establish whether it could be a HW-issue.

I'll keep you posted

Jon Ivar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeff at garzik

Sep 14, 2007, 7:25 AM

Post #15 of 25 (911 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jon Ivar Rykkelid wrote:
> Prakash Punnoor wrote:
>> I don't think it will matter, as adma doesn't affect MCP51, but only
>> nforce4. So I'd look for other trouble makers.
>>
> Robert told me. (And you're correct - It didn't help).

Yes, it was already in slow-and-safe mode.


> I'm going to test another (identical) motherboard this evening to
> establish whether it could be a HW-issue.

Not just motherboard. It is more likely to be a cable, drive or PSU
problem.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


htejun at gmail

Sep 14, 2007, 7:39 AM

Post #16 of 25 (909 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Prakash Punnoor wrote:
>>> I don't think it will matter, as adma doesn't affect MCP51, but only
>>> nforce4. So I'd look for other trouble makers.
>>>
>> Robert told me. (And you're correct - It didn't help).
>
> Yes, it was already in slow-and-safe mode.
>
>
>> I'm going to test another (identical) motherboard this evening to
>> establish whether it could be a HW-issue.
>
> Not just motherboard. It is more likely to be a cable, drive or PSU
> problem.

I don't think it's cable as the problem occurs on multiple ports. My
bet is either the controller or PSU.

Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeff at garzik

Sep 14, 2007, 8:58 AM

Post #17 of 25 (916 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jon Ivar Rykkelid wrote:
> It is NOT the PSU, nor is it cables, as all the drives work well using
> the same cables + PSU (in the same box) if I connect them to my other
> two controllers (in that same box).


It's sometimes the combination that matters most. You cannot really
make that determination yet.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 14, 2007, 11:38 AM

Post #18 of 25 (907 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>
>> I'm going to test another (identical) motherboard this evening to
>> establish whether it could be a HW-issue.
>
> Not just motherboard. It is more likely to be a cable, drive or PSU
> problem.

>> It is NOT the PSU, nor is it cables, as all the drives work well
>> using the same cables + PSU (in the same box) if I connect them to my
>> other two controllers (in that same box).
>
>
> It's sometimes the combination that matters most. You cannot really
> make that determination yet.
>

Whatever.
(Though I must confess, that in spite of my Master degree in Electrical
Engineering and extensive HW experience, I can not for the life of me
understand how you can find it more likely to be cables (that work fine
with other controllers), disks (that also work fine with other
controllers) or the power-supply (that also works fine with exactly the
same things connected to it) rather than the motherboard's
SATA-controller (that is the item that actually is reported to fail in
the first place). - Sure, I'm well aware that sometimes the combination
of HW matters, but to my experience we're normally not talking about
"dumb" stuff like cables and PSU if that is the issue.)

Anyway, I have just changed to the other (identical) motherboard, and
things are running just fine at the moment...
I'll let you know if they start acting up (as they did before). If not I
guess the fault was with the motherboard and not the driver - Guess
we'll know pretty soon...

Thanks for all your effort, gents, let's hope it all works now!

BR
Jon Ivar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


auxsvr at gmail

Sep 14, 2007, 1:24 PM

Post #19 of 25 (911 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Hello,

I get a similar, if not identical, problem with an ASUS A8N SLI nforce4 based
motherboard. The PC (with a seagate SATA-2 120 GB HDD) ran fine for two
years , last Christmas windows xp (I didn't change either hardware or
drivers) started crashing and the filesystem got corrupted beyond repair
within 8 hours after every installation. The system log contained entries
about bad sectors and, based on the seagate diagnosis tool, I returned the
system to the supplier. According to the retail shop, neither the disk nor
the system had any problems, so I was coerced to pay for a replacement disk.
The replacement HDD (seagate again, 120 GB) ran fine until a month ago (this
time the system is connected to a UPS), when the same problem occurred! I
moved the disk to a linux system with the promise tx2plus controller (the one
I'm typing this from), found bad sectors, formatted it and everything works
fine for at least 6 hours of continuous disk writes and reads in this system.
If I return the disk to the nforce4 system, it becomes corrupted within some
hours of disk access, no matter whether linux or windows is installed,
regardless of NCQ settings, drivers and cables.

The symptoms are the same in both cases: the system crashes, then runs for
some hours, then the controller stops completely responding (ata1: exception
Emask 0x10 SAct 0x0 SErr 0x1810000 action 0x2 frozen is the first error
message), the disk access LED blinks continuously, linux 2.6.18 (opensuse
10.2) throws lots of error messages similar to the ones you mention above,
linux says that the device is dead and the system becomes unusable (no disk
access). After a reboot, the filesystem is fine for some time, afterwards
similar error messages appear, seek errors appear and the filesystem becomes
completely destroyed. The positive part of this ordeal is that the linux SATA
error handling works fine and linux recovered the first time, without access
to the drive of course, while windows crashed badly and I was unable to find
out what was happening in the beginning.

I cannot say with certainty that this is a hardware error or damage, seagate
technical support insists that their HDD is at fault, which is obviously
wrong, the PC is (after the second incident) connected to a UPS and was
checked by the service at the shop, and the most weird thing I cannot
explain is that the system ran fine for 8 months after I changed the
disk, even though the disk wasn't damaged! Either the motherboard is damaged
or faulty (how can you explain that it ran fine for 8 months after I changed
the disk?) or there is some very weird interaction with the HDD and the SATA
controller, which isn't unlikely, considering the problems reported about
combinations of nforce4 and maxtor HDDs, yet still doesn't explain the 2 year
and 8 month period of normal operation. I'm going to contact the service
again and see how this comes out.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 14, 2007, 1:35 PM

Post #20 of 25 (907 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Hi, I'm getting inmore confident that the driver is the issue.

I have now been able to reproduce the same error on the new motherboard
as well... - (the same MB was tested to work in Windows with
windows-drivers)...

Unless you guys can come up with something clever, I'll see if I can get
my hands on / change to another (non-nvidia) chipset in a day or two, as
the sata_nv with this chipset apparently isn't working.

(Or have anyone EVER been successful with the latest kernel/driver on
this HW)?

Attaching everything relevant from /var/log/messages...


Jon Ivar Rykkelid wrote:
> I'm going to test another (identical) motherboard this evening to
> establish whether it could be a HW-issue.
>
> I'll keep you posted
Jon Ivar
Attachments: sata_nv-new.log (8.41 KB)


prakash at punnoor

Sep 15, 2007, 12:12 AM

Post #21 of 25 (904 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

On the day of Friday 14 September 2007 Jon Ivar Rykkelid hast written:
> Hi, I'm getting inmore confident that the driver is the issue.
>
>
> (Or have anyone EVER been successful with the latest kernel/driver on
> this HW)?

I don't have exaclty the same hw, but the same chipset and I don't have any
problems - even with the swncq patch applied. Do you have an hpet? If not,
try booting with acpi_use_time_override. My system won't work with skipping
the override.

--
(°= =°)
//\ Prakash Punnoor /\\
V_/ \_V
Attachments: signature.asc (0.18 KB)


jonry at pvv

Sep 15, 2007, 3:14 AM

Post #22 of 25 (906 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

Prakash Punnoor wrote:
> I don't have exaclty the same hw, but the same chipset and I don't have any
> problems - even with the swncq patch applied. Do you have an hpet? If not,
> try booting with acpi_use_time_override. My system won't work with skipping
> the override.
>
>
Hi , I reconnected and rebooted with the kernel option
"acpi_use_timer_override" (this is the correct spelling, isn't it? -
Kernel didn't complain.). Didn't help, the same error received as
before. - I'll have to connect all disks back to my PCI-connected SATA
controllers and start rebuilding my RAID yet again.

It seems random which disk is first affected (This far, I know that it
has happened to ata1, ata3 and ata4, three of my potential disks) - I
guess it just happens to the disk that is being used at the moment when
the driver / controller acts up.)

I'm about to give in. I think I'll try to replace both ( Gigabyte
GA-N650SLI-DS4 ) motherboards, as the driver simply isn't working for
the on-board controller of these boards. Could be a combination of the
controllers and some other HW on the motherboards of course, but all is
working when I connect all disks to my non-nvidia controllers. - Guess
I'll opt for a motherboard with an intel-chipset after all...

BR
Jon Ivar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


prakash at punnoor

Sep 15, 2007, 4:30 AM

Post #23 of 25 (906 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

On the day of Saturday 15 September 2007 Jon Ivar Rykkelid hast written:
> Do you get the same error messages that I do if you're running without
> the "acpi_use_timer_override" (this is how it is spelled, isn't it) ?

I don't remeber which messages I get, but for me the kernel didn't boot with
certain versions. Any yes, you spelled it correctly.
--
(°= =°)
//\ Prakash Punnoor /\\
V_/ \_V
Attachments: signature.asc (0.18 KB)


john at stoffel

Sep 15, 2007, 7:47 AM

Post #24 of 25 (906 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

>>>>> "Jon" == Jon Ivar Rykkelid <jonry[at]pvv.org> writes:

Jon> Prakash Punnoor wrote:
>> I don't have exaclty the same hw, but the same chipset and I don't have any
>> problems - even with the swncq patch applied. Do you have an hpet? If not,
>> try booting with acpi_use_time_override. My system won't work with skipping
>> the override.

Jon> Hi , I reconnected and rebooted with the kernel option
Jon> "acpi_use_timer_override" (this is the correct spelling, isn't
Jon> it? - Kernel didn't complain.). Didn't help, the same error
Jon> received as before. - I'll have to connect all disks back to my
Jon> PCI-connected SATA controllers and start rebuilding my RAID yet
Jon> again.

What happens when you just have ONE disk connected to the motherboard
controller, and the rest connected to PCI controllers? Does it crap
out then? You've just such a nice repeatable problem across
motherboards that it's a shame to waste this debugging time.

I'm wondering if it's a PCI bus issue somehow, and that the load on
the motherboard controller isn't supportable when you have a bunch of
disks on PCI controllers as well. Shot in the dark...

Thanks for all your hard work on this, I know how frustrating it is to
not have a stable system!

John
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jonry at pvv

Sep 15, 2007, 12:29 PM

Post #25 of 25 (905 views)
Permalink
Re: sata_nv issues with MCP51 SATA controller [In reply to]

John Stoffel wrote:
> What happens when you just have ONE disk connected to the motherboard
> controller, and the rest connected to PCI controllers? Does it crap
> out then? You've just such a nice repeatable problem across
> motherboards that it's a shame to waste this debugging time.
>
Sorry, I gave in. I have now abandoned my nvidia trials (both
motherboards have been returned, and I'm now running with Intel chipset)
- My current motherboard is less ideal (in terms of PCI-slots etc.), but
on the other hand it works...
> I'm wondering if it's a PCI bus issue somehow, and that the load on
> the motherboard controller isn't supportable when you have a bunch of
> disks on PCI controllers as well. Shot in the dark...
>
That was actually not such a bad idea... Unfortunately it's too late now
(If not I should have tested for sure). I was/am after all running an
8-disk SATA array (plus a normal IDE disk - not in the raid). I had 4
disks running through two PCI-cards and 4 disks used the motherboard's
controller. - When all 8 disks were connected to the two PCI-cards the
speed dropped compared to when the motherboard's controller took some
load.. (So it could maybe be an issue with bandwidth / load ? - I don't
know.)
> Thanks for all your hard work on this, I know how frustrating it is to
> not have a stable system!
>
Sorry for giving in, but I felt I was banging my head against the wall
(and with too few sensible solutions being suggested). Now I guess I'm
semi-happy that all seems to work OK with the Intel chipset..
Frustrating that the sata_nv-driver / nvidia HW didn't work with my
configuration, though...

Thank you all for your effort as well - hope someone figures this out
sometime in the future.

All the best
Jon Ivar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.