Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338

 

 

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


tripie at cpan

Jul 19, 2008, 10:13 PM

Post #1 of 40 (4746 views)
Permalink
[PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338

Hello,

this message includes a patch that provides a workaround for
a silent data corruption bug caused by incorrect error handling in
the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
USB device id 152d:2338.


- the problem occurs quite rarely, approx. once for
every 20 GB of transfered data during heavy load

- it seems that only read operations are affected

- the problem is accompanied by these messages in syslog each
time it occurs:

May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] Sense Key : 0x0 [current]
May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] ASC=0x0 ASCQ=0x0

- the bug is not detected as an error and incorrect data is returned,
causing insidious data corruption

- tested with 3 external disk enclosures (Akasa Integral AK-ENP2SATA-BL)
with different disks on different computers, with kernel 2.6.24 and 2.6.25

- the patch provides a crude workaround by detecting the error condition
and retrying the faulty transfer


The fix needs a review as I don't know much about USB and SCSI.
It's possible that this approach is wrong and that the problem should
be fixed somewhere else.

There are other problems with this chipset that make it necessary
to disconnect and power off the enclosure from time to time, but at least
there's no data corruption anymore.


--
Tomas Styblo <tripie [at] cpan>
PGP: C97EA4B6 / 817E B8A8 1AFD 3256 3181 A730 85CF 7BEB C97E A4B6
Attachments: linux-kernel-2.6.25.9-usb-storage-jmicron-JM20337-no-sense-bugfix.patch (1.19 KB)


hancockr at shaw

Jul 21, 2008, 12:37 PM

Post #2 of 40 (4617 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

(adding CCs)

Tomas Styblo wrote:
>
> Hello,
>
> this message includes a patch that provides a workaround for
> a silent data corruption bug caused by incorrect error handling in
> the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> USB device id 152d:2338.
>
>
> - the problem occurs quite rarely, approx. once for
> every 20 GB of transfered data during heavy load
>
> - it seems that only read operations are affected
>
> - the problem is accompanied by these messages in syslog each
> time it occurs:
>
> May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] Sense Key : 0x0 [current]
> May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] ASC=0x0 ASCQ=0x0
>
> - the bug is not detected as an error and incorrect data is returned,
> causing insidious data corruption
>
> - tested with 3 external disk enclosures (Akasa Integral AK-ENP2SATA-BL)
> with different disks on different computers, with kernel 2.6.24 and 2.6.25
>
> - the patch provides a crude workaround by detecting the error condition
> and retrying the faulty transfer
>
>
> The fix needs a review as I don't know much about USB and SCSI.
> It's possible that this approach is wrong and that the problem should
> be fixed somewhere else.
>
> There are other problems with this chipset that make it necessary
> to disconnect and power off the enclosure from time to time, but at least
> there's no data corruption anymore.

I'm not sure this is a good approach. More that this code right above in
usb_stor_invoke_transport, which your code undoes the effect of for this
device, doesn't seem right:

/* If things are really okay, then let's show that. Zero
* out the sense buffer so the higher layers won't realize
* we did an unsolicited auto-sense. */
if (result == USB_STOR_TRANSPORT_GOOD &&
/* Filemark 0, ignore EOM, ILI 0, no sense */
(srb->sense_buffer[2] & 0xaf) == 0 &&
/* No ASC or ASCQ */
srb->sense_buffer[12] == 0 &&
srb->sense_buffer[13] == 0) {
srb->result = SAM_STAT_GOOD;
srb->sense_buffer[0] = 0x0;
}

So if the transport initially gets a failure, but then request sense
doesn't show any error, we just go "hmm, guess it was ok after all".
That seems kind of dangerous, I shouldn't think we should assume a
successful transfer occurred if we got any kind of error.

If you just delete that code above, does the corruption go away?

Original attached patch was (likely whitespace damaged now):

--- linux-2.6.25.9/drivers/usb/storage/transport.c.orig 2008-06-24
23:09:06.000000000 +0200
+++ linux-2.6.25.9/drivers/usb/storage/transport.c 2008-07-20
05:14:32.000000000 +0200
@@ -661,6 +661,21 @@ void usb_stor_invoke_transport(struct sc
srb->result = SAM_STAT_GOOD;
srb->sense_buffer[0] = 0x0;
}
+
+ /* JMicron JM20337 chipset bug workaround - BEGIN */
+ if (us->pusb_dev->descriptor.idVendor == 0x152d &&
+ us->pusb_dev->descriptor.idProduct == 0x2338 &&
+ result == USB_STOR_TRANSPORT_FAILED &&
+ /* Filemark 0, ignore EOM, ILI 0, no sense */
+ (srb->sense_buffer[2] & 0xaf) == 0 &&
+ /* No ASC or ASCQ */
+ srb->sense_buffer[12] == 0 &&
+ srb->sense_buffer[13] == 0) {
+ printk(KERN_WARNING "USB Storage - Working around the
JMicron JM20337 chipset bug (idVendor=%04x, idProduct=%04x, NO_SENSE,
ASC=0, ASCQ=0) - retrying the read operation\n",
us->pusb_dev->descriptor.idVendor, us->pusb_dev->descriptor.idProduct);
+ srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);
+ return;
+ }
+ /* JMicron JM20337 chipset bug workaround - END */
}

/* Did we transfer less than the minimum amount required? */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 21, 2008, 7:37 PM

Post #3 of 40 (4618 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Mon, 21 Jul 2008, Robert Hancock wrote:

> (adding CCs)
>
> Tomas Styblo wrote:
> >
> > Hello,
> >
> > this message includes a patch that provides a workaround for
> > a silent data corruption bug caused by incorrect error handling in
> > the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> > USB device id 152d:2338.

The two of you should read through

http://bugzilla.kernel.org/show_bug.cgi?id=9638

which concerns this very problem.

> > - the problem occurs quite rarely, approx. once for
> > every 20 GB of transfered data during heavy load
> >
> > - it seems that only read operations are affected
> >
> > - the problem is accompanied by these messages in syslog each
> > time it occurs:
> >
> > May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] Sense Key : 0x0 [current]
> > May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] ASC=0x0 ASCQ=0x0
> >
> > - the bug is not detected as an error and incorrect data is returned,
> > causing insidious data corruption
> >
> > - tested with 3 external disk enclosures (Akasa Integral AK-ENP2SATA-BL)
> > with different disks on different computers, with kernel 2.6.24 and 2.6.25
> >
> > - the patch provides a crude workaround by detecting the error condition
> > and retrying the faulty transfer
> >
> >
> > The fix needs a review as I don't know much about USB and SCSI.
> > It's possible that this approach is wrong and that the problem should
> > be fixed somewhere else.
> >
> > There are other problems with this chipset that make it necessary
> > to disconnect and power off the enclosure from time to time, but at least
> > there's no data corruption anymore.
>
> I'm not sure this is a good approach. More that this code right above in
> usb_stor_invoke_transport, which your code undoes the effect of for this
> device, doesn't seem right:
>
> /* If things are really okay, then let's show that. Zero
> * out the sense buffer so the higher layers won't realize
> * we did an unsolicited auto-sense. */
> if (result == USB_STOR_TRANSPORT_GOOD &&
> /* Filemark 0, ignore EOM, ILI 0, no sense */
> (srb->sense_buffer[2] & 0xaf) == 0 &&
> /* No ASC or ASCQ */
> srb->sense_buffer[12] == 0 &&
> srb->sense_buffer[13] == 0) {
> srb->result = SAM_STAT_GOOD;
> srb->sense_buffer[0] = 0x0;
> }
>
> So if the transport initially gets a failure, but then request sense
> doesn't show any error, we just go "hmm, guess it was ok after all".
> That seems kind of dangerous, I shouldn't think we should assume a

No, no -- you have misread the code. If the transport initially got a
failure then result would be equal to USB_STOR_TRANSPORT_FAILED, not
USB_STOR_TRANSPORT_GOOD, so this code wouldn't run.

> If you just delete that code above, does the corruption go away?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tripie at cpan

Jul 21, 2008, 10:11 PM

Post #4 of 40 (4612 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

* Robert Hancock <hancockr [at] shaw> [Mon, 21 Jul 2008]:
>
> I'm not sure this is a good approach. More that this code right above in
> usb_stor_invoke_transport, which your code undoes the effect of for this
> device, doesn't seem right:
>
> /* If things are really okay, then let's show that. Zero
> * out the sense buffer so the higher layers won't realize
> * we did an unsolicited auto-sense. */
> if (result == USB_STOR_TRANSPORT_GOOD &&
> /* Filemark 0, ignore EOM, ILI 0, no sense */
> (srb->sense_buffer[2] & 0xaf) == 0 &&
> /* No ASC or ASCQ */
> srb->sense_buffer[12] == 0 &&
> srb->sense_buffer[13] == 0) {
> srb->result = SAM_STAT_GOOD;
> srb->sense_buffer[0] = 0x0;
> }
>

The patch doesn't exactly undo the effect of the code above,
because the value of _result_ is different. When this problem
happens, the condition above is false, _result_ is
USB_STOR_TRANSPORT_FAILED and scsi_get_resid(srb) > 0, but the
chipset doesn't report any error (NO_SENSE,ASC==0,ASCQ==0). That's
why I think there's something wrong with the chipset.

There are Windows users on various message boards who report the
same problem with this chipset - a kind of silent data corruption
that occurs only when copying large amounts of data.

But as I said I know little about SCSI and USB. I tried to locate
and fix the problem, but I can't tell whether the current error
handling code is written according to the relevant standards.

A more generic approach would certainly be better than hardcoded
device ids. Perhaps this check should be enabled for all devices?
Why not?



--
Tomas Styblo <tripie [at] cpan>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hancockr at shaw

Jul 21, 2008, 10:31 PM

Post #5 of 40 (4609 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

Tomas Styblo wrote:
> * Robert Hancock <hancockr [at] shaw> [Mon, 21 Jul 2008]:
>> I'm not sure this is a good approach. More that this code right above in
>> usb_stor_invoke_transport, which your code undoes the effect of for this
>> device, doesn't seem right:
>>
>> /* If things are really okay, then let's show that. Zero
>> * out the sense buffer so the higher layers won't realize
>> * we did an unsolicited auto-sense. */
>> if (result == USB_STOR_TRANSPORT_GOOD &&
>> /* Filemark 0, ignore EOM, ILI 0, no sense */
>> (srb->sense_buffer[2] & 0xaf) == 0 &&
>> /* No ASC or ASCQ */
>> srb->sense_buffer[12] == 0 &&
>> srb->sense_buffer[13] == 0) {
>> srb->result = SAM_STAT_GOOD;
>> srb->sense_buffer[0] = 0x0;
>> }
>>
>
> The patch doesn't exactly undo the effect of the code above,
> because the value of _result_ is different. When this problem

Yes, you and Alan are right, that code only triggers on a good result.

> happens, the condition above is false, _result_ is
> USB_STOR_TRANSPORT_FAILED and scsi_get_resid(srb) > 0, but the
> chipset doesn't report any error (NO_SENSE,ASC==0,ASCQ==0). That's
> why I think there's something wrong with the chipset.

I'm assuming what triggers the transport failure is it getting a
US_BULK_STAT_FAIL result from the transfer inside
usb_stor_Bulk_transport. It could be there's some condition in the chip
that causes that error, yet doesn't provide any sense data.

In any case, given that your code apparently fixes the corruption it
seems that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the
DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the
SCSI layer looks at the sense data and says "hmm, nothing to worry
about here" and carries on.

I think we do need something like your patch, though it should likely be
moved inside the if (need_auto_sense) check, and I don't see a reason to
limit to this device ID only.

>
> There are Windows users on various message boards who report the
> same problem with this chipset - a kind of silent data corruption
> that occurs only when copying large amounts of data.
>
> But as I said I know little about SCSI and USB. I tried to locate
> and fix the problem, but I can't tell whether the current error
> handling code is written according to the relevant standards.
>
> A more generic approach would certainly be better than hardcoded
> device ids. Perhaps this check should be enabled for all devices?
> Why not?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tripie at cpan

Jul 21, 2008, 11:11 PM

Post #6 of 40 (4613 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

* Robert Hancock <hancockr [at] shaw> [Tue, 22 Jul 2008]:
>
> In any case, given that your code apparently fixes the corruption it seems
> that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the
> DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the
> SCSI layer looks at the sense data and says "hmm, nothing to worry about
> here" and carries on.

That's exactly what I thought was happening, after a cursory
look at the SCSI code.

>
> I think we do need something like your patch, though it should likely be
> moved inside the if (need_auto_sense) check, and I don't see a reason to
> limit to this device ID only.

Thank you. This is a very insidious bug as it doesn't manifest
itself very often, months of data corruption may pass before you
notice it.

So is there a bug in the chipset, or does the error handling code
not follow specifications?

I wonder if the company that makes the chipset should be notified
about this problem?


--
Tomas Styblo <tripie [at] cpan>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hancockr at shaw

Jul 22, 2008, 1:45 AM

Post #7 of 40 (4616 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

Tomas Styblo wrote:
> * Robert Hancock <hancockr [at] shaw> [Tue, 22 Jul 2008]:
>> In any case, given that your code apparently fixes the corruption it seems
>> that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the
>> DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the
>> SCSI layer looks at the sense data and says "hmm, nothing to worry about
>> here" and carries on.
>
> That's exactly what I thought was happening, after a cursory
> look at the SCSI code.
>
>> I think we do need something like your patch, though it should likely be
>> moved inside the if (need_auto_sense) check, and I don't see a reason to
>> limit to this device ID only.
>
> Thank you. This is a very insidious bug as it doesn't manifest
> itself very often, months of data corruption may pass before you
> notice it.
>
> So is there a bug in the chipset, or does the error handling code
> not follow specifications?

It looks clear to me that it's a bug in the chipset. It's supposed to
set some valid sense data if an error occurs, not just set the "failed"
flag in the USB storage status word. (Presumably the fact that these
errors are occurring in the first place is a bug in itself.. though that
could be a problem with the enclosure or drive as well.)

However the kernel should be more robust and not ignore the error
indication that it is giving.

> I wonder if the company that makes the chipset should be notified
> about this problem?

I suppose it wouldn't hurt to let JMicron know about this. I doubt they
could do anything for existing chipsets, but it might help them avoid
this bug in future designs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tripie at cpan

Jul 22, 2008, 2:03 AM

Post #8 of 40 (4643 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

* Alan Stern <stern [at] rowland> [Tue, 22 Jul 2008]:
> On Mon, 21 Jul 2008, Robert Hancock wrote:
> > > this message includes a patch that provides a workaround for
> > > a silent data corruption bug caused by incorrect error handling in
> > > the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> > > USB device id 152d:2338.
>
> The two of you should read through
>
> http://bugzilla.kernel.org/show_bug.cgi?id=9638
>
> which concerns this very problem.

I had found this bugreport and read through it before I posted my
patch. I don't think this is the same problem. The error messages
and the description of the problem are different from what I've
been trying to fix.

Anyway, I'll send the patch to this person so he can try it.
I guess it won't fix his problem. This patch is much simpler and doesn't
need any delays - I really think this is a different situation.

I sometimes experience the problems described by this person, as I
noted in the first message with the patch. When these "reset high
speed USB device" messages appear, it is usually necessary to
disconnect and power off the device. In my experience, data
corruption can be prevented in this situation by setting:

# tune2fs -e remount-ro /dev/sdX

But this is a different problem - in this situation the error
actually IS detected.



--
Tomas Styblo <tripie [at] cpan>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 22, 2008, 7:38 PM

Post #9 of 40 (4603 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Tue, 22 Jul 2008, Tomas Styblo wrote:

> * Alan Stern <stern [at] rowland> [Tue, 22 Jul 2008]:
> > On Mon, 21 Jul 2008, Robert Hancock wrote:
> > > > this message includes a patch that provides a workaround for
> > > > a silent data corruption bug caused by incorrect error handling in
> > > > the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> > > > USB device id 152d:2338.
> >
> > The two of you should read through
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=9638
> >
> > which concerns this very problem.
>
> I had found this bugreport and read through it before I posted my
> patch. I don't think this is the same problem. The error messages
> and the description of the problem are different from what I've
> been trying to fix.

Maybe the initial descriptions are. If you read through the entire
report, though, you'll see that the underlying problems are exactly
the same:

A READ fails to transfer all the data requested. The device
sends back Check Condition status, but the sense data is all
0, indicating that nothing was wrong. The system does not
retry the READ, leading to data corruption.

A WRITE fails and times out after 30 seconds. The system
tries to reset the device, but the reset fails.

The system doesn't detect the first problem for two reasons: The
scsi_eh_prep_cmnd routine fails to preserve scmd->underflow, and
usb-storage fails to check for underflow when the device returns Check
Condition status. The Bugzilla report contains fixes for both of
these, and they solve the problem.

The second problem is harder. The device is supposed to send a STALL
to cut the WRITE short -- and a USB trace under Windows seems to
indicate that it does -- but under Linux no STALL is received. I
don't know why not. But since Linux does not respond in the way the
device expects, the device crashes.

> Anyway, I'll send the patch to this person so he can try it.
> I guess it won't fix his problem. This patch is much simpler and doesn't
> need any delays - I really think this is a different situation.

It isn't. And your patch is an ad-hoc correction that doesn't address
the true underlying reasons for the errors.

You should also try adding the delay mentioned in the bug report.
There's an excellent chance it will also prevent your problems.

> I sometimes experience the problems described by this person, as I
> noted in the first message with the patch. When these "reset high
> speed USB device" messages appear, it is usually necessary to
> disconnect and power off the device.

Because the device's firmware has crashed. That's why the reset fails.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hancockr at shaw

Jul 23, 2008, 4:20 PM

Post #10 of 40 (4627 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

Alan Stern wrote:
> The system doesn't detect the first problem for two reasons: The
> scsi_eh_prep_cmnd routine fails to preserve scmd->underflow, and
> usb-storage fails to check for underflow when the device returns Check
> Condition status. The Bugzilla report contains fixes for both of
> these, and they solve the problem.
>
> The second problem is harder. The device is supposed to send a STALL
> to cut the WRITE short -- and a USB trace under Windows seems to
> indicate that it does -- but under Linux no STALL is received. I
> don't know why not. But since Linux does not respond in the way the
> device expects, the device crashes.
>
>> Anyway, I'll send the patch to this person so he can try it.
>> I guess it won't fix his problem. This patch is much simpler and doesn't
>> need any delays - I really think this is a different situation.
>
> It isn't. And your patch is an ad-hoc correction that doesn't address
> the true underlying reasons for the errors.

Tomas, you should try the patch listed in that bug report (well, except
maybe the part that's actually adding the JMicron unusual devices entry)
and see if that fixes the problem.

It remains an issue, though, that if there's no underflow, if the device
reports an error in the CSW but doesn't provide sense data, we assume
nothing bad happened and don't retry. That definitely does not seem
correct. The device is not supposed to do this, but with how crappily
some of these devices are designed we should be more defensive.

It's pretty gross to add random delays in without some good evidence or
input from the manufacturer that it will really fix the problem. (We do
have one GO_SLOW flag with a delay in there already for some Genesys
chips, but I believe that was based on some actual input from Genesys.)

>
> You should also try adding the delay mentioned in the bug report.
> There's an excellent chance it will also prevent your problems.
>
>> I sometimes experience the problems described by this person, as I
>> noted in the first message with the patch. When these "reset high
>> speed USB device" messages appear, it is usually necessary to
>> disconnect and power off the device.
>
> Because the device's firmware has crashed. That's why the reset fails.
>
> Alan Stern
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 23, 2008, 8:42 PM

Post #11 of 40 (4608 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Wed, 23 Jul 2008, Robert Hancock wrote:

> It remains an issue, though, that if there's no underflow, if the device
> reports an error in the CSW but doesn't provide sense data, we assume
> nothing bad happened and don't retry. That definitely does not seem
> correct. The device is not supposed to do this, but with how crappily
> some of these devices are designed we should be more defensive.

The problem is, what can you do? The device has said that something
was wrong, but it hasn't told you what. Without knowing what went
wrong, you can't know how to recover.

I suppose in such cases we could simply report that the command failed
completely.

> It's pretty gross to add random delays in without some good evidence or
> input from the manufacturer that it will really fix the problem. (We do
> have one GO_SLOW flag with a delay in there already for some Genesys
> chips, but I believe that was based on some actual input from Genesys.)

That's right.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alex.buell at munted

Jul 23, 2008, 11:15 PM

Post #12 of 40 (4613 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Tue, 22 Jul 2008 02:45:32 -0600, I waved a wand and this message
magically appears in front of Robert Hancock:

> I suppose it wouldn't hurt to let JMicron know about this. I doubt
> they could do anything for existing chipsets, but it might help them
> avoid this bug in future designs.

I, too, have the same chipet. I have observed the same corruption bug
with SATA disks only. PATA disks works perfectly.

When I have more time and a spare SATA disk, I will try and reproduce
the same problem and post results on here, in the hope someone will
come up with a fix for it.

It is very annoying not being able to use cheaper SATA disks for
backing up my boxes instead of slightly more expensive PATA disks.
--
http://www.munted.org.uk

Fearsome grindings.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tripie at cpan

Jul 25, 2008, 1:44 AM

Post #13 of 40 (4584 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

* Robert Hancock <hancockr [at] shaw> [Thu, 24 Jul 2008]:
>>> Anyway, I'll send the patch to this person so he can try it. I guess it
>>> won't fix his problem. This patch is much simpler and doesn't need any
>>> delays - I really think this is a different situation.
>>
>> It isn't. And your patch is an ad-hoc correction that doesn't address
>> the true underlying reasons for the errors.
>
> Tomas, you should try the patch listed in that bug report (well, except
> maybe the part that's actually adding the JMicron unusual devices entry)
> and see if that fixes the problem.

I'll try that patch next week - but I have no confidence in the
reliability of fixing the problem by inserting random delays. All
I have wanted to achieve is to detect the errors and thus prevent
the silent data corruption.

I don't really mind those sporadic forced resets since they cause
no data corruption, at least in my setup. My simple patch
accomplishes the goal of preventing data corruption more reliably,
I think.


--
Tomas Styblo <tripie [at] cpan>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hancockr at shaw

Jul 25, 2008, 1:54 AM

Post #14 of 40 (4573 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

Tomas Styblo wrote:
> * Robert Hancock <hancockr [at] shaw> [Thu, 24 Jul 2008]:
>>>> Anyway, I'll send the patch to this person so he can try it. I guess it
>>>> won't fix his problem. This patch is much simpler and doesn't need any
>>>> delays - I really think this is a different situation.
>>> It isn't. And your patch is an ad-hoc correction that doesn't address
>>> the true underlying reasons for the errors.
>> Tomas, you should try the patch listed in that bug report (well, except
>> maybe the part that's actually adding the JMicron unusual devices entry)
>> and see if that fixes the problem.
>
> I'll try that patch next week - but I have no confidence in the
> reliability of fixing the problem by inserting random delays. All
> I have wanted to achieve is to detect the errors and thus prevent
> the silent data corruption.
>
> I don't really mind those sporadic forced resets since they cause
> no data corruption, at least in my setup. My simple patch
> accomplishes the goal of preventing data corruption more reliably,
> I think.

I agree, that's why I mentioned to leave out the part that adds the
unusual devices entry which actually enables the random delays. There's
another part of the patch which fixes underflow detection, which might
have the same effect as your patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alex.buell at munted

Jul 29, 2008, 2:09 PM

Post #15 of 40 (4522 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Tue, 22 Jul 2008 02:45:32 -0600, I waved a wand and this message
magically appears in front of Robert Hancock:

> I suppose it wouldn't hurt to let JMicron know about this. I doubt
> they could do anything for existing chipsets, but it might help them
> avoid this bug in future designs.

I can reproduce this EVERY time with SATA disks using the same JM20337
chipset.

1. mke2fs /dev/sda1
2. tunefs -j /dev/sda1
3. e2fsck /dev/sda1

This spits out a whole load of:
Jul 29 22:04:41 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:04:41 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:43 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:04:43 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:44 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:04:44 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:31 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:06:31 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:32 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:06:32 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:35 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:06:35 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:54 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:06:54 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:07:47 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:07:47 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:08:04 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:08:04 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:08:06 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current]
Jul 29 22:08:06 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0

--
http://www.munted.org.uk

Fearsome grindings.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mdharm-kernel at one-eyed-alien

Jul 29, 2008, 3:33 PM

Post #16 of 40 (4565 views)
Permalink
Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Tue, Jul 29, 2008 at 10:09:08PM +0100, Alex Buell wrote:
> On Tue, 22 Jul 2008 02:45:32 -0600, I waved a wand and this message
> magically appears in front of Robert Hancock:
>
> > I suppose it wouldn't hurt to let JMicron know about this. I doubt
> > they could do anything for existing chipsets, but it might help them
> > avoid this bug in future designs.
>
> I can reproduce this EVERY time with SATA disks using the same JM20337
> chipset.
>
> 1. mke2fs /dev/sda1
> 2. tunefs -j /dev/sda1
> 3. e2fsck /dev/sda1

I already sent a note to my contact at JMicron, but got no response.

Matt

--
Matthew Dharm Home: mdharm-usb [at] one-eyed-alien
Maintainer, Linux USB Mass Storage Driver

Somebody call an exorcist!
-- Dust Puppy
User Friendly, 5/16/1998


hancockr at shaw

Jul 30, 2008, 12:55 PM

Post #17 of 40 (4520 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

Alan Stern wrote:
> On Wed, 23 Jul 2008, Robert Hancock wrote:
>
>> It remains an issue, though, that if there's no underflow, if the device
>> reports an error in the CSW but doesn't provide sense data, we assume
>> nothing bad happened and don't retry. That definitely does not seem
>> correct. The device is not supposed to do this, but with how crappily
>> some of these devices are designed we should be more defensive.
>
> The problem is, what can you do? The device has said that something
> was wrong, but it hasn't told you what. Without knowing what went
> wrong, you can't know how to recover.
>
> I suppose in such cases we could simply report that the command failed
> completely.

I think that is what we need to do. The SCSI/block layers should retry
the command or report a failure to userspace. Above all else we can't
just continue on our merry way and assume success, otherwise data will
get silently corrupted.

It could be that's what Windows is doing, and it's just silently
retrying commands. Though, some people have reported corruption on these
chipsets on Windows too, so maybe it has the same issue.

Is that SCSI/USB underflow fix from
http://bugzilla.kernel.org/show_bug.cgi?id=9638 being pushed forward by
anybody? If not, I can try and cook up a patch to take care of that as
well as the non-underflow failed CSW/no sense data issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mdharm-kernel at one-eyed-alien

Jul 30, 2008, 1:00 PM

Post #18 of 40 (4515 views)
Permalink
Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Wed, Jul 30, 2008 at 01:55:25PM -0600, Robert Hancock wrote:
> Alan Stern wrote:
> > On Wed, 23 Jul 2008, Robert Hancock wrote:
> >
> >> It remains an issue, though, that if there's no underflow, if the device
> >> reports an error in the CSW but doesn't provide sense data, we assume
> >> nothing bad happened and don't retry. That definitely does not seem
> >> correct. The device is not supposed to do this, but with how crappily
> >> some of these devices are designed we should be more defensive.
> >
> > The problem is, what can you do? The device has said that something
> > was wrong, but it hasn't told you what. Without knowing what went
> > wrong, you can't know how to recover.

Yes and no. If ASC/ASCQ is clear, then it's telling you that nothing is
wrong. The device is contradicting itself. That doesn't really help us
here, but it's a point I like to be clear on.

> > I suppose in such cases we could simply report that the command failed
> > completely.
>
> I think that is what we need to do. The SCSI/block layers should retry
> the command or report a failure to userspace. Above all else we can't
> just continue on our merry way and assume success, otherwise data will
> get silently corrupted.

The code path to supress the reporting of an error when auto-sense shows no
ASC/ASCQ was added for a reason. That reason has likely been lost to time,
but I worry about devices that are out there that rely on the current
behavior to function properly....

Matt

--
Matthew Dharm Home: mdharm-usb [at] one-eyed-alien
Maintainer, Linux USB Mass Storage Driver

A: The most ironic oxymoron wins ...
DP: "Microsoft Works"
A: Uh, okay, you win.
-- A.J. & Dust Puppy
User Friendly, 1/18/1998


hancockr at shaw

Jul 30, 2008, 1:46 PM

Post #19 of 40 (4515 views)
Permalink
Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

Matthew Dharm wrote:
> On Wed, Jul 30, 2008 at 01:55:25PM -0600, Robert Hancock wrote:
>> Alan Stern wrote:
>>> On Wed, 23 Jul 2008, Robert Hancock wrote:
>>>
>>>> It remains an issue, though, that if there's no underflow, if the device
>>>> reports an error in the CSW but doesn't provide sense data, we assume
>>>> nothing bad happened and don't retry. That definitely does not seem
>>>> correct. The device is not supposed to do this, but with how crappily
>>>> some of these devices are designed we should be more defensive.
>>> The problem is, what can you do? The device has said that something
>>> was wrong, but it hasn't told you what. Without knowing what went
>>> wrong, you can't know how to recover.
>
> Yes and no. If ASC/ASCQ is clear, then it's telling you that nothing is
> wrong. The device is contradicting itself. That doesn't really help us
> here, but it's a point I like to be clear on.
>
>>> I suppose in such cases we could simply report that the command failed
>>> completely.
>> I think that is what we need to do. The SCSI/block layers should retry
>> the command or report a failure to userspace. Above all else we can't
>> just continue on our merry way and assume success, otherwise data will
>> get silently corrupted.
>
> The code path to supress the reporting of an error when auto-sense shows no
> ASC/ASCQ was added for a reason. That reason has likely been lost to time,
> but I worry about devices that are out there that rely on the current
> behavior to function properly....

My original comment was that that code should be removed, but this is
incorrect. In fact that code path is unrelated to this problem since it
only executes if no transport error was detected. This code path is
needed since retrieving sense data is done for multiple reasons other
than a transport failure. For one, "If we're running the CB transport,
which is incapable of determining status on its own, we will auto-sense
unless the operation involved a data-in transfer." In this case, for a
successful transfer the status must be reset to good after getting the
sense data.

In the case in question here, the BOT transport reports a failure, and
we retrieve sense data, but the sense data doesn't indicate an error.
This results in the failure essentially being ignored. In this case I
think we should be doing the same thing as we do on detecting an underflow:

srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 30, 2008, 2:07 PM

Post #20 of 40 (4553 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Wed, 30 Jul 2008, Robert Hancock wrote:

> > I suppose in such cases we could simply report that the command failed
> > completely.
>
> I think that is what we need to do. The SCSI/block layers should retry
> the command or report a failure to userspace. Above all else we can't
> just continue on our merry way and assume success, otherwise data will
> get silently corrupted.
>
> It could be that's what Windows is doing, and it's just silently
> retrying commands. Though, some people have reported corruption on these
> chipsets on Windows too, so maybe it has the same issue.

It's probably a combination of problems.

> Is that SCSI/USB underflow fix from
> http://bugzilla.kernel.org/show_bug.cgi?id=9638 being pushed forward by
> anybody? If not, I can try and cook up a patch to take care of that as
> well as the non-underflow failed CSW/no sense data issue.

The patch situation is a little confused. The change to the SCSI stack
are already present in 2.6.27-rc1. I am not pushing the added delay
patch; I want to do some testing of my own and hear from other people
about it before submitting it.

The USB change discussed there is written but not yet submitted. It
also needs testing. Would you like to try it under 2.6.27-rc1? Here
it is.

Alan Stern


Index: usb-2.6/drivers/usb/storage/transport.c
===================================================================
--- usb-2.6.orig/drivers/usb/storage/transport.c
+++ usb-2.6/drivers/usb/storage/transport.c
@@ -663,7 +663,7 @@ void usb_stor_invoke_transport(struct sc
}

/* Did we transfer less than the minimum amount required? */
- if (srb->result == SAM_STAT_GOOD &&
+ if ((srb->result == SAM_STAT_GOOD || srb->sense_buffer[2] == 0) &&
scsi_bufflen(srb) - scsi_get_resid(srb) < srb->underflow)
srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 30, 2008, 2:18 PM

Post #21 of 40 (4519 views)
Permalink
Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Wed, 30 Jul 2008, Matthew Dharm wrote:

> The code path to supress the reporting of an error when auto-sense shows no
> ASC/ASCQ was added for a reason. That reason has likely been lost to time,
> but I worry about devices that are out there that rely on the current
> behavior to function properly....

In this case we shouldn't have to worry about that code path. It
applies to situations where there actually is no error. In fact, if
you follow the logic flow you'll see that the supression code gets
invoked only for CB-like transports, i.e., unsolicited auto-sense.

But in this situation there really _is_ an error: We get an underflow.
The existing logic checks for underflow only when the device reports
Okay status; my patch changes the logic so that it also checks for
underflow when the device reports Check Condition status with no sense
data.

The idea is that the higher SCSI layers don't know what to do with
Check Condition plus no-sense -- so they treat the transfer as a
success. But that's definitely the wrong thing to do when there was an
underflow; in that situation we want to report the underflow so that
the midlayer will retry the command.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alex.buell at munted

Aug 1, 2008, 2:15 PM

Post #22 of 40 (4507 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Wed, 30 Jul 2008 17:07:30 -0400 (EDT), I waved a wand and this
message magically appears in front of Alan Stern:

> Index: usb-2.6/drivers/usb/storage/transport.c
[ snip ]

I've just applied this patch to 2.6.25 tonight, and so far, despite my
number of tests, I've yet to experience any more of these failures.

I've done a lot of mkfs and fsck, haven't seen any problems so far.
Fingers crossed that this patch works.
--
http://www.munted.org.uk

Fearsome grindings.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alex.buell at munted

Aug 1, 2008, 3:22 PM

Post #23 of 40 (4489 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Fri, 1 Aug 2008 22:15:04 +0100, I waved a wand and this message
magically appears in front of Alex Buell:

> I've done a lot of mkfs and fsck, haven't seen any problems so far.
> Fingers crossed that this patch works.

Spoke too soon, that patch that was posted doesn't work for me. Here's
my log:

Aug 1 23:18:08 lithium scsi 5:0:0:0: Direct-Access SAMSUNG SP1213C 0-30 PQ: 0 ANSI: 2 CCS
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
Aug 1 23:18:08 lithium sda: sda1
Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Attached SCSI disk
Aug 1 23:18:08 lithium sd 5:0:0:0: Attached scsi generic sg1 type 0
Aug 1 23:18:08 lithium usb-storage: device scan complete
Aug 1 23:19:11 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
Aug 1 23:19:11 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug 1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page
Aug 1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page
Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug 1 23:20:01 lithium cron[13504]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Aug 1 23:20:34 lithium usb 1-3.4.3: USB disconnect, address 8

:-(
--
http://www.munted.org.uk

Fearsome grindings.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hancockr at shaw

Aug 1, 2008, 7:32 PM

Post #24 of 40 (4489 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

Alex Buell wrote:
> On Fri, 1 Aug 2008 22:15:04 +0100, I waved a wand and this message
> magically appears in front of Alex Buell:
>
>> I've done a lot of mkfs and fsck, haven't seen any problems so far.
>> Fingers crossed that this patch works.
>
> Spoke too soon, that patch that was posted doesn't work for me. Here's
> my log:
>
> Aug 1 23:18:08 lithium scsi 5:0:0:0: Direct-Access SAMSUNG SP1213C 0-30 PQ: 0 ANSI: 2 CCS
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
> Aug 1 23:18:08 lithium sda: sda1
> Aug 1 23:18:08 lithium sd 5:0:0:0: [sda] Attached SCSI disk
> Aug 1 23:18:08 lithium sd 5:0:0:0: Attached scsi generic sg1 type 0
> Aug 1 23:18:08 lithium usb-storage: device scan complete
> Aug 1 23:19:11 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
> Aug 1 23:19:11 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0

These patches (mine or Alan's) won't stop those messages from showing
up. The request should get retried, though.

> Aug 1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page
> Aug 1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page

Not sure what is producing these messages?

> Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
> Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
> Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current]
> Aug 1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> Aug 1 23:20:01 lithium cron[13504]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
> Aug 1 23:20:34 lithium usb 1-3.4.3: USB disconnect, address 8
>
> :-(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Aug 2, 2008, 4:49 PM

Post #25 of 40 (4512 views)
Permalink
Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 [In reply to]

On Fri, 1 Aug 2008, Alex Buell wrote:

> On Fri, 1 Aug 2008 22:15:04 +0100, I waved a wand and this message
> magically appears in front of Alex Buell:
>
> > I've done a lot of mkfs and fsck, haven't seen any problems so far.
> > Fingers crossed that this patch works.
>
> Spoke too soon, that patch that was posted doesn't work for me. Here's
> my log:

Which patch? Several different ones have been posted.

The log doesn't provide much help. We need to see a log with
CONFIG_USB_STORAGE_DEBUG enabled.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.