Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] cciss: Ignore stale commands after reboot

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


hare at suse

Jul 2, 2009, 1:23 AM

Post #1 of 14 (400 views)
Permalink
[PATCH] cciss: Ignore stale commands after reboot

When doing an unexpected shutdown like kexec the cciss
firmware might still have some commands in flight, which
it is trying to complete.
The driver is doing it's best on resetting the HBA,
but sadly there's a firmware issue causing the firmware
_not_ to abort or drop old commands.
So the firmware will send us commands which we haven't
accounted for, causing the driver to panic.

With this patch we're just ignoring these commands as
there is nothing we could be doing with them anyway.

Signed-off-by: Hannes Reinecke <hare [at] suse>
---
drivers/block/cciss.c | 14 ++++++++++++--
drivers/block/cciss_cmd.h | 1 +
2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index c7a527c..8dd4c0d 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -226,7 +226,16 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)

static inline void removeQ(CommandList_struct *c)
{
- if (WARN_ON(hlist_unhashed(&c->list)))
+ /*
+ * After kexec/dump some commands might still
+ * be in flight, which the firmware will try
+ * to complete. Resetting the firmware doesn't work
+ * with old fw revisions, so we have to mark
+ * them off as 'stale' to prevent the driver from
+ * falling over.
+ */
+ if (unlikely(hlist_unhashed(&c->list))) {
+ c->cmd_type = CMD_MSG_STALE;
return;

hlist_del_init(&c->list);
@@ -4246,7 +4255,8 @@ static void fail_all_cmds(unsigned long ctlr)
while (!hlist_empty(&h->cmpQ)) {
c = hlist_entry(h->cmpQ.first, CommandList_struct, list);
removeQ(c);
- c->err_info->CommandStatus = CMD_HARDWARE_ERR;
+ if (c->cmd_type != CMD_MSG_STALE)
+ c->err_info->CommandStatus = CMD_HARDWARE_ERR;
if (c->cmd_type == CMD_RWREQ) {
complete_command(h, c, 0);
} else if (c->cmd_type == CMD_IOCTL_PEND)
diff --git a/drivers/block/cciss_cmd.h b/drivers/block/cciss_cmd.h
index cd665b0..dbaed1e 100644
--- a/drivers/block/cciss_cmd.h
+++ b/drivers/block/cciss_cmd.h
@@ -274,6 +274,7 @@ typedef struct _ErrorInfo_struct {
#define CMD_SCSI 0x03
#define CMD_MSG_DONE 0x04
#define CMD_MSG_TIMEOUT 0x05
+#define CMD_MSG_STALE 0xff

/* This structure needs to be divisible by 8 for new
* indexing method.
--
1.5.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jens.axboe at oracle

Jul 2, 2009, 1:28 AM

Post #2 of 14 (396 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

On Thu, Jul 02 2009, Hannes Reinecke wrote:
>
> When doing an unexpected shutdown like kexec the cciss
> firmware might still have some commands in flight, which
> it is trying to complete.
> The driver is doing it's best on resetting the HBA,
> but sadly there's a firmware issue causing the firmware
> _not_ to abort or drop old commands.
> So the firmware will send us commands which we haven't
> accounted for, causing the driver to panic.
>
> With this patch we're just ignoring these commands as
> there is nothing we could be doing with them anyway.
>
> Signed-off-by: Hannes Reinecke <hare [at] suse>
> ---
> drivers/block/cciss.c | 14 ++++++++++++--
> drivers/block/cciss_cmd.h | 1 +
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> index c7a527c..8dd4c0d 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -226,7 +226,16 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)
>
> static inline void removeQ(CommandList_struct *c)
> {
> - if (WARN_ON(hlist_unhashed(&c->list)))
> + /*
> + * After kexec/dump some commands might still
> + * be in flight, which the firmware will try
> + * to complete. Resetting the firmware doesn't work
> + * with old fw revisions, so we have to mark
> + * them off as 'stale' to prevent the driver from
> + * falling over.
> + */
> + if (unlikely(hlist_unhashed(&c->list))) {
> + c->cmd_type = CMD_MSG_STALE;
> return;
>
> hlist_del_init(&c->list);

Ehm, that looks rather dangerous. What's the level of testing this patch
received?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hare at suse

Jul 2, 2009, 1:44 AM

Post #3 of 14 (394 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

Jens Axboe wrote:
> On Thu, Jul 02 2009, Hannes Reinecke wrote:
>> When doing an unexpected shutdown like kexec the cciss
>> firmware might still have some commands in flight, which
>> it is trying to complete.
>> The driver is doing it's best on resetting the HBA,
>> but sadly there's a firmware issue causing the firmware
>> _not_ to abort or drop old commands.
>> So the firmware will send us commands which we haven't
>> accounted for, causing the driver to panic.
>>
>> With this patch we're just ignoring these commands as
>> there is nothing we could be doing with them anyway.
>>
>> Signed-off-by: Hannes Reinecke <hare [at] suse>
>> ---
>> drivers/block/cciss.c | 14 ++++++++++++--
>> drivers/block/cciss_cmd.h | 1 +
>> 2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
>> index c7a527c..8dd4c0d 100644
>> --- a/drivers/block/cciss.c
>> +++ b/drivers/block/cciss.c
>> @@ -226,7 +226,16 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)
>>
>> static inline void removeQ(CommandList_struct *c)
>> {
>> - if (WARN_ON(hlist_unhashed(&c->list)))
>> + /*
>> + * After kexec/dump some commands might still
>> + * be in flight, which the firmware will try
>> + * to complete. Resetting the firmware doesn't work
>> + * with old fw revisions, so we have to mark
>> + * them off as 'stale' to prevent the driver from
>> + * falling over.
>> + */
>> + if (unlikely(hlist_unhashed(&c->list))) {
>> + c->cmd_type = CMD_MSG_STALE;
>> return;
>>
>> hlist_del_init(&c->list);
>
> Ehm, that looks rather dangerous. What's the level of testing this patch
> received?
>
Where is the danger here?

With the original code we would be issuing a warning
and return.
But then we hit this codepath:

while (!hlist_empty(&h->cmpQ)) {
c = hlist_entry(h->cmpQ.first, CommandList_struct, list);
removeQ(c);
c->err_info->CommandStatus = CMD_HARDWARE_ERR;

and the driver goes boom as c->err_info is not initialized.

This frequently happens if you're trying to do a kdump
while the system is doing I/O.
If you object to the removed WARN() I can easily put this
in, but without the fix there is a good chance that
kdump fails on cciss machines.

And note we can't do anything with the stale commands anyway,
as the context having sent the commands originally is long gone.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare [at] suse +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jens.axboe at oracle

Jul 2, 2009, 2:18 AM

Post #4 of 14 (395 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

On Thu, Jul 02 2009, Hannes Reinecke wrote:
> Jens Axboe wrote:
> > On Thu, Jul 02 2009, Hannes Reinecke wrote:
> >> When doing an unexpected shutdown like kexec the cciss
> >> firmware might still have some commands in flight, which
> >> it is trying to complete.
> >> The driver is doing it's best on resetting the HBA,
> >> but sadly there's a firmware issue causing the firmware
> >> _not_ to abort or drop old commands.
> >> So the firmware will send us commands which we haven't
> >> accounted for, causing the driver to panic.
> >>
> >> With this patch we're just ignoring these commands as
> >> there is nothing we could be doing with them anyway.
> >>
> >> Signed-off-by: Hannes Reinecke <hare [at] suse>
> >> ---
> >> drivers/block/cciss.c | 14 ++++++++++++--
> >> drivers/block/cciss_cmd.h | 1 +
> >> 2 files changed, 13 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> >> index c7a527c..8dd4c0d 100644
> >> --- a/drivers/block/cciss.c
> >> +++ b/drivers/block/cciss.c
> >> @@ -226,7 +226,16 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)
> >>
> >> static inline void removeQ(CommandList_struct *c)
> >> {
> >> - if (WARN_ON(hlist_unhashed(&c->list)))
> >> + /*
> >> + * After kexec/dump some commands might still
> >> + * be in flight, which the firmware will try
> >> + * to complete. Resetting the firmware doesn't work
> >> + * with old fw revisions, so we have to mark
> >> + * them off as 'stale' to prevent the driver from
> >> + * falling over.
> >> + */
> >> + if (unlikely(hlist_unhashed(&c->list))) {
> >> + c->cmd_type = CMD_MSG_STALE;
> >> return;
> >>
> >> hlist_del_init(&c->list);
> >
> > Ehm, that looks rather dangerous. What's the level of testing this patch
> > received?
> >
> Where is the danger here?

The danger is that the patch doesn't even compile :-)
At least it had the { at the end of the if, otherwise it would have been
insta-hang.


>
> With the original code we would be issuing a warning
> and return.
> But then we hit this codepath:
>
> while (!hlist_empty(&h->cmpQ)) {
> c = hlist_entry(h->cmpQ.first, CommandList_struct, list);
> removeQ(c);
> c->err_info->CommandStatus = CMD_HARDWARE_ERR;
>
> and the driver goes boom as c->err_info is not initialized.
>
> This frequently happens if you're trying to do a kdump
> while the system is doing I/O.
> If you object to the removed WARN() I can easily put this
> in, but without the fix there is a good chance that
> kdump fails on cciss machines.
>
> And note we can't do anything with the stale commands anyway,
> as the context having sent the commands originally is long gone.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare [at] suse +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hare at suse

Jul 2, 2009, 2:36 AM

Post #5 of 14 (385 views)
Permalink
[PATCH] cciss: Ignore stale commands after reboot [In reply to]

When doing an unexpected shutdown like kexec the cciss
firmware might still have some commands in flight, which
it is trying to complete.
The driver is doing it's best on resetting the HBA,
but sadly there's a firmware issue causing the firmware
_not_ to abort or drop old commands.
So the firmware will send us commands which we haven't
accounted for, causing the driver to panic.

With this patch we're just ignoring these commands as
there is nothing we could be doing with them anyway.

Signed-off-by: Hannes Reinecke <hare [at] suse>
---
drivers/block/cciss.c | 15 +++++++++++++--
drivers/block/cciss_cmd.h | 1 +
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index c7a527c..65a0655 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -226,8 +226,18 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)

static inline void removeQ(CommandList_struct *c)
{
- if (WARN_ON(hlist_unhashed(&c->list)))
+ /*
+ * After kexec/dump some commands might still
+ * be in flight, which the firmware will try
+ * to complete. Resetting the firmware doesn't work
+ * with old fw revisions, so we have to mark
+ * them off as 'stale' to prevent the driver from
+ * falling over.
+ */
+ if (WARN_ON(hlist_unhashed(&c->list))) {
+ c->cmd_type = CMD_MSG_STALE;
return;
+ }

hlist_del_init(&c->list);
}
@@ -4246,7 +4256,8 @@ static void fail_all_cmds(unsigned long ctlr)
while (!hlist_empty(&h->cmpQ)) {
c = hlist_entry(h->cmpQ.first, CommandList_struct, list);
removeQ(c);
- c->err_info->CommandStatus = CMD_HARDWARE_ERR;
+ if (c->cmd_type != CMD_MSG_STALE)
+ c->err_info->CommandStatus = CMD_HARDWARE_ERR;
if (c->cmd_type == CMD_RWREQ) {
complete_command(h, c, 0);
} else if (c->cmd_type == CMD_IOCTL_PEND)
diff --git a/drivers/block/cciss_cmd.h b/drivers/block/cciss_cmd.h
index cd665b0..dbaed1e 100644
--- a/drivers/block/cciss_cmd.h
+++ b/drivers/block/cciss_cmd.h
@@ -274,6 +274,7 @@ typedef struct _ErrorInfo_struct {
#define CMD_SCSI 0x03
#define CMD_MSG_DONE 0x04
#define CMD_MSG_TIMEOUT 0x05
+#define CMD_MSG_STALE 0xff

/* This structure needs to be divisible by 8 for new
* indexing method.
--
1.5.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hare at suse

Jul 2, 2009, 2:36 AM

Post #6 of 14 (385 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

Jens Axboe wrote:
> On Thu, Jul 02 2009, Hannes Reinecke wrote:
>> Jens Axboe wrote:
>>> On Thu, Jul 02 2009, Hannes Reinecke wrote:
>>>> When doing an unexpected shutdown like kexec the cciss
>>>> firmware might still have some commands in flight, which
>>>> it is trying to complete.
>>>> The driver is doing it's best on resetting the HBA,
>>>> but sadly there's a firmware issue causing the firmware
>>>> _not_ to abort or drop old commands.
>>>> So the firmware will send us commands which we haven't
>>>> accounted for, causing the driver to panic.
>>>>
>>>> With this patch we're just ignoring these commands as
>>>> there is nothing we could be doing with them anyway.
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare [at] suse>
>>>> ---
>>>> drivers/block/cciss.c | 14 ++++++++++++--
>>>> drivers/block/cciss_cmd.h | 1 +
>>>> 2 files changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
>>>> index c7a527c..8dd4c0d 100644
>>>> --- a/drivers/block/cciss.c
>>>> +++ b/drivers/block/cciss.c
>>>> @@ -226,7 +226,16 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)
>>>>
>>>> static inline void removeQ(CommandList_struct *c)
>>>> {
>>>> - if (WARN_ON(hlist_unhashed(&c->list)))
>>>> + /*
>>>> + * After kexec/dump some commands might still
>>>> + * be in flight, which the firmware will try
>>>> + * to complete. Resetting the firmware doesn't work
>>>> + * with old fw revisions, so we have to mark
>>>> + * them off as 'stale' to prevent the driver from
>>>> + * falling over.
>>>> + */
>>>> + if (unlikely(hlist_unhashed(&c->list))) {
>>>> + c->cmd_type = CMD_MSG_STALE;
>>>> return;
>>>>
>>>> hlist_del_init(&c->list);
>>> Ehm, that looks rather dangerous. What's the level of testing this patch
>>> received?
>>>
>> Where is the danger here?
>
> The danger is that the patch doesn't even compile :-)
> At least it had the { at the end of the if, otherwise it would have been
> insta-hang.
>
Bah. Should've said so.
That's what you get when you cut-n-paste from a different version.

OK, resending.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare [at] suse +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jens.axboe at oracle

Jul 2, 2009, 3:26 AM

Post #7 of 14 (384 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

On Thu, Jul 02 2009, Hannes Reinecke wrote:
> Jens Axboe wrote:
> > On Thu, Jul 02 2009, Hannes Reinecke wrote:
> >> Jens Axboe wrote:
> >>> On Thu, Jul 02 2009, Hannes Reinecke wrote:
> >>>> When doing an unexpected shutdown like kexec the cciss
> >>>> firmware might still have some commands in flight, which
> >>>> it is trying to complete.
> >>>> The driver is doing it's best on resetting the HBA,
> >>>> but sadly there's a firmware issue causing the firmware
> >>>> _not_ to abort or drop old commands.
> >>>> So the firmware will send us commands which we haven't
> >>>> accounted for, causing the driver to panic.
> >>>>
> >>>> With this patch we're just ignoring these commands as
> >>>> there is nothing we could be doing with them anyway.
> >>>>
> >>>> Signed-off-by: Hannes Reinecke <hare [at] suse>
> >>>> ---
> >>>> drivers/block/cciss.c | 14 ++++++++++++--
> >>>> drivers/block/cciss_cmd.h | 1 +
> >>>> 2 files changed, 13 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> >>>> index c7a527c..8dd4c0d 100644
> >>>> --- a/drivers/block/cciss.c
> >>>> +++ b/drivers/block/cciss.c
> >>>> @@ -226,7 +226,16 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)
> >>>>
> >>>> static inline void removeQ(CommandList_struct *c)
> >>>> {
> >>>> - if (WARN_ON(hlist_unhashed(&c->list)))
> >>>> + /*
> >>>> + * After kexec/dump some commands might still
> >>>> + * be in flight, which the firmware will try
> >>>> + * to complete. Resetting the firmware doesn't work
> >>>> + * with old fw revisions, so we have to mark
> >>>> + * them off as 'stale' to prevent the driver from
> >>>> + * falling over.
> >>>> + */
> >>>> + if (unlikely(hlist_unhashed(&c->list))) {
> >>>> + c->cmd_type = CMD_MSG_STALE;
> >>>> return;
> >>>>
> >>>> hlist_del_init(&c->list);
> >>> Ehm, that looks rather dangerous. What's the level of testing this patch
> >>> received?
> >>>
> >> Where is the danger here?
> >
> > The danger is that the patch doesn't even compile :-)
> > At least it had the { at the end of the if, otherwise it would have been
> > insta-hang.
> >
> Bah. Should've said so.

Sorry, just annoys me when people send out patches for inclusion that
don't even compile. It usually means that some other form of the patch
was tested and that this one hasn't even been run (obviously, since it
doesn't compile).

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hare at suse

Jul 2, 2009, 3:28 AM

Post #8 of 14 (387 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

Jens Axboe wrote:
[ ... ]
>
> Sorry, just annoys me when people send out patches for inclusion that
> don't even compile. It usually means that some other form of the patch
> was tested and that this one hasn't even been run (obviously, since it
> doesn't compile).
>
Accepted.

Apologies.

Cheers,

Hannes

--
Dr. Hannes Reinecke zSeries & Storage
hare [at] suse +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jens.axboe at oracle

Jul 2, 2009, 12:00 PM

Post #9 of 14 (372 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

On Thu, Jul 02 2009, Hannes Reinecke wrote:
>
> When doing an unexpected shutdown like kexec the cciss
> firmware might still have some commands in flight, which
> it is trying to complete.
> The driver is doing it's best on resetting the HBA,
> but sadly there's a firmware issue causing the firmware
> _not_ to abort or drop old commands.
> So the firmware will send us commands which we haven't
> accounted for, causing the driver to panic.
>
> With this patch we're just ignoring these commands as
> there is nothing we could be doing with them anyway.

Looks good to me. Mike, Stephen?

>
> Signed-off-by: Hannes Reinecke <hare [at] suse>
> ---
> drivers/block/cciss.c | 15 +++++++++++++--
> drivers/block/cciss_cmd.h | 1 +
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> index c7a527c..65a0655 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -226,8 +226,18 @@ static inline void addQ(struct hlist_head *list, CommandList_struct *c)
>
> static inline void removeQ(CommandList_struct *c)
> {
> - if (WARN_ON(hlist_unhashed(&c->list)))
> + /*
> + * After kexec/dump some commands might still
> + * be in flight, which the firmware will try
> + * to complete. Resetting the firmware doesn't work
> + * with old fw revisions, so we have to mark
> + * them off as 'stale' to prevent the driver from
> + * falling over.
> + */
> + if (WARN_ON(hlist_unhashed(&c->list))) {
> + c->cmd_type = CMD_MSG_STALE;
> return;
> + }
>
> hlist_del_init(&c->list);
> }
> @@ -4246,7 +4256,8 @@ static void fail_all_cmds(unsigned long ctlr)
> while (!hlist_empty(&h->cmpQ)) {
> c = hlist_entry(h->cmpQ.first, CommandList_struct, list);
> removeQ(c);
> - c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> + if (c->cmd_type != CMD_MSG_STALE)
> + c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> if (c->cmd_type == CMD_RWREQ) {
> complete_command(h, c, 0);
> } else if (c->cmd_type == CMD_IOCTL_PEND)
> diff --git a/drivers/block/cciss_cmd.h b/drivers/block/cciss_cmd.h
> index cd665b0..dbaed1e 100644
> --- a/drivers/block/cciss_cmd.h
> +++ b/drivers/block/cciss_cmd.h
> @@ -274,6 +274,7 @@ typedef struct _ErrorInfo_struct {
> #define CMD_SCSI 0x03
> #define CMD_MSG_DONE 0x04
> #define CMD_MSG_TIMEOUT 0x05
> +#define CMD_MSG_STALE 0xff
>
> /* This structure needs to be divisible by 8 for new
> * indexing method.
> --
> 1.5.3.2
>

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Mike.Miller at hp

Jul 2, 2009, 12:58 PM

Post #10 of 14 (377 views)
Permalink
RE: Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

> -----Original Message-----
> From: Andrew Morton [mailto:akpm [at] linux-foundation]
> Sent: Thursday, July 02, 2009 2:51 PM
> To: Miller, Mike (OS Dev)
> Subject: Fw: Re: [PATCH] cciss: Ignore stale commands after reboot
>
>
> oh, Jens already did it.
>
> Begin forwarded message:
>
> Date: Thu, 2 Jul 2009 21:00:49 +0200
> From: Jens Axboe <jens.axboe [at] oracle>
> To: Hannes Reinecke <hare [at] suse>
> Cc: scameron [at] beardog,
> linux-kernel [at] vger, mikem [at] beardog
> Subject: Re: [PATCH] cciss: Ignore stale commands after reboot
>
>
> On Thu, Jul 02 2009, Hannes Reinecke wrote:
> >
> > When doing an unexpected shutdown like kexec the cciss
> firmware might
> > still have some commands in flight, which it is trying to complete.
> > The driver is doing it's best on resetting the HBA, but
> sadly there's
> > a firmware issue causing the firmware _not_ to abort or drop old
> > commands.
> > So the firmware will send us commands which we haven't
> accounted for,
> > causing the driver to panic.
> >
> > With this patch we're just ignoring these commands as there
> is nothing
> > we could be doing with them anyway.
>
> Looks good to me. Mike, Stephen?

Sorry I haven't seen this before. The beardog addresses are no longer valid. We moved into a dungeon and into a new domain. The good folks in IT have yet to assign another IP address/domain name or an MX record for the mail servers. I hope that by next week that will be corrected. Until then all Steve and I have to use is some form of OutHouse mail client.

Acked-by: Mike Miller <mike.miller [at] hp>

>
> >
> > Signed-off-by: Hannes Reinecke <hare [at] suse>
> > ---
> > drivers/block/cciss.c | 15 +++++++++++++--
> > drivers/block/cciss_cmd.h | 1 +
> > 2 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index
> > c7a527c..65a0655 100644
> > --- a/drivers/block/cciss.c
> > +++ b/drivers/block/cciss.c
> > @@ -226,8 +226,18 @@ static inline void addQ(struct
> hlist_head *list,
> > CommandList_struct *c)
> >
> > static inline void removeQ(CommandList_struct *c) {
> > - if (WARN_ON(hlist_unhashed(&c->list)))
> > + /*
> > + * After kexec/dump some commands might still
> > + * be in flight, which the firmware will try
> > + * to complete. Resetting the firmware doesn't work
> > + * with old fw revisions, so we have to mark
> > + * them off as 'stale' to prevent the driver from
> > + * falling over.
> > + */
> > + if (WARN_ON(hlist_unhashed(&c->list))) {
> > + c->cmd_type = CMD_MSG_STALE;
> > return;
> > + }
> >
> > hlist_del_init(&c->list);
> > }
> > @@ -4246,7 +4256,8 @@ static void fail_all_cmds(unsigned long ctlr)
> > while (!hlist_empty(&h->cmpQ)) {
> > c = hlist_entry(h->cmpQ.first,
> CommandList_struct, list);
> > removeQ(c);
> > - c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> > + if (c->cmd_type != CMD_MSG_STALE)
> > + c->err_info->CommandStatus = CMD_HARDWARE_ERR;
> > if (c->cmd_type == CMD_RWREQ) {
> > complete_command(h, c, 0);
> > } else if (c->cmd_type == CMD_IOCTL_PEND) diff --git
> > a/drivers/block/cciss_cmd.h b/drivers/block/cciss_cmd.h index
> > cd665b0..dbaed1e 100644
> > --- a/drivers/block/cciss_cmd.h
> > +++ b/drivers/block/cciss_cmd.h
> > @@ -274,6 +274,7 @@ typedef struct _ErrorInfo_struct {
> > #define CMD_SCSI 0x03
> > #define CMD_MSG_DONE 0x04
> > #define CMD_MSG_TIMEOUT 0x05
> > +#define CMD_MSG_STALE 0xff
> >
> > /* This structure needs to be divisible by 8 for new
> > * indexing method.
> > --
> > 1.5.3.2
> >
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> --
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jens.axboe at oracle

Jul 2, 2009, 1:01 PM

Post #11 of 14 (371 views)
Permalink
Re: Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

On Thu, Jul 02 2009, Miller, Mike (OS Dev) wrote:
>
>
> > -----Original Message-----
> > From: Andrew Morton [mailto:akpm [at] linux-foundation]
> > Sent: Thursday, July 02, 2009 2:51 PM
> > To: Miller, Mike (OS Dev)
> > Subject: Fw: Re: [PATCH] cciss: Ignore stale commands after reboot
> >
> >
> > oh, Jens already did it.
> >
> > Begin forwarded message:
> >
> > Date: Thu, 2 Jul 2009 21:00:49 +0200
> > From: Jens Axboe <jens.axboe [at] oracle>
> > To: Hannes Reinecke <hare [at] suse>
> > Cc: scameron [at] beardog,
> > linux-kernel [at] vger, mikem [at] beardog
> > Subject: Re: [PATCH] cciss: Ignore stale commands after reboot
> >
> >
> > On Thu, Jul 02 2009, Hannes Reinecke wrote:
> > >
> > > When doing an unexpected shutdown like kexec the cciss
> > firmware might
> > > still have some commands in flight, which it is trying to complete.
> > > The driver is doing it's best on resetting the HBA, but
> > sadly there's
> > > a firmware issue causing the firmware _not_ to abort or drop old
> > > commands.
> > > So the firmware will send us commands which we haven't
> > accounted for,
> > > causing the driver to panic.
> > >
> > > With this patch we're just ignoring these commands as there
> > is nothing
> > > we could be doing with them anyway.
> >
> > Looks good to me. Mike, Stephen?
>
> Sorry I haven't seen this before. The beardog addresses are no longer
> valid. We moved into a dungeon and into a new domain. The good folks
> in IT have yet to assign another IP address/domain name or an MX
> record for the mail servers. I hope that by next week that will be
> corrected. Until then all Steve and I have to use is some form of
> OutHouse mail client.
>
> Acked-by: Mike Miller <mike.miller [at] hp>

OK, I'll add the patch with your ack.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Mike.Miller at hp

Jul 2, 2009, 1:06 PM

Post #12 of 14 (378 views)
Permalink
RE: Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

Jens wrote:

> > Acked-by: Mike Miller <mike.miller [at] hp>
>
> OK, I'll add the patch with your ack.
>
> --
> Jens Axboe

Thanks, Jens.

>
> --
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


adbrunelle at gmail

Jul 6, 2009, 1:33 PM

Post #13 of 14 (349 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

Hannes Reinecke wrote:
> When doing an unexpected shutdown like kexec the cciss
> firmware might still have some commands in flight, which
> it is trying to complete.
> The driver is doing it's best on resetting the HBA,
> but sadly there's a firmware issue causing the firmware
> _not_ to abort or drop old commands.
> So the firmware will send us commands which we haven't
> accounted for, causing the driver to panic.
>
> With this patch we're just ignoring these commands as
> there is nothing we could be doing with them anyway.
>
> Signed-off-by: Hannes Reinecke <hare [at] suse>

Pardon my ignorance here, but don't you have a bigger problem: if the
reset is not dropping or aborting old commands, doesn't this also mean
that these old commands can still be _executing_? In which case any
(old) reads being executed could be scribbling over memory? (Memory that
may be being used for other purposes?)

Alan D. Brunelle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hare at suse

Jul 7, 2009, 12:34 AM

Post #14 of 14 (358 views)
Permalink
Re: [PATCH] cciss: Ignore stale commands after reboot [In reply to]

Hi Alan,

Alan D. Brunelle wrote:
> Hannes Reinecke wrote:
>> When doing an unexpected shutdown like kexec the cciss
>> firmware might still have some commands in flight, which
>> it is trying to complete.
>> The driver is doing it's best on resetting the HBA,
>> but sadly there's a firmware issue causing the firmware
>> _not_ to abort or drop old commands.
>> So the firmware will send us commands which we haven't
>> accounted for, causing the driver to panic.
>>
>> With this patch we're just ignoring these commands as
>> there is nothing we could be doing with them anyway.
>>
>> Signed-off-by: Hannes Reinecke <hare [at] suse>
>
> Pardon my ignorance here, but don't you have a bigger problem: if the
> reset is not dropping or aborting old commands, doesn't this also mean
> that these old commands can still be _executing_? In which case any
> (old) reads being executed could be scribbling over memory? (Memory that
> may be being used for other purposes?)
>
Yes and no.

This scenario is being observed whilst doing a kexec/kdump reboot.
IE a new kernel is started directly from the context of an
already running kernel, so there is a fair chance that IO is
still in flight.
In flight here means the kernel/driver has send the commands to the
firmware but not yet received a reply/completion to them.

So the kdump kernel boots and initializes the driver.
The driver itself tries to initializes the firmware, but due to the
abovementioned bug this initialization does _not_ clear out old
commands, so when the driver is up and running is receives
command completions.
But these completions are not associated with any commands the
driver has been sent, so we can as well drop them to the floor.
Which is what this patch is all about.

So yes, there is some sort of overwrite in the sense the 'old'
IO is being committed to disk by the time the new kernel starts.
But no, it doesn't really matter to us as we're starting out
with any operations only _after_ we have received these stale
IO.

HTH.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare [at] suse +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.