Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


imammedo at redhat

May 9, 2012, 2:20 AM

Post #1 of 10 (249 views)
Permalink
[PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up

When bringing up cpuX1, it could stall in start_secondary
before setting cpu_callin_mask for more than 5 sec. That forces
do_boot_cpu() to give up on waiting and go to error return path
printing messages:
pr_err("CPU%d: Stuck ??\n", cpuX1);
or
pr_err("CPU%d: Not responding.\n", cpuX1);
and native_cpu_up returns early with -EIO. However AP may continue
its boot process till it reaches check_tsc_sync_target(), where
it will wait for boot cpu to run cpu_up...=>check_tsc_sync_source.
That will never happen since cpu_up have returned with error before.

Now we need to note that cpuX1 is marked as active in smp_callin
before it stuck in check_tsc_sync_target. And when another cpuX2
is being onlined, start_secondary on it will call
smp_callin
-> smp_store_cpu_info
-> identify_secondary_cpu
-> mtrr_ap_init
-> set_mtrr_from_inactive_cpu
-> stop_machine_from_inactive_cpu
where it's going to schedule stop_machine work on all ACTIVE cpus
smdata.num_threads = num_active_cpus() + 1;
and wait till they all complete it before continuing. As was noted
before cpuX1 was marked as active but can't execute any work since
it's not completed initialization and stuck in check_tsc_sync_target.
As result system soft lockups in stop_machine_cpu_stop.

backtrace from reproducer:

PID: 3324 TASK: ffff88007c00ae20 CPU: other cpus COMMAND: "migration/1"
[exception RIP: stop_machine_cpu_stop+131]
...
#0 [ffff88007b4d7de8] cpu_stopper_thread at ffffffff810c66bd
#1 [ffff88007b4d7ee8] kthread at ffffffff8107871e
#2 [ffff88007b4d7f48] kernel_thread_helper at ffffffff8154af24

PID: 0 TASK: ffff88007c029710 CPU: 2 COMMAND: "swapper/2"
[exception RIP: check_tsc_sync_target+33]
...
#0 [ffff88007c025f30] start_secondary at ffffffff81539876

PID: 0 TASK: ffff88007c041710 CPU: 3 COMMAND: "swapper/3"
[exception RIP: stop_machine_cpu_stop+131]
...
#0 [ffff88007c04be50] stop_machine_from_inactive_cpu at ffffffff810c6b2f
#1 [ffff88007c04bee0] mtrr_ap_init at ffffffff8102e963
#2 [ffff88007c04bf10] identify_secondary_cpu at ffffffff81536799
#3 [ffff88007c04bf20] smp_store_cpu_info at ffffffff815396d5
#4 [ffff88007c04bf30] start_secondary at ffffffff81539800

Could be fixed by not marking being onlined cpu as active too early.

Signed-off-by: Igor Mammedov <imammedo [at] redhat>
---
arch/x86/kernel/smpboot.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6e1e406..ae19d90 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -232,8 +232,6 @@ static void __cpuinit smp_callin(void)
set_cpu_sibling_map(raw_smp_processor_id());
wmb();

- notify_cpu_starting(cpuid);
-
/*
* Allow the master to continue.
*/
@@ -268,6 +266,8 @@ notrace static void __cpuinit start_secondary(void *unused)
*/
check_tsc_sync_target();

+ notify_cpu_starting(smp_processor_id());
+
/*
* We need to hold call_lock, so there is no inconsistency
* between the time smp_call_function() determines number of
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


shuahkhan at gmail

May 9, 2012, 8:04 AM

Post #2 of 10 (245 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
> When bringing up cpuX1, it could stall in start_secondary
> before setting cpu_callin_mask for more than 5 sec. That forces
> do_boot_cpu() to give up on waiting and go to error return path
> printing messages:
> pr_err("CPU%d: Stuck ??\n", cpuX1);

I am seeing this with the linux-next May 7th build on my laptop HP
EliteBook 8440p during boot. Could this problem be not specific to virt
envs.? Anybody else seeing it?

-- Shuah


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


imammedo at redhat

May 9, 2012, 8:22 AM

Post #3 of 10 (244 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

On 05/09/2012 05:04 PM, Shuah Khan wrote:
> On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
>> When bringing up cpuX1, it could stall in start_secondary
>> before setting cpu_callin_mask for more than 5 sec. That forces
>> do_boot_cpu() to give up on waiting and go to error return path
>> printing messages:
>> pr_err("CPU%d: Stuck ??\n", cpuX1);
>
> I am seeing this with the linux-next May 7th build on my laptop HP
> EliteBook 8440p during boot. Could this problem be not specific to virt
> envs.? Anybody else seeing it?

I could only guess that on bare metal SMI or other firmware issue may interfere with AP boot.
Could you test if this patch-set fixes issue for you?
And do you see the same problem during suspend/resume (assuming that it's booted without problem)?

--
-----
Igor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


shuahkhan at gmail

May 9, 2012, 8:34 AM

Post #4 of 10 (245 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

On Wed, 2012-05-09 at 17:22 +0200, Igor Mammedov wrote:
> On 05/09/2012 05:04 PM, Shuah Khan wrote:
> > On Wed, 2012-05-09 at 12:24 +0200, Igor Mammedov wrote:
> >> When bringing up cpuX1, it could stall in start_secondary
> >> before setting cpu_callin_mask for more than 5 sec. That forces
> >> do_boot_cpu() to give up on waiting and go to error return path
> >> printing messages:
> >> pr_err("CPU%d: Stuck ??\n", cpuX1);
> >
> > I am seeing this with the linux-next May 7th build on my laptop HP
> > EliteBook 8440p during boot. Could this problem be not specific to virt
> > envs.? Anybody else seeing it?
>
> I could only guess that on bare metal SMI or other firmware issue may interfere with AP boot.
> Could you test if this patch-set fixes issue for you?
> And do you see the same problem during suspend/resume (assuming that it's booted without problem)?
>

I had to abandon the boot and go back to distro installed kernel - yes I
will test with your patch set and see if the problem goes away. Might
not happen until close to end of day today, but will report back.

-- Shuah


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


shuahkhan at gmail

May 10, 2012, 8:26 AM

Post #5 of 10 (242 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

> > I could only guess that on bare metal SMI or other firmware issue may interfere with AP boot.
> > Could you test if this patch-set fixes issue for you?
> > And do you see the same problem during suspend/resume (assuming that it's booted without problem)?
> >
>
> I had to abandon the boot and go back to distro installed kernel - yes I
> will test with your patch set and see if the problem goes away. Might
> not happen until close to end of day today, but will report back.

These patches failed to apply on linux-next May 9th. Couldn't get the
testing done as planned.

-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


imammedo at redhat

May 10, 2012, 9:29 AM

Post #6 of 10 (243 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

just cloned git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
last commit 0bed306

All 5 patches applied cleanly, maybe we are talking about different linux-next trees?


----- Original Message -----
> From: "Shuah Khan" <shuahkhan [at] gmail>
> To: "Igor Mammedov" <imammedo [at] redhat>
> Cc: shuahkhan [at] gmail, linux-kernel [at] vger, rob [at] landley, tglx [at] linutronix, mingo [at] redhat,
> hpa [at] zytor, x86 [at] kernel, luto [at] mit, "suresh b siddha" <suresh.b.siddha [at] intel>, avi [at] redhat, "a p
> zijlstra" <a.p.zijlstra [at] chello>, johnstul [at] us, arjan [at] linux, linux-doc [at] vger
> Sent: Thursday, May 10, 2012 5:26:24 PM
> Subject: Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up
>
>
> > > I could only guess that on bare metal SMI or other firmware issue
> > > may interfere with AP boot.
> > > Could you test if this patch-set fixes issue for you?
> > > And do you see the same problem during suspend/resume (assuming
> > > that it's booted without problem)?
> > >
> >
> > I had to abandon the boot and go back to distro installed kernel -
> > yes I
> > will test with your patch set and see if the problem goes away.
> > Might
> > not happen until close to end of day today, but will report back.
>
> These patches failed to apply on linux-next May 9th. Couldn't get the
> testing done as planned.
>
> -- Shuah
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


shuahkhan at gmail

May 10, 2012, 9:38 AM

Post #7 of 10 (244 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

On Thu, 2012-05-10 at 12:29 -0400, Igor Mammedov wrote:
> just cloned git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> last commit 0bed306
>
> All 5 patches applied cleanly, maybe we are talking about different linux-next trees?

The same git from May 8th. The commit id I have is
407655a15be465ca284a68843e66c6fe7decf4bc. Let me try again.

-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tglx at linutronix

May 11, 2012, 4:45 AM

Post #8 of 10 (245 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

On Wed, 9 May 2012, Igor Mammedov wrote:

> When bringing up cpuX1, it could stall in start_secondary
> before setting cpu_callin_mask for more than 5 sec. That forces
> do_boot_cpu() to give up on waiting and go to error return path
> printing messages:
> pr_err("CPU%d: Stuck ??\n", cpuX1);
> or
> pr_err("CPU%d: Not responding.\n", cpuX1);
> and native_cpu_up returns early with -EIO. However AP may continue
> its boot process till it reaches check_tsc_sync_target(), where
> it will wait for boot cpu to run cpu_up...=>check_tsc_sync_source.
> That will never happen since cpu_up have returned with error before.
>
> Now we need to note that cpuX1 is marked as active in smp_callin
> before it stuck in check_tsc_sync_target. And when another cpuX2
> is being onlined, start_secondary on it will call
> smp_callin
> -> smp_store_cpu_info
> -> identify_secondary_cpu
> -> mtrr_ap_init
> -> set_mtrr_from_inactive_cpu
> -> stop_machine_from_inactive_cpu
> where it's going to schedule stop_machine work on all ACTIVE cpus
> smdata.num_threads = num_active_cpus() + 1;
> and wait till they all complete it before continuing. As was noted
> before cpuX1 was marked as active but can't execute any work since
> it's not completed initialization and stuck in check_tsc_sync_target.
> As result system soft lockups in stop_machine_cpu_stop.
>
> backtrace from reproducer:
>
> PID: 3324 TASK: ffff88007c00ae20 CPU: other cpus COMMAND: "migration/1"
> [exception RIP: stop_machine_cpu_stop+131]
> ...
> #0 [ffff88007b4d7de8] cpu_stopper_thread at ffffffff810c66bd
> #1 [ffff88007b4d7ee8] kthread at ffffffff8107871e
> #2 [ffff88007b4d7f48] kernel_thread_helper at ffffffff8154af24
>
> PID: 0 TASK: ffff88007c029710 CPU: 2 COMMAND: "swapper/2"
> [exception RIP: check_tsc_sync_target+33]
> ...
> #0 [ffff88007c025f30] start_secondary at ffffffff81539876
>
> PID: 0 TASK: ffff88007c041710 CPU: 3 COMMAND: "swapper/3"
> [exception RIP: stop_machine_cpu_stop+131]
> ...
> #0 [ffff88007c04be50] stop_machine_from_inactive_cpu at ffffffff810c6b2f
> #1 [ffff88007c04bee0] mtrr_ap_init at ffffffff8102e963
> #2 [ffff88007c04bf10] identify_secondary_cpu at ffffffff81536799
> #3 [ffff88007c04bf20] smp_store_cpu_info at ffffffff815396d5
> #4 [ffff88007c04bf30] start_secondary at ffffffff81539800
>
> Could be fixed by not marking being onlined cpu as active too early.

This explanation is completely useless. What's fixed by what. And is
it fixed or could it be fixed?

This also want's an explanation why moving the cpu_starting notifier
does not hurt any assumptions of the code which has notifiers
registered for CPU_STARTING. In fact your change can result in
CPU_ONLINE notifier being called _BEFORE_ CPU_STARTING. Do you really
think that's correct?

Aside of that your whole patch series tackles the wrong aspect.

Why the heck do you need extra magic in check_tsc_sync_target() ?

If the booting CPU fails to set the callin map within 5 seconds then
it should not even reach check_tsc_sync_target() at all.

And just for the record, the new CPU can run into the very same
timeout problem, when the boot CPU fails to set the callout mask.

This whole stuff is a complete trainwreck already and I don't want to
see anything like your "fixing the symptoms" hackery near it, really.

This whole stuff needs a proper rewrite and not some more braindamaged
bandaids. And if we apply bandaids for the time being, then certainly
not bandaids like the mess you created.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


imammedo at redhat

May 11, 2012, 8:16 AM

Post #9 of 10 (246 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

On 05/11/2012 01:45 PM, Thomas Gleixner wrote:
> On Wed, 9 May 2012, Igor Mammedov wrote:
>
>> When bringing up cpuX1, it could stall in start_secondary
>> before setting cpu_callin_mask for more than 5 sec. That forces
>> do_boot_cpu() to give up on waiting and go to error return path
>> printing messages:
>> pr_err("CPU%d: Stuck ??\n", cpuX1);
>> or
>> pr_err("CPU%d: Not responding.\n", cpuX1);
>> and native_cpu_up returns early with -EIO. However AP may continue
>> its boot process till it reaches check_tsc_sync_target(), where
>> it will wait for boot cpu to run cpu_up...=>check_tsc_sync_source.
>> That will never happen since cpu_up have returned with error before.
>>
>> Now we need to note that cpuX1 is marked as active in smp_callin
>> before it stuck in check_tsc_sync_target. And when another cpuX2
>> is being onlined, start_secondary on it will call
>> smp_callin
>> -> smp_store_cpu_info
>> -> identify_secondary_cpu
>> -> mtrr_ap_init
>> -> set_mtrr_from_inactive_cpu
>> -> stop_machine_from_inactive_cpu
>> where it's going to schedule stop_machine work on all ACTIVE cpus
>> smdata.num_threads = num_active_cpus() + 1;
>> and wait till they all complete it before continuing. As was noted
>> before cpuX1 was marked as active but can't execute any work since
>> it's not completed initialization and stuck in check_tsc_sync_target.
>> As result system soft lockups in stop_machine_cpu_stop.
>>
>> backtrace from reproducer:
>>
>> PID: 3324 TASK: ffff88007c00ae20 CPU: other cpus COMMAND: "migration/1"
>> [exception RIP: stop_machine_cpu_stop+131]
>> ...
>> #0 [ffff88007b4d7de8] cpu_stopper_thread at ffffffff810c66bd
>> #1 [ffff88007b4d7ee8] kthread at ffffffff8107871e
>> #2 [ffff88007b4d7f48] kernel_thread_helper at ffffffff8154af24
>>
>> PID: 0 TASK: ffff88007c029710 CPU: 2 COMMAND: "swapper/2"
>> [exception RIP: check_tsc_sync_target+33]
>> ...
>> #0 [ffff88007c025f30] start_secondary at ffffffff81539876
>>
>> PID: 0 TASK: ffff88007c041710 CPU: 3 COMMAND: "swapper/3"
>> [exception RIP: stop_machine_cpu_stop+131]
>> ...
>> #0 [ffff88007c04be50] stop_machine_from_inactive_cpu at ffffffff810c6b2f
>> #1 [ffff88007c04bee0] mtrr_ap_init at ffffffff8102e963
>> #2 [ffff88007c04bf10] identify_secondary_cpu at ffffffff81536799
>> #3 [ffff88007c04bf20] smp_store_cpu_info at ffffffff815396d5
>> #4 [ffff88007c04bf30] start_secondary at ffffffff81539800
>>
>> Could be fixed by not marking being onlined cpu as active too early.
>
> This explanation is completely useless. What's fixed by what. And is
> it fixed or could it be fixed?
What's fixed:
above mentioned hang in stop_machine_from_inactive_cpu() because even if
a cpu failed to set cpu_callin_mask in time and boot cpu marked it as
not present + removed from some maps, with this move, a failed cpu won't
set cpu_active_mask before it completes check_tsc_sync_target().
And with patches [2,3,4]/5 it will not set cpu_active_mask at all
so making itself unavailable to the rest of kernel.

> This also want's an explanation why moving the cpu_starting notifier
> does not hurt any assumptions of the code which has notifiers
> registered for CPU_STARTING.
I've checked in kernel users [sched, kvm, pmu] before moving it here.
It looked safe. However I might have missed something.

>In fact your change can result in
> CPU_ONLINE notifier being called _BEFORE_ CPU_STARTING. Do you really
> think that's correct?
That's certainly is not correct, it asks for a barrier after
cpu_starting and before setting cpu_online_mask.

> Aside of that your whole patch series tackles the wrong aspect.
patch series tries to prevent a failed to boot cpu wreck havoc on
running kernel. How wrong is that?
What should be fixed instead?

> Why the heck do you need extra magic in check_tsc_sync_target() ?
Because it's plainly racy. patch 2/5 describes/fixes race condition in
check_tsc_sync_target().

> If the booting CPU fails to set the callin map within 5 seconds then
> it should not even reach check_tsc_sync_target() at all.
Why it shouldn't reach check_tsc_sync_target () at all. There is nothing
that prevents it and guaranties such behavior.
For example: it happens when kernel is running inside guest on overloaded host.
And it seems on baremetal as well: https://lkml.org/lkml/2012/5/9/336

> And just for the record, the new CPU can run into the very same
> timeout problem, when the boot CPU fails to set the callout mask.
Yes, it can.
I've tried to fix only what was reproducible on my test system, so I
haven't touched this.
That might result in panic in smp_callin():
panic("%s: CPU%d started up but did not get a callout!\n"

> This whole stuff is a complete trainwreck already and I don't want to
> see anything like your "fixing the symptoms" hackery near it, really.
Fixing slow to respond cpu might be not option, so we need to gracefully
abort failed cpu_online operation instead of hanging in stop_machine or
crashing in scheduler[https://lkml.org/lkml/2012/5/9/137].

>
> This whole stuff needs a proper rewrite and not some more braindamaged
> bandaids. And if we apply bandaids for the time being, then certainly
> not bandaids like the mess you created.
Rewrite will need to deal with failed to boot in time cpu as well.
So if rewrite is not near completion, then maybe for a time being bandaids
would be needed.
Any ideas/suggestions for "right bandaids" instead of braindamaged ones?

> Thanks,
>
> tglx


--
-----
Thanks,
Igor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tglx at linutronix

May 11, 2012, 2:14 PM

Post #10 of 10 (234 views)
Permalink
Re: [PATCH 1/5] Fix soft-lookup in stop machine on secondary cpu bring up [In reply to]

On Fri, 11 May 2012, Igor Mammedov wrote:
> On 05/11/2012 01:45 PM, Thomas Gleixner wrote:
> > In fact your change can result in
> > CPU_ONLINE notifier being called _BEFORE_ CPU_STARTING. Do you really
> > think that's correct?
>
> That's certainly is not correct, it asks for a barrier after
> cpu_starting and before setting cpu_online_mask.

Ah, a barrier will solve that. Interesting approach.

> > Aside of that your whole patch series tackles the wrong aspect.
>
> patch series tries to prevent a failed to boot cpu wreck havoc on
> running kernel. How wrong is that?

Emphasis on "tries". And as I explained before you are fixing stuff at
the wrong place.

> What should be fixed instead?

If that timeout happens, then prevent that the following code is
reached, perhaps ?

> > Why the heck do you need extra magic in check_tsc_sync_target() ?
> Because it's plainly racy. patch 2/5 describes/fixes race condition in
> check_tsc_sync_target().

Crap. You do not understand at all. If the code which is before
check_tsc_sync_target() is failing then check_tsc_sync_target() should
not be called at all. It's that simple. Putting weird ass checks into
that code is simply the wrong solution.

> > If the booting CPU fails to set the callin map within 5 seconds then
> > it should not even reach check_tsc_sync_target() at all.
>
> Why it shouldn't reach check_tsc_sync_target () at all. There is nothing
> that prevents it and guaranties such behavior.

That's the whole fcking point. The code is missing which prevents that
and instead of hacking that crap into check_tsc_sync_target() we need
to add that what's missing.

> > And just for the record, the new CPU can run into the v()ery same
> > timeout problem, when the boot CPU fails to set the callout mask.
>
> Yes, it can.
> I've tried to fix only what was reproducible on my test system, so I
> haven't touched this.
> That might result in panic in smp_callin():
> panic("%s: CPU%d started up but did not get a callout!\n"

I know that. And it's fucking wrong and I don't care whether you are
only fixing what's reproducible on your test system. If we touch that
code for that purpose then we better touch it so it's correct in all
aspects and not in some "this fixes my esoteric problem" approach.

> > This whole stuff is a complete trainwreck already and I don't want to
> > see anything like your "fixing the symptoms" hackery near it, really.
>
> Fixing slow to respond cpu might be not option, so we need to gracefully
> abort failed cpu_online operation instead of hanging in stop_machine or
> crashing in scheduler[https://lkml.org/lkml/2012/5/9/137].

I'm tired of your symptom links. You are simply not understanding the
scope of the problem and you just try to fix it so your testing
failures go away.

> > This whole stuff needs a proper rewrite and not some more braindamaged
> > bandaids. And if we apply bandaids for the time being, then certainly
> > not bandaids like the mess you created.
>
> Rewrite will need to deal with failed to boot in time cpu as well.

Really? Thanks for the hint, didn't know that.

> So if rewrite is not near completion, then maybe for a time being bandaids
> would be needed.

As I said, I don't object against proper bandaids, but I object
against the hackery you provided.

> Any ideas/suggestions for "right bandaids" instead of braindamaged ones?

Maybe start to think about my answers instead of blindly repeating
your observations of symptoms and praising your symptom cures.

Even in bandaid mode we can fix that behaviour by putting proper
synchronization into the right points, instead of hacking weird
failure handling into code which should never be affected by that.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.