Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH 4/6] hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events

 

 

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


paulus at samba

Nov 7, 2009, 2:03 AM

Post #26 of 29 (26 views)
Permalink
Re: [PATCH 4/6] hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events [In reply to]

Frederic Weisbecker writes:

> On Thu, Nov 05, 2009 at 10:59:44AM +1100, Paul Mackerras wrote:
> > What I haven't managed to understand yet is how you provide reliable
> > breakpoints for debugging purposes. If I'm debugging a program and I
> > have set a breakpoint, I'll be very unhappy if the breakpoint should
> > trigger but doesn't because the perf_event infrastructure has decided
> > it can't schedule that breakpoint in. If the breakpoint isn't going
> > to work then I want to know that at the time that I set it.
>
>
>
> That won't happen because of the set of constraints we have.
> We never overcommit the debug register resources, except in
> the case of non-pinned counter, but that's in their nature :)

Suppose you have 4 breakpoint registers per cpu and there are two
pinned per-cpu breakpoint events, three non-pinned per-cpu breakpoint
events, and one pinned per-task breakpoint event. I believe your
constraints will allow that situation.

What will happen is that the two pinned per-cpu breakpoint events will
use two of the hardware registers, and the three non-pinned per-cpu
breakpoint events will get round-robined onto the other two hardware
registers. The per-task breakpoint will never get to use a hardware
register, because the code in perf_event.c schedules per-cpu events
before it schedules per-task events (see for example
perf_event_task_tick()).

We will have to make the event scheduling in kernel/perf_event.c a bit
more sophisticated before we can guarantee that a pinned breakpoint
event will always get to use a hardware register.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fweisbec at gmail

Nov 7, 2009, 11:52 AM

Post #27 of 29 (22 views)
Permalink
Re: [PATCH 4/6] hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events [In reply to]

On Sat, Nov 07, 2009 at 09:03:00PM +1100, Paul Mackerras wrote:
> Frederic Weisbecker writes:
>
> > On Thu, Nov 05, 2009 at 10:59:44AM +1100, Paul Mackerras wrote:
> > > What I haven't managed to understand yet is how you provide reliable
> > > breakpoints for debugging purposes. If I'm debugging a program and I
> > > have set a breakpoint, I'll be very unhappy if the breakpoint should
> > > trigger but doesn't because the perf_event infrastructure has decided
> > > it can't schedule that breakpoint in. If the breakpoint isn't going
> > > to work then I want to know that at the time that I set it.
> >
> >
> >
> > That won't happen because of the set of constraints we have.
> > We never overcommit the debug register resources, except in
> > the case of non-pinned counter, but that's in their nature :)
>
> Suppose you have 4 breakpoint registers per cpu and there are two
> pinned per-cpu breakpoint events, three non-pinned per-cpu breakpoint
> events, and one pinned per-task breakpoint event. I believe your
> constraints will allow that situation.
>
> What will happen is that the two pinned per-cpu breakpoint events will
> use two of the hardware registers, and the three non-pinned per-cpu
> breakpoint events will get round-robined onto the other two hardware
> registers. The per-task breakpoint will never get to use a hardware
> register, because the code in perf_event.c schedules per-cpu events
> before it schedules per-task events (see for example
> perf_event_task_tick()).



Oh! :-(


> We will have to make the event scheduling in kernel/perf_event.c a bit
> more sophisticated before we can guarantee that a pinned breakpoint
> event will always get to use a hardware register.
>
> Paul.


Ok, so the only solution for now (a part from fixing that into perf) is to
consider the non-pinned events as being pinned in the constraints.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


prasad at linux

Nov 8, 2009, 9:32 AM

Post #28 of 29 (21 views)
Permalink
Re: [PATCH 4/6] hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events [In reply to]

On Thu, Nov 05, 2009 at 10:06:55PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 05, 2009 at 09:04:04PM +0530, K.Prasad wrote:
> > On Tue, Nov 03, 2009 at 08:11:12PM +0100, Frederic Weisbecker wrote:
> > > +/*
> > > + * Kernel breakpoints are not associated with any particular thread.
> > > + */
> > > +extern struct perf_event *
> > > +register_wide_hw_breakpoint_cpu(unsigned long addr,
> > > + int len,
> > > + int type,
> > > + perf_callback_t triggered,
> > > + int cpu,
> >
> > Can't it be cpumask_t instead of int cpu? Given that per-cpu breakpoints
> > will be implemented, it should be very different to implement them for a
> > subset of cpus too.
>
> I can't figure out any usecase where we want to only bind to,
> say, cpu 1 and 3 or any kind of such strange combination.
>
> Either we want a wide breakpoint, or we want to profile
> a single cpu, but I don't imagine we need a middle case.

When we originally had this discussion on LKML, one of the use-cases
cited was http://lkml.org/lkml/2009/7/29/243. I can't see why such
need should be restricted to a given CPU only, rather than a subset of
CPUs (say 'x' is a variable normally read/written-to in the interrupt
path, and if the said interrupt is has a cpu affinity to a subset of
cpus only).

Although in the normal case, this feature could be implemented later, in
case of breakpoints we accept that as input from the user (and hence
part of the well-defined interface), so it is better to design it for
a subset of CPUs from start. The logic isn't very different and given that
there are plenty of helper routines in cpumask.h the implementation is easy too.

>
> > > -static DEFINE_SPINLOCK(hw_breakpoint_lock);
> >
> > Wouldn't you want to hold this lock while checking for system-wide
> > availability of debug registers (during rollbacks) to avoid contenders
> > from checking for the same simultaneously?
>
>
> If we want to lock such path, we probably more likely want a mutex.
> Registering a breakpoint is not a fastpath and also perf does
> some sleepable things while creating a counter.
>
> The check to register constraints, which is part of this path,
> is itself a mutex.
>
> But we'll probably need something NMI safe in the future so
> that it can be used without any problem by kgdb.
>

I suspect that it will be required for cpu-hotplug handler, where
previously load_debug_registers() was called from a softirq context.

>
> > <snipped>
> >
> > > -int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > > +struct perf_event **
> > > +register_wide_hw_breakpoint(unsigned long addr,
> > > + int len,
> > > + int type,
> > > + perf_callback_t triggered,
> > > + bool active)
> > > {
> > > - int rc;
> > > + struct perf_event **cpu_events, **pevent, *bp;
> > > + long err;
> > > + int cpu;
> > > +
> > > + cpu_events = alloc_percpu(typeof(*cpu_events));
> > > + if (!cpu_events)
> > > + return ERR_PTR(-ENOMEM);
> > >
> > > - rc = arch_validate_hwbkpt_settings(bp, NULL);
> > > - if (rc)
> > > - return rc;
> > > + for_each_possible_cpu(cpu) {
> > > + pevent = per_cpu_ptr(cpu_events, cpu);
> > > + bp = register_kernel_hw_breakpoint_cpu(addr, len, type,
> > > + triggered, cpu, active);
> > >
> >
> > I'm assuming that there'd be an implementation for system-wide
> > perf-events (and hence breakpoints) in the forthcoming version(s) of
> > this patchset.
>
>
> If that becomes a necessary feature, then yeah.
>
>

Apart from the several benefits of having system-wide perf-events,
implementing them in the first iteration itself will
help us fully realise the cost of perf-events + hw-breakpoint
integration! When implemented, perf-events will also be ready to
accomodate future users (apart from bp and perf-top) having a
need for system-wide counter.

Thanks,
K.Prasad

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fweisbec at gmail

Nov 12, 2009, 7:42 AM

Post #29 of 29 (4 views)
Permalink
Re: [PATCH 4/6] hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events [In reply to]

On Sun, Nov 08, 2009 at 11:02:05PM +0530, K.Prasad wrote:
> On Thu, Nov 05, 2009 at 10:06:55PM +0100, Frederic Weisbecker wrote:
> > > Can't it be cpumask_t instead of int cpu? Given that per-cpu breakpoints
> > > will be implemented, it should be very different to implement them for a
> > > subset of cpus too.
> >
> > I can't figure out any usecase where we want to only bind to,
> > say, cpu 1 and 3 or any kind of such strange combination.
> >
> > Either we want a wide breakpoint, or we want to profile
> > a single cpu, but I don't imagine we need a middle case.
>
> When we originally had this discussion on LKML, one of the use-cases
> cited was http://lkml.org/lkml/2009/7/29/243. I can't see why such
> need should be restricted to a given CPU only, rather than a subset of
> CPUs (say 'x' is a variable normally read/written-to in the interrupt
> path, and if the said interrupt is has a cpu affinity to a subset of
> cpus only).
>
> Although in the normal case, this feature could be implemented later, in
> case of breakpoints we accept that as input from the user (and hence
> part of the well-defined interface), so it is better to design it for
> a subset of CPUs from start. The logic isn't very different and given that
> there are plenty of helper routines in cpumask.h the implementation is easy too.


Well, if one day someone wants to profile a subset of cpus and then need
this feature, I'll implement it. But I don't think we should anticipate
every possible corner usecases for now.

It's not possible to request that from any user interface anyway
(either ptrace, perf or ftrace). And if it becomes needed for in-kernel
use, then it's trivial to change.


> > If we want to lock such path, we probably more likely want a mutex.
> > Registering a breakpoint is not a fastpath and also perf does
> > some sleepable things while creating a counter.
> >
> > The check to register constraints, which is part of this path,
> > is itself a mutex.
> >
> > But we'll probably need something NMI safe in the future so
> > that it can be used without any problem by kgdb.
> >
>
> I suspect that it will be required for cpu-hotplug handler, where
> previously load_debug_registers() was called from a softirq context.


Nop. There is no register/unregister on cpu hotplug time.
Perf will just reschedule the events on that cpu (through
pmu::enable/disable calls).



> > > I'm assuming that there'd be an implementation for system-wide
> > > perf-events (and hence breakpoints) in the forthcoming version(s) of
> > > this patchset.
> >
> >
> > If that becomes a necessary feature, then yeah.
> >
> >
>
> Apart from the several benefits of having system-wide perf-events,
> implementing them in the first iteration itself will
> help us fully realise the cost of perf-events + hw-breakpoint
> integration! When implemented, perf-events will also be ready to
> accomodate future users (apart from bp and perf-top) having a
> need for system-wide counter.


For now this is meant to be costly (wrt cross cpu contention)
as I explained you before.

But if the ftrace ring buffer becomes integrated by perf (which
seem to be in the plans), then yeah this may become a very useful
feature because we could use a single counter for wide profiling
without the cost of the cpu contention (ftrace ring buffer is
per cpu and fully lockless).

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.