Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[BUG] perf stat: useless output for raw events with new event parser

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


eranian at google

Apr 23, 2012, 3:45 AM

Post #1 of 17 (187 views)
Permalink
[BUG] perf stat: useless output for raw events with new event parser

Hi,

With the new event parser, one can express raw events field by field:

$ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1

The problem with this is that the output of perf stat becomes useless:

$ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
noploop for 1 seconds

Performance counter stats for 'noploop 1':

2395038678 pmu
10787 pmu
^^^^^^
1.000802603 seconds time elapsed

We lose the event names or encoding completely. Now for all events
expressed via this
new syntax , all we see is 'pmu'. That is pretty useless. It is hard
to decrypt the results
without some serious scripting.

Not sure how to solve this given how the parser works. This looks like
a regression to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Apr 23, 2012, 3:48 AM

Post #2 of 17 (186 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Mon, 2012-04-23 at 12:45 +0200, Stephane Eranian wrote:
> Hi,
>
> With the new event parser, one can express raw events field by field:
>
> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
>
> The problem with this is that the output of perf stat becomes useless:
>
> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> noploop for 1 seconds
>
> Performance counter stats for 'noploop 1':
>
> 2395038678 pmu
> 10787 pmu
> ^^^^^^
> 1.000802603 seconds time elapsed

Yeah, I already complained about that.. Jolsa proposed adding a name=
parameter so you could explicitly name your events. I think I've seen a
patch adding that, but can't atm seem to locate it.

Jiri?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jolsa at redhat

Apr 23, 2012, 3:55 AM

Post #3 of 17 (183 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Mon, Apr 23, 2012 at 12:48:52PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-04-23 at 12:45 +0200, Stephane Eranian wrote:
> > Hi,
> >
> > With the new event parser, one can express raw events field by field:
> >
> > $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> >
> > The problem with this is that the output of perf stat becomes useless:
> >
> > $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> > noploop for 1 seconds
> >
> > Performance counter stats for 'noploop 1':
> >
> > 2395038678 pmu
> > 10787 pmu
> > ^^^^^^
> > 1.000802603 seconds time elapsed
>
> Yeah, I already complained about that.. Jolsa proposed adding a name=
> parameter so you could explicitly name your events. I think I've seen a
> patch adding that, but can't atm seem to locate it.

it's waiting in here:
https://lkml.org/lkml/2012/4/15/74

I already talked to Arnaldo and I need to resend this serie,
due to another changes that are yet about to get into tip tree..
Once it's in, I'll resend ;)

jirka
>
> Jiri?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


robert.richter at amd

Apr 23, 2012, 3:56 AM

Post #4 of 17 (184 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On 23.04.12 12:48:52, Peter Zijlstra wrote:
> On Mon, 2012-04-23 at 12:45 +0200, Stephane Eranian wrote:
> > Hi,
> >
> > With the new event parser, one can express raw events field by field:
> >
> > $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> >
> > The problem with this is that the output of perf stat becomes useless:
> >
> > $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> > noploop for 1 seconds
> >
> > Performance counter stats for 'noploop 1':
> >
> > 2395038678 pmu
> > 10787 pmu
> > ^^^^^^
> > 1.000802603 seconds time elapsed
>
> Yeah, I already complained about that.. Jolsa proposed adding a name=
> parameter so you could explicitly name your events. I think I've seen a
> patch adding that, but can't atm seem to locate it.

"pmu" is always useless, because we have multiple pmus in one system
and don't know which pmu is meant. So the events name should be close
(maybe exactly) to what the -e option says instead of "pmu".

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jolsa at redhat

Apr 23, 2012, 3:57 AM

Post #5 of 17 (187 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Mon, Apr 23, 2012 at 12:45:34PM +0200, Stephane Eranian wrote:
> Hi,
>
> With the new event parser, one can express raw events field by field:
>
> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
>
> The problem with this is that the output of perf stat becomes useless:
>
> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> noploop for 1 seconds
>
> Performance counter stats for 'noploop 1':
>
> 2395038678 pmu
> 10787 pmu
> ^^^^^^
> 1.000802603 seconds time elapsed
>
> We lose the event names or encoding completely. Now for all events
> expressed via this
> new syntax , all we see is 'pmu'. That is pretty useless. It is hard
> to decrypt the results
> without some serious scripting.
>
> Not sure how to solve this given how the parser works. This looks like
> a regression to me.

I dont think thats regression, since you use new way to express an event.
Please check my answer in the other reply to Peter.

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


eranian at google

Apr 23, 2012, 4:17 AM

Post #6 of 17 (184 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Mon, Apr 23, 2012 at 12:48 PM, Peter Zijlstra <peterz [at] infradead> wrote:
> On Mon, 2012-04-23 at 12:45 +0200, Stephane Eranian wrote:
>> Hi,
>>
>> With the new event parser, one can express raw events field by field:
>>
>> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
>>
>> The problem with this is that the output of perf stat becomes useless:
>>
>> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
>> noploop for 1 seconds
>>
>>  Performance counter stats for 'noploop 1':
>>
>>         2395038678 pmu
>>             10787 pmu
>>                        ^^^^^^
>>        1.000802603 seconds time elapsed
>
> Yeah, I already complained about that.. Jolsa proposed adding a name=
> parameter so you could explicitly name your events. I think I've seen a
> patch adding that, but can't atm seem to locate it.
>
If the proposed solution is to add a pseudo field called "name", then I don't
see the point of this new parser compared to directly using my libpfm4
library which already allows:

perf stat -e inst_retired:any,wsm_unc:unc_cycles:u ...
And
perf record -e instr_retired:period=200000,cpu_clk_unhalted:freq=100 ...

If you have to make up names, you might as well, use the actual PMU
event names, but if you do so, why would you have to bother breaking
down the encoding into fields?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Apr 26, 2012, 3:27 AM

Post #7 of 17 (225 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Mon, 2012-04-23 at 13:17 +0200, Stephane Eranian wrote:
> On Mon, Apr 23, 2012 at 12:48 PM, Peter Zijlstra <peterz [at] infradead> wrote:
> > On Mon, 2012-04-23 at 12:45 +0200, Stephane Eranian wrote:
> >> Hi,
> >>
> >> With the new event parser, one can express raw events field by field:
> >>
> >> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> >>
> >> The problem with this is that the output of perf stat becomes useless:
> >>
> >> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> >> noploop for 1 seconds
> >>
> >> Performance counter stats for 'noploop 1':
> >>
> >> 2395038678 pmu
> >> 10787 pmu
> >> ^^^^^^
> >> 1.000802603 seconds time elapsed
> >
> > Yeah, I already complained about that.. Jolsa proposed adding a name=
> > parameter so you could explicitly name your events. I think I've seen a
> > patch adding that, but can't atm seem to locate it.
> >
> If the proposed solution is to add a pseudo field called "name", then I don't
> see the point of this new parser compared to directly using my libpfm4
> library which already allows:
>
> perf stat -e inst_retired:any,wsm_unc:unc_cycles:u ...
> And
> perf record -e instr_retired:period=200000,cpu_clk_unhalted:freq=100 ...
>
> If you have to make up names, you might as well, use the actual PMU
> event names, but if you do so, why would you have to bother breaking
> down the encoding into fields?

I'd like to use the event names, but really, when you do:

cpu/event=instructions_retired,inv,cmask=16/

is it still instructions_retired?

In SFO we spoke about exporting those event tables you have in libpfm4
in some structured file format so that we both might use them, the
primary difficulty in that (IIRC) was the umask constraints per event.

I'm not too overly bothered by people specifying wrong events (at some
point they really had better know wth they're doing anyway), so can we
start with something that mostly works?

I've already spoken with Jiri about adding this before all this parser
stuff got implemented, so it should all be quite possible to add.

Furthermore, once we have a common format, we could even ask Intel/AMD
(and other vendors) to provide their data in this format.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


eranian at google

Apr 26, 2012, 5:53 AM

Post #8 of 17 (177 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Thu, Apr 26, 2012 at 12:27 PM, Peter Zijlstra <peterz [at] infradead> wrote:
> On Mon, 2012-04-23 at 13:17 +0200, Stephane Eranian wrote:
>> On Mon, Apr 23, 2012 at 12:48 PM, Peter Zijlstra <peterz [at] infradead> wrote:
>> > On Mon, 2012-04-23 at 12:45 +0200, Stephane Eranian wrote:
>> >> Hi,
>> >>
>> >> With the new event parser, one can express raw events field by field:
>> >>
>> >> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
>> >>
>> >> The problem with this is that the output of perf stat becomes useless:
>> >>
>> >> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
>> >> noploop for 1 seconds
>> >>
>> >>  Performance counter stats for 'noploop 1':
>> >>
>> >>         2395038678 pmu
>> >>             10787 pmu
>> >>                        ^^^^^^
>> >>        1.000802603 seconds time elapsed
>> >
>> > Yeah, I already complained about that.. Jolsa proposed adding a name=
>> > parameter so you could explicitly name your events. I think I've seen a
>> > patch adding that, but can't atm seem to locate it.
>> >
>> If the proposed solution is to add a pseudo field called "name", then I don't
>> see the point of this new parser compared to directly using my libpfm4
>> library which already allows:
>>
>>  perf stat -e inst_retired:any,wsm_unc:unc_cycles:u ...
>> And
>>  perf record -e instr_retired:period=200000,cpu_clk_unhalted:freq=100 ...
>>
>> If you have to make up names, you might as well, use the actual PMU
>> event names, but if you do so, why would you have to bother breaking
>> down the encoding into fields?
>
> I'd like to use the event names, but really, when you do:
>
>  cpu/event=instructions_retired,inv,cmask=16/
>
> is it still instructions_retired?

Yes, to some extent. Remember this is not equivalent to
unhalted_core_cycles.

I think this new syntax is too heavy. Again, all you need to know
is the event name + umask + modifiers (inv, cmask, edge, u, k).
You can structure the event string to make the tool auto detect
which string is pmu vs. event name vs. umask vs. modifier.


>
> In SFO we spoke about exporting those event tables you have in libpfm4
> in some structured file format so that we both might use them, the
> primary difficulty in that (IIRC) was the umask constraints per event.
>
Yes, for now the tables are C tables inside the library. This allows some
flexibility. Not all tables have the same format. But you can also express
more sophisticated umask constraints. For instance, offcore_response
where you need a request umask + a response umask on WSM. On
SNB is it even more complicated. Of course, the alternative it just
to have a flat list enumerating ALL the possible valid combinations.
But you have an explosion with WSM/SNB and offcore_response, though
maybe only few dozens do really make sense.

So I suspect for an extern text file, a flat list would make more sense even
though is would take up some memory to hold it. But then we'd have only
a couple of tables loaded per process (core, uncore, ...).

> I'm not too overly bothered by people specifying wrong events (at some
> point they really had better know wth they're doing anyway), so can we
> start with something that mostly works?
>
Yes, at the end of the day you need to understand how the hw works
and what the events actually measures, otherwise you won't accomplish
anything useful.

> I've already spoken with Jiri about adding this before all this parser
> stuff got implemented, so it should all be quite possible to add.
>
You mean as you suggested above?

I tried that 3 weeks ago and as I ran into a major issue having to do
with loss of semantic of the terms (event, umask) in the parser code.
Also I wonder how the syntax would accomodate umask combinations.
The problem is that you need to know that the keyword "event" refers
to the event code and "umask" to the event umask. Without that, hard
to lookup the event string and umask string properly. You need to know
you're looking for an event name vs. a umask name. That's regardless
of the fact that the table is internal to a library or in an external text file.
To work around that, you'd have to ass the content of the syfs format/event
file to machine specific code which would recognize the attr + bit field and
determine whether or not this corresponds to an event code or umask. That's
pretty ugly. Especially given that I have proof that you don't need to
go through
all of this to get something working.

The advantage of the library is that is can be used for
self-monitoring programs.
Therefore the event syntax would remain identical. It can work on
non-Linux systems
as well. The library is layered between HW and OS.

> Furthermore, once we have a common format, we could even ask Intel/AMD
> (and other vendors) to provide their data in this format.
>
I am already doing that.

I think having external event tables would make it easier for people
to customize.
It would also decrease the memory footprint but it would probably increase the
startup time as you'd have to identify the event table and load it (parse it) on
demand. It is definitively something I want to pursue. I already
started experimenting
with it using JSON for the external file syntax.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


robert.richter at amd

Apr 26, 2012, 6:12 AM

Post #9 of 17 (175 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On 26.04.12 12:27:11, Peter Zijlstra wrote:
> Furthermore, once we have a common format, we could even ask Intel/AMD
> (and other vendors) to provide their data in this format.

I don't think that can be done with a reasonable effort. Trying to
abstract the pmu description in sysfs to let user space know about the
pmu? We will never fit everything. And we have duplicate code,
esp. bit masks etc in the kernel and userland. What has been simple
bit mask macros in the past are now tons of bit name and mask pairs
provided by sysfs, generated by the kernel and reassembled in
userland. Could it be any more complicated? Really this can't be
handled.

Why not simply pass an identifier for each kind of pmu and then only
add pmu specific code in userland? Much easier than all this sysfs
format thing, where the kernel tries to tell userland what to do,
which the kernel never can do exactly. And even if we can describe
everything with sysfs, kernel and userland code becomes bloated, it
actually is already, looking at the recent perf tool and kernel
updates.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Apr 26, 2012, 7:03 AM

Post #10 of 17 (180 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Thu, 2012-04-26 at 14:53 +0200, Stephane Eranian wrote:

> > I'd like to use the event names, but really, when you do:
> >
> > cpu/event=instructions_retired,inv,cmask=16/
> >
> > is it still instructions_retired?
>
> Yes, to some extent. Remember this is not equivalent to
> unhalted_core_cycles.

Sure, but I think calling it instructions_retired isn't right either. So
I think that when you use cpu/event=instructions_retired/ without
further qualifications its fine to use that as default name, however
when you muck about with it reverting to either a raw or explicit name
is useful.

> I think this new syntax is too heavy. Again, all you need to know
> is the event name + umask + modifiers (inv, cmask, edge, u, k).
> You can structure the event string to make the tool auto detect
> which string is pmu vs. event name vs. umask vs. modifier.

On Intel yes, what about the rest of the world? The current stuff is
designed so it works with arbitrary bitmasks.

Ideally the other arch pmu maintainers would provide suitable format
descriptions as well where appropriate.

> > In SFO we spoke about exporting those event tables you have in libpfm4
> > in some structured file format so that we both might use them, the
> > primary difficulty in that (IIRC) was the umask constraints per event.

> Yes, for now the tables are C tables inside the library. This allows some
> flexibility. Not all tables have the same format. But you can also express
> more sophisticated umask constraints. For instance, offcore_response
> where you need a request umask + a response umask on WSM. On
> SNB is it even more complicated. Of course, the alternative it just
> to have a flat list enumerating ALL the possible valid combinations.
> But you have an explosion with WSM/SNB and offcore_response, though
> maybe only few dozens do really make sense.

Right, for SNB there's an explicit list of offcore events that's
'supported' which is significantly shorter than what the bit-encoding
allows for. See 19.3 table 19-4 (Intel SDM March 2012 edition).

> So I suspect for an extern text file, a flat list would make more sense even
> though is would take up some memory to hold it. But then we'd have only
> a couple of tables loaded per process (core, uncore, ...).

Right, but I suspect you can express most of it in JSON. You'd be able
to associate an event with a MESI umask, or some linear enumeration.


> > I've already spoken with Jiri about adding this before all this parser
> > stuff got implemented, so it should all be quite possible to add.
> >
> You mean as you suggested above?

Mostly in general trends without too much detail. From what my memory
still remembers I think we talked about something like:

If the event= thing has a string instead of number, go to some external
dictionary and get an event description back, this event description
might include umask,inv,etc.. bits. If the rest of the explicit event
specification includes bits that are also part of the event, these
override.

For example:

cpu/event=BR_MISP_RETIRED.ALL_BRANCHES,umask=1/

would, from the external dictionary, return: event=0xc5,umask=0, the
final event will look like: event=0xc5,umask=1 (aka.
BR_MISP_RETIRED.CONDITIONAL).

But suppose you did:

cpu/event=BR_MISP_RETIRED/

the dictionary might return a umask object (enumeration in this case),
if it would not include 0, the above might throw a warning to the user
about incomplete events. But it could also be used resolve:

cpu/event=BR_MISP_RETIRED,umask=CONDITIONAL/

[. now arguably my example above is somewhat unfortunate because both
0x00c5 and 0x04c5 are listed as BR_MISP_RETIRED.ALL_BRANCHES ]

> I tried that 3 weeks ago and as I ran into a major issue having to do
> with loss of semantic of the terms (event, umask) in the parser code.
> Also I wonder how the syntax would accomodate umask combinations.
> The problem is that you need to know that the keyword "event" refers
> to the event code and "umask" to the event umask. Without that, hard
> to lookup the event string and umask string properly. You need to know
> you're looking for an event name vs. a umask name.

I'm not entirely sure I get this, have the tables have the same names as
the bitfields described in the format.

{
"event" : {
"BR_MISP_RETIRED" : {
"value" : 0xc5,
"umask" : {
"ALL_BRANCHES" : 0x00
"CONDITIONAL" : 0x01
...
"NOT_TAKEN" : 0x10
"TAKEN" : 0x20
}
},
...
}
}

> That's regardless
> of the fact that the table is internal to a library or in an external text file.
> To work around that, you'd have to ass the content of the syfs format/event
> file to machine specific code which would recognize the attr + bit field and
> determine whether or not this corresponds to an event code or umask. That's
> pretty ugly. Especially given that I have proof that you don't need to
> go through
> all of this to get something working.

I'm really not sure what you mean. You need to know how to encode all
that, you have cpuid->encoding rules, we have sysfs encoding rules, how
does that matter?

> The advantage of the library is that is can be used for
> self-monitoring programs.

If we'd stick the perf parser into a library you get the same, that's
what libraries are for.. so?

> Therefore the event syntax would remain identical. It can work on
> non-Linux systems
> as well. The library is layered between HW and OS.

Uhm, no the library is layered on top of the OS since it runs in
userspace and needs to use OS provided interfaces to work, it simply
cannot do anything other. If you want to support other OSs thats fine,
but completely irrelevant.

If we get this event data in a usable format every OS/software on that
particular hardware can use the same names provided they use the same
data-set.

> > Furthermore, once we have a common format, we could even ask Intel/AMD
> > (and other vendors) to provide their data in this format.
> >
> I am already doing that.
>
> I think having external event tables would make it easier for people
> to customize.
> It would also decrease the memory footprint but it would probably increase the
> startup time as you'd have to identify the event table and load it (parse it) on
> demand. It is definitively something I want to pursue. I already
> started experimenting
> with it using JSON for the external file syntax.

You can further reduce the load-time by adding an index/cache on top of
whatever file-format you use. Also, even if you hard-code this in a C
file the resulting data+code still needs to be loaded, albeit by the OS.

For all I care someone writes a small compiler to compile the JSON into
a DSO which you can link to.. really I don't care. What I do care about
is getting that data in a common usable format.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Apr 26, 2012, 7:24 AM

Post #11 of 17 (174 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Thu, 2012-04-26 at 15:12 +0200, Robert Richter wrote:
> On 26.04.12 12:27:11, Peter Zijlstra wrote:
> > Furthermore, once we have a common format, we could even ask Intel/AMD
> > (and other vendors) to provide their data in this format.
>
> I don't think that can be done with a reasonable effort.

I'm thinking you mis-understand, all we're talking about is a copy of
your event list (BKDG Fam 10h Rev 3.48, section 3.14) in a usable
format.

> Why not simply pass an identifier for each kind of pmu and then only
> add pmu specific code in userland? Much easier than all this sysfs
> format thing, where the kernel tries to tell userland what to do,
> which the kernel never can do exactly. And even if we can describe
> everything with sysfs, kernel and userland code becomes bloated, it
> actually is already, looking at the recent perf tool and kernel
> updates.

The format is only about encoding rules, it doesn't do constraints. All
it wants to convey is if you have an event-code of: 0x4E2 where to stick
it in perf_event_attr::config*. It doesn't want to tell you if that
event exists and if there's a particular umask that ought to go with it.

Its an aid to simplify constructing raw events, nothing more.

When I want to use funny events I'm staring at the Intel-SDM/AMD-BKDG
anyway and I find writing:

cpu/event=0x4e2,umask=0xf8/

A lot easier than:

r40000f8e2

because while I like numbers I cannot actually count and would get that
4 wrong half the time, furthermore I'd have to actually remember the
encoding rules; or something like:

cpu/event=L3_fills_caused_by_L2_evictions.modified_any_core/

Then again, some people are scared of numbers and prefer to wear down
their finger-tips writing silly names.

That said, a unique identifier in there might make sense, esp on things
like ARM that don't have CPUID and even if they do it doesn't actually
correlate to the PMU :-)

So I'd be perfectly ok with adding something
like /sys/bus/events/device/*/name or so.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


robert.richter at amd

Apr 26, 2012, 7:45 AM

Post #12 of 17 (177 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On 26.04.12 16:24:33, Peter Zijlstra wrote:
> On Thu, 2012-04-26 at 15:12 +0200, Robert Richter wrote:
> > On 26.04.12 12:27:11, Peter Zijlstra wrote:
> > > Furthermore, once we have a common format, we could even ask Intel/AMD
> > > (and other vendors) to provide their data in this format.
> >
> > I don't think that can be done with a reasonable effort.
>
> I'm thinking you mis-understand, all we're talking about is a copy of
> your event list (BKDG Fam 10h Rev 3.48, section 3.14) in a usable
> format.

If it is not more complex than for libpfm4 am fine with it. I was
worried about describing pmu capabilities in sysfs which can better be
done with pmu specific code, once we know on which pmu we run.

> Its an aid to simplify constructing raw events, nothing more.
>
> When I want to use funny events I'm staring at the Intel-SDM/AMD-BKDG
> anyway and I find writing:
>
> cpu/event=0x4e2,umask=0xf8/
>
> A lot easier than:
>
> r40000f8e2

It is totally ok to have parser support for this. I simply do not see
why we need to put the encoding into sysfs. We somehow know on which
hardware we run and the parser should already know how to setup the
syscall. So parsing the above finally ends in calling of something
like:

setup_event_for_some_pmu(event, 0x4e2, 0xf8);

We don't need any description of bit masks in sysfs for this.

> So I'd be perfectly ok with adding something
> like /sys/bus/events/device/*/name or so.

Yes, something like this could be added to let userland know about the
pmu.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Apr 26, 2012, 8:39 AM

Post #13 of 17 (174 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Thu, 2012-04-26 at 16:45 +0200, Robert Richter wrote:
> It is totally ok to have parser support for this. I simply do not see
> why we need to put the encoding into sysfs. We somehow know on which
> hardware we run and the parser should already know how to setup the
> syscall. So parsing the above finally ends in calling of something
> like:
>
> setup_event_for_some_pmu(event, 0x4e2, 0xf8);
>
> We don't need any description of bit masks in sysfs for this.

Its the kernel side decoding perf_event_attr, so it seems sensible to
also describe this encoding from the kernel.

Currently we mostly match the hardware encoding, but there's no strict
requirement to do so, we can already see some of that with the extra_reg
stuff, perf_event_attr::config1 can mean different things depending on
the event.

Keeping all this information in two places just seems like asking for it
to get out of sync.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


robert.richter at amd

Apr 26, 2012, 10:36 AM

Post #14 of 17 (175 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On 26.04.12 17:39:32, Peter Zijlstra wrote:
> On Thu, 2012-04-26 at 16:45 +0200, Robert Richter wrote:
> > It is totally ok to have parser support for this. I simply do not see
> > why we need to put the encoding into sysfs. We somehow know on which
> > hardware we run and the parser should already know how to setup the
> > syscall. So parsing the above finally ends in calling of something
> > like:
> >
> > setup_event_for_some_pmu(event, 0x4e2, 0xf8);
> >
> > We don't need any description of bit masks in sysfs for this.
>
> Its the kernel side decoding perf_event_attr, so it seems sensible to
> also describe this encoding from the kernel.

For perfctr we have:

/sys/bus/event_source/devices/cpu/format/inv: config:23
/sys/bus/event_source/devices/cpu/format/edge: config:18
/sys/bus/event_source/devices/cpu/format/cmask: config:24-31
/sys/bus/event_source/devices/cpu/format/event: config:0-7,32-35
/sys/bus/event_source/devices/cpu/format/umask: config:8-15

The kernel does not en- or decode anything in the config value. It is
directly passed to the pmu with some validation of the values.
Everything else is in userland since it composes the syscall.

The kernel must now contain code like this:

PMU_FORMAT_ATTR(event, "config:0-7,32-35");
PMU_FORMAT_ATTR(umask, "config:8-15" );
PMU_FORMAT_ATTR(edge, "config:18" );
PMU_FORMAT_ATTR(inv, "config:23" );
PMU_FORMAT_ATTR(cmask, "config:24-31" );

Which is unrelated to anything else, duplicates the effort to maintain
bit masks and thus is more error-prone. Besides this there is no need
for it because the values are fix and do not change. We simply know
the format of the config value already, so the format entries are of
no use.

One could argue that feeding a generic pmu setup with the format
configuration reduces the need to modify userland, we have same code
for various archs. But if I have the choice I rather update my perf
tool chain than rebooting the kernel to update perf.

> Currently we mostly match the hardware encoding, but there's no strict
> requirement to do so, we can already see some of that with the extra_reg
> stuff, perf_event_attr::config1 can mean different things depending on
> the event.

Of course the config values of the syscall could be translated into a
different hardware configuration. But its layout is always spec'ed
somewhere and needs no description in sysfs.

> Keeping all this information in two places just seems like asking for it
> to get out of sync.

All this could reside in userland at one place too. Also, the syscall
definition is sufficient as interface description and both sides must
handle any differences of kernel or userland implementations.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


eranian at google

May 2, 2012, 4:14 AM

Post #15 of 17 (156 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

Peter,

I will try to experiment with: cpu/event=INST_RETIRED:ANY:c=1/
But for now with libpfm4 as it is. Then will try to come up with an
external file format that encapsulates what I already have in the
C-table. There is more than just what the hardware exports. You
have to encode certain constraints, e.g., how umasks can be
combined. I don't even use the same format for all arch. On
X86 the AMD and Intel formats are different.


On Mon, Apr 23, 2012 at 12:45 PM, Stephane Eranian <eranian [at] google> wrote:
> Hi,
>
> With the new event parser, one can express raw events field by field:
>
> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
>
> The problem with this is that the output of perf stat becomes useless:
>
> $ perf stat -e cpu/event=0x3c,umask=0x0/,cpu/event=0xc5,umask=0x0/ noploop 1
> noploop for 1 seconds
>
>  Performance counter stats for 'noploop 1':
>
>        2395038678 pmu
>            10787 pmu
>                       ^^^^^^
>       1.000802603 seconds time elapsed
>
> We lose the event names or encoding completely. Now for all events
> expressed via this
> new syntax , all we see is 'pmu'. That is pretty useless. It is hard
> to decrypt the results
> without some serious scripting.
>
> Not sure how to solve this given how the parser works. This looks like
> a regression to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

May 7, 2012, 5:42 AM

Post #16 of 17 (154 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On Thu, 2012-04-26 at 19:36 +0200, Robert Richter wrote:
> On 26.04.12 17:39:32, Peter Zijlstra wrote:
> > On Thu, 2012-04-26 at 16:45 +0200, Robert Richter wrote:
> > > It is totally ok to have parser support for this. I simply do not see
> > > why we need to put the encoding into sysfs. We somehow know on which
> > > hardware we run and the parser should already know how to setup the
> > > syscall. So parsing the above finally ends in calling of something
> > > like:
> > >
> > > setup_event_for_some_pmu(event, 0x4e2, 0xf8);
> > >
> > > We don't need any description of bit masks in sysfs for this.
> >
> > Its the kernel side decoding perf_event_attr, so it seems sensible to
> > also describe this encoding from the kernel.
>
> For perfctr we have:
>
> /sys/bus/event_source/devices/cpu/format/inv: config:23
> /sys/bus/event_source/devices/cpu/format/edge: config:18
> /sys/bus/event_source/devices/cpu/format/cmask: config:24-31
> /sys/bus/event_source/devices/cpu/format/event: config:0-7,32-35
> /sys/bus/event_source/devices/cpu/format/umask: config:8-15
>
> The kernel does not en- or decode anything in the config value. It is
> directly passed to the pmu with some validation of the values.

For AMD and most Intel, yes. But there's no requirement for this to be
so. IIRC, Intel P4 doesn't.

Furthermore, extra data required for some events doesn't have a well
defined place in the one ::config and we use config1,2, but how do we
tell what goes where?

> Everything else is in userland since it composes the syscall.

Dependent on what you want to achieve, but with the format/ and event/
stuff you can get a whole way without userspace ever having to know what
particular hardware its running on.

Ideally we'd get all the way to where the user itself doesn't care, once
the user starts caring he/she will have to open up the BKDG/SDM/other
volume of arch magic. At which point they'd better be well qualified to
deal with all that brings.

> The kernel must now contain code like this:
>
> PMU_FORMAT_ATTR(event, "config:0-7,32-35");
> PMU_FORMAT_ATTR(umask, "config:8-15" );
> PMU_FORMAT_ATTR(edge, "config:18" );
> PMU_FORMAT_ATTR(inv, "config:23" );
> PMU_FORMAT_ATTR(cmask, "config:24-31" );
>
> Which is unrelated to anything else, duplicates the effort to maintain
> bit masks and thus is more error-prone. Besides this there is no need
> for it because the values are fix and do not change.

If it doesn't change there's no maintenance overhead is there?

> We simply know
> the format of the config value already, so the format entries are of
> no use.

And yet 'event' isn't the same for Intel and AMD, this means I'd have to
add all kinds of cpu detection code into the parser. By having the
kernel provide this information the parser doesn't have to care about
it.

> One could argue that feeding a generic pmu setup with the format
> configuration reduces the need to modify userland, we have same code
> for various archs. But if I have the choice I rather update my perf
> tool chain than rebooting the kernel to update perf.

You'll have to reboot at some point anyway, you can always frob
perf_event_attr::config* by hand without the aid of this sysfs stuff,
but when the kernel cannot handle the data or otherwise doesn't know how
to talk to the hardware you'll have to go reboot.

I really don't see the problem with reboots and wish people would stop
using that silly argument.

> > Currently we mostly match the hardware encoding, but there's no strict
> > requirement to do so, we can already see some of that with the extra_reg
> > stuff, perf_event_attr::config1 can mean different things depending on
> > the event.
>
> Of course the config values of the syscall could be translated into a
> different hardware configuration. But its layout is always spec'ed
> somewhere and needs no description in sysfs.

You mean there's a spec (kernel source excluded) that says that the
offcore response msr goes in perf_event_attr::config1 ?

> > Keeping all this information in two places just seems like asking for it
> > to get out of sync.
>
> All this could reside in userland at one place too.

Userland would first have to figure out what physical hardware it was
running on to determine if it has an offcore response msr, then it would
have to 'know' this value is supposed to go in perf_event_attr::config1.

By hard-coding this in userspace the kernel is then never allowed to
change it and you'd have to duplicate this knowledge everywhere you'd
want to use the syscall. You could 'optimize' this by using a library to
reduce the userland copies to 1, but the fact is that both kernel and
userland need to 'know' this independently.

Yet by having it in sysfs the kernel can tell userspace than it has an
offcore_rsp field and if we get a value for it it goes wherever it says.
Userspace doesn't need to know anything other than to look up stuff in
sysfs and there's only 1 copy of the knowledge needed -- in the kernel.

So if I try to set offcore_rsp on an AMD machine it'll error since they
don't provide that field.

Same with the event field, without having to have cpu checks, perf can
deal with 'cpu/event=0x4e3/'. On Intel it can tell us this won't go
since the event field isn't wide enough, on AMD it knows to stick that 4
in bits 32-35 of perf_event_attr::config.

> Also, the syscall
> definition is sufficient as interface description and both sides must
> handle any differences of kernel or userland implementations.

You've just posted a patch ([PATCH 4/7] perf/x86-ibs: Add support for
IBS pseudo events) that's the very counter example.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


robert.richter at amd

May 7, 2012, 9:58 AM

Post #17 of 17 (154 views)
Permalink
Re: [BUG] perf stat: useless output for raw events with new event parser [In reply to]

On 07.05.12 14:42:45, Peter Zijlstra wrote:
> On Thu, 2012-04-26 at 19:36 +0200, Robert Richter wrote:
> > On 26.04.12 17:39:32, Peter Zijlstra wrote:
> > > On Thu, 2012-04-26 at 16:45 +0200, Robert Richter wrote:
> > > > It is totally ok to have parser support for this. I simply do not see
> > > > why we need to put the encoding into sysfs. We somehow know on which
> > > > hardware we run and the parser should already know how to setup the
> > > > syscall. So parsing the above finally ends in calling of something
> > > > like:
> > > >
> > > > setup_event_for_some_pmu(event, 0x4e2, 0xf8);
> > > >
> > > > We don't need any description of bit masks in sysfs for this.
> > >
> > > Its the kernel side decoding perf_event_attr, so it seems sensible to
> > > also describe this encoding from the kernel.
> >
> > For perfctr we have:
> >
> > /sys/bus/event_source/devices/cpu/format/inv: config:23
> > /sys/bus/event_source/devices/cpu/format/edge: config:18
> > /sys/bus/event_source/devices/cpu/format/cmask: config:24-31
> > /sys/bus/event_source/devices/cpu/format/event: config:0-7,32-35
> > /sys/bus/event_source/devices/cpu/format/umask: config:8-15
> >
> > The kernel does not en- or decode anything in the config value. It is
> > directly passed to the pmu with some validation of the values.
>
> For AMD and most Intel, yes. But there's no requirement for this to be
> so. IIRC, Intel P4 doesn't.
>
> Furthermore, extra data required for some events doesn't have a well
> defined place in the one ::config and we use config1,2, but how do we
> tell what goes where?
>
> > Everything else is in userland since it composes the syscall.
>
> Dependent on what you want to achieve, but with the format/ and event/
> stuff you can get a whole way without userspace ever having to know what
> particular hardware its running on.
>
> Ideally we'd get all the way to where the user itself doesn't care, once
> the user starts caring he/she will have to open up the BKDG/SDM/other
> volume of arch magic. At which point they'd better be well qualified to
> deal with all that brings.
>
> > The kernel must now contain code like this:
> >
> > PMU_FORMAT_ATTR(event, "config:0-7,32-35");
> > PMU_FORMAT_ATTR(umask, "config:8-15" );
> > PMU_FORMAT_ATTR(edge, "config:18" );
> > PMU_FORMAT_ATTR(inv, "config:23" );
> > PMU_FORMAT_ATTR(cmask, "config:24-31" );
> >
> > Which is unrelated to anything else, duplicates the effort to maintain
> > bit masks and thus is more error-prone. Besides this there is no need
> > for it because the values are fix and do not change.
>
> If it doesn't change there's no maintenance overhead is there?
>
> > We simply know
> > the format of the config value already, so the format entries are of
> > no use.
>
> And yet 'event' isn't the same for Intel and AMD, this means I'd have to
> add all kinds of cpu detection code into the parser. By having the
> kernel provide this information the parser doesn't have to care about
> it.

I think we agree that it can be done in both ways, either providing
the information in sysfs or delivering a library like function to fill
in the data into attr. It is just a matter of taste with some pro and
cons for both.

> > One could argue that feeding a generic pmu setup with the format
> > configuration reduces the need to modify userland, we have same code
> > for various archs. But if I have the choice I rather update my perf
> > tool chain than rebooting the kernel to update perf.
>
> You'll have to reboot at some point anyway, you can always frob
> perf_event_attr::config* by hand without the aid of this sysfs stuff,
> but when the kernel cannot handle the data or otherwise doesn't know how
> to talk to the hardware you'll have to go reboot.
>
> I really don't see the problem with reboots and wish people would stop
> using that silly argument.

Some people may not reboot their system just to get an update of their
perf environment. And the other thing is, it is much harder to get a
change into a distro kernel than in a distro's perf package.

These are the main problems a user might face with. And simply because
you and me can reboot our machines as we want we actually can't ignore
this.

> > > Currently we mostly match the hardware encoding, but there's no strict
> > > requirement to do so, we can already see some of that with the extra_reg
> > > stuff, perf_event_attr::config1 can mean different things depending on
> > > the event.
> >
> > Of course the config values of the syscall could be translated into a
> > different hardware configuration. But its layout is always spec'ed
> > somewhere and needs no description in sysfs.
>
> You mean there's a spec (kernel source excluded) that says that the
> offcore response msr goes in perf_event_attr::config1 ?
>
> > > Keeping all this information in two places just seems like asking for it
> > > to get out of sync.
> >
> > All this could reside in userland at one place too.
>
> Userland would first have to figure out what physical hardware it was
> running on to determine if it has an offcore response msr, then it would
> have to 'know' this value is supposed to go in perf_event_attr::config1.
>
> By hard-coding this in userspace the kernel is then never allowed to
> change it and you'd have to duplicate this knowledge everywhere you'd
> want to use the syscall. You could 'optimize' this by using a library to
> reduce the userland copies to 1, but the fact is that both kernel and
> userland need to 'know' this independently.
>
> Yet by having it in sysfs the kernel can tell userspace than it has an
> offcore_rsp field and if we get a value for it it goes wherever it says.
> Userspace doesn't need to know anything other than to look up stuff in
> sysfs and there's only 1 copy of the knowledge needed -- in the kernel.
>
> So if I try to set offcore_rsp on an AMD machine it'll error since they
> don't provide that field.

Once a field is defined in attr you may not change this afterwards for
abi stability reasons. Otherwise you make sysfs part of the syscall
specification. This requires sysfs itself and parsing it for using
perf_open_event(). Didn't see it this way until now. Other
applications than perf might use the perf syscall too and assume
stable config* bit fields.

-Robert

> Same with the event field, without having to have cpu checks, perf can
> deal with 'cpu/event=0x4e3/'. On Intel it can tell us this won't go
> since the event field isn't wide enough, on AMD it knows to stick that 4
> in bits 32-35 of perf_event_attr::config.
>
> > Also, the syscall
> > definition is sufficient as interface description and both sides must
> > handle any differences of kernel or userland implementations.
>
> You've just posted a patch ([PATCH 4/7] perf/x86-ibs: Add support for
> IBS pseudo events) that's the very counter example.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.