mark at hotpy
May 17, 2012, 3:38 AM
Post #26 of 34
Dag Sverre Seljebotn wrote:
> On 05/16/2012 10:24 PM, Robert Bradshaw wrote:
>> On Wed, May 16, 2012 at 11:33 AM, "Martin v.
>> Löwis"<martin [at] v> wrote:
>>>> Does this use case make sense to everyone?
>>>> The reason why we are discussing this on python-dev is that we are
>>>> for a general way to expose these C level signatures within the Python
>>>> ecosystem. And Dag's idea was to expose them as part of the type
>>>> basically as an addition to the current Python level tp_call() slot.
>>> The use case makes sense, yet there is also a long-standing solution
>>> to expose APIs and function pointers: the capsule objects.
>>> If you want to avoid dictionary lookups on the server side, implement
>>> tp_getattro, comparing addresses of interned strings.
>> Yes, that's an idea worth looking at. The point implementing
>> tp_getattro to avoid dictionary lookups overhead is a good one, worth
>> trying at least. One drawback is that this approach does require the
>> GIL (as does _PyType_Lookup).
>> Regarding the C function being faster than the dictionary lookup (or
>> at least close enough that the lookup takes time), yes, this happens
>> all the time. For example one might be solving differential equations
>> and the "user input" is essentially a set of (usually simple) double
>> f(double) and its derivatives.
> To underline how this is performance critical to us, perhaps a full
> Cython example is useful.
> The following Cython code is a real world usecase. It is not too
> contrived in the essentials, although simplified a little bit. For
> instance undergrad engineering students could pick up Cython just to
> play with simple scalar functions like this.
> from numpy import sin
> # assume sin is a Python callable and that NumPy decides to support
> # our spec to also support getting a "double (*sinfuncptr)(double)".
> # Our mission: Avoid to have the user manually import "sin" from C,
> # but allow just using the NumPy object and still be fast.
> # define a function to integrate
> cpdef double f(double x):
> return sin(x * x) # guess on signature and use "fastcall"!
> # the integrator
> def integrate(func, double a, double b, int n):
> cdef double s = 0
> cdef double dx = (b - a) / n
> for i in range(n):
> # This is also a fastcall, but can be cached so doesn't
> # matter...
> s += func(a + i * dx)
> return s * dx
> integrate(f, 0, 1, 1000000)
> There are two problems here:
> - The "sin" global can be reassigned (monkey-patched) between each call
> to "f", no way for "f" to know. Even "sin" could do the reassignment. So
> you'd need to check for reassignment to do caching...
Since Cython allows static typing why not just declare that func can
treat sin as if it can't be monkeypatched?
Moving the load of a global variable out of the loop does seem to be a
rather obvious optimisation, if it were declared to be legal.
> - The fastcall inside of "f" is separated from the loop in "integrate".
> And since "f" is often in another module, we can't rely on static full
> program analysis.
> These problems with monkey-patching disappear if the lookup is negligible.
> Some rough numbers:
> - The overhead with the tp_flags hack is a 2 ns overhead (something
> similar with a metaclass, the problems are more how to synchronize that
> metaclass across multiple 3rd party libraries)
Does your approach handle subtyping properly?
> - Dict lookup 20 ns
Did you time _PyType_Lookup() ?
> - The sin function is about 35 ns. And, "f" is probably only 2-3 ns,
> and there could very easily be multiple such functions, defined in
> different modules, in a chain, in order to build up a formula.
Such micro timings are meaningless, because the working set often tends
to fit in the hardware cache. A level 2 cache miss can takes 100s of cycles.
Python-Dev mailing list
Python-Dev [at] python