Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Understanding the buffer API

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


"ja...py" at farowl

Aug 3, 2012, 4:34 PM

Post #1 of 14 (898 views)
Permalink
Understanding the buffer API

I'm implementing the buffer API and some of memoryview for Jython. I
have read with interest, and mostly understood, the discussion in Issue
#10181 that led to the v3.3 re-implementation of memoryview and
much-improved documentation of the buffer API. Although Jython is
targeting v2.7 at the moment, and 1-D bytes (there's no Jython NumPy),
I'd like to lay a solid foundation that benefits from the recent CPython
work. I hope that some of the complexity in memoryview stems from legacy
considerations I don't have to deal with in Jython.

I am puzzled that PEP 3118 makes some specifications that seem
unnecessary and complicate the implementation. Would those who know the
API inside out answer a few questions?

My understanding is this: When a consumer requests a buffer from the
exporter it specifies using flags how it intends to navigate it. If the
buffer actually needs more apparatus than the consumer proposes, this
raises an exception. If the buffer needs less apparatus than the
consumer proposes, the exporter has to supply what was asked for. For
example, if the consumer sets PyBUF_STRIDES, and the buffer can only be
navigated by using suboffsets (PIL-style) this raises an exception.
Alternatively, if the consumer sets PyBUF_STRIDES, and the buffer is
just a simple byte array, the exporter has to supply shape and strides
arrays (with trivial values), since the consumer is going to use those
arrays.

Is there any harm is supplying shape and strides when they were not
requested? The PEP says: "PyBUF_ND ... If this is not given then shape
will be NULL". It doesn't stipulate that strides will be null if
PyBUF_STRIDES is not given, but the library documentation says so.
suboffsets is different since even when requested, it will be null if
not needed.

Similar, but simpler, the PEP says "PyBUF_FORMAT ... If format is not
explicitly requested then the format must be returned as NULL (which
means "B", or unsigned bytes)". What would be the harm in returning "B"?

One place where this really matters is in the implementation of
memoryview. PyMemoryView requests a buffer with the flags PyBUF_FULL_RO,
so even a simple byte buffer export will come with shape, strides and
format. A consumer (of the memoryview's buffer API) might specify
PyBUF_SIMPLE: according to the PEP I can't simply give it the original
buffer since required fields (that the consumer will presumably not
access) are not NULL. In practice, I'd like to: what could possibly go
wrong?

Jeff Allen

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan at bytereef

Aug 4, 2012, 2:11 AM

Post #2 of 14 (883 views)
Permalink
Re: Understanding the buffer API [In reply to]

Jeff Allen <ja...py [at] farowl> wrote:
> I'd like to lay a solid foundation that benefits from the
> recent CPython work. I hope that some of the complexity in
> memoryview stems from legacy considerations I don't have to deal
> with in Jython.

I'm afraid not: PEP-3118 is really that complex. ;)


> My understanding is this: When a consumer requests a buffer from the
> exporter it specifies using flags how it intends to navigate it. If
> the buffer actually needs more apparatus than the consumer proposes,
> this raises an exception. If the buffer needs less apparatus than
> the consumer proposes, the exporter has to supply what was asked
> for. For example, if the consumer sets PyBUF_STRIDES, and the
> buffer can only be navigated by using suboffsets (PIL-style) this
> raises an exception. Alternatively, if the consumer sets
> PyBUF_STRIDES, and the buffer is just a simple byte array, the
> exporter has to supply shape and strides arrays (with trivial
> values), since the consumer is going to use those arrays.

Yes.


> Is there any harm is supplying shape and strides when they were not
> requested? The PEP says: "PyBUF_ND ... If this is not given then
> shape will be NULL". It doesn't stipulate that strides will be null
> if PyBUF_STRIDES is not given, but the library documentation says
> so. suboffsets is different since even when requested, it will be
> null if not needed.

You are right that the PEP does not explicitly state that rule for
strides. However, NULL always has an implied meaning:

format=NULL -> treat the buffer as unsigned bytes.

shape=NULL -> one-dimensional AND treat the buffer as unsigned bytes.

strides=NULL -> C-contiguous


I think relaxing the NULL rule for strides would complicate things,
since it would introduce yet another special case.


> Similar, but simpler, the PEP says "PyBUF_FORMAT ... If format is
> not explicitly requested then the format must be returned as NULL
> (which means "B", or unsigned bytes)". What would be the harm in
> returning "B"?

Ah, yes. The key here is this:

"This would be used when the consumer is going to be checking for what
'kind' of data is actually stored."


Conversely, if not requested, format=NULL indicates that the real
format may be e.g. 'L', but the consumer wants to treat the buffer
as unsigned bytes. This works because the 'len' field stores the
length of the memory area in bytes (for contiguous buffers at least).

The 'itemsize' field may be wrong though in this special case.

In general, format=NULL is a cast of a (possibly multi-dimensional)
C-contiguous buffer to a one-dimensional buffer of unsigned bytes.


IMO only the following combinations make sense. These two are self explanatory:

1) shape=NULL, format=NULL -> e.g. PyBUF_SIMPLE

2) shape!=NULL, format!=NULL -> e.g. PyBUF_FULL


1) can break the invariant product(shape) * itemsize = len!


The next combination exists as part of PyBUF_STRIDED:

3) shape!=NULL, format=NULL.

It can break two invariants (product(shape) * itemsize = len,
calcsize(format) = itemsize), but since it's explicitly part of
PyBUF_STRIDED, memoryview_getbuf() allows it.


The remaining combination is disallowed, since the buffer is already assumed to
be unsigned bytes:

4) shape=NULL, format!=NULL.



> One place where this really matters is in the implementation of
> memoryview. PyMemoryView requests a buffer with the flags
> PyBUF_FULL_RO, so even a simple byte buffer export will come with
> shape, strides and format. A consumer (of the memoryview's buffer
> API) might specify PyBUF_SIMPLE: according to the PEP I can't simply
> give it the original buffer since required fields (that the consumer
> will presumably not access) are not NULL. In practice, I'd like to:
> what could possibly go wrong?

Because of all the implied meanings of NULL, I think the safest way is
to implement memoryview_getbuf() for Jython. After all the PEP describes
a protocol, so everyone should really be doing the same thing.


Whether the protocol needs to be that complex is another question.
Partially initialized buffers are a pain to handle on the C level
since it is necessary to reconstruct the missing values -- at least if
you want to keep your sanity :).


I think the protocol would benefit from changing the getbuffer rules to:

a) The buffer gets a 'flags' field that can store properties like
PyBUF_SIMPLE, PyBUF_C_CONTIGUOUS etc.

b) The exporter must *always* provide full information.

c) If a buffer can be exported as unsigned bytes but has a different
layout, the exporter must perform a full cast so that the above
mentioned invariants are kept.

The disadvantage of this is that the original layout is lost for
the consumer. I do not know if there is a use case that requires
the consumer to have the original layout information.



Stefan Krah


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


"ja...py" at farowl

Aug 4, 2012, 6:48 AM

Post #3 of 14 (878 views)
Permalink
Re: Understanding the buffer API [In reply to]

Thanks for a swift reply: you're just the person I hoped would do so.

On 04/08/2012 10:11, Stefan Krah wrote:
> You are right that the PEP does not explicitly state that rule for
> strides. However, NULL always has an implied meaning:
>
> format=NULL -> treat the buffer as unsigned bytes.
>
> shape=NULL -> one-dimensional AND treat the buffer as unsigned bytes.
>
> strides=NULL -> C-contiguous
>
> I think relaxing the NULL rule for strides would complicate things,
> since it would introduce yet another special case.
... Ok, I think I see that how the absence of certain arrays is used to
deduce structural simplicity, over and above their straightforward use
in navigating the data. So although no shape array is (sort of)
equivalent to ndim==1, shape[0]==len, it also means I can call simpler
code instead of using the arrays for navigation.

I still don't see why, if the consumer says "I'm assuming 1-D unsigned
bytes", and that's what the data is, memoryview_getbuf could not provide
a shape and strides that agree with the data. Is the catch perhaps that
there is code (in abstract.c etc.) that does not know what the consumer
promised not to use/look at? Would it actually break, e.g. not treat it
as bytes, or just be inefficient?

> Because of all the implied meanings of NULL, I think the safest way is
> to implement memoryview_getbuf() for Jython. After all the PEP describes
> a protocol, so everyone should really be doing the same thing.
I'll look carefully at what you've written (snipped here) because it is
these "consumer expectations" that are most important. The Jython buffer
API is necessarily a lot different from the C one: some things are not
possible in Java (pointer arithmetic) and some are just un-Javan
activities (allocate a struct and have the library fill it in). I'm only
going for a logical conformance to the PEP: the same navigational and
other attributes, that mean the same things for the consumer.

When you say such-and-such is disallowed, but the PEP or the data
structures seem to provide for it, you mean memoryview_getbuf()
disallows it, since you've concluded it is not sensible?
> I think the protocol would benefit from changing the getbuffer rules to:
>
> a) The buffer gets a 'flags' field that can store properties like
> PyBUF_SIMPLE, PyBUF_C_CONTIGUOUS etc.
>
> b) The exporter must *always* provide full information.
>
> c) If a buffer can be exported as unsigned bytes but has a different
> layout, the exporter must perform a full cast so that the above
> mentioned invariants are kept.
>
Just like PyManagedBuffer mbuf and its sister view in memoryview? I've
thought the same things, but the tricky part is to do it compatibly.

a) I think I can achieve this. As I have interfaces and polymorphism on
my side, and a commitment only to logical equivalence to CPython, I can
have the preserved flags stashed away inside to affect behaviour. But
it's not as simple as saving the consumer's request, and I'm still
trying to work it out what to do, e.g. when the consumer didn't ask for
C-contiguity, but in this case it happens to be true.

In the same way, functions you have in abstract.c etc. can be methods
that, rather than work out by inspection of a struct how to navigate the
data on this call, already know what kind of buffer they are in. So
SimpleBuffer.isContiguous(char order) can simply return true.

b) What I'm hoping can work, but maybe not.

c) Java will not of course give you raw memory it thinks is one thing,
to treat as another, so this aspect is immature in my thinking. I got as
far as accommodating multi-byte items, but have no use for them as yet.

Thanks again for the chance to test my ideas.
Jeff Allen
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Aug 4, 2012, 7:51 AM

Post #4 of 14 (881 views)
Permalink
Re: Understanding the buffer API [In reply to]

On Sat, Aug 4, 2012 at 7:11 PM, Stefan Krah <stefan [at] bytereef> wrote:
> You are right that the PEP does not explicitly state that rule for
> strides. However, NULL always has an implied meaning:
>
> format=NULL -> treat the buffer as unsigned bytes.
>
> shape=NULL -> one-dimensional AND treat the buffer as unsigned bytes.
>
> strides=NULL -> C-contiguous
>
>
> I think relaxing the NULL rule for strides would complicate things,
> since it would introduce yet another special case.

I took Jeff's question as being slightly different and applying in the
following situations:

1. If the consumer has NOT requested format data, can the provider
return accurate format data anyway, if that's easier than returning
NULL but is consistent with doing so?

2. The consumer has NOT requested shape data, can shape data be
provided anyway, if that's easier than returning NULL but is
consistent with doing so?

3. The consumer has NOT requested strides data, can strides data be
provided anyway, if that's easier than returning NULL but is
consistent with doing so?

That's what I believe is Jeff's main question: is a provider that
always publishes complete information, even if the consumer doesn't
ask for it, in compliance with the API, so long as any cases where the
consumer's stated assumption (as indicated by the request flags) would
be violated are handled as errors instead of successfully populating
the buffer?

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


storchaka at gmail

Aug 4, 2012, 8:25 AM

Post #5 of 14 (879 views)
Permalink
Re: Understanding the buffer API [In reply to]

On 04.08.12 17:51, Nick Coghlan wrote:
> I took Jeff's question as being slightly different and applying in the
> following situations:
>
> 1. If the consumer has NOT requested format data, can the provider
> return accurate format data anyway, if that's easier than returning
> NULL but is consistent with doing so?
>
> 2. The consumer has NOT requested shape data, can shape data be
> provided anyway, if that's easier than returning NULL but is
> consistent with doing so?
>
> 3. The consumer has NOT requested strides data, can strides data be
> provided anyway, if that's easier than returning NULL but is
> consistent with doing so?

4. The consumer has NOT requested writable buffer, can readonly flag of
provided buffer be false anyway?


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan at bytereef

Aug 4, 2012, 8:25 AM

Post #6 of 14 (885 views)
Permalink
Re: Understanding the buffer API [In reply to]

Jeff Allen <ja...py [at] farowl> wrote:
> I still don't see why, if the consumer says "I'm assuming 1-D
> unsigned bytes", and that's what the data is, memoryview_getbuf
> could not provide a shape and strides that agree with the data.

In most cases it won't matter. However, a consumer is entitled to rely
on shape==NULL in response to a PyBUF_SIMPLE request. Perhaps there
is code that tests for shape==NULL to determine C-contiguity.

This is an example that might occur in C. You hinted at the fact that not
all of this may be relevant for Java, but on that I can't comment.


> When you say such-and-such is disallowed, but the PEP or the data
> structures seem to provide for it, you mean memoryview_getbuf()
> disallows it, since you've concluded it is not sensible?

The particular request of PyBUF_SIMPLE|PyBUF_FORMAT, when applied to
any array that is not one-dimensional with format 'B' would lead to a
contradiction: PyBUF_SIMPLE implies 'B', but format would be set to
something else.

It is also a useless combination, since a plain PyBUF_SIMPLE suffices.


> >I think the protocol would benefit from changing the getbuffer rules to:
> >
> > a) The buffer gets a 'flags' field that can store properties like
> > PyBUF_SIMPLE, PyBUF_C_CONTIGUOUS etc.
> >
> > b) The exporter must *always* provide full information.
> >
> > c) If a buffer can be exported as unsigned bytes but has a different
> > layout, the exporter must perform a full cast so that the above
> > mentioned invariants are kept.
> >
> Just like PyManagedBuffer mbuf and its sister view in memoryview?
> I've thought the same things, but the tricky part is to do it
> compatibly.
>
> a) I think I can achieve this. As I have interfaces and polymorphism
> on my side, and a commitment only to logical equivalence to CPython,
> I can have the preserved flags stashed away inside to affect
> behaviour. But it's not as simple as saving the consumer's request,
> and I'm still trying to work it out what to do, e.g. when the
> consumer didn't ask for C-contiguity, but in this case it happens to
> be true.
>
> In the same way, functions you have in abstract.c etc. can be
> methods that, rather than work out by inspection of a struct how to
> navigate the data on this call, already know what kind of buffer
> they are in. So SimpleBuffer.isContiguous(char order) can simply
> return true.

Avoiding repeated calls to PyBuffer_IsContiguous() was in fact the main
reason for storing flags in the new MemoryViewObject.

It would be handy to have these flags in the Py_buffer structure, but
that can only be considered for a future version of Python, perhaps no
earlier than 4.0. The same applies of course to all three points that
I made above.


Stefan Krah


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Aug 4, 2012, 8:35 AM

Post #7 of 14 (877 views)
Permalink
Re: Understanding the buffer API [In reply to]

On Sun, Aug 5, 2012 at 1:25 AM, Stefan Krah <stefan [at] bytereef> wrote:
> In most cases it won't matter. However, a consumer is entitled to rely
> on shape==NULL in response to a PyBUF_SIMPLE request. Perhaps there
> is code that tests for shape==NULL to determine C-contiguity.
>
> This is an example that might occur in C. You hinted at the fact that not
> all of this may be relevant for Java, but on that I can't comment.

Think about trying to specify the buffer protocol using only C++
references rather than pointers. In Java, it's a lot easier to say
"this value must be a reference to 'B'" than it is to say "this value
must be NULL". (My Java is a little rusty, but I'm still pretty sure
you can only get NullPointerException by messing about with the JNI).

I think it's worth defining an "OR" clause for each of the current "X
must be NULL" cases, where it is legal for the provider to emit an
appropriate non-NULL value that would be consistent with the consumer
assuming that the returned value is consistent with what they
requested.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan at bytereef

Aug 4, 2012, 8:39 AM

Post #8 of 14 (878 views)
Permalink
Re: Understanding the buffer API [In reply to]

Nick Coghlan <ncoghlan [at] gmail> wrote:
> I took Jeff's question as being slightly different and applying in the
> following situations:

I think I attempted to answer the same thing. :)


> 1. If the consumer has NOT requested format data, can the provider
> return accurate format data anyway, if that's easier than returning
> NULL but is consistent with doing so?

No, this is definitely disallowed by the PEP (PyBUF_FORMAT):

"If format is not explicitly requested then the format must be returned as
NULL (which means "B", or unsigned bytes)."


> 2. The consumer has NOT requested shape data, can shape data be
> provided anyway, if that's easier than returning NULL but is
> consistent with doing so?

Also explicitly disallowed (PyBUF_ND):

"If this is not given then shape will be NULL."


> 3. The consumer has NOT requested strides data, can strides data be
> provided anyway, if that's easier than returning NULL but is
> consistent with doing so?

This is not explicitly disallowed, but IMO the intent is that strides
should also be NULL in that case. For example, strides==NULL might be
used for a quick C-contiguity test.


Stefan Krah



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan at bytereef

Aug 4, 2012, 8:43 AM

Post #9 of 14 (876 views)
Permalink
Re: Understanding the buffer API [In reply to]

Serhiy Storchaka <storchaka [at] gmail> wrote:
> 4. The consumer has NOT requested writable buffer, can readonly flag
> of provided buffer be false anyway?

Yes, per the new documentation. This is not explicitly mentioned in the PEP
but was existing practice and greatly simplifies several things:

http://docs.python.org/dev/c-api/buffer.html#PyBUF_WRITABLE


Stefan Krah



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan at bytereef

Aug 4, 2012, 9:41 AM

Post #10 of 14 (881 views)
Permalink
Re: Understanding the buffer API [In reply to]

Nick Coghlan <ncoghlan [at] gmail> wrote:
> Think about trying to specify the buffer protocol using only C++
> references rather than pointers. In Java, it's a lot easier to say
> "this value must be a reference to 'B'" than it is to say "this value
> must be NULL". (My Java is a little rusty, but I'm still pretty sure
> you can only get NullPointerException by messing about with the JNI).
>
> I think it's worth defining an "OR" clause for each of the current "X
> must be NULL" cases, where it is legal for the provider to emit an
> appropriate non-NULL value that would be consistent with the consumer
> assuming that the returned value is consistent with what they
> requested.

I think any implementation that doesn't use the Py_buffer struct directly
in a C-API should just always return a full buffer if a specific request
can be met according to the rules.


For the C-API, I would be cautious:

- The number of case splits in testing getbuffer flags is already
staggering. Defining an "OR" clause would introduce new cases.

- Consumers may simply rely on the status-quo.


As I said in my earlier mail, for Python 4.0, I'd rather see that buffers
have mandatory full information. Querying individual Py_buffer fields for
NULL should be replaced by a set of flags that would determine contiguity,
buffer "history" (has the buffer been cast to unsigned bytes?) etc.

It would also be possible to add new flags for things like byte order.


The main reason is that it turns out that in any general C function that
takes a Py_buffer argument one has to reconstruct full information anyway,
otherwise obscure cases *will* be overlooked (in the absence of a formal
proof that takes care of all case splits).



Stefan Krah



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Aug 4, 2012, 10:13 AM

Post #11 of 14 (880 views)
Permalink
Re: Understanding the buffer API [In reply to]

On Sun, Aug 5, 2012 at 2:41 AM, Stefan Krah <stefan [at] bytereef> wrote:
> Nick Coghlan <ncoghlan [at] gmail> wrote:
>> Think about trying to specify the buffer protocol using only C++
>> references rather than pointers. In Java, it's a lot easier to say
>> "this value must be a reference to 'B'" than it is to say "this value
>> must be NULL". (My Java is a little rusty, but I'm still pretty sure
>> you can only get NullPointerException by messing about with the JNI).
>>
>> I think it's worth defining an "OR" clause for each of the current "X
>> must be NULL" cases, where it is legal for the provider to emit an
>> appropriate non-NULL value that would be consistent with the consumer
>> assuming that the returned value is consistent with what they
>> requested.
>
> I think any implementation that doesn't use the Py_buffer struct directly
> in a C-API should just always return a full buffer if a specific request
> can be met according to the rules.

Since Jeff is talking about an inspired-by API, rather than using the
C API directly, I think that's the way Jython should go: *require*
that those fields be populated appropriately, rather than allowing
them to be None.

> For the C-API, I would be cautious:
>
> - The number of case splits in testing getbuffer flags is already
> staggering. Defining an "OR" clause would introduce new cases.
>
> - Consumers may simply rely on the status-quo.
>
>
> As I said in my earlier mail, for Python 4.0, I'd rather see that buffers
> have mandatory full information. Querying individual Py_buffer fields for
> NULL should be replaced by a set of flags that would determine contiguity,
> buffer "history" (has the buffer been cast to unsigned bytes?) etc.

Making a switch to mandatory full information later suggest that we
need to at least make it optional now. I do agree with what you
suggest though, which is that, if a buffer chooses to always publish
full and accurate information it must do so for *all* fields.Tthat
should reduce the combinatorial explosion.

It does place a constraint on consumers that they can't assume those
fields will be NULL just because they didn't ask for them, but I'm
struggling to think of any reason why a client would actually *check*
that instead of just assuming it. I guess the dodgy Py_buffer-copying
code in the old memoryview implementation only mostly works because
those fields are almost always NULL, but that approach was just deeply
broken in general.

> The main reason is that it turns out that in any general C function that
> takes a Py_buffer argument one has to reconstruct full information anyway,
> otherwise obscure cases *will* be overlooked (in the absence of a formal
> proof that takes care of all case splits).

Right, that's why I think we should declare it legal to *provide* full
information even if the consumer didn't ask for it, *as long as* any
consumer assumptions implied by the limited request (such as unsigned
byte data, a single dimension or C contiguity) remain valid. Consumers
that can't handle that correctly (which would likely include the
pre-3.3 memoryview) are officially broken.

As you say, we likely can't make providing full information mandatory
during the 3.x cycle, but we can at least pave the way for it.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


"ja...py" at farowl

Aug 5, 2012, 3:08 AM

Post #12 of 14 (855 views)
Permalink
Re: Understanding the buffer API [In reply to]

- Summary:

The PEP, or sometimes just the documentation, definitely requires that
features not requested shall be NULL.

The API would benefit from:

a. stored flags that tell you the actual structural features.
b. requiring exporters to provide full information (e.g. strides =
{1}, format = "B") even when trivial.

It could and possibly should work this way in Python 4.0.

Nick thinks we could *allow* exporters to behave this way (PEP change)
in Python 3.x. Stefan thinks not, because "Perhaps there is code that
tests for shape==NULL to determine C-contiguity."

Jython exporters should return full information unconditionally from the
start: "any implementation that doesn't use the Py_buffer struct
directly in a C-API should just always return a full buffer" (Stefan);
"I think that's the way Jython should go: *require* that those fields be
populated appropriately" (Nick).

- But what I now think is:

_If the only problem really is_ "code that tests for shape==NULL to
determine C-contiguity", or makes similar deductions, I agree that
providing unasked-for information is_safe_. I think the stipulation in
PEP/documentation has some efficiency value: on finding shape!=NULL the
code has to do a more complicated test, as inPyBuffer_IsContiguous(). I
have the option to provide an isContiguous that has the answer written
down already, so the risk is only from/to ported code. If it is only a
risk to the efficiency of ported code, I'm relaxed: I hesitate only to
check that there's no circumstance that logically requires nullity for
correctness. Whether it was safe that was the key question.

In the hypothetical Python 4.0 buffer API (and in Jython) where feature
flags are provided, the efficiency is still useful, but complicated
deductive logic in the consumer should be deprecated in favour of
(functions for) interrogating the flags.

An example illustrating the semantics would then be:
1. consumer requests a buffer, saying "I can cope with a strided arrays"
(PyBUF_STRIDED);
2. exporter provides a strides array, but in the feature flags
STRIDED=0, meaning "you don't need the strides array";
3. exporter (optionally) uses efficient, non-strided access.

_I do not think_ that full provision by the exporter has to be
_mandatory_, as the discussion has gone on to suggest. I know your
experience is that you have often had to regenerate the missing
information to write generic code, but I think this does not continue
once you have the feature flags. An example would be:
1. consumer requests a buffer, saying "I can cope with a N-dimensional
but not strided arrays" (PyBUF_ND);
2. exporter sets strides=NULL, and the feature flag STRIDED=0;
3. exporter accesses the data, without reference to the strides array,
as it planned;
4. new generic code that respects the feature flag STRIDED=0, does not
reference the strides array;
5. old generic code, ignorant of the feature flags, finds the
strides=NULL and so does not dereference strides.
Insofar as it is not necessary, there is some efficiency in not
providing it. There would only be a problem with broken code that both
ignores the feature flag and uses the strides array unchecked. But this
code was always broken.

Really useful discussion this.
Jeff


stefan at bytereef

Aug 8, 2012, 3:47 AM

Post #13 of 14 (831 views)
Permalink
Re: Understanding the buffer API [In reply to]

Nick Coghlan <ncoghlan [at] gmail> wrote:
> It does place a constraint on consumers that they can't assume those
> fields will be NULL just because they didn't ask for them, but I'm
> struggling to think of any reason why a client would actually *check*
> that instead of just assuming it.

Can we continue this discussion some other time, perhaps after 3.3 is out?
I'd like to respond, but need a bit more time to think about it than I have
right now (for this issue).


Stefan Krah



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


"ja...py" at farowl

Dec 18, 2012, 1:15 PM

Post #14 of 14 (682 views)
Permalink
Re: Understanding the buffer API [In reply to]

On 08/08/2012 11:47, Stefan Krah wrote:
> Nick Coghlan<ncoghlan [at] gmail> wrote:
>> It does place a constraint on consumers that they can't assume those
>> fields will be NULL just because they didn't ask for them, but I'm
>> struggling to think of any reason why a client would actually *check*
>> that instead of just assuming it.
> Can we continue this discussion some other time, perhaps after 3.3 is out?
> I'd like to respond, but need a bit more time to think about it than I have
> right now (for this issue).
Those who contributed to the design of it through discussion here may be
interested in how this has turned out in Jython. Although Jython is
still at a 2.7 alpha, the buffer API has proved itself in a few parts of
the core now and feels reasonably solid. It works for bytes in one
dimension. There's a bit of description here:
http://wiki.python.org/jython/BufferProtocol

Long story short, I took the route of providing all information, which
makes the navigational parts of the flags argument unequivocally a
statement of what navigation the client is assuming will be sufficient.
(The exception if thrown says explicitly that it won't be enough.) It
follows that if two clients want a view on the same object, an exporter
can safely give them the same one. Buffers take care of export counting
for the exporter (as in the bytearray resize lock), and buffers can give
you a sliced view of themselves without help from the exporter. The
innards of memoryview are much simpler for all this and enable it to
implement slicing (as in CPython 3.3) in one dimension. There may be
ideas worth stealing here if the CPython buffer is revisited.

N dimensional arrays and indirect addressing, while supported in
principle, have no implementation. I'm fairly sure multi-byte items, as
a way to export arrays of other types, makes no sense in Java where type
security is strict and a parallel but type-safe approach will be needed.

Jeff Allen
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.