Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue15573] Support unknown formats in memoryview comparisons

 

 

First page Previous page 1 2 3 Next page Last page  View All Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Aug 7, 2012, 4:55 AM

Post #1 of 66 (796 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons

New submission from Stefan Krah:

Continuing the discussion from #13072. I hit a snag here:

Determining in full generality whether two format strings describe
identical items is pretty complicated, see also #3132.


I'm attaching a best effort fmtcmp() function that should do the
following:

- recognize byte order specifiers at the start of the string.

- recognize if an explicitly specified byte order happens to
match the native byte order.

It won't catch:

- byte order specifiers anywhere in the string.

- C types that happen to be identical ('I', 'L' on a 32-bit
platform). I'm also not sure if that is desirable in the
first place.

- ???


So fmtcmp() will return false negatives (not equal), but should be
correct for *most* format strings that are actually in use.


Mark, Meador: You did a lot of work on the struct module and of
course on issue #3132. Does this look like a reasonable compromise?
Did I miss obvious cases (see attachment)?

----------
assignee: skrah
components: Interpreter Core
files: format.c
messages: 167618
nosy: Arfrever, georg.brandl, haypo, mark.dickinson, meador.inge, ncoghlan, pitrou, python-dev, skrah
priority: release blocker
severity: normal
stage: needs patch
status: open
title: Support unknown formats in memoryview comparisons
type: behavior
versions: Python 3.3
Added file: http://bugs.python.org/file26720/format.c

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 7, 2012, 6:06 AM

Post #2 of 66 (780 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Nick Coghlan added the comment:

I confess I was thinking of an even simpler "format strings must be identical" fallback, but agree your way is better, since it reproduces the 3.2 behaviour in many more cases where ignoring the format string actually did the right thing.

The struct docs for the byte order specifier specifically say "the first character of the format string can be used to indicate the byte order, size and alignment of the packed data", so treating format strings that include byte order markers elsewhere in the string as differing from each other if those markers are in different locations sounds fine to me.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 7, 2012, 12:40 PM

Post #3 of 66 (773 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Changes by Christian Heimes <lists [at] cheimes>:


----------
nosy: +christian.heimes

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 7, 2012, 7:52 PM

Post #4 of 66 (774 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Meador Inge added the comment:

I agree that the general case is complicated. It will get even more complicated if the full of PEP 3118 gets implemented since it turns into a tree comparison. In general, I think you will probably have to compute some canonical form and then compare the canonical forms.

Here are a few more cases that don't work out in the attached algorithm:

1. Repeat characters - '2c' == 'cc'
2. Whitespace - 'h h' == 'hh'

Also, currently the byte order specifiers are always at the beginning of the string. We discussed in issue3132 scoping them per the nested structures, but decided to drop that unless somebody barks about it since it is fairly complicated without a clear benefit. So, I wouldn't worry about them being scattered through the string.

This seems like sort of a slippery slope. I need to think about it more, but my first impression is that coming up with some way to compare format strings is going to be nasty.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 8, 2012, 4:32 AM

Post #5 of 66 (775 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Stefan Krah added the comment:

Right, byte order specifiers are always at the beginning of the string.
That is at least something. I wonder if we should tighten PEP-3118 to
demand a canonical form of format strings, such as (probably incomplete):

- Whitespace is disallowed.

- Except for 's', no zero count may be given.

- A count of 1 (redundant) is disallowed.

- Repeats must be specified in terms of count + single char.


That still leaves the '=I' != '=L' problem. Why are there two
specifiers describing uint32_t?


Anyway, as Meador says, this can get tricky and I don't think this
can be resolved before beta-2. I'm attaching a patch that should
behave well for the restricted canonical form at least.

----------
keywords: +patch
Added file: http://bugs.python.org/file26725/issue15573.diff

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 2:47 PM

Post #6 of 66 (770 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

Can someone please explain why this is a release blocker? What is the specific regression from 3.2 that this deals with?

----------
nosy: +loewis

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 3:06 PM

Post #7 of 66 (770 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

STINNER Victor added the comment:

> What is the specific regression from 3.2 that this deals with?

I don't know if it must be called a regression, but at least the behaviour is different in Python 3.2 and 3.3. For example, an Unicode array is no more equal to its memoryview:

Python 3.3.0b1 (default:aaa68dce117e, Aug 9 2012, 22:45:00)
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import array
>>> a=array.array('u', 'abc')
>>> v=memoryview(a)
>>> a == v
False

ned$ python3
Python 3.2.3 (default, Jun 8 2012, 05:40:07)
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import array
>>> a=array.array('u', 'abc')
>>> v=memoryview(a)
>>> a == v
True

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 11:29 PM

Post #8 of 66 (770 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

haypo: thanks for stating the issue.

ISTM that this classifies as an "obscure" bug: you have to use memoryviews, and you need to compare them for equality, and the comparison needs to be "non-trivial", where "trivial" is defined
by "both are 1D byte arrays".

While this is a bug, I think it still can be fixed in a bug fix
release of 3.3, so un-blocking.

I also think that as a first step, a specification needs to be
drafted defining when exactly a memory view should compare equal
with some other object. I can easily provide a specification that
makes the current implementation "correct".

----------
priority: release blocker -> high

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 12:48 AM

Post #9 of 66 (772 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Stefan Krah added the comment:

> I can easily provide a specification that makes the current implementation "correct"

Yes, the current specification is: memoryview only attempts to
compare arrays with known (single character native) formats and
returns "not equal" otherwise.

The problem is that for backwards compatibility memoryview accepts
arrays with arbitrary format strings. In operations like tolist()
it's possible to raise NotImplemented, but for equality comparisons
that's not possible.


Note that in 3.2 memoryview would return "equal" for arrays that
simply aren't equal, if those arrays happen to have the same bit
pattern.


One way to deal with this is to demand a strict canonical form
of format strings for PEP-3118, see msg167687.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 1:53 AM

Post #10 of 66 (770 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

> Note that in 3.2 memoryview would return "equal" for arrays that
> simply aren't equal, if those arrays happen to have the same bit
> pattern.

This is exactly my point. If a "memoryview" has the same "memory"
as another, why are they then not rightfully considered equal?
IOW, the 3.2 behavior looks fine to me.

You apparently have a vision that equality should mean something
different for memoryviews - please explicitly state what that
view is. A mathematical definition ("two memoryviews A and B
are equal iff ...") would be appreciated.

> One way to deal with this is to demand a strict canonical form
> of format strings for PEP-3118, see msg167687.

You are talking about the solution already - I still don't know
what the problem is exactly (not that I *need* to understand
the problem, but at a minimum, the documentation should state
what the intended behavior is - better if people would also
agree that the proposed behavior is "reasonable").

For 3.3, I see two approaches: either move backwards to the
3.2 behavior and defer this change to 3.4 - this would make
it release-critical indeed. Or move forward to a yet-to-be
specified equality operation which as of now may not be
implemented correctly, treating any improvement to it
as a bug fix.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 2:53 AM

Post #11 of 66 (773 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Stefan Krah added the comment:

PEP-3118 specifies strongly typed multi-dimensional arrays. The existing
code in the 3.2 memoryview as well as numerous comments by Travis Oliphant
make it clear that these capabilities were intended from the start for
memoryview as well.

Perhaps the name "memoryview" is a misnomer, since the boundaries between
memoryview and NumPy's ndarray become blurry. In fact, the small
implementation of an ndarray in Modules/_testbuffer.c is *also* a memoryview
in some sense, since it can grab a buffer from an exporter and expose it in
the same manner as memoryview.

So what I implemented is certainly not only *my* vision. The PEP essentially
describes NumPy arrays with an interchange format to convert between NumPy
and PIL arrays.

It is perhaps unfortunate that the protocol was named "buffer" protocol,
since it is actually an advanced "array" protocol.

NumPy arrays don't care about the raw memory. It is always the logical array
that matters. For example, Fortran and C arrays have different bit patterns
in memory but compare equal, a fact that the 3.2 implementation completely
misses.

Arrays v and w are equal iff format and shape are equal and for all valid
indices allowed by shape

memcmp((char *)PyBuffer_GetPointer(v, indices),
(char *)PyBuffer_GetPointer(w, indices),
itemsize) == 0.

Equal formats of course imply equal itemsize.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 9:04 AM

Post #12 of 66 (771 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

Can you please elaborate on the specification:

1. what does it mean that the formats of v and w are equal?

2. Victor's clarification about this issue isn't about comparing two arrays, but an array with a string object. So: when is an array equal to some other (non-array) object?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 9:46 AM

Post #13 of 66 (772 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Stefan Krah added the comment:

> 1. what does it mean that the formats of v and w are equal?

I'm using array and Py_buffer interchangeably since a Py_buffer struct
actually describes a multi-dimensional array. v and w are Py_buffer
structs.

So v.format must equal w.format, where format is a format string in
struct module syntax. The topic of this issue is to determine under
what circumstances two strings in struct module syntax are considered
equal.

> 2. Victor's clarification about this issue isn't about comparing
> two arrays, but an array with a string object. So: when is an
> array equal to some other (non-array) object?

>>> a=array.array('u', 'abc')
>>> v=memoryview(a)
>>> a == v
False

memoryview can compare against any object with a getbufferproc, in this
case array.array. memoryview_richcompare() calls PyObject_GetBuffer(other)
and proceeds to compare its own internal Py_buffer v against the obtained
Py_buffer w.

In the case of v.format == w.format the fix for unknown formats is trivial:
Just allow the comparison using v.itemsize == w.itemsize.

However, the struct module format string syntax has multiple representations
for the exact same formats, which makes a general fmtcmp() function tricky
to write.

Hence my proposal to demand a strict canonical form for PEP-3118 format
strings, which would be a proper subset of struct module format strings.

Example: "1Q 1h 1h 0c" must be written as "Q2h"

The attached patch should largely implement this proposal. A canonical form
is perhaps not such a severe restriction, since format strings should usually
come from the exporting object. E.g. NumPy must translate its own internal
format to struct module syntax anyway.

Another option is to commit the patch that misses "1Q 1h 1h 0c" == "Q2h"
now and aim for a completely general fmtcmp() function later.

IMO any general fmtcmp() function should also be reasonably fast.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 10:06 AM

Post #14 of 66 (770 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Changes by Stefan Krah <stefan-usenet [at] bytereef>:


----------
stage: needs patch -> patch review

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 10:47 AM

Post #15 of 66 (774 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

> So v.format must equal w.format, where format is a format string in
> struct module syntax. The topic of this issue is to determine under
> what circumstances two strings in struct module syntax are considered
> equal.

And that is exactly my question: We don't need a patch implementing
it (yet), but a specification of what is to be implemented first.

I know when two strings are equal (regardless of their syntax):
if they have the same length, and contain the same characters in
the same order. Apparently, you have a different notion of "equal"
for strings in mind, please be explicitly what that notion is.

> memoryview can compare against any object with a getbufferproc, in this
> case array.array. memoryview_richcompare() calls PyObject_GetBuffer(other)
> and proceeds to compare its own internal Py_buffer v against the obtained
> Py_buffer w.

Can this be expressed on Python level as well? I.e. is it correct
to say: an array/buffer/memoryview A is equal to an object O iff
A is equal to memoryview(O)? Or could it be that these two equivalences
might reasonably differ?

> Hence my proposal to demand a strict canonical form for PEP-3118 format
> strings, which would be a proper subset of struct module format strings.

Can you kindly phrase this as a specification? Who is demanding what
from whom?

Proposal: two format strings are equal if their canonical forms
are equal strings. The canonical form C of a string S is created by ???

However, it appears that you may have something different in mind
where things are rejected/fail to work if the canonical form isn't
originally provided by somebody (whom?)

So another Proposal: two format strings are equal iff they are
in both in canonical form and are equal strings.

This would imply that a format string which is not in canonical
form is not equal to any other strings, not even to itself, so
this may still not be what you want. But I can't guess what it
is that you want.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 11:01 AM

Post #16 of 66 (771 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

STINNER Victor added the comment:

Can't we start with something simple (for ptyhon 3.3?), and elaborate
later? In my specific example, both object have the same format string and
the same content. So i expect that they are equal.
Le 10 août 2012 19:47, "Martin v. Löwis" <report [at] bugs> a écrit :

>
> Martin v. Löwis added the comment:
>
> > So v.format must equal w.format, where format is a format string in
> > struct module syntax. The topic of this issue is to determine under
> > what circumstances two strings in struct module syntax are considered
> > equal.
>
> And that is exactly my question: We don't need a patch implementing
> it (yet), but a specification of what is to be implemented first.
>
> I know when two strings are equal (regardless of their syntax):
> if they have the same length, and contain the same characters in
> the same order. Apparently, you have a different notion of "equal"
> for strings in mind, please be explicitly what that notion is.
>
> > memoryview can compare against any object with a getbufferproc, in this
> > case array.array. memoryview_richcompare() calls
> PyObject_GetBuffer(other)
> > and proceeds to compare its own internal Py_buffer v against the obtained
> > Py_buffer w.
>
> Can this be expressed on Python level as well? I.e. is it correct
> to say: an array/buffer/memoryview A is equal to an object O iff
> A is equal to memoryview(O)? Or could it be that these two equivalences
> might reasonably differ?
>
> > Hence my proposal to demand a strict canonical form for PEP-3118 format
> > strings, which would be a proper subset of struct module format strings.
>
> Can you kindly phrase this as a specification? Who is demanding what
> from whom?
>
> Proposal: two format strings are equal if their canonical forms
> are equal strings. The canonical form C of a string S is created by ???
>
> However, it appears that you may have something different in mind
> where things are rejected/fail to work if the canonical form isn't
> originally provided by somebody (whom?)
>
> So another Proposal: two format strings are equal iff they are
> in both in canonical form and are equal strings.
>
> This would imply that a format string which is not in canonical
> form is not equal to any other strings, not even to itself, so
> this may still not be what you want. But I can't guess what it
> is that you want.
>
> ----------
>
> _______________________________________
> Python tracker <report [at] bugs>
> <http://bugs.python.org/issue15573>
> _______________________________________
>

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 12:18 PM

Post #17 of 66 (770 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

> Can't we start with something simple (for ptyhon 3.3?), and elaborate
> later?

Sure: someone would have to make a proposal what exactly that is.
IMO, the 3.2 definition *was* simple, but apparently it was considered
too simple. So either Stefan gets to define his view of equality, or
somebody else needs to make a (specific) counter-proposal.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 1:05 PM

Post #18 of 66 (771 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Stefan Krah added the comment:

The ideal specification is:

1) Arrays v and w are equal iff format and shape are equal and for all valid
indices allowed by shape

memcmp((char *)PyBuffer_GetPointer(v, indices),
(char *)PyBuffer_GetPointer(w, indices),
itemsize) == 0.

2) Two format strings s and t are equal if canonical(s) == canonical(t).

End ideal specification.

Purely to *facilitate* the implementation of a format comparison function,
I suggested:

3) An exporter must initialize the format field of a Py_buffer structure
with canonical(s).

If *all* exporters obey 3), a format comparison function can simply
call strcmp(s, t) (after sorting out the byte order specifier).

Specifically, if x and y are equal, then:

a) x == memoryview(x) == memoryview(y) == y

If x and y are equal and exporter x does *not* obey 3), but exporter y does,
then:

b) x == memoryview(x) != memoryview(y) == y

Under rule 3) this would be the fault of exporter x.

For Python 3.3 it is also possible to state only 1) and 2), with a caveat in
the documentation that case b) might occur until the format comparison function
in memoryview implements the reductions to canonical forms.

The problem is that reductions to canonical forms might get very complicated
if #3132 is implemented.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 1:53 PM

Post #19 of 66 (771 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Stefan Krah added the comment:

Martin v. Loewis <report [at] bugs> wrote:
> Sure: someone would have to make a proposal what exactly that is.
> IMO, the 3.2 definition *was* simple, but apparently it was considered
> too simple.

It was simply broken in multiple ways. Example:

>>> from numpy import *
>>> x = array([1,2,3,4,5], dtype='B')
>>> y = array([5,4,3,2,1], dtype='B')
>>> z = y[::-1]
>>>
>>> x == z
array([ True, True, True, True, True], dtype=bool)
>>> memoryview(x) == memoryview(z)
False
Segmentation fault


I'm not even talking about the segfault here. Note that x == z, but
memoryview(x) != memoryview(z), because the logical structure is
not taken into account.

Likewise, one could construct cases where one array contains a float
NaN and the other an integer that happens to have the same bit pattern.

The arrays would not be equal, but their memoryviews would be equal.



> So either Stefan gets to define his view of equality, or
> somebody else needs to make a (specific) counter-proposal.

The view is defined by the PEP that clearly models NumPy. I'm curious what
counter-proposal will work with NumPy and PIL.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 9:16 PM

Post #20 of 66 (771 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

I find Stefan's proposed equality confusing. Why is it based on memcmp? Either it compares memory (i.e. internal representations), or it compares abstract values. If it compares abstract values, then it shouldn't be a requirement that the format strings are equal in any sense. Instead, the resulting values should be equal. So I propose this definition:

v == w iff v.shape() == w.shape() and v.tolist() == w.tolist()
if either operation fails with an exception, the objects are not equal

Of course, the implementation doesn't need to literally call tolist; instead, behaving as-if it had been called is fine. However, as time
is running out, I would actually propose this to be the implementation
in 3.3.

In addition, I would propose to support the 'u' and 'w' codes in tolist, to resolve what Victor says the actual issue is.

I'm -1 on a definition that involves equivalence of format strings.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 11, 2012, 2:42 AM

Post #21 of 66 (771 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Nick Coghlan added the comment:

Stefan's proposed definition is correct. Shapes define array types, format
strings define entry types, then the actual memory contents define the
value.

It's not "Stefan's definition of equivalence", it's what a statically typed
array *means*.

The 3.2 way is completely broken, as it considers arrays containing
completely different data as equal if the memory layout happens to be the
same by coincidence.

3.3 is currently also broken, as it considers arrays that *do* contain the
same values to be different.

Stefan's patch fixes that by checking the shape and format first, and
*then* checking the value. It's exactly the same as doing an instance check
in an __eq__ method.

The requirement for a canonical format is for sanity's sake: the general
equivalence classes are too hard to figure out.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 11, 2012, 10:31 AM

Post #22 of 66 (768 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

Nick: I still disagree. Would you agree that array.array constitutes a "statically typed array"? Yet

py> array.array('b',b'foo') == array.array('B',b'foo')
True
py> array.array('i',[1,2,3]) == array.array('L', [1,2,3])
True

So the array object (rightfully) performs comparison on abstract values, not on memory representation. In Python, a statically typed array still conceptually contains abstract values, not memory blocks (this is also what Stefan asserts for NumPy in msg167862). The static typing only restricts the values you can store in the container, and defines the internal representation on the C level (plus it typically implies a value storage, instead of a reference storage).

With your and Stefan's proposed semantics, we would get the weird case that for two array.arrays a and b, it might happen that

a == b and memoryview(a) != memoryview(b)

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 11, 2012, 10:41 AM

Post #23 of 66 (768 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

"it might *still* happen", I should say, since this problem is exactly what Victor says this issue intends to solve (for comparison of two 'u' arrays).

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 11, 2012, 11:13 AM

Post #24 of 66 (773 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Stefan Krah added the comment:

So we have two competing proposals:

1) Py_buffers are strongly typed arrays in the ML sense (e.g. array of float,
array of int).

This is probably what I'd prefer on the C level, imagine a function like
function like PyBuffer_Compare(v, w).

Backwards compatibility problem for users who were thinking in terms of
value comparisons:

>>> x = array.array('b', [127])
>>> y = array.array('B', [127])
>>> x == y
True
>>> memoryview(x) == memoryview(y)
False

2) Compare by value, like NumPy arrays do:

>>> x = numpy.array([1, 2, 3], dtype='i')
>>> y = numpy.array([1, 2, 3], dtype='f')
>>> x == y
array([True, True, True], dtype=bool)

I concede that this is probably what users want to see on the Python level.

Backwards compatibility problem for users who were using complicated
structs:

>>> from _testbuffer import *
>>> x = ndarray([(1,1), (2,2), (3,3)], shape=[3], format='hQ')
>>> x == memoryview(x)
False

Reason: While _testbuffer.ndarray already implements tolist() in full
generality, memoryview does not:

>>> x.tolist()
[(1, 1), (2, 2), (3, 3)]

>>> memoryview(x).tolist()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: memoryview: unsupported format hQ

So while I'm beginning to like Martin's proposal, the implementation is
certainly trickier and will always be quite slow for complicated format
strings.

It would be possible to keep a fast path for the primitive C types
and use the code from _testbuffer.tolist() for the slow cases.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 11, 2012, 11:22 AM

Post #25 of 66 (769 views)
Permalink
[issue15573] Support unknown formats in memoryview comparisons [In reply to]

Martin v. Löwis added the comment:

Here is a patch doing the comparison on abstract values if the formats are different.

----------
Added file: http://bugs.python.org/file26766/abstractcmp.diff

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

First page Previous page 1 2 3 Next page Last page  View All Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.