Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

PEP 0424: A method for exposing a length hint

 

 

First page Previous page 1 2 3 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


alex.gaynor at gmail

Jul 14, 2012, 3:11 PM

Post #1 of 54 (453 views)
Permalink
PEP 0424: A method for exposing a length hint

Hi all,

I've just submitted a PEP proposing making __length_hint__ a public API for
users to define and other VMs to implement:

PEP: 424
Title: A method for exposing a length hint
Version: $Revision$
Last-Modified: $Date
Author: Alex Gaynor <alex.gaynor [at] gmail>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 14-July-2012
Python-Version: 3.4

Abstract
========

CPython currently defines an ``__length_hint__`` method on several types, such
as various iterators. This method is then used by various other functions (such
as
``map``) to presize lists based on the estimated returned by
``__length_hint__``. Types can then define ``__length_hint__`` which are not
sized, and thus should not define ``__len__``, but can estimate or compute a
size (such as many iterators).

Proposal
========

This PEP proposes formally documenting ``__length_hint__`` for other
interpreter and non-standard library Python to implement.

``__length_hint__`` must return an integer, and is not required to be accurate.
It may return a value that is either larger or smaller than the actual size of
the container. It may raise a ``TypeError`` if a specific instance cannot have
its length estimated. It may not return a negative value.

Rationale
=========

Being able to pre-allocate lists based on the expected size, as estimated by
``__length_hint__``, can be a significant optimization. CPython has been
observed to run some code faster than PyPy, purely because of this optimization
being present.

Open questions
==============

There are two open questions for this PEP:

* Should ``list`` expose a kwarg in it's constructor for supplying a length
hint.
* Should a function be added either to ``builtins`` or some other module which
calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.

Copyright
=========

This document has been placed into the public domain.

..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8




Alex

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


benjamin at python

Jul 14, 2012, 4:18 PM

Post #2 of 54 (452 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

2012/7/14 Alex Gaynor <alex.gaynor [at] gmail>:
>
> Proposal
> ========
>
> This PEP proposes formally documenting ``__length_hint__`` for other
> interpreter and non-standard library Python to implement.
>
> ``__length_hint__`` must return an integer, and is not required to be accurate.
> It may return a value that is either larger or smaller than the actual size of
> the container. It may raise a ``TypeError`` if a specific instance cannot have
> its length estimated. It may not return a negative value.

And what happens if you return a negative value?

>
> Rationale
> =========
>
> Being able to pre-allocate lists based on the expected size, as estimated by
> ``__length_hint__``, can be a significant optimization. CPython has been
> observed to run some code faster than PyPy, purely because of this optimization
> being present.
>
> Open questions
> ==============
>
> There are two open questions for this PEP:
>
> * Should ``list`` expose a kwarg in it's constructor for supplying a length
> hint.
> * Should a function be added either to ``builtins`` or some other module which
> calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.

Let's try to keep this as limited as possible for a public API.


--
Regards,
Benjamin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


alex.gaynor at gmail

Jul 14, 2012, 4:21 PM

Post #3 of 54 (443 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sat, Jul 14, 2012 at 4:18 PM, Benjamin Peterson <benjamin [at] python>wrote:

> 2012/7/14 Alex Gaynor <alex.gaynor [at] gmail>:
> >
> > Proposal
> > ========
> >
> > This PEP proposes formally documenting ``__length_hint__`` for other
> > interpreter and non-standard library Python to implement.
> >
> > ``__length_hint__`` must return an integer, and is not required to be
> accurate.
> > It may return a value that is either larger or smaller than the actual
> size of
> > the container. It may raise a ``TypeError`` if a specific instance
> cannot have
> > its length estimated. It may not return a negative value.
>
> And what happens if you return a negative value?
>
>
ValueError, the same as with len.


> >
> > Rationale
> > =========
> >
> > Being able to pre-allocate lists based on the expected size, as
> estimated by
> > ``__length_hint__``, can be a significant optimization. CPython has been
> > observed to run some code faster than PyPy, purely because of this
> optimization
> > being present.
> >
> > Open questions
> > ==============
> >
> > There are two open questions for this PEP:
> >
> > * Should ``list`` expose a kwarg in it's constructor for supplying a
> length
> > hint.
> > * Should a function be added either to ``builtins`` or some other module
> which
> > calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.
>
> Let's try to keep this as limited as possible for a public API.
>
>
Sounds reasonable to me! Should we just go ahead and strip those out now?


>
> --
> Regards,
> Benjamin
>

Alex

--
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero


alexandre.zani at gmail

Jul 14, 2012, 4:28 PM

Post #4 of 54 (445 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sat, Jul 14, 2012 at 4:21 PM, Alex Gaynor <alex.gaynor [at] gmail> wrote:
>
>
> On Sat, Jul 14, 2012 at 4:18 PM, Benjamin Peterson <benjamin [at] python>
> wrote:
>>
>> 2012/7/14 Alex Gaynor <alex.gaynor [at] gmail>:
>> >
>> > Proposal
>> > ========
>> >
>> > This PEP proposes formally documenting ``__length_hint__`` for other
>> > interpreter and non-standard library Python to implement.
>> >
>> > ``__length_hint__`` must return an integer, and is not required to be
>> > accurate.
>> > It may return a value that is either larger or smaller than the actual
>> > size of
>> > the container. It may raise a ``TypeError`` if a specific instance
>> > cannot have
>> > its length estimated. It may not return a negative value.
>>
>> And what happens if you return a negative value?
>>
>
> ValueError, the same as with len.
>
>>
>> >
>> > Rationale
>> > =========
>> >
>> > Being able to pre-allocate lists based on the expected size, as
>> > estimated by
>> > ``__length_hint__``, can be a significant optimization. CPython has been
>> > observed to run some code faster than PyPy, purely because of this
>> > optimization
>> > being present.
>> >
>> > Open questions
>> > ==============
>> >
>> > There are two open questions for this PEP:
>> >
>> > * Should ``list`` expose a kwarg in it's constructor for supplying a
>> > length
>> > hint.
>> > * Should a function be added either to ``builtins`` or some other module
>> > which
>> > calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.
>>
>> Let's try to keep this as limited as possible for a public API.
>>
>
> Sounds reasonable to me! Should we just go ahead and strip those out now?

I'm +1 on not having a public API for this. Ultimately the contract
for a length hint will depend heavily upon what you need it for. Some
applications would require a length hint to be an "at least" others an
"at most" and others something else entirely. Given that the contract
here appears to be >=0, I don't think the length hint is particularly
useful to the public at large.

>
>>
>>
>> --
>> Regards,
>> Benjamin
>
>
> Alex
>
> --
> "I disapprove of what you say, but I will defend to the death your right to
> say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
> "The people's good is the highest law." -- Cicero
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/alexandre.zani%40gmail.com
>
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


benjamin at python

Jul 14, 2012, 4:37 PM

Post #5 of 54 (441 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

2012/7/14 Alex Gaynor <alex.gaynor [at] gmail>:
>
>
> On Sat, Jul 14, 2012 at 4:18 PM, Benjamin Peterson <benjamin [at] python>
> wrote:
>>
>> 2012/7/14 Alex Gaynor <alex.gaynor [at] gmail>:
>> >
>> > Proposal
>> > ========
>> >
>> > This PEP proposes formally documenting ``__length_hint__`` for other
>> > interpreter and non-standard library Python to implement.
>> >
>> > ``__length_hint__`` must return an integer, and is not required to be
>> > accurate.
>> > It may return a value that is either larger or smaller than the actual
>> > size of
>> > the container. It may raise a ``TypeError`` if a specific instance
>> > cannot have
>> > its length estimated. It may not return a negative value.
>>
>> And what happens if you return a negative value?
>>
>
> ValueError, the same as with len.

CPython will probably have to updated to not ignore it if you return "melons".

>
>>
>> >
>> > Rationale
>> > =========
>> >
>> > Being able to pre-allocate lists based on the expected size, as
>> > estimated by
>> > ``__length_hint__``, can be a significant optimization. CPython has been
>> > observed to run some code faster than PyPy, purely because of this
>> > optimization
>> > being present.
>> >
>> > Open questions
>> > ==============
>> >
>> > There are two open questions for this PEP:
>> >
>> > * Should ``list`` expose a kwarg in it's constructor for supplying a
>> > length
>> > hint.
>> > * Should a function be added either to ``builtins`` or some other module
>> > which
>> > calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.
>>
>> Let's try to keep this as limited as possible for a public API.
>>
>
> Sounds reasonable to me! Should we just go ahead and strip those out now?

Certainly.


--
Regards,
Benjamin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


tjreedy at udel

Jul 14, 2012, 6:03 PM

Post #6 of 54 (443 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On 7/14/2012 6:11 PM, Alex Gaynor wrote:
...

Various thoughts:

"This method is then used by various other functions (such +as ``map``)
to presize lists"
-- map no longer produces lists. This only makes sense in 3.x if you
mean that map can pass along the value of its inputs.

"Types can then define ``__length_hint__`` which are not
+sized, and thus should not define ``__len__``,"
is awkwardly phrased. I think you mean
"Types that are not sized and should not define __len__ can then define
__length_hint__.

What do 'sized' and 'should' mean? Some iterators know exactly how many
items they have yet to yield. The main implication of having a __len__
versus __length_hint__ methods seems to be it bool() value when empty.

If lists were to get a new keyword arg, so should the other classes
based on one internal array. I see this has been removed.

Generator functions are the nicest way to define iterators in Python.
Generator instances returned from generator functions cannot be given a
length hint. They are not directly helped. However ...

Not addressed in the PEP: do consumers of __length_hint look for it (and
__length__ before or after calling iter(input), or both? If before, then
the following should work.

class gwlh: # generator with length hint
def __init__(self, gen, len):
self.gen = gen
self.len = len
def __iter__(self):
return self.gen
def __length_hint__(self):
return len

Do transformation iterators pass through hints from inputs? Does map(f,
iterable) look for len or hint on iterable? Ditto for some itertools,
like chain (add lengths). Any guidelines in the PEP

--
Terry Jan Reedy



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Jul 14, 2012, 10:16 PM

Post #7 of 54 (437 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sun, Jul 15, 2012 at 9:18 AM, Benjamin Peterson <benjamin [at] python> wrote:
>> Open questions
>> ==============
>>
>> There are two open questions for this PEP:
>>
>> * Should ``list`` expose a kwarg in it's constructor for supplying a length
>> hint.
>> * Should a function be added either to ``builtins`` or some other module which
>> calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.
>
> Let's try to keep this as limited as possible for a public API.

Length hints are very useful for *any* container implementation,
whether those containers are in the standard library or not. Just as
we exposed operator.index when __index__ was added, we should expose
an "operator.length_hint" function with the following semantics:

def length_hint(obj):
"""Return an estimate of the number of items in obj. This is
useful for presizing containers when building from an iterable.

If the object supports len(), the result will be exact.
Otherwise, it may over or underestimate by an arbitrary amount. The
result will be an integer >= 0.
"""
try:
return len(obj)
except TypeError:
try:
get_hint = obj.__length_hint__
except AttributeError:
return 0
hint = get_hint()
if not isinstance(hint, int):
raise TypeError("Length hint must be an integer, not
%r" % type(hint))
if hint < 0:
raise ValueError("Length hint (%r) must be >= 0" % hint)
return hint

There's no reason to make pure Python container implementations
reimplement all that for themselves.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


alex.gaynor at gmail

Jul 14, 2012, 10:20 PM

Post #8 of 54 (438 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sat, Jul 14, 2012 at 10:16 PM, Nick Coghlan <ncoghlan [at] gmail> wrote:

> On Sun, Jul 15, 2012 at 9:18 AM, Benjamin Peterson <benjamin [at] python>
> wrote:
> >> Open questions
> >> ==============
> >>
> >> There are two open questions for this PEP:
> >>
> >> * Should ``list`` expose a kwarg in it's constructor for supplying a
> length
> >> hint.
> >> * Should a function be added either to ``builtins`` or some other
> module which
> >> calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.
> >
> > Let's try to keep this as limited as possible for a public API.
>
> Length hints are very useful for *any* container implementation,
> whether those containers are in the standard library or not. Just as
> we exposed operator.index when __index__ was added, we should expose
> an "operator.length_hint" function with the following semantics:
>
> def length_hint(obj):
> """Return an estimate of the number of items in obj. This is
> useful for presizing containers when building from an iterable.
>
> If the object supports len(), the result will be exact.
> Otherwise, it may over or underestimate by an arbitrary amount. The
> result will be an integer >= 0.
> """
> try:
> return len(obj)
> except TypeError:
> try:
> get_hint = obj.__length_hint__
> except AttributeError:
> return 0
> hint = get_hint()
> if not isinstance(hint, int):
> raise TypeError("Length hint must be an integer, not
> %r" % type(hint))
> if hint < 0:
> raise ValueError("Length hint (%r) must be >= 0" % hint)
> return hint
>
> There's no reason to make pure Python container implementations
> reimplement all that for themselves.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
>

Sounds reasonable to me, the only issue with your psuedocode (err... I mean
Python ;)), is that there's no way for the __lenght_hint__ to specify that
that particular instance can't have a length hint computed. e.g. imagine
some sort of lazy stream that cached itself, and only wanted to offer a
length hint if it had already been evaluated. Without an exception to
raise, it has to return whatever the magic value for length_hint is (in
your impl it appears to be 0, the current _PyObject_LengthHint method in
CPython has a required `default` parameter). The PEP proposes using
TypeError for that.

Anyways that code looks good, do you want to add it to the PEP?

Alex

--
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero


steve at pearwood

Jul 15, 2012, 1:21 AM

Post #9 of 54 (434 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Nick Coghlan wrote:
> On Sun, Jul 15, 2012 at 9:18 AM, Benjamin Peterson <benjamin [at] python> wrote:
>>> Open questions
>>> ==============
>>>
>>> There are two open questions for this PEP:
>>>
>>> * Should ``list`` expose a kwarg in it's constructor for supplying a length
>>> hint.
>>> * Should a function be added either to ``builtins`` or some other module which
>>> calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.
>> Let's try to keep this as limited as possible for a public API.
>
> Length hints are very useful for *any* container implementation,
> whether those containers are in the standard library or not. Just as
> we exposed operator.index when __index__ was added, we should expose
> an "operator.length_hint" function with the following semantics:
[...]

As given, length_hint gives no way of distinguishing between iterables and
non-iterables:

py> length_hint([])
0
py> length_hint(42)
0

nor does it give iterable objects a way to indicate that either they don't
know their length, or that they are infinite.

I suggest:

* object (and hence all other types that don't explicitly override it)
should have a __length_hint__ that raises TypeError;

* __length_hint__ should be allowed to return None to indicate "don't know"
or -1 to indicate "infinite".

Presumably anything that wishes to create a list or other sequence from an
object with a hint of -1 could then raise an exception immediately.




--
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Jul 15, 2012, 1:47 AM

Post #10 of 54 (434 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sun, Jul 15, 2012 at 6:21 PM, Steven D'Aprano <steve [at] pearwood> wrote:
> I suggest:
>
> * object (and hence all other types that don't explicitly override it)
> should have a __length_hint__ that raises TypeError;

We can keep it simpler than that just by changing the order of the checks.

> * __length_hint__ should be allowed to return None to indicate "don't know"
> or -1 to indicate "infinite".
>
> Presumably anything that wishes to create a list or other sequence from an
> object with a hint of -1 could then raise an exception immediately.

I'm not seeing the value in returning None over 0 for the don't know
case - it just makes the API harder to use. Declaring negative results
as meaning "I'm infinite" sounds reasonable, though:

def length_hint(obj):
"""Return an estimate of the number of items in obj.

This is useful for presizing containers when building from an iterable.

If the object supports len(), the result will be exact. Otherwise,
it may over or underestimate by an arbitrary amount.
"""
try:
get_hint = obj.__length_hint__
except AttributeError:
return len(obj)
hint = get_hint()
if not isinstance(hint, int):
msg = "Length hint must be an integer, not %r"
raise TypeError(msg % type(hint))
if hint < 0:
raise ValueError("%r is an infinite iterator" % (obj,))
return hint

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Jul 15, 2012, 2:11 AM

Post #11 of 54 (437 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Alex Gaynor, 15.07.2012 07:20:
> there's no way for the __lenght_hint__ to specify that
> that particular instance can't have a length hint computed. e.g. imagine
> some sort of lazy stream that cached itself, and only wanted to offer a
> length hint if it had already been evaluated. Without an exception to
> raise, it has to return whatever the magic value for length_hint is (in
> your impl it appears to be 0, the current _PyObject_LengthHint method in
> CPython has a required `default` parameter). The PEP proposes using
> TypeError for that.

Yes, that's a major issue. I've been planning to add a length hint to
Cython's generator expressions for a while, but the problem is really that
in most cases it is only known at runtime if the underlying iterable has a
length hint, so propagating it needs a way to say "sorry, I thought I might
know, but I don't". It would be even better if this way was efficient.
Since we're at a point of making this an official protocol, why not change
the current behaviour and return -1 (or even just 0) to explicitly state
that "we don't know"?

The problem with an exception here is that it might have been raised
accidentally inside of the __length_hint__() implementation that is being
asked. Swallowing it just because it happened to be a TypeError rather than
something else may end up covering bugs. We had a similar issue with
hasattr() in the past.

Also, it would be nice if this became a type slot rather than requiring a
dict lookup and Python function call.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Jul 15, 2012, 5:36 AM

Post #12 of 54 (435 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sun, 15 Jul 2012 18:47:38 +1000
Nick Coghlan <ncoghlan [at] gmail> wrote:
>
> > * __length_hint__ should be allowed to return None to indicate "don't know"
> > or -1 to indicate "infinite".
> >
> > Presumably anything that wishes to create a list or other sequence from an
> > object with a hint of -1 could then raise an exception immediately.
>
> I'm not seeing the value in returning None over 0 for the don't know
> case - it just makes the API harder to use.

The point is that 0 is a legitimate value for a length hint. Simple
implementations of __length_hint__ will start returning 0 as a
legitimate value and you will wrongly interpret that as "don't know",
which kinds of defeat the purpose of __length-hint__ ;)

That said, I don't think a special value for "is infinite" is useful.
Just make -1 mean "I don't know".

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Jul 15, 2012, 6:47 AM

Post #13 of 54 (432 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Antoine Pitrou wrote:

> The point is that 0 is a legitimate value for a length hint. Simple
> implementations of __length_hint__ will start returning 0 as a
> legitimate value and you will wrongly interpret that as "don't know",
> which kinds of defeat the purpose of __length-hint__ ;)

> That said, I don't think a special value for "is infinite" is useful.
> Just make -1 mean "I don't know".

You've obviously never accidentally called list on an infinite iterator *wink*

It's not the (eventual) MemoryError that is the problem. On some systems, this
can cause the PC to become unresponsive as the OS tries to free an
ever-increasing amount of memory. Been there, done that, on a production
system. I had to do a hard reboot to fix it.

I think having a hint that says "there's no way this can succeed, fail
immediately" is more useful than caring about the difference between a hint of
0 and a hint of 1.



--
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Jul 15, 2012, 7:08 AM

Post #14 of 54 (434 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Right, I agree on the value in being able to return something to say "this
cannot be converted to a concrete container".

I still haven't seen a use case where the appropriate response to "I don't
know" differs from the appropriate response to a hint of zero - that is,
you don't preallocate, you just start iterating.

Cheers,
Nick.

--
Sent from my phone, thus the relative brevity :)


mark at hotpy

Jul 15, 2012, 7:14 AM

Post #15 of 54 (429 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Alex Gaynor wrote:
> Hi all,
>
> I've just submitted a PEP proposing making __length_hint__ a public API for
> users to define and other VMs to implement:

These seems back-to-front.
__length_hint__ is *used* by the VM, not provided by it.
It should be part of the object model, rather than the API.

>
> PEP: 424
> Title: A method for exposing a length hint
> Version: $Revision$
> Last-Modified: $Date
> Author: Alex Gaynor <alex.gaynor [at] gmail>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 14-July-2012
> Python-Version: 3.4
>
> Abstract
> ========
>
> CPython currently defines an ``__length_hint__`` method on several types, such
> as various iterators. This method is then used by various other functions (such
> as
> ``map``) to presize lists based on the estimated returned by

Don't use "map" as an example.
map returns an iterator so it doesn't need __length_hint__

> ``__length_hint__``. Types can then define ``__length_hint__`` which are not
> sized, and thus should not define ``__len__``, but can estimate or compute a
> size (such as many iterators).
>
> Proposal
> ========
>
> This PEP proposes formally documenting ``__length_hint__`` for other
> interpreter and non-standard library Python to implement.
>
> ``__length_hint__`` must return an integer, and is not required to be accurate.
> It may return a value that is either larger or smaller than the actual size of
> the container. It may raise a ``TypeError`` if a specific instance cannot have
> its length estimated. It may not return a negative value.

Rather than raising a TypeError, why not return NotImplemented?

>
> Rationale
> =========
>
> Being able to pre-allocate lists based on the expected size, as estimated by
> ``__length_hint__``, can be a significant optimization. CPython has been
> observed to run some code faster than PyPy, purely because of this optimization
> being present.
>
> Open questions
> ==============
>
> There are two open questions for this PEP:
>
> * Should ``list`` expose a kwarg in it's constructor for supplying a length
> hint.
> * Should a function be added either to ``builtins`` or some other module which
> calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``.
>
> Copyright
> =========
>
> This document has been placed into the public domain.
>
> ..
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
>
>
>
>
> Alex
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Jul 15, 2012, 7:22 AM

Post #16 of 54 (432 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Mon, 16 Jul 2012 00:08:41 +1000
Nick Coghlan <ncoghlan [at] gmail> wrote:
> Right, I agree on the value in being able to return something to say "this
> cannot be converted to a concrete container".

Who would be able to return that, apart from trivial cases like
itertools.cycle()?

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Jul 15, 2012, 7:33 AM

Post #17 of 54 (436 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Am 15.07.2012 16:22, schrieb Antoine Pitrou:
> On Mon, 16 Jul 2012 00:08:41 +1000
> Nick Coghlan <ncoghlan [at] gmail> wrote:
>> Right, I agree on the value in being able to return something to say "this
>> cannot be converted to a concrete container".
>
> Who would be able to return that, apart from trivial cases like
> itertools.cycle()?

For example most numerical sequence iterators like Fibonacci generator,
prime number sequence generator and even trivial cases like even natural
number generator. IMO it's a good idea to have a notation for infinitive
iterators that can't be materialized as finite containers.

+1

Christian

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mark at hotpy

Jul 15, 2012, 7:39 AM

Post #18 of 54 (430 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Nick Coghlan wrote:
> Right, I agree on the value in being able to return something to say
> "this cannot be converted to a concrete container".
>
> I still haven't seen a use case where the appropriate response to "I
> don't know" differs from the appropriate response to a hint of zero -
> that is, you don't preallocate, you just start iterating.
>

There seem to be 5 possible classes values of __length_hint__ that an
iterator object can provide:

1. Don't implement it at all.

2. Implement __length_hint__() but don't want to return any value.
Either raise an exception (TypeError) -- As suggested in the PEP.
or return NotImplemented -- my preferred option.

3. Return a "don't know" value:
Returning 0 would be fine for this, but the VM might want to respond
differently to "don't know" and 0.
__length_hint__() == 0 container should be minimum size.
__length_hint__() == "unknown" container starts at default size.

4. Infinite iterator:
Could return float('inf'), but given this is a "hint" then
returning sys.maxsize or sys.maxsize + 1 might be OK.
Alternatively raise an OverflowError

5. A meaningful length. No problem :)

Also, what are the allowable return types?

1. int only
2. Any number (ie any type with a __int__() method)?
3. Or any integer-like object (ie a type with a __index__() method)?

My suggestion:

a) Don't want to return any value or "don't know": return NotImplemented
b) For infinite iterators: raise an OverflowError
c) All other cases: return an int or a type with a __index__() method.

Cheers,
Mark.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Jul 15, 2012, 7:47 AM

Post #19 of 54 (432 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sun, Jul 15, 2012 at 10:39 AM, Mark Shannon <mark [at] hotpy> wrote:

> Nick Coghlan wrote:
>
>> Right, I agree on the value in being able to return something to say
>> "this cannot be converted to a concrete container".
>>
>> I still haven't seen a use case where the appropriate response to "I
>> don't know" differs from the appropriate response to a hint of zero - that
>> is, you don't preallocate, you just start iterating.
>>
>>
> There seem to be 5 possible classes values of __length_hint__ that an
> iterator object can provide:
>
> 1. Don't implement it at all.
>
> 2. Implement __length_hint__() but don't want to return any value.
> Either raise an exception (TypeError) -- As suggested in the PEP.
> or return NotImplemented -- my preferred option.
>
> 3. Return a "don't know" value:
> Returning 0 would be fine for this, but the VM might want to respond
> differently to "don't know" and 0.
> __length_hint__() == 0 container should be minimum
> size.
> __length_hint__() == "unknown" container starts at default
> size.


> 4. Infinite iterator:
> Could return float('inf'), but given this is a "hint" then
> returning sys.maxsize or sys.maxsize + 1 might be OK.
> Alternatively raise an OverflowError
>

I am really having a hard time differentiating infinity with "I don't know"
since they are both accurate from the point of view of __length_hint__ and
its typical purpose of allocation. You have no clue how many values will be
grabbed from an infinite iterator, so it's the same as just not knowing
upfront how long the iterator will be, infinite or not, and thus not worth
distinguishing.


>
> 5. A meaningful length. No problem :)
>
> Also, what are the allowable return types?
>
> 1. int only
> 2. Any number (ie any type with a __int__() method)?
> 3. Or any integer-like object (ie a type with a __index__() method)?
>
> My suggestion:
>
> a) Don't want to return any value or "don't know": return NotImplemented
> b) For infinite iterators: raise an OverflowError
> c) All other cases: return an int or a type with a __index__() method.
>

I'm fine with (a), drop (b), and for (c) use what we allow for __len__()
since, as Nick's operator.length_hint pseudo-code suggests, people will
call this as a fallback if __len__ isn't defined.

-Brett



>
> Cheers,
> Mark.
>
>
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> brett%40python.org<http://mail.python.org/mailman/options/python-dev/brett%40python.org>
>


lists at cheimes

Jul 15, 2012, 7:56 AM

Post #20 of 54 (432 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Am 15.07.2012 16:39, schrieb Mark Shannon:
> 1. Don't implement it at all.
>
> 2. Implement __length_hint__() but don't want to return any value.
> Either raise an exception (TypeError) -- As suggested in the PEP.
> or return NotImplemented -- my preferred option.

How is this different from "don't know"? What's the use case for knowing
that the object doesn't want to say anything or doesn't know its
possible length.

> 3. Return a "don't know" value:
> Returning 0 would be fine for this, but the VM might want to respond
> differently to "don't know" and 0.

How about None? It's the logical choice, simple and easy to test for in
Python and C code.

0 is a valid number for "I know that's I'll return nothing".

> 4. Infinite iterator:
> Could return float('inf'), but given this is a "hint" then
> returning sys.maxsize or sys.maxsize + 1 might be OK.
> Alternatively raise an OverflowError

Too complex, hard to remember and even harder to check for. Since a
length is always positive or zero, -1 is a good return value for infinite.

> a) Don't want to return any value or "don't know": return NotImplemented

+1

> b) For infinite iterators: raise an OverflowError

-1, I'm for -1. ;) I'm not a fan of using exception for valid and
correct return values.

> c) All other cases: return an int or a type with a __index__() method.

+1

Christian

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Jul 15, 2012, 8:06 AM

Post #21 of 54 (432 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sun, 15 Jul 2012 16:33:23 +0200
Christian Heimes <lists [at] cheimes> wrote:
> Am 15.07.2012 16:22, schrieb Antoine Pitrou:
> > On Mon, 16 Jul 2012 00:08:41 +1000
> > Nick Coghlan <ncoghlan [at] gmail> wrote:
> >> Right, I agree on the value in being able to return something to say "this
> >> cannot be converted to a concrete container".
> >
> > Who would be able to return that, apart from trivial cases like
> > itertools.cycle()?
>
> For example most numerical sequence iterators like Fibonacci generator,
> prime number sequence generator and even trivial cases like even natural
> number generator.

First, you can't implement __length_hint__ for a generator, which is the
preferred (the most practical) way of writing iterators in pure Python.

Second, not all iterators will implement __length_hint__ (because it's
optional and, really, of rather little use). So, as a user, you cannot
hope that `list(some_iterator)` will always raise instead of filling
your memory with an infinite stream of values: you have to be careful
anyway.

Even if __length_hint__ is implemented, its result may be wrong.
That's the whole point: it's a *hint*; an iterator might tell you it's
finite while it's infinite, or the reverse.


My conclusion is that an infinite iterator is a documentation issue.
Just tell the user that it doesn't stop, and let them shoot themselves
in the foot in they want to.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mark at hotpy

Jul 15, 2012, 8:08 AM

Post #22 of 54 (432 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Brett Cannon wrote:
>
>
> On Sun, Jul 15, 2012 at 10:39 AM, Mark Shannon <mark [at] hotpy
> <mailto:mark [at] hotpy>> wrote:
>
> Nick Coghlan wrote:
>
> Right, I agree on the value in being able to return something to
> say "this cannot be converted to a concrete container".
>
> I still haven't seen a use case where the appropriate response
> to "I don't know" differs from the appropriate response to a
> hint of zero - that is, you don't preallocate, you just start
> iterating.
>
>
> There seem to be 5 possible classes values of __length_hint__ that an
> iterator object can provide:
>
> 1. Don't implement it at all.
>
> 2. Implement __length_hint__() but don't want to return any value.
> Either raise an exception (TypeError) -- As suggested in the PEP.
> or return NotImplemented -- my preferred option.
>
> 3. Return a "don't know" value:
> Returning 0 would be fine for this, but the VM might want to respond
> differently to "don't know" and 0.
> __length_hint__() == 0 container should be
> minimum size.
> __length_hint__() == "unknown" container starts at
> default size.
>
>
> 4. Infinite iterator:
> Could return float('inf'), but given this is a "hint" then
> returning sys.maxsize or sys.maxsize + 1 might be OK.
> Alternatively raise an OverflowError
>
>
> I am really having a hard time differentiating infinity with "I don't
> know" since they are both accurate from the point of view of
> __length_hint__ and its typical purpose of allocation. You have no clue
> how many values will be grabbed from an infinite iterator, so it's the
> same as just not knowing upfront how long the iterator will be, infinite
> or not, and thus not worth distinguishing.
>
>
>
> 5. A meaningful length. No problem :)
>
> Also, what are the allowable return types?
>
> 1. int only
> 2. Any number (ie any type with a __int__() method)?
> 3. Or any integer-like object (ie a type with a __index__() method)?
>
> My suggestion:
>
> a) Don't want to return any value or "don't know": return NotImplemented
> b) For infinite iterators: raise an OverflowError
> c) All other cases: return an int or a type with a __index__() method.
>
>
> I'm fine with (a), drop (b), and for (c) use what we allow for __len__()
> since, as Nick's operator.length_hint pseudo-code suggests, people will
> call this as a fallback if __len__ isn't defined.

So how does an iterator express infinite length?

What should happen if I am silly enough to do this:
>>> list(itertools.count())

This will fail; it should fail quickly.

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Jul 15, 2012, 8:14 AM

Post #23 of 54 (432 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sun, 15 Jul 2012 16:08:00 +0100
Mark Shannon <mark [at] hotpy> wrote:
>
> What should happen if I am silly enough to do this:
> >>> list(itertools.count())
>
> This will fail; it should fail quickly.

Why should it? AFAIK it's not a common complaint. You said it yourself:
it's a silly thing to do.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


alexandre.zani at gmail

Jul 15, 2012, 8:38 AM

Post #24 of 54 (421 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

On Sun, Jul 15, 2012 at 8:08 AM, Mark Shannon <mark [at] hotpy> wrote:
> Brett Cannon wrote:
>
>>
>>
>> On Sun, Jul 15, 2012 at 10:39 AM, Mark Shannon <mark [at] hotpy
>> <mailto:mark [at] hotpy>> wrote:
>>
>> Nick Coghlan wrote:
>>
>> Right, I agree on the value in being able to return something to
>> say "this cannot be converted to a concrete container".
>>
>> I still haven't seen a use case where the appropriate response
>> to "I don't know" differs from the appropriate response to a
>> hint of zero - that is, you don't preallocate, you just start
>> iterating.
>>
>>
>> There seem to be 5 possible classes values of __length_hint__ that an
>> iterator object can provide:
>>
>> 1. Don't implement it at all.
>>
>> 2. Implement __length_hint__() but don't want to return any value.
>> Either raise an exception (TypeError) -- As suggested in the PEP.
>> or return NotImplemented -- my preferred option.
>>
>> 3. Return a "don't know" value:
>> Returning 0 would be fine for this, but the VM might want to
>> respond
>> differently to "don't know" and 0.
>> __length_hint__() == 0 container should be
>> minimum size.
>> __length_hint__() == "unknown" container starts at
>> default size.
>>
>>
>> 4. Infinite iterator:
>> Could return float('inf'), but given this is a "hint" then
>> returning sys.maxsize or sys.maxsize + 1 might be OK.
>> Alternatively raise an OverflowError
>>
>>
>> I am really having a hard time differentiating infinity with "I don't
>> know" since they are both accurate from the point of view of __length_hint__
>> and its typical purpose of allocation. You have no clue how many values will
>> be grabbed from an infinite iterator, so it's the same as just not knowing
>> upfront how long the iterator will be, infinite or not, and thus not worth
>> distinguishing.
>>
>>
>> 5. A meaningful length. No problem :)
>>
>> Also, what are the allowable return types?
>>
>> 1. int only
>> 2. Any number (ie any type with a __int__() method)?
>> 3. Or any integer-like object (ie a type with a __index__() method)?
>>
>> My suggestion:
>>
>> a) Don't want to return any value or "don't know": return
>> NotImplemented
>> b) For infinite iterators: raise an OverflowError
>> c) All other cases: return an int or a type with a __index__() method.
>>
>>
>> I'm fine with (a), drop (b), and for (c) use what we allow for __len__()
>> since, as Nick's operator.length_hint pseudo-code suggests, people will call
>> this as a fallback if __len__ isn't defined.
>
>
> So how does an iterator express infinite length?
>
> What should happen if I am silly enough to do this:
>>>> list(itertools.count())
>
> This will fail; it should fail quickly.
>
>
> Cheers,
> Mark.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/alexandre.zani%40gmail.com

The PEP so far says: "It may raise a ``TypeError`` if a specific
instance cannot have
its length estimated." In many ways, "I don't know" is the same as
this "specific instance cannot have its length estimated". Why not
just raise a TypeError?

Also, regarding the code Nick posted above, I'm a little concerned
about calling len as the first thing to try. That means that if I
implement both __len__ and __len_hint__ (perhaps because __len__ is
very expensive) __len_hint__ will never be used. It's relatively easy
to say:

try:
hint = len_hint(l)
except TypeError:
hint = len(l)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Jul 15, 2012, 8:55 AM

Post #25 of 54 (419 views)
Permalink
Re: PEP 0424: A method for exposing a length hint [In reply to]

Mark Shannon wrote:

> So how does an iterator express infinite length?

The suggestion was it should return -1.


> What should happen if I am silly enough to do this:
> >>> list(itertools.count())
>
> This will fail; it should fail quickly.

That depends on your OS. I've just tested it now on Linux Mint, and the Python
process was terminated within seconds.

I've also inadvertently done it on a Fedora system, which became completely
unresponsive to user-input (including ctrl-alt-delete) within a few minutes. I
let it run overnight (16 hours) before literally pulling the plug.

(I expect the difference in behaviour is due to the default ulimit under
Debian/Mint and RedHat/Fedora systems.)

Ignoring OS-specific features, the promise[1] of the language is that list
will try to allocate enough space for every item yielded by the iterator, or
fail with a MemoryError. No promise is made as to how long that will take: it
could take hours, or days, depending on how badly memory allocation
performance drops when faced with unreasonably large requests. You can't
expect it to fail either quickly or with an exception.

With a length hint, we could strengthen that promise:

"if __length_hint__ returns a negative number, list, tuple and set will fail
immediately with MemoryError"

which I think is a good safety feature for some things which cannot possibly
succeed, but risk DOSing your system. Does it prevent every possible failure
mode? No, of course not. But just because you can't prevent *every* problem
doesn't mean you should prevent the ones which you can.


[1] I think. I'm sure I read this somewhere in the docs, but I can't find it now.


--
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 3 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.