Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Sequence splitting

 

 

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded


schickb at gmail

Jul 2, 2009, 7:56 PM

Post #1 of 35 (778 views)
Permalink
Sequence splitting

I have fairly often found the need to split a sequence into two groups
based on a function result. Much like the existing filter function,
but returning a tuple of true, false sequences. In Python, something
like:

def split(seq, func=None):
if func is None:
func = bool
t, f = [], []
for item in seq:
if func(item):
t.append(item)
else:
f.append(item)
return (t, f)

The discussion linked to below has various approaches for doing this
now, but most traverse the sequence twice and many don't apply a
function to spit the sequence.
http://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition

Is there any interest in a C implementation of this? Seems too trivial
to write a PEP, so I'm just trying to measure interest before diving
in. This wouldn't really belong in intertool. Would it be best
implemented as a top level built-in?

-Brad
--
http://mail.python.org/mailman/listinfo/python-list


tn.pablo at gmail

Jul 2, 2009, 8:10 PM

Post #2 of 35 (744 views)
Permalink
Re: Sequence splitting [In reply to]

On Thu, Jul 2, 2009 at 21:56, schickb<schickb [at] gmail> wrote:
> I have fairly often found the need to split a sequence into two groups
> based on a function result. Much like the existing filter function,
> but returning a tuple of true, false sequences. In Python, something
> like:
>
> def split(seq, func=None):
>    if func is None:
>        func = bool
>    t, f = [], []
>    for item in seq:
>        if func(item):
>            t.append(item)
>        else:
>            f.append(item)
>    return (t, f)
>
> The discussion linked to below has various approaches for doing this
> now, but most traverse the sequence twice and many don't apply a
> function to spit the sequence.
> http://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition
>
> Is there any interest in a C implementation of this? Seems too trivial
> to write a PEP, so I'm just trying to measure interest before diving
> in. This wouldn't really belong in intertool. Would it be best
> implemented as a top level built-in?
>
> -Brad
> --
> http://mail.python.org/mailman/listinfo/python-list
>

This sounds like it belongs to the python-ideas list. I suggest
posting there for better feedback, since the core developers check
that list more often than this one.


--
Pablo Torres N.
--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 2, 2009, 8:14 PM

Post #3 of 35 (749 views)
Permalink
Re: Sequence splitting [In reply to]

schickb <schickb [at] gmail> writes:
> def split(seq, func=None):
> if func is None:
> func = bool
> t, f = [], []
> for item in seq:
> if func(item):
> t.append(item)
> else:
> f.append(item)
> return (t, f)

untested:

def split(seq, func=bool):
xs = zip(seq, itertools.imap(func, seq))
t = list(x for (x,y) in xs if y)
f = list(x for (x,y) in xs if not y)
return (t, f)
--
http://mail.python.org/mailman/listinfo/python-list


tn.pablo at gmail

Jul 2, 2009, 8:17 PM

Post #4 of 35 (746 views)
Permalink
Re: Sequence splitting [In reply to]

On Jul 2, 9:56 pm, schickb <schi...@gmail.com> wrote:
> I have fairly often found the need to split a sequence into two groups
> based on a function result. Much like the existing filter function,
> but returning a tuple of true, false sequences. In Python, something
> like:
>
> def split(seq, func=None):
>     if func is None:
>         func = bool
>     t, f = [], []
>     for item in seq:
>         if func(item):
>             t.append(item)
>         else:
>             f.append(item)
>     return (t, f)
>
> The discussion linked to below has various approaches for doing this
> now, but most traverse the sequence twice and many don't apply a
> function to spit the sequence.http://stackoverflow.com/questions/949098/python-split-a-list-based-o...
>
> Is there any interest in a C implementation of this? Seems too trivial
> to write a PEP, so I'm just trying to measure interest before diving
> in. This wouldn't really belong in intertool. Would it be best
> implemented as a top level built-in?
>
> -Brad

This sounds like it belongs to the python-ideas list. I suggest
posting there for better feedback, since the core developers check
that list more often than this one.

--
http://mail.python.org/mailman/listinfo/python-list


schickb at gmail

Jul 2, 2009, 8:55 PM

Post #5 of 35 (747 views)
Permalink
Re: Sequence splitting [In reply to]

On Jul 2, 8:14 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> schickb <schi...@gmail.com> writes:
> > def split(seq, func=None):
> >     if func is None:
> >         func = bool
> >     t, f = [], []
> >     for item in seq:
> >         if func(item):
> >             t.append(item)
> >         else:
> >             f.append(item)
> >     return (t, f)
>
> untested:
>
>    def split(seq, func=bool):
>       xs = zip(seq, itertools.imap(func, seq))
>       t = list(x for (x,y) in xs if y)
>       f = list(x for (x,y) in xs if not y)
>       return (t, f)

In my testing that is 3.5x slower than the original solution (and less
clear imo). I fixed my version to take a bool default. Either way, I'm
not really looking for additional ways to do this in Python unless
I've totally missed something. What I am considering is writing it in
C, much like filter.

-Brad
--
http://mail.python.org/mailman/listinfo/python-list


schickb at gmail

Jul 2, 2009, 8:56 PM

Post #6 of 35 (748 views)
Permalink
Re: Sequence splitting [In reply to]

On Jul 2, 8:17 pm, "Pablo Torres N." <tn.pa...@gmail.com> wrote:
> On Jul 2, 9:56 pm, schickb <schi...@gmail.com> wrote:
>
> > I have fairly often found the need to split a sequence into two groups
> > based on a function result.
>
> This sounds like it belongs to the python-ideas list.  I suggest
> posting there for better feedback, since the core developers check
> that list more often than this one.

Thanks, I didn't know about that list.
--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 2, 2009, 9:08 PM

Post #7 of 35 (746 views)
Permalink
Re: Sequence splitting [In reply to]

Brad <schickb [at] gmail> writes:

> On Jul 2, 8:14 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> > schickb <schi...@gmail.com> writes:
> > > def split(seq, func=None):
> > >     if func is None:
> > >         func = bool
> > >     t, f = [], []
> > >     for item in seq:
> > >         if func(item):
> > >             t.append(item)
> > >         else:
> > >             f.append(item)
> > >     return (t, f)
> >
> > untested:
> >
> >    def split(seq, func=bool):
> >       xs = zip(seq, itertools.imap(func, seq))
> >       t = list(x for (x,y) in xs if y)
> >       f = list(x for (x,y) in xs if not y)
> >       return (t, f)
>
> In my testing that is 3.5x slower than the original solution (and less
> clear imo). I fixed my version to take a bool default. Either way, I'm
> not really looking for additional ways to do this in Python unless
> I've totally missed something. What I am considering is writing it in
> C, much like filter.

I'm a little skeptical that the C version will help much, if it's
evaluating a python function at every list element. Here's a variant
of your version:

def split(seq, func=bool):
    t, f = [], []
ta, fa = t.append, f.append
    for item in seq:
(ta if func(item) else fa)(item)
    return (t, f)

This avoids some dict lookups and copying. I wonder if that helps
significantly.
--
http://mail.python.org/mailman/listinfo/python-list


schickb at gmail

Jul 2, 2009, 9:34 PM

Post #8 of 35 (745 views)
Permalink
Re: Sequence splitting [In reply to]

On Jul 2, 9:08 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Brad <schi...@gmail.com> writes:
> > On Jul 2, 8:14 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> > > schickb <schi...@gmail.com> writes:
> > > > def split(seq, func=None):
> > > >     if func is None:
> > > >         func = bool
> > > >     t, f = [], []
> > > >     for item in seq:
> > > >         if func(item):
> > > >             t.append(item)
> > > >         else:
> > > >             f.append(item)
> > > >     return (t, f)
>
> > > untested:
>
> > >    def split(seq, func=bool):
> > >       xs = zip(seq, itertools.imap(func, seq))
> > >       t = list(x for (x,y) in xs if y)
> > >       f = list(x for (x,y) in xs if not y)
> > >       return (t, f)
>
> > In my testing that is 3.5x slower than the original solution (and less
> > clear imo). I fixed my version to take a bool default. Either way, I'm
> > not really looking for additional ways to do this in Python unless
> > I've totally missed something. What I am considering is writing it in
> > C, much like filter.
>
> I'm a little skeptical that the C version will help much, if it's
> evaluating a python function at every list element.  

Perhaps true, but it would be a nice convenience (for me) as a built-
in written in either Python or C. Although the default case of a bool
function would surely be faster.

> Here's a variant of your version:
>
>  def split(seq, func=bool):
>      t, f = [], []
>      ta, fa = t.append, f.append
>      for item in seq:
>          (ta if func(item) else fa)(item)
>      return (t, f)
>
> This avoids some dict lookups and copying.  I wonder if that helps
> significantly.

Faster, but in tests of a few short sequences only 1% so.

-Brad
--
http://mail.python.org/mailman/listinfo/python-list


tn.pablo at gmail

Jul 2, 2009, 9:40 PM

Post #9 of 35 (746 views)
Permalink
Re: Sequence splitting [In reply to]

On Thu, Jul 2, 2009 at 23:34, Brad<schickb [at] gmail> wrote:
> On Jul 2, 9:08 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>> Brad <schi...@gmail.com> writes:
>> > On Jul 2, 8:14 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>> > > schickb <schi...@gmail.com> writes:
>> > > > def split(seq, func=None):
>> > > >     if func is None:
>> > > >         func = bool
>> > > >     t, f = [], []
>> > > >     for item in seq:
>> > > >         if func(item):
>> > > >             t.append(item)
>> > > >         else:
>> > > >             f.append(item)
>> > > >     return (t, f)
>>
>> > > untested:
>>
>> > >    def split(seq, func=bool):
>> > >       xs = zip(seq, itertools.imap(func, seq))
>> > >       t = list(x for (x,y) in xs if y)
>> > >       f = list(x for (x,y) in xs if not y)
>> > >       return (t, f)
>>
>> > In my testing that is 3.5x slower than the original solution (and less
>> > clear imo). I fixed my version to take a bool default. Either way, I'm
>> > not really looking for additional ways to do this in Python unless
>> > I've totally missed something. What I am considering is writing it in
>> > C, much like filter.
>>
>> I'm a little skeptical that the C version will help much, if it's
>> evaluating a python function at every list element.
>
> Perhaps true, but it would be a nice convenience (for me) as a built-
> in written in either Python or C. Although the default case of a bool
> function would surely be faster.
>
>> Here's a variant of your version:
>>
>>  def split(seq, func=bool):
>>      t, f = [], []
>>      ta, fa = t.append, f.append
>>      for item in seq:
>>          (ta if func(item) else fa)(item)
>>      return (t, f)
>>
>> This avoids some dict lookups and copying.  I wonder if that helps
>> significantly.
>
> Faster, but in tests of a few short sequences only 1% so.
>
> -Brad
> --
> http://mail.python.org/mailman/listinfo/python-list
>

If it is speed that we are after, it's my understanding that map and
filter are faster than iterating with the for statement (and also
faster than list comprehensions). So here is a rewrite:

def split(seq, func=bool):
t = filter(func, seq)
f = filter(lambda x: not func(x), seq)
return list(t), list(f)

The lambda thing is kinda ugly, but I can't think of anything else.
Also, is it ok to return lists? Py3k saw a lot of APIs changed to
return iterables instead of lists, so maybe my function should have
'return t, f' as it's last statement.


--
Pablo Torres N.
--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 2, 2009, 9:58 PM

Post #10 of 35 (747 views)
Permalink
Re: Sequence splitting [In reply to]

"Pablo Torres N." <tn.pablo [at] gmail> writes:
> def split(seq, func=bool):
> t = filter(func, seq)
> f = filter(lambda x: not func(x), seq)
> return list(t), list(f)

That is icky--you're calling func (which might be slow) twice instead
of once on every element of the seq.
--
http://mail.python.org/mailman/listinfo/python-list


schickb at gmail

Jul 2, 2009, 11:31 PM

Post #11 of 35 (747 views)
Permalink
Re: Sequence splitting [In reply to]

On Jul 2, 9:40 pm, "Pablo Torres N." <tn.pa...@gmail.com> wrote:
>
> If it is speed that we are after, it's my understanding that map and
> filter are faster than iterating with the for statement (and also
> faster than list comprehensions).  So here is a rewrite:
>
> def split(seq, func=bool):
>         t = filter(func, seq)
>         f = filter(lambda x: not func(x), seq)
>         return list(t), list(f)
>

In my simple tests, that takes 1.8x as long as the original solution.
Better than the itertools solution, when "func" is short and fast. I
think the solution here would worse if func was more complex.

Either way, what I am still wondering is if people would find a built-
in implementation useful?

-Brad
--
http://mail.python.org/mailman/listinfo/python-list


gagsl-py2 at yahoo

Jul 2, 2009, 11:31 PM

Post #12 of 35 (745 views)
Permalink
Re: Sequence splitting [In reply to]

En Fri, 03 Jul 2009 01:58:22 -0300, <//phr.cx [at] nospam>> escribió:

> "Pablo Torres N." <tn.pablo [at] gmail> writes:
>> def split(seq, func=bool):
>> t = filter(func, seq)
>> f = filter(lambda x: not func(x), seq)
>> return list(t), list(f)
>
> That is icky--you're calling func (which might be slow) twice instead
> of once on every element of the seq.

In addition, this doesn't work if seq is an iterator instead of a sequence.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


schickb at gmail

Jul 2, 2009, 11:34 PM

Post #13 of 35 (747 views)
Permalink
Re: Sequence splitting [In reply to]

On Jul 2, 8:17 pm, "Pablo Torres N." <tn.pa...@gmail.com> wrote:
>
> This sounds like it belongs to the python-ideas list.  I suggest
> posting there for better feedback, since the core developers check
> that list more often than this one.

I tried posting on python-ideas and received a "You are not allowed to
post to this mailing list" reply. Perhaps because I am posting through
Google groups? Or maybe one must be an approved member to post?

-Brad
--
http://mail.python.org/mailman/listinfo/python-list


ricli85 at gmail

Jul 2, 2009, 11:46 PM

Post #14 of 35 (746 views)
Permalink
Re: Sequence splitting [In reply to]

> I tried posting on python-ideas and received a "You are not allowed to
> post to this mailing list" reply. Perhaps because I am posting through
> Google groups? Or maybe one must be an approved member to post?

If you got an "awaiting moderator approval" message you post might appear on
the list soon. The reason for getting those can be that it is a member only
list and you posted from another address. I am not sure if that was the message
you got.

--
Rickard Lindberg
--
http://mail.python.org/mailman/listinfo/python-list


lie.1296 at gmail

Jul 3, 2009, 12:16 AM

Post #15 of 35 (744 views)
Permalink
Re: Sequence splitting [In reply to]

Brad wrote:
> On Jul 2, 9:40 pm, "Pablo Torres N." <tn.pa...@gmail.com> wrote:
>> If it is speed that we are after, it's my understanding that map and
>> filter are faster than iterating with the for statement (and also
>> faster than list comprehensions). So here is a rewrite:
>>
>> def split(seq, func=bool):
>> t = filter(func, seq)
>> f = filter(lambda x: not func(x), seq)
>> return list(t), list(f)
>>
>
> In my simple tests, that takes 1.8x as long as the original solution.
> Better than the itertools solution, when "func" is short and fast. I
> think the solution here would worse if func was more complex.
>
> Either way, what I am still wondering is if people would find a built-
> in implementation useful?
>
> -Brad

A built-in/itertools should always try to provide the general solution
to be as useful as possible, something like this:

def group(seq, func=bool):
ret = {}
for item in seq:
fitem = func(item)
try:
ret[fitem].append(item)
except KeyError:
ret[fitem] = [item]
return ret

definitely won't be faster, but it is a much more general solution.
Basically, the function allows you to group sequences based on the
result of func(item). It is similar to itertools.groupby() except that
this also group non-contiguous items.
--
http://mail.python.org/mailman/listinfo/python-list


lie.1296 at gmail

Jul 3, 2009, 12:31 AM

Post #16 of 35 (743 views)
Permalink
Re: Sequence splitting [In reply to]

Rickard Lindberg wrote:
>> I tried posting on python-ideas and received a "You are not allowed to
>> post to this mailing list" reply. Perhaps because I am posting through
>> Google groups? Or maybe one must be an approved member to post?
>
> If you got an "awaiting moderator approval" message you post might appear on
> the list soon. The reason for getting those can be that it is a member only
> list and you posted from another address. I am not sure if that was the message
> you got.
>

AFAIK, python-ideas is not moderated (I followed python-ideas). I've
never used Google Groups to access it though. Try subscribing to the
mailing list directly (instead of using Google Group's web-gateway)
here: http://mail.python.org/mailman/listinfo/python-ideas
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Jul 3, 2009, 12:57 AM

Post #17 of 35 (744 views)
Permalink
Re: Sequence splitting [In reply to]

On Thu, 02 Jul 2009 22:10:14 -0500, Pablo Torres N. wrote:

> This sounds like it belongs to the python-ideas list. I suggest posting
> there for better feedback, since the core developers check that list
> more often than this one.

If you post to python-ideas, you'll probably be told to gather feedback
here first. The core-developers aren't hugely interested in arbitrary new
features unless they have significant community support.

I've never needed such a split function, and I don't like the name, and
the functionality isn't general enough. I'd prefer something which splits
the input sequence into as many sublists as necessary, according to the
output of the key function. Something like itertools.groupby(), except it
runs through the entire sequence and collates all the elements with
identical keys.

E.g.:

splitby(range(10), lambda n: n%3)
=> [ (0, [0, 3, 6, 9]),
(1, [1, 4, 7]),
(2, [2, 5, 8]) ]

Your split() would be nearly equivalent to this with a key function that
returns a Boolean.



--
Steven


--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 3, 2009, 1:02 AM

Post #18 of 35 (743 views)
Permalink
Re: Sequence splitting [In reply to]

Steven D'Aprano <steve [at] REMOVE-THIS-cybersource> writes:
> I've never needed such a split function, and I don't like the name, and
> the functionality isn't general enough. I'd prefer something which splits
> the input sequence into as many sublists as necessary, according to the
> output of the key function. Something like itertools.groupby(), except it
> runs through the entire sequence and collates all the elements with
> identical keys.

No really, groupby makes iterators, not lists, and it you have to
develop quite a delicate sense of when you can use it without having
bugs caused by the different iterators it makes getting advanced at
the wrong times. The concept of a split function that actually works
on lists is useful. I'm neutral about whether it's worth having a C
version in the stdlib.
--
http://mail.python.org/mailman/listinfo/python-list


clp2 at rebertia

Jul 3, 2009, 1:12 AM

Post #19 of 35 (743 views)
Permalink
Re: Sequence splitting [In reply to]

On Thu, Jul 2, 2009 at 11:31 PM, Brad<schickb [at] gmail> wrote:
> On Jul 2, 9:40 pm, "Pablo Torres N." <tn.pa...@gmail.com> wrote:
>>
>> If it is speed that we are after, it's my understanding that map and
>> filter are faster than iterating with the for statement (and also
>> faster than list comprehensions).  So here is a rewrite:
>>
>> def split(seq, func=bool):
>>         t = filter(func, seq)
>>         f = filter(lambda x: not func(x), seq)
>>         return list(t), list(f)
>>
>
> In my simple tests, that takes 1.8x as long as the original solution.
> Better than the itertools solution, when "func" is short and fast. I
> think the solution here would worse if func was more complex.
>
> Either way, what I am still wondering is if people would find a built-
> in implementation useful?

FWIW, Ruby has Enumerable#partition, which does the same thing as
split() and has a better name IMHO.
http://www.ruby-doc.org/core/classes/Enumerable.html#M003130

Cheers,
Chris
--
http://blog.rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Jul 3, 2009, 1:26 AM

Post #20 of 35 (741 views)
Permalink
Re: Sequence splitting [In reply to]

On Fri, 03 Jul 2009 01:02:56 -0700, Paul Rubin wrote:

> Steven D'Aprano <steve [at] REMOVE-THIS-cybersource> writes:
>> I've never needed such a split function, and I don't like the name, and
>> the functionality isn't general enough. I'd prefer something which
>> splits the input sequence into as many sublists as necessary, according
>> to the output of the key function. Something like itertools.groupby(),
>> except it runs through the entire sequence and collates all the
>> elements with identical keys.
>
> No really, groupby makes iterators, not lists, and it you have to
> develop quite a delicate sense of when you can use it without having
> bugs caused by the different iterators it makes getting advanced at the
> wrong times. The concept of a split function that actually works on
> lists is useful. I'm neutral about whether it's worth having a C
> version in the stdlib.

groupby() works on lists.

The difference between what I'm suggesting and what groupby() does is
that my suggestion would collate *all* the elements with the same key,
not just runs of them. This (as far as I can tell) requires returning
lists rather than iterators.

The most important difference between my suggestion and that of the OP is
that he limited the key function to something which returns a truth
value, while I'm looking for something more general which can split the
input into an arbitrary number of collated sublists.



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 3, 2009, 1:39 AM

Post #21 of 35 (740 views)
Permalink
Re: Sequence splitting [In reply to]

Steven D'Aprano <steve [at] REMOVE-THIS-cybersource> writes:
> groupby() works on lists.

>>> a = [1,3,4,6,7]
>>> from itertools import groupby
>>> b = groupby(a, lambda x: x%2==1) # split into even and odd
>>> c = list(b)
>>> print len(c)
3
>>> d = list(c[1][1]) # should be [4,6]
>>> print d # oops.
[]

> The difference between what I'm suggesting and what groupby() does is
> that my suggestion would collate *all* the elements with the same key,
> not just runs of them. This (as far as I can tell) requires returning
> lists rather than iterators.

I guess that is reasonable.

> The most important difference between my suggestion and that of the OP is
> that he limited the key function to something which returns a truth
> value, while I'm looking for something more general which can split the
> input into an arbitrary number of collated sublists.

Also ok.
--
http://mail.python.org/mailman/listinfo/python-list


tsangpo.newsgroup at gmail

Jul 3, 2009, 2:03 AM

Post #22 of 35 (740 views)
Permalink
Re: Sequence splitting [In reply to]

Just a shorter implementation:

from itertools import groupby
def split(lst, func):
gs = groupby(lst, func)
return list(gs[True]), list(gs[False])


"Lie Ryan" <lie.1296 [at] gmail> дÈëÏûÏ¢ÐÂÎÅ:nfi3m.2341$ze1.1151 [at] news-server
> Brad wrote:
>> On Jul 2, 9:40 pm, "Pablo Torres N." <tn.pa...@gmail.com> wrote:
>>> If it is speed that we are after, it's my understanding that map and
>>> filter are faster than iterating with the for statement (and also
>>> faster than list comprehensions). So here is a rewrite:
>>>
>>> def split(seq, func=bool):
>>> t = filter(func, seq)
>>> f = filter(lambda x: not func(x), seq)
>>> return list(t), list(f)
>>>
>>
>> In my simple tests, that takes 1.8x as long as the original solution.
>> Better than the itertools solution, when "func" is short and fast. I
>> think the solution here would worse if func was more complex.
>>
>> Either way, what I am still wondering is if people would find a built-
>> in implementation useful?
>>
>> -Brad
>
> A built-in/itertools should always try to provide the general solution
> to be as useful as possible, something like this:
>
> def group(seq, func=bool):
> ret = {}
> for item in seq:
> fitem = func(item)
> try:
> ret[fitem].append(item)
> except KeyError:
> ret[fitem] = [item]
> return ret
>
> definitely won't be faster, but it is a much more general solution.
> Basically, the function allows you to group sequences based on the
> result of func(item). It is similar to itertools.groupby() except that
> this also group non-contiguous items.


steve at REMOVE-THIS-cybersource

Jul 3, 2009, 2:50 AM

Post #23 of 35 (742 views)
Permalink
Re: Sequence splitting [In reply to]

On Fri, 03 Jul 2009 01:39:27 -0700, Paul Rubin wrote:

> Steven D'Aprano <steve [at] REMOVE-THIS-cybersource> writes:
>> groupby() works on lists.
>
>>>> a = [1,3,4,6,7]
>>>> from itertools import groupby
>>>> b = groupby(a, lambda x: x%2==1) # split into even and odd
>>>> c = list(b)
>>>> print len(c)
> 3
>>>> d = list(c[1][1]) # should be [4,6] print d # oops.
> []

I didn't say it worked properly *wink*

Seriously, this behaviour caught me out too. The problem isn't that the
input data is a list, the same problem occurs for arbitrary iterators.
From the docs:

[quote]
The operation of groupby() is similar to the uniq filter in Unix. It
generates a break or new group every time the value of the key function
changes (which is why it is usually necessary to have sorted the data
using the same key function). That behavior differs from SQL’s GROUP BY
which aggregates common elements regardless of their input order.

The returned group is itself an iterator that shares the underlying
iterable with groupby(). Because the source is shared, when the groupby()
object is advanced, the previous group is no longer visible. So, if that
data is needed later, it should be stored as a list
[end quote]

http://www.python.org/doc/2.6/library/itertools.html#itertools.groupby




--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


p.f.moore at gmail

Jul 3, 2009, 4:01 AM

Post #24 of 35 (739 views)
Permalink
Re: Sequence splitting [In reply to]

2009/7/3 Brad <schickb [at] gmail>:
> Perhaps true, but it would be a nice convenience (for me) as a built-
> in written in either Python or C. Although the default case of a bool
> function would surely be faster.

The chance of getting this accepted as a builtin is essentially zero.
To be a builtin, as opposed to being in the standard library,
something has to have a very strong justification.

This suggestion may find a home in the standard library, although it's
not entirely clear where (maybe itertools, although it's not entirely
obvious that it's a fit there).

You'd have to justify this against the argument "not every 2-3 line
function needs to be built in". Personally, I'm not sure it passes
that test - sure, it's a useful function, but it's not that hard to
write when you need it. It may be better as a recipe in the cookbook.
Or if it's close enough to the spirit of the itertools, it may be
suitable as a sample in the itertools documentation (the "recipes"
section).

Paul.
--
http://mail.python.org/mailman/listinfo/python-list


tn.pablo at gmail

Jul 3, 2009, 5:34 AM

Post #25 of 35 (739 views)
Permalink
Re: Sequence splitting [In reply to]

On Fri, Jul 3, 2009 at 06:01, Paul Moore<p.f.moore [at] gmail> wrote:
> 2009/7/3 Brad <schickb [at] gmail>:
>> Perhaps true, but it would be a nice convenience (for me) as a built-
>> in written in either Python or C. Although the default case of a bool
>> function would surely be faster.
>
> The chance of getting this accepted as a builtin is essentially zero.
> To be a builtin, as opposed to being in the standard library,
> something has to have a very strong justification.

That's right. Mr. schickb, I think what you need is a few concrete
examples as to where this function would be beneficial, so it can be
judged objectively.

--
Pablo Torres N.
--
http://mail.python.org/mailman/listinfo/python-list

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.