Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Sequence splitting

 

 

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded


lie.1296 at gmail

Jul 3, 2009, 6:10 AM

Post #26 of 35 (374 views)
Permalink
Re: Sequence splitting [In reply to]

tsangpo wrote:
> Just a shorter implementation:
>
> from itertools import groupby
> def split(lst, func):
> gs = groupby(lst, func)
> return list(gs[True]), list(gs[False])
>

As you're replying to my post, I assume you meant a shorter
implementation my function. But it doesn't do the same thing. The idea
with my group() is similar to what Steven D'Aprano is describing in
another branch of this thread (i.e. splitting not only based on True and
False, but arbitrary groupings, e.g. 'tru', 'flash' or perhaps -1, 0, 1).

For example:
>>> def group(seq, func=bool):
... ret = {}
... for item in seq:
... fitem = func(item)
... try:
... ret[fitem].append(item)
... except KeyError:
... ret[fitem] = [item]
... return ret
...
>>> group(range(10), lambda x: x%3)
{0: [0, 3, 6, 9], 1: [1, 4, 7], 2: [2, 5, 8]}
>>> # the next one is the OP's split
>>> group(['true', '', [], [''], 'false'], bool)
{False: ['', []], True: ['true', [''], 'false']}
--
http://mail.python.org/mailman/listinfo/python-list


schickb at gmail

Jul 3, 2009, 10:03 AM

Post #27 of 35 (371 views)
Permalink
Re: Sequence splitting [In reply to]

On Jul 3, 12:57 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> I've never needed such a split function, and I don't like the name, and
> the functionality isn't general enough. I'd prefer something which splits
> the input sequence into as many sublists as necessary, according to the
> output of the key function.

That's not a bad idea, I'll have to experiment with the alternatives.
My thought process for this, however, was that filter itself already
splits the sequence and it would have been more useful had it not
thrown away "half" of what it discovers. It could have been written to
returned two sequences with very litter perf hit for all but very
large input sequences, and been useful in more situations. What I
*really* wanted was a way to make filter itself more useful, since it
seems a bit silly to have two very similar functions.

Maybe this would be difficult to get into the core, but how about this
idea: Rename the current filter function to something like "split" or
"partition" (which I agree may be a better name) and modify it to
return the desired true and false sequences. Then recreate the
existing "filter" function with a wrapper that throws away the false
sequence.

Here are two simplified re-creations of situations where I could have
used partition (aka split):

words = ['this', 'is', 'a', 'bunch', 'of', 'words']
short, long = partition(words, lambda w: len(w) < 3)

d = {1 : 'w', 2 : 'x' ,3 : 'y' ,4 : 'z'}
keys = [1, 3, 4, 9]
found, missing = partition(keys, d.has_key)


There are probably a dozen other approaches, but the existing "filter"
is fast, clear, and *almost* good enough. So when is this useful in
general: Whenever filter itself is useful, but you want to use both
sides of the partitioning work it already does.


-Brad
--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 3, 2009, 10:14 AM

Post #28 of 35 (371 views)
Permalink
Re: Sequence splitting [In reply to]

Brad <schickb [at] gmail> writes:
> Maybe this would be difficult to get into the core, but how about this
> idea: Rename the current filter function to something like "split" or
> "partition" (which I agree may be a better name) and modify it to
> return the desired true and false sequences. Then recreate the
> existing "filter" function with a wrapper that throws away the false
> sequence.

This isn't so attractive, since filter takes a sequence input but
returns a list. So if I have an iterator that produces a billion
elements of which I expect three to satisfy some predicate, then
xs = filter(func, seq)
as currently implemented will build a 3-element list and return it.
Under your suggestion, it would also build and throw away an (almost)
billion element list.
--
http://mail.python.org/mailman/listinfo/python-list


lie.1296 at gmail

Jul 3, 2009, 10:27 AM

Post #29 of 35 (371 views)
Permalink
Re: Sequence splitting [In reply to]

Brad wrote:
> On Jul 3, 12:57 am, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> I've never needed such a split function, and I don't like the name, and
>> the functionality isn't general enough. I'd prefer something which splits
>> the input sequence into as many sublists as necessary, according to the
>> output of the key function.
>
> That's not a bad idea, I'll have to experiment with the alternatives.
> My thought process for this, however, was that filter itself already
> splits the sequence and it would have been more useful had it not
> thrown away "half" of what it discovers. It could have been written to
> returned two sequences with very litter perf hit for all but very
> large input sequences, and been useful in more situations. What I
> *really* wanted was a way to make filter itself more useful, since it
> seems a bit silly to have two very similar functions.

It's not that easy. filter is nearly by definition a lazy function. The
various split/group functions is impossible to be made an efficient
iterator, since they must traverse the whole list before being sure that
they have finished the group/part of the split (or even to be sure that
they have all the group keys).

> Maybe this would be difficult to get into the core, but how about this
> idea: Rename the current filter function to something like "split" or
> "partition" (which I agree may be a better name) and modify it to
> return the desired true and false sequences. Then recreate the
> existing "filter" function with a wrapper that throws away the false
> sequence.

filter has a very deep root in functional programming, I doubt it will
change any time soon. Also, consider infinite iterators. With the
"filter as wrapper to partition" scheme, it is impossible for
split/group/partition to use infinite iterator.
--
http://mail.python.org/mailman/listinfo/python-list


Scott.Daniels at Acm

Jul 3, 2009, 10:47 AM

Post #30 of 35 (371 views)
Permalink
Re: Sequence splitting [In reply to]

Steven D'Aprano wrote:
> I've never needed such a split function, and I don't like the name, and
> the functionality isn't general enough. I'd prefer something which splits
> the input sequence into as many sublists as necessary, according to the
> output of the key function. Something like itertools.groupby(), except it
> runs through the entire sequence and collates all the elements with
> identical keys.
>
> splitby(range(10), lambda n: n%3)
> => [ (0, [0, 3, 6, 9]),
> (1, [1, 4, 7]),
> (2, [2, 5, 8]) ]
>
> Your split() would be nearly equivalent to this with a key function that
> returns a Boolean.

Well, here is my go at doing the original with iterators:

def splitter(source, test=bool):
a, b = itertools.tee((x, test(x)) for x in source)
return (data for data, decision in a if decision), (
data for data, decision in b if not decision)

This has the advantage that it can operate on infinite lists. For
something like splitby for grouping, I seem to need to know the cases
up front:

def _make_gen(particular, src):
return (x for x, c in src if c == particular)

def splitby(source, cases, case):
'''Produce a dict of generators for case(el) for el in source'''
decided = itertools.tee(((x, case(x)) for x in source), len(cases))
return dict((c, _make_gen(c, src))
for c, src in zip(cases, decided))

example:

def classify(n):
'''Least prime factor of a few'''
for prime in [2, 3, 5, 7]:
if n % prime == 0:
return prime
return 0

for k,g in splitby(range(50), (2, 3, 5, 7, 0), classify).items():
print('%s: %s' % (k, list(g)))

0: [.1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
2: [.0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
3: [3, 9, 15, 21, 27, 33, 39, 45]
5: [5, 25, 35]
7: [7, 49]

--Scott David Daniels
Scott.Daniels [at] Acm
--
http://mail.python.org/mailman/listinfo/python-list


mwilson at the-wire

Jul 3, 2009, 12:08 PM

Post #31 of 35 (374 views)
Permalink
Re: Sequence splitting [In reply to]

Steven D'Aprano wrote:

> The most important difference between my suggestion and that of the OP is
> that he limited the key function to something which returns a truth
> value, while I'm looking for something more general which can split the
> input into an arbitrary number of collated sublists.

Generally along the lines of

def scatter (seq, criterion):
lists = {}
for s in seq:
lists.setdefault (criterion (s), []).append (s)
return lists

modulo a defaultdict or some such ?

Mel.
>
>
>


--
http://mail.python.org/mailman/listinfo/python-list


tjreedy at udel

Jul 3, 2009, 12:23 PM

Post #32 of 35 (363 views)
Permalink
Re: Sequence splitting [In reply to]

Brad wrote:
> On Jul 2, 8:17 pm, "Pablo Torres N." <tn.pa...@gmail.com> wrote:
>> This sounds like it belongs to the python-ideas list. I suggest
>> posting there for better feedback, since the core developers check
>> that list more often than this one.
>
> I tried posting on python-ideas and received a "You are not allowed to
> post to this mailing list" reply. Perhaps because I am posting through
> Google groups? Or maybe one must be an approved member to post?

Spammers post thru Google groups

Either subscribe or access via news.gmane.org as gmane.comp.python.ideas.

As to your main question: this was discuss some years ago. There did not
seem enough use cases for 'bifilter' or 'distributor' to warrent
inclusion in the stdlib. Easy enough to do on one's own for those few
cases. Just filter twice.

Usually, one wants to do one thing or another to each item with the
original scan

for item in collection:
if pred(item):
proc_true(item)
else:
proc_false(item)

Having proc_true and proc_false both be append(item) is a special case.

Note that filter has been changed from returning a list to returning an
iterator. So a proposal for a function to return two lists would seem
backwards. A function that returns two iterators would be possible.
Something like itertools.tee except that each item would go on one tee
or the other instead of both. It would have two queues (deques) instead
of one. And it should be lazy -- items should only be pulled from the
original iterator when an item is requested from an empty queue.

I am not sure it is worth it. In the worst case, all items are say
'True' and you request 'False' items and all items get copied to the
True queue before the False iterator raised StopIteration.

The way to avoid calling keyfunc(item) twice on each item is to make a
new list first: newlist = list(map(keyfunc,iterable)). Then scan newlist
twice.

If you write such a function, first in Python, then C, you can register
code on PyPI and see how much interest it gets.

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


tjreedy at udel

Jul 3, 2009, 12:27 PM

Post #33 of 35 (365 views)
Permalink
Re: Sequence splitting [In reply to]

Paul Rubin wrote:
> Brad <schickb [at] gmail> writes:
>> Maybe this would be difficult to get into the core, but how about this
>> idea: Rename the current filter function to something like "split" or
>> "partition" (which I agree may be a better name) and modify it to
>> return the desired true and false sequences. Then recreate the
>> existing "filter" function with a wrapper that throws away the false
>> sequence.
>
> This isn't so attractive, since filter takes a sequence input but
> returns a list.

In modern Python (py3), it returns an iterator.

--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 3, 2009, 12:41 PM

Post #34 of 35 (367 views)
Permalink
Re: Sequence splitting [In reply to]

Terry Reedy <tjreedy [at] udel> writes:
> > This isn't so attractive, since filter takes a sequence input but
> > returns a list.
>
> In modern Python (py3), it returns an iterator.

Still not much help in the proposed version that would return two
iterators.
--
http://mail.python.org/mailman/listinfo/python-list


python at mrabarnett

Jul 3, 2009, 1:07 PM

Post #35 of 35 (365 views)
Permalink
Re: Sequence splitting [In reply to]

Mel wrote:
> Steven D'Aprano wrote:
>
>> The most important difference between my suggestion and that of the OP is
>> that he limited the key function to something which returns a truth
>> value, while I'm looking for something more general which can split the
>> input into an arbitrary number of collated sublists.
>
> Generally along the lines of
>
> def scatter (seq, criterion):
> lists = {}
> for s in seq:
> lists.setdefault (criterion (s), []).append (s)
> return lists
>
> modulo a defaultdict or some such ?
>
Somehow this reminds me of the new Counter class, except that it's
making lists instead of counts.
--
http://mail.python.org/mailman/listinfo/python-list

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.