Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

How to check if any item from a list of strings is in a big string?

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


matt.dubins at sympatico

Jul 9, 2009, 6:36 PM

Post #1 of 13 (543 views)
Permalink
How to check if any item from a list of strings is in a big string?

Hi all,

For one of my projects, I came across the need to check if one of many
items from a list of strings could be found in a long string. I came
up with a pretty quick helper function to check this, but I just want
to find out if there's something a little more elegant than what I've
cooked up. The helper function follows:

def list_items_in_string(list_items, string):
for item in list_items:
if item in string:
return True
return False

So if you define a list x = ['Blah','Yadda','Hoohoo'] and a string y =
'Yip yip yippee Blah' and you run list_items_in_string(x, y), it
should return True.

Any ideas how to make that function look nicer? :)

Matt Dubins
--
http://mail.python.org/mailman/listinfo/python-list


clp2 at rebertia

Jul 9, 2009, 6:43 PM

Post #2 of 13 (534 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

On Thu, Jul 9, 2009 at 6:36 PM, inkhorn<matt.dubins [at] sympatico> wrote:
> Hi all,
>
> For one of my projects, I came across the need to check if one of many
> items from a list of strings could be found in a long string.  I came
> up with a pretty quick helper function to check this, but I just want
> to find out if there's something a little more elegant than what I've
> cooked up.  The helper function follows:
>
> def list_items_in_string(list_items, string):
>    for item in list_items:
>        if item in string:
>            return True
>    return False
>
> So if you define a list x = ['Blah','Yadda','Hoohoo'] and a string y =
> 'Yip yip yippee Blah' and you run list_items_in_string(x, y), it
> should return True.
>
> Any ideas how to make that function look nicer? :)

any(substr in y for substr in x)

Note that any() was added in Python 2.5

Cheers,
Chris
--
http://blog.rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list


nobody at nowhere

Jul 9, 2009, 7:53 PM

Post #3 of 13 (535 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

On Thu, 09 Jul 2009 18:36:05 -0700, inkhorn wrote:

> For one of my projects, I came across the need to check if one of many
> items from a list of strings could be found in a long string.

If you need to match many strings or very long strings against the same
list of items, the following should (theoretically) be optimal:

r = re.compile('|'.join(map(re.escape,list_items)))
...
result = r.search(string)

--
http://mail.python.org/mailman/listinfo/python-list


steven at REMOVE

Jul 9, 2009, 8:07 PM

Post #4 of 13 (535 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

On Thu, 09 Jul 2009 18:36:05 -0700, inkhorn wrote:

> def list_items_in_string(list_items, string):
> for item in list_items:
> if item in string:
> return True
> return False
...
> Any ideas how to make that function look nicer? :)

Change the names. Reverse the order of the arguments. Add a docstring.

Otherwise looks pretty nice to me. Simple, straightforward, and correct.

If you're running Python 2.5 or better, then this is even shorter (and
probably faster):

def contains(s, targets):
"""Return True if any item of targets is in string s."""
return any(target in s for target in targets)



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


sjmachin at lexicon

Jul 9, 2009, 8:07 PM

Post #5 of 13 (540 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

On Jul 10, 12:53 pm, Nobody <nob...@nowhere.com> wrote:
> On Thu, 09 Jul 2009 18:36:05 -0700, inkhorn wrote:
> > For one of my projects, I came across the need to check if one of many
> > items from a list of strings could be found in a long string.
>
> If you need to match many strings or very long strings against the same
> list of items, the following should (theoretically) be optimal:
>
>         r = re.compile('|'.join(map(re.escape,list_items)))
>         ...
>         result = r.search(string)

"theoretically optimal" happens only if the search mechanism builds a
DFA or similar out of the list of strings. AFAIK Python's re module
doesn't.

Try this:
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
--
http://mail.python.org/mailman/listinfo/python-list


http://phr.cx at NOSPAM

Jul 9, 2009, 8:24 PM

Post #6 of 13 (541 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

inkhorn <matt.dubins [at] sympatico> writes:
> def list_items_in_string(list_items, string):
> for item in list_items:
> if item in string:
> return True
> return False

You could write that as (untested):

def list_items_in_string(list_items, string):
return any(item in string for item in list_items)

but there are faster algorithms you could use if the list is large and
you want to do the test on lots of long strings, etc.
--
http://mail.python.org/mailman/listinfo/python-list


matt.dubins at sympatico

Jul 10, 2009, 2:39 PM

Post #7 of 13 (520 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

Thanks all!! I found the following to be most helpful: any(substr in
long_string for substr in list_of_strings)

This bang-for-your-buck is one of the many many reasons why I love
Python programming :)

Matt Dubins
--
http://mail.python.org/mailman/listinfo/python-list


denis-bz-gg at t-online

Jul 13, 2009, 6:11 AM

Post #8 of 13 (502 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

Matt, how many words are you looking for, in how long a string ?
Were you able to time any( substr in long_string ) against re.compile
( "|".join( list_items )) ?
(REs are my method of choice, but different inputs of course give
different times --
see google regex speed site:groups.google.com /
site:stackoverflow.com .)

cheers
-- denis


--
http://mail.python.org/mailman/listinfo/python-list


gagsl-py2 at yahoo

Jul 13, 2009, 10:06 PM

Post #9 of 13 (498 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

En Mon, 13 Jul 2009 10:11:09 -0300, denis <denis-bz-gg [at] t-online>
escribió:

> Matt, how many words are you looking for, in how long a string ?
> Were you able to time any( substr in long_string ) against re.compile
> ( "|".join( list_items )) ?

There is a known algorithm to solve specifically this problem
(Aho-Corasick), a good implementation should perform better than R.E. (and
better than the gen.expr. with the advantage of returning WHICH string
matched)
There is a C extension somewhere implementing Aho-Corasick.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


denis-bz-gg at t-online

Jul 15, 2009, 3:46 AM

Post #10 of 13 (491 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

Sure, Aho-Corasick is fast for fixed strings; but without real
numbers / a concrete goal
> > Matt, how many words are you looking for, in how long a string ?

a simple solution is good enough, satisficing. Matt asked "how to
make that function look nicer?"
but "nice" has many dimensions -- bicycles are nice for some tasks,
Ferraris for others.

Bytheway http://en.wikipedia.org/wiki/Aho-Corasick_algorithm has a
link to a Python implementation,
also http://en.wikipedia.org/wiki/Worse_is_Better is fun.

--
http://mail.python.org/mailman/listinfo/python-list


matt.dubins at sympatico

Jul 16, 2009, 6:17 AM

Post #11 of 13 (495 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

Hi all,

This was more a question of programming aesthetics for me than one of
great practical significance. I was looking to perform a certain
function on files in a directory so long as those files weren't found
in certain standard directories. In other words, I was using os.walk
() to get multiple root directory strings, and the lists of files in
each directory. The function was to be performed on those files, so
long as certain terms weren't in the root directory string.

In actuality, I could have stuck with the helper function I created,
but I'm always curious to see how well multiple lines of code can turn
into fewer lines of code in python and retain the same functional
value :)

Matt

On Jul 15, 6:46 am, denis <denis-bz...@t-online.de> wrote:
> Sure, Aho-Corasick is fast for fixedstrings; but without real
> numbers / a concrete goal
>
> > > Matt, how many words are you looking for, in how long a string ?
>
> a simple solution is good enough, satisficing.  Matt asked "how to
> make that function look nicer?"
> but "nice" has many dimensions -- bicycles are nice for some tasks,
> Ferraris for others.
>
> Bythewayhttp://en.wikipedia.org/wiki/Aho-Corasick_algorithmhas a
> link to a Python implementation,
> alsohttp://en.wikipedia.org/wiki/Worse_is_Betteris fun.

--
http://mail.python.org/mailman/listinfo/python-list


tn.pablo at gmail

Jul 16, 2009, 9:02 AM

Post #12 of 13 (484 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

On Thu, Jul 9, 2009 at 22:07, Steven
D'Aprano<steven [at] remove> wrote:
> On Thu, 09 Jul 2009 18:36:05 -0700, inkhorn wrote:
>
>> def list_items_in_string(list_items, string):
>>     for item in list_items:
>>         if item in string:
>>             return True
>>     return False
> ...
>> Any ideas how to make that function look nicer? :)
>
> Change the names. Reverse the order of the arguments. Add a docstring.
>

Why reverse the order of the arguments? Is there a design principle there?

I always make a mess out of the order of my arguments...

--
Pablo Torres N.
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Jul 17, 2009, 1:06 AM

Post #13 of 13 (493 views)
Permalink
Re: How to check if any item from a list of strings is in a big string? [In reply to]

On Thu, 16 Jul 2009 11:02:57 -0500, Pablo Torres N. wrote:

> On Thu, Jul 9, 2009 at 22:07, Steven
> D'Aprano<steven [at] remove> wrote:
>> On Thu, 09 Jul 2009 18:36:05 -0700, inkhorn wrote:
>>
>>> def list_items_in_string(list_items, string):
>>>     for item in list_items:
>>>         if item in string:
>>>             return True
>>>     return False
>> ...
>>> Any ideas how to make that function look nicer? :)
>>
>> Change the names. Reverse the order of the arguments. Add a docstring.
>>
>>
> Why reverse the order of the arguments? Is there a design principle
> there?

It's just a convention. Before strings had methods, you used the string
module, e.g.:

string.find(source, target)
=> find target in source

This became source.find(target).

In your function:

list_items_in_string(list_items, string)

"list_items" is equivalent to target, and "string" is equivalent to
source. It's conventional to write the source first.



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.