Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Remove empty strings from list

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


helvinlui at gmail

Sep 14, 2009, 6:49 PM

Post #1 of 15 (9455 views)
Permalink
Remove empty strings from list

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
line = f.readline()
line = line.lstrip() # take away whitespace at the beginning of the
readline.
list = line.split(' ') # split the str line into a list

# the list has empty strings in it, so now,
remove these empty strings
for item in list:
if item is ' ':
print 'discard these: ',item
index = list.index(item)
del list[index] # remove this item from the list
else:
print 'keep this: ',item
The problem is, when my list is : ['44', '', '', '', '', '',
'0.000000000\n']
The output is:
len of list: 7
keep this: 44
discard these:
discard these:
discard these:
So finally the list is: ['44', '', '', '0.000000000\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?

Regards,
Helvin
--
http://mail.python.org/mailman/listinfo/python-list


clp2 at rebertia

Sep 14, 2009, 6:55 PM

Post #2 of 15 (9376 views)
Permalink
Re: Remove empty strings from list [In reply to]

On Mon, Sep 14, 2009 at 6:49 PM, Helvin <helvinlui [at] gmail> wrote:
> Hi,
>
> Sorry I did not want to bother the group, but I really do not
> understand this seeming trivial problem.
> I am reading from a textfile, where each line has 2 values, with
> spaces before and between the values.
> I would like to read in these values, but of course, I don't want the
> whitespaces between them.
> I have looked at documentation, and how strings and lists work, but I
> cannot understand the behaviour of the following:
>                        line = f.readline()
>                        line = line.lstrip() # take away whitespace at the beginning of the
> readline.
>                        list = line.split(' ') # split the str line into a list
>
>                        # the list has empty strings in it, so now,
> remove these empty strings
>                        for item in list:
>                                if item is ' ':
>                                        print 'discard these: ',item
>                                        index = list.index(item)
>                                        del list[index]         # remove this item from the list
>                                else:
>                                        print 'keep this: ',item
> The problem is, when my list is :  ['44', '', '', '', '', '',
> '0.000000000\n']
> The output is:
>    len of list:  7
>    keep this:  44
>    discard these:
>    discard these:
>    discard these:
> So finally the list is:   ['44', '', '', '0.000000000\n']
> The code above removes all the empty strings in the middle, all except
> two. My code seems to miss two of the empty strings.
>
> Would you know why this is occuring?

Block quoting from http://effbot.org/zone/python-list.htm
"""
Note that the for-in statement maintains an internal index, which is
incremented for each loop iteration. This means that if you modify the
list you’re looping over, the indexes will get out of sync, and you
may end up skipping over items, or process the same item multiple
times.
"""

Thus why your code is skipping over some elements and not removing them.
Moral: Don't modify a list while iterating over it. Use the loop to
create a separate, new list from the old one instead.

Cheers,
Chris
--
http://blog.rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list


technic.tec at gmail

Sep 14, 2009, 7:33 PM

Post #3 of 15 (9369 views)
Permalink
Re: Remove empty strings from list [In reply to]

Chris Rebert 写道:
> On Mon, Sep 14, 2009 at 6:49 PM, Helvin <helvinlui [at] gmail> wrote:
>> Hi,
>>
>> Sorry I did not want to bother the group, but I really do not
>> understand this seeming trivial problem.
>> I am reading from a textfile, where each line has 2 values, with
>> spaces before and between the values.
>> I would like to read in these values, but of course, I don't want the
>> whitespaces between them.
>> I have looked at documentation, and how strings and lists work, but I
>> cannot understand the behaviour of the following:
>> line = f.readline()
>> line = line.lstrip() # take away whitespace at the beginning of the
>> readline.
>> list = line.split(' ') # split the str line into a list
>>
>> # the list has empty strings in it, so now,
>> remove these empty strings
>> for item in list:
>> if item is ' ':
>> print 'discard these: ',item
>> index = list.index(item)
>> del list[index] # remove this item from the list
>> else:
>> print 'keep this: ',item
>> The problem is, when my list is : ['44', '', '', '', '', '',
>> '0.000000000\n']
>> The output is:
>> len of list: 7
>> keep this: 44
>> discard these:
>> discard these:
>> discard these:
>> So finally the list is: ['44', '', '', '0.000000000\n']
>> The code above removes all the empty strings in the middle, all except
>> two. My code seems to miss two of the empty strings.
>>
>> Would you know why this is occuring?
>
> Block quoting from http://effbot.org/zone/python-list.htm
> """
> Note that the for-in statement maintains an internal index, which is
> incremented for each loop iteration. This means that if you modify the
> list you’re looping over, the indexes will get out of sync, and you
> may end up skipping over items, or process the same item multiple
> times.
> """
>
> Thus why your code is skipping over some elements and not removing them.
> Moral: Don't modify a list while iterating over it. Use the loop to
> create a separate, new list from the old one instead.

or use filter
list=filter(lambda x: len(x)>0, list)

>
> Cheers,
> Chris
> --
> http://blog.rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list


technic.tec at gmail

Sep 14, 2009, 7:38 PM

Post #4 of 15 (9377 views)
Permalink
Re: Remove empty strings from list [In reply to]

Helvin 写道:
> Hi,
>
> Sorry I did not want to bother the group, but I really do not
> understand this seeming trivial problem.
> I am reading from a textfile, where each line has 2 values, with
> spaces before and between the values.
> I would like to read in these values, but of course, I don't want the
> whitespaces between them.
> I have looked at documentation, and how strings and lists work, but I
> cannot understand the behaviour of the following:
> line = f.readline()
> line = line.lstrip() # take away whitespace at the beginning of the
> readline.
> list = line.split(' ') # split the str line into a list
>
> # the list has empty strings in it, so now,
> remove these empty strings
> for item in list:
> if item is ' ':
> print 'discard these: ',item
> index = list.index(item)
> del list[index] # remove this item from the list
> else:
> print 'keep this: ',item
> The problem is, when my list is : ['44', '', '', '', '', '',
> '0.000000000\n']
> The output is:
> len of list: 7
> keep this: 44
> discard these:
> discard these:
> discard these:
> So finally the list is: ['44', '', '', '0.000000000\n']
> The code above removes all the empty strings in the middle, all except
> two. My code seems to miss two of the empty strings.
>
> Would you know why this is occuring?
>
> Regards,
> Helvin

You can use the default argument of split:
list = line.split()

From the python documentation,

"If the optional second argument sep is absent or None, the words are
separated by arbitrary strings of whitespace characters (space, tab,
newline, return, formfeed)."

So it is suitable for most cases without introduce empty strings.
--
http://mail.python.org/mailman/listinfo/python-list


helvinlui at gmail

Sep 14, 2009, 7:47 PM

Post #5 of 15 (9370 views)
Permalink
Re: Remove empty strings from list [In reply to]

Thanks Chris! Thanks for the quick reply. Indeed this is the case! I have
now written out a new list, instead of modifying the list I am iterating
over.
Logged at my blog:
http://learnwithhelvin.blogspot.com/2009/09/python-loop-and-modify-list.html

Regards,
Helvin =)

On Tue, Sep 15, 2009 at 1:55 PM, Chris Rebert <clp2 [at] rebertia> wrote:

> On Mon, Sep 14, 2009 at 6:49 PM, Helvin <helvinlui [at] gmail> wrote:
> > Hi,
> >
> > Sorry I did not want to bother the group, but I really do not
> > understand this seeming trivial problem.
> > I am reading from a textfile, where each line has 2 values, with
> > spaces before and between the values.
> > I would like to read in these values, but of course, I don't want the
> > whitespaces between them.
> > I have looked at documentation, and how strings and lists work, but I
> > cannot understand the behaviour of the following:
> > line = f.readline()
> > line = line.lstrip() # take away whitespace at the
> beginning of the
> > readline.
> > list = line.split(' ') # split the str line into a
> list
> >
> > # the list has empty strings in it, so now,
> > remove these empty strings
> > for item in list:
> > if item is ' ':
> > print 'discard these: ',item
> > index = list.index(item)
> > del list[index] # remove
> this item from the list
> > else:
> > print 'keep this: ',item
> > The problem is, when my list is : ['44', '', '', '', '', '',
> > '0.000000000\n']
> > The output is:
> > len of list: 7
> > keep this: 44
> > discard these:
> > discard these:
> > discard these:
> > So finally the list is: ['44', '', '', '0.000000000\n']
> > The code above removes all the empty strings in the middle, all except
> > two. My code seems to miss two of the empty strings.
> >
> > Would you know why this is occuring?
>
> Block quoting from http://effbot.org/zone/python-list.htm
> """
> Note that the for-in statement maintains an internal index, which is
> incremented for each loop iteration. This means that if you modify the
> list youre looping over, the indexes will get out of sync, and you
> may end up skipping over items, or process the same item multiple
> times.
> """
>
> Thus why your code is skipping over some elements and not removing them.
> Moral: Don't modify a list while iterating over it. Use the loop to
> create a separate, new list from the old one instead.
>
> Cheers,
> Chris
> --
> http://blog.rebertia.com
>



--
Helvin

"Though the world may promise me more, I'm just made to be filled with the
Lord."


davea at ieee

Sep 14, 2009, 8:06 PM

Post #6 of 15 (9368 views)
Permalink
Re: Remove empty strings from list [In reply to]

Helvin wrote:
> Hi,
>
> Sorry I did not want to bother the group, but I really do not
> understand this seeming trivial problem.
> I am reading from a textfile, where each line has 2 values, with
> spaces before and between the values.
> I would like to read in these values, but of course, I don't want the
> whitespaces between them.
> I have looked at documentation, and how strings and lists work, but I
> cannot understand the behaviour of the following:
> line = f.readline()
> line = line.lstrip() # take away whitespace at the beginning of the
> readline.
> list = line.split(' ') # split the str line into a list
>
> # the list has empty strings in it, so now,
> remove these empty strings
> for item in list:
> if item is ' ':
> print 'discard these: ',item
> index = list.index(item)
> del list[index] # remove this item from the list
> else:
> print 'keep this: ',item
> The problem is, when my list is : ['44', '', '', '', '', '',
> '0.000000000\n']
> The output is:
> len of list: 7
> keep this: 44
> discard these:
> discard these:
> discard these:
> So finally the list is: ['44', '', '', '0.000000000\n']
> The code above removes all the empty strings in the middle, all except
> two. My code seems to miss two of the empty strings.
>
> Would you know why this is occuring?
>
> Regards,
> Helvin
>
>
(list already is a defined name, so you really should call it something
else.


As Chris says, you're modifying the list while you're iterating through
it, and that's undefined behavior. Why not do the following?

mylist = line.strip().split(' ')
mylist = [item for item in mylist if item]

DaveA
--
http://mail.python.org/mailman/listinfo/python-list


gagsl-py2 at yahoo

Sep 14, 2009, 8:52 PM

Post #7 of 15 (9376 views)
Permalink
Re: Remove empty strings from list [In reply to]

En Mon, 14 Sep 2009 23:33:05 -0300, tec <technic.tec [at] gmail> escribió:

> or use filter
> list=filter(lambda x: len(x)>0, list)

For strings, len(x)>0 <=> len(x) <=> x, so the above statement is
equivalent to:

list=filter(lambda x: x, list)

which according to the documentation is the same as:

list=filter(None, list)

which is the fastest variant AFAIK.

(Of course, it's even better to use the right split() call so there is no
empty strings to filter out in the first place)

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


steven at REMOVE

Sep 14, 2009, 8:58 PM

Post #8 of 15 (9372 views)
Permalink
Re: Remove empty strings from list [In reply to]

On Mon, 14 Sep 2009 18:55:13 -0700, Chris Rebert wrote:

> On Mon, Sep 14, 2009 at 6:49 PM, Helvin <helvinlui [at] gmail> wrote:
...
> > I have looked at documentation, and how strings and lists work, but I
> > cannot understand the behaviour of the following:
...
> >                        for item in list:
> >                                if item is ' ':
> >                                        print 'discard these: ',item
> >                                        index = list.index(item)
> >                                        del list[index]

...

> Moral: Don't modify a list while iterating over it. Use the loop to
> create a separate, new list from the old one instead.


This doesn't just apply to Python, it is good advice in every language
I'm familiar with. At the very least, if you have to modify over a list
in place and you are deleting or inserting items, work *backwards*:

for i in xrange(len(alist), -1, -1):
item = alist[i]
if item == 'delete me':
del alist[i]


This is almost never the right solution in Python, but as a general
technique, it works in all sorts of situations. (E.g. when varnishing a
floor, don't start at the doorway and varnish towards the end of the
room, because you'll be walking all over the fresh varnish. Do it the
other way, starting at the end of the room, and work backwards towards
the door.)

In Python, the right solution is almost always to make a new copy of the
list. Here are three ways to do that:


newlist = []
for item in alist:
if item != 'delete me':
newlist.append(item)


newlist = [item for item in alist if item != 'delete me']

newlist = filter(lambda item: item != 'delete me', alist)



Once you have newlist, you can then rebind it to alist:

alist = newlist

or you can replace the contents of alist with the contents of newlist:

alist[:] = newlist


The two have a subtle difference in behavior that may not be apparent
unless you have multiple names bound to alist.



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


joinhack at gmail

Sep 14, 2009, 9:55 PM

Post #9 of 15 (9365 views)
Permalink
Re: Remove empty strings from list [In reply to]

good solution ,thanks~!

2009/9/15 Steven D'Aprano <steven [at] remove>

> On Mon, 14 Sep 2009 18:55:13 -0700, Chris Rebert wrote:
>
> > On Mon, Sep 14, 2009 at 6:49 PM, Helvin <helvinlui [at] gmail> wrote:
> ...
> > > I have looked at documentation, and how strings and lists work, but I
> > > cannot understand the behaviour of the following:
> ...
> > > for item in list:
> > > if item is ' ':
> > > print 'discard these: ',item
> > > index = list.index(item)
> > > del list[index]
>
> ...
>
> > Moral: Don't modify a list while iterating over it. Use the loop to
> > create a separate, new list from the old one instead.
>
>
> This doesn't just apply to Python, it is good advice in every language
> I'm familiar with. At the very least, if you have to modify over a list
> in place and you are deleting or inserting items, work *backwards*:
>
> for i in xrange(len(alist), -1, -1):
> item = alist[i]
> if item == 'delete me':
> del alist[i]
>
>
> This is almost never the right solution in Python, but as a general
> technique, it works in all sorts of situations. (E.g. when varnishing a
> floor, don't start at the doorway and varnish towards the end of the
> room, because you'll be walking all over the fresh varnish. Do it the
> other way, starting at the end of the room, and work backwards towards
> the door.)
>
> In Python, the right solution is almost always to make a new copy of the
> list. Here are three ways to do that:
>
>
> newlist = []
> for item in alist:
> if item != 'delete me':
> newlist.append(item)
>
>
> newlist = [item for item in alist if item != 'delete me']
>
> newlist = filter(lambda item: item != 'delete me', alist)
>
>
>
> Once you have newlist, you can then rebind it to alist:
>
> alist = newlist
>
> or you can replace the contents of alist with the contents of newlist:
>
> alist[:] = newlist
>
>
> The two have a subtle difference in behavior that may not be apparent
> unless you have multiple names bound to alist.
>
>
>
> --
> Steven
> --
> http://mail.python.org/mailman/listinfo/python-list
>


bruno.42.desthuilliers at websiteburo

Sep 15, 2009, 1:16 AM

Post #10 of 15 (9355 views)
Permalink
Re: Remove empty strings from list [In reply to]

Helvin a crit :
> Hi,
>
> Sorry I did not want to bother the group, but I really do not
> understand this seeming trivial problem.
> I am reading from a textfile, where each line has 2 values, with
> spaces before and between the values.
> I would like to read in these values, but of course, I don't want the
> whitespaces between them.
> I have looked at documentation, and how strings and lists work, but I
> cannot understand the behaviour of the following:
line = f.readline()
> line = line.lstrip() # take away whitespace at the beginning of the
> readline.

file.readline returns the line with the ending newline character (which
is considered whitespace by the str.strip method), so you may want to
use line.strip instead of line.lstrip

> list = line.split(' ')

Slightly OT but : don't use builtin types or functions names as
identifiers - this shadows the builtin object.

Also, the default behaviour of str.split is to split on whitespaces and
remove the delimiter. You would have better results not specifying the
delimiters here:

>>> " a a a a ".split(' ')
['', 'a', '', 'a', '', 'a', '', 'a', '']
>>> " a a a a ".split()
['a', 'a', 'a', 'a']
>>>

> # the list has empty strings in it, so now,
> remove these empty strings

A problem you could have avoided right from the start !-)

> for item in list:
> if item is ' ':

Don't use identity comparison when you want to test for equality. It
happens to kind of work in your above example but only because CPython
implements a cache for _some_ small strings, but you should _never_ rely
on such implementation details. A string containing accented characters
would not have been cached:
>>> s = ''
>>> s is ''
False
>>>


Also, this is surely not your actual code : ' ' is not an empty string,
it's a string with a single space character. The empty string is ''. And
FWIW, empty strings (like most empty sequences and collections, all
numerical zeros, and the None object) have a false value in a boolean
context, so you can just test the string directly:

for s in ['', 0, 0.0, [], {}, (), None]:
if not s:
print "'%s' is empty, so it's false" % str(s)


> print 'discard these: ',item
> index = list.index(item)
> del list[index] # remove this item from the list

And then you do have a big problem : the internal pointer used by the
iterator is not in sync with the list anymore, so the next iteration
will skip one item.

As general rule : *don't* add / remove elements to/from a sequence while
iterating over it. If you really need to modify the sequence while
iterating over it, do a reverse iteration - but there are usually better
solutions.

> else:
> print 'keep this: ',item
> The problem is,

Make it a plural - there's more than 1 problem here !-)

> when my list is : ['44', '', '', '', '', '',
> '0.000000000\n']
> The output is:
> len of list: 7
> keep this: 44
> discard these:
> discard these:
> discard these:
> So finally the list is: ['44', '', '', '0.000000000\n']
> The code above removes all the empty strings in the middle, all except
> two. My code seems to miss two of the empty strings.
>
> Would you know why this is occuring?


cf above... and below:

>>> alist = ['44', '', '', '', '', '', '0.000000000']
>>> for i, it in enumerate(alist):
... print 'i : %s - it : "%s"' % (i, it)
... if not it:
... del alist[idx]
... print "alist is now %s" % alist
...
i : 0 - it : "44"
alist is now ['44', '', '', '', '', '', '0.000000000']
i : 1 - it : ""
alist is now ['44', '', '', '', '', '0.000000000']
i : 2 - it : ""
alist is now ['44', '', '', '', '0.000000000']
i : 3 - it : ""
alist is now ['44', '', '', '0.000000000']
>>>


Ok, now for practical answers:

1/ in the above case, use line.strip().split(), you'll have no more
problem !-)

2/ as a general rule, if you need to filter a sequence, don't try to do
it in place (unless it's a *very* big sequence and you run into memory
problems but then there are probably better solutions).

The common idioms for filtering a sequence are:

* filter(predicate, sequence):

the 'predicate' param is callback function which takes an item from the
sequence and returns a boolean value (True to keep the item, False to
discard it). The following example will filter out even integers:

def is_odd(n):
return n % 2

alist = range(10)
odds = filter(is_odd, alist)
print alist
print odds

Alternatively, filter() can take None as it's first param, in which case
it will filter out items that have a false value in a boolean context, ie:

alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = filter(None, alist)
print result


* list comprehensions

Here you directly build the result list:

alist = range(10)
odds = [n for n in alist if n % 2]

alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = [item for item in alist if item]
print result



HTH
--
http://mail.python.org/mailman/listinfo/python-list


bruno.42.desthuilliers at websiteburo

Sep 15, 2009, 2:20 AM

Post #11 of 15 (9354 views)
Permalink
Re: Remove empty strings from list [In reply to]

Dave Angel a crit :
(snip)
>
> As Chris says, you're modifying the list while you're iterating through
> it, and that's undefined behavior. Why not do the following?
>
> mylist = line.strip().split(' ')
> mylist = [item for item in mylist if item]

Mmmm... because the second line is plain useless when calling
str.split() without a delimiter ?-)

>> mylist = line.strip().split()

will already do the RightThing(tm).

--
http://mail.python.org/mailman/listinfo/python-list


bruno.42.desthuilliers at websiteburo

Sep 15, 2009, 2:21 AM

Post #12 of 15 (9353 views)
Permalink
Re: Remove empty strings from list [In reply to]

Dennis Lee Bieber a crit :
(snip)
> All of which can be condensed into a simple
>
> for ln in f:
> wrds = ln.strip()
> # do something with the words -- no whitespace to be seen


I assume you meant:
wrds = ln.strip().split()

?-)
--
http://mail.python.org/mailman/listinfo/python-list


sion at viridian

Sep 15, 2009, 7:07 AM

Post #13 of 15 (9344 views)
Permalink
Re: Remove empty strings from list [In reply to]

Bruno Desthuilliers <bruno.42.desthuilliers [at] websiteburo> wrote:
> >> mylist = line.strip().split()
>
>will already do the RightThing(tm).

So will

mylist = line.split()

--
\S

under construction

--
http://mail.python.org/mailman/listinfo/python-list


rhodri at wildebst

Sep 15, 2009, 4:00 PM

Post #14 of 15 (9341 views)
Permalink
Re: Remove empty strings from list [In reply to]

On Tue, 15 Sep 2009 02:55:13 +0100, Chris Rebert <clp2 [at] rebertia> wrote:

> On Mon, Sep 14, 2009 at 6:49 PM, Helvin <helvinlui [at] gmail> wrote:
>> Hi,
>>
>> Sorry I did not want to bother the group, but I really do not
>> understand this seeming trivial problem.
>> I am reading from a textfile, where each line has 2 values, with
>> spaces before and between the values.
>> I would like to read in these values, but of course, I don't want the
>> whitespaces between them.
>> I have looked at documentation, and how strings and lists work, but I
>> cannot understand the behaviour of the following:
>> line = f.readline()
>> line = line.lstrip() # take away whitespace at
>> the beginning of the
>> readline.
>> list = line.split(' ') # split the str line into
>> a list
>>
>> # the list has empty strings in it, so now,
>> remove these empty strings
[snip]
>
> Block quoting from http://effbot.org/zone/python-list.htm
> """
> Note that the for-in statement maintains an internal index, which is
> incremented for each loop iteration. This means that if you modify the
> list you’re looping over, the indexes will get out of sync, and you
> may end up skipping over items, or process the same item multiple
> times.
> """
>
> Thus why your code is skipping over some elements and not removing them.
> Moral: Don't modify a list while iterating over it. Use the loop to
> create a separate, new list from the old one instead.

In this case, your life would be improved by using

l = line.split()

instead of

l = line.split(' ')

and not getting the empty strings in the first place.

--
Rhodri James *-* Wildebeest Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


bruno.42.desthuilliers at websiteburo

Sep 16, 2009, 1:08 AM

Post #15 of 15 (9330 views)
Permalink
Re: Remove empty strings from list [In reply to]

Sion Arrowsmith a crit :
> Bruno Desthuilliers <bruno.42.desthuilliers [at] websiteburo> wrote:
>>>> mylist = line.strip().split()
>> will already do the RightThing(tm).
>
> So will
>
> mylist = line.split()
>
Yeps, it's at least the second time someone reminds me that the call to
str.strip is just useless here... Pity my poor old neuron :(

--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.