
bruno.42.desthuilliers at websiteburo
Sep 15, 2009, 1:16 AM
Post #10 of 15
(6228 views)
Permalink
|
Helvin a crit : > Hi, > > Sorry I did not want to bother the group, but I really do not > understand this seeming trivial problem. > I am reading from a textfile, where each line has 2 values, with > spaces before and between the values. > I would like to read in these values, but of course, I don't want the > whitespaces between them. > I have looked at documentation, and how strings and lists work, but I > cannot understand the behaviour of the following: line = f.readline() > line = line.lstrip() # take away whitespace at the beginning of the > readline. file.readline returns the line with the ending newline character (which is considered whitespace by the str.strip method), so you may want to use line.strip instead of line.lstrip > list = line.split(' ') Slightly OT but : don't use builtin types or functions names as identifiers - this shadows the builtin object. Also, the default behaviour of str.split is to split on whitespaces and remove the delimiter. You would have better results not specifying the delimiters here: >>> " a a a a ".split(' ') ['', 'a', '', 'a', '', 'a', '', 'a', ''] >>> " a a a a ".split() ['a', 'a', 'a', 'a'] >>> > # the list has empty strings in it, so now, > remove these empty strings A problem you could have avoided right from the start !-) > for item in list: > if item is ' ': Don't use identity comparison when you want to test for equality. It happens to kind of work in your above example but only because CPython implements a cache for _some_ small strings, but you should _never_ rely on such implementation details. A string containing accented characters would not have been cached: >>> s = '' >>> s is '' False >>> Also, this is surely not your actual code : ' ' is not an empty string, it's a string with a single space character. The empty string is ''. And FWIW, empty strings (like most empty sequences and collections, all numerical zeros, and the None object) have a false value in a boolean context, so you can just test the string directly: for s in ['', 0, 0.0, [], {}, (), None]: if not s: print "'%s' is empty, so it's false" % str(s) > print 'discard these: ',item > index = list.index(item) > del list[index] # remove this item from the list And then you do have a big problem : the internal pointer used by the iterator is not in sync with the list anymore, so the next iteration will skip one item. As general rule : *don't* add / remove elements to/from a sequence while iterating over it. If you really need to modify the sequence while iterating over it, do a reverse iteration - but there are usually better solutions. > else: > print 'keep this: ',item > The problem is, Make it a plural - there's more than 1 problem here !-) > when my list is : ['44', '', '', '', '', '', > '0.000000000\n'] > The output is: > len of list: 7 > keep this: 44 > discard these: > discard these: > discard these: > So finally the list is: ['44', '', '', '0.000000000\n'] > The code above removes all the empty strings in the middle, all except > two. My code seems to miss two of the empty strings. > > Would you know why this is occuring? cf above... and below: >>> alist = ['44', '', '', '', '', '', '0.000000000'] >>> for i, it in enumerate(alist): ... print 'i : %s - it : "%s"' % (i, it) ... if not it: ... del alist[idx] ... print "alist is now %s" % alist ... i : 0 - it : "44" alist is now ['44', '', '', '', '', '', '0.000000000'] i : 1 - it : "" alist is now ['44', '', '', '', '', '0.000000000'] i : 2 - it : "" alist is now ['44', '', '', '', '0.000000000'] i : 3 - it : "" alist is now ['44', '', '', '0.000000000'] >>> Ok, now for practical answers: 1/ in the above case, use line.strip().split(), you'll have no more problem !-) 2/ as a general rule, if you need to filter a sequence, don't try to do it in place (unless it's a *very* big sequence and you run into memory problems but then there are probably better solutions). The common idioms for filtering a sequence are: * filter(predicate, sequence): the 'predicate' param is callback function which takes an item from the sequence and returns a boolean value (True to keep the item, False to discard it). The following example will filter out even integers: def is_odd(n): return n % 2 alist = range(10) odds = filter(is_odd, alist) print alist print odds Alternatively, filter() can take None as it's first param, in which case it will filter out items that have a false value in a boolean context, ie: alist = ['', 'a', 0, 1, [], [1], None, object, False, True] result = filter(None, alist) print result * list comprehensions Here you directly build the result list: alist = range(10) odds = [n for n in alist if n % 2] alist = ['', 'a', 0, 1, [], [1], None, object, False, True] result = [item for item in alist if item] print result HTH -- http://mail.python.org/mailman/listinfo/python-list
|