Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Trouble splitting strings with consecutive delimiters

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


deuteros at xrs

Apr 30, 2012, 9:50 PM

Post #1 of 8 (500 views)
Permalink
Trouble splitting strings with consecutive delimiters

I'm using regular expressions to split a string using multiple delimiters.
But if two or more of my delimiters occur next to each other in the
string, it puts an empty string in the resulting list. For example:

re.split(':|;|px', "width:150px;height:50px;float:right")

Results in

['width', '150', '', 'height', '50', '', 'float', 'right']

Is there any way to avoid getting '' in my list without adding px; as a
delimiter?
--
http://mail.python.org/mailman/listinfo/python-list


jpiitula at ling

Apr 30, 2012, 11:14 PM

Post #2 of 8 (479 views)
Permalink
Re: Trouble splitting strings with consecutive delimiters [In reply to]

deuteros writes:

> I'm using regular expressions to split a string using multiple
> delimiters. But if two or more of my delimiters occur next to each
> other in the string, it puts an empty string in the resulting
> list. For example:
>
> re.split(':|;|px', "width:150px;height:50px;float:right")
>
> Results in
>
> ['width', '150', '', 'height', '50', '', 'float', 'right']
>
> Is there any way to avoid getting '' in my list without adding px;
> as a delimiter?

You could use a sequence of such delimiters.

>>> re.split('(?::|;|px)+', "width:150px;height:50px;float:right")
['width', '150', 'height', '50', 'float', 'right']

Consider splitting twice instead: first into key-value substrings at
semicolons, and those into key-value pairs at colons. Here as a dict.
Better handle the units after that.

>>> dict(kv.split(':') for kv in "width:150px;height:50px;float:right".split(';'))
{'width': '150px', 'float': 'right', 'height': '50px'}

You might also want to accept whitespace as part of the delimiters.

(There might be a parser for such data formats somewhere in the
library already. CSV?)
--
http://mail.python.org/mailman/listinfo/python-list


__peter__ at web

May 1, 2012, 5:55 AM

Post #3 of 8 (480 views)
Permalink
Re: Trouble splitting strings with consecutive delimiters [In reply to]

deuteros wrote:

> I'm using regular expressions to split a string using multiple delimiters.
> But if two or more of my delimiters occur next to each other in the
> string, it puts an empty string in the resulting list. For example:
>
> re.split(':|;|px', "width:150px;height:50px;float:right")
>
> Results in
>
> ['width', '150', '', 'height', '50', '', 'float', 'right']
>
> Is there any way to avoid getting '' in my list without adding px; as a
> delimiter?

That looks like a CSS style; to parse it you should use a tool that was
built for the job. The first one I came across (because it is included in
the linux distro I'm using and has "css" in its name, so this is not an
endorsement) is

http://packages.python.org/cssutils/

>>> import cssutils
>>> style = cssutils.parseStyle("width:150px;height:50px;float:right")
>>> for property in style.getProperties():
... print property.name, "-->", property.value
...
width --> 150px
height --> 50px
float --> right

OK, so you still need to strip off the unit prefix manually:

>>> def strip_suffix(s, *suffixes):
... for suffix in suffixes:
... if s.endswith(suffix):
... return s[:-len(suffix)]
... return s
...
>>> strip_suffix(style.float, "pt", "px")
u'right'
>>> strip_suffix(style.width, "pt", "px")
u'150'


--
http://mail.python.org/mailman/listinfo/python-list


steve+comp.lang.python at pearwood

May 1, 2012, 8:12 AM

Post #4 of 8 (474 views)
Permalink
Re: Trouble splitting strings with consecutive delimiters [In reply to]

On Tue, 01 May 2012 04:50:48 +0000, deuteros wrote:

> I'm using regular expressions to split a string using multiple
> delimiters. But if two or more of my delimiters occur next to each other
> in the string, it puts an empty string in the resulting list.

As I would expect. After all, there *is* an empty string between two
delimiters.


> For example:
>
> re.split(':|;|px', "width:150px;height:50px;float:right")
>
> Results in
>
> ['width', '150', '', 'height', '50', '', 'float', 'right']
>
> Is there any way to avoid getting '' in my list without adding px; as a
> delimiter?

Probably. But why not do it the easy way?


items = re.split(':|;|px', "width:150px;height:50px;float:right")
items = filter(None, item)

In Python 3, the second line will need to be list(filter(None, item)).



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


emile at fenx

May 1, 2012, 10:06 AM

Post #5 of 8 (472 views)
Permalink
Re: Trouble splitting strings with consecutive delimiters [In reply to]

> re.split(':|;|px', "width:150px;height:50px;float:right")

You could recognize that the delimiter you want to strip is in fact px;
and not px in and of itself.

So, try:

re.split(':|px;', "width:150px;height:50px;float:right")

Emile




--
http://mail.python.org/mailman/listinfo/python-list


lamialily at cleverpun

May 1, 2012, 10:13 AM

Post #6 of 8 (474 views)
Permalink
Re: Trouble splitting strings with consecutive delimiters [In reply to]

>> re.split(':|;|px', "width:150px;height:50px;float:right")
>
>You could recognize that the delimiter you want to strip is in fact px;
>and not px in and of itself.
>
>So, try:
>
>re.split(':|px;', "width:150px;height:50px;float:right")
>
>Emile

That won't work at all outside of the example case. It'd choke on any
attribute seperator that didn't end in px.

Honestly I'd recommend recovering the size measurement anyway, since
there are pretty huge differences between each form of measurement in
CSS. Seperating it from the number itself is fine and all since you
probably still need to turn it into a number Python can use, but I
wouldn't discard it outright.

~Temia
--
When on earth, do as the earthlings do.
--
http://mail.python.org/mailman/listinfo/python-list


emile at fenx

May 1, 2012, 10:36 AM

Post #7 of 8 (475 views)
Permalink
Re: Trouble splitting strings with consecutive delimiters [In reply to]

On 5/1/2012 10:13 AM Temia Eszteri said...
>> re.split(':|px;', "width:150px;height:50px;float:right")
>>
>> Emile
>
> That won't work at all outside of the example case. It'd choke on any
> attribute seperator that didn't end in px.

It would certainly choke on all delimeters that are not presented in the
argument. You're free to flavor to taste...

Emile


--
http://mail.python.org/mailman/listinfo/python-list


rustompmody at gmail

May 1, 2012, 11:37 PM

Post #8 of 8 (473 views)
Permalink
Re: Trouble splitting strings with consecutive delimiters [In reply to]

On May 1, 9:50 am, deuteros <deute...@xrs.net> wrote:
> I'm using regular expressions to split a string using multiple delimiters.
> But if two or more of my delimiters occur next to each other in the
> string, it puts an empty string in the resulting list. For example:
>
>         re.split(':|;|px', "width:150px;height:50px;float:right")
>
> Results in
>
>         ['width', '150', '', 'height', '50', '', 'float', 'right']
>
> Is there any way to avoid getting '' in my list without adding px; as a
> delimiter?

Are you parsing css?
If so have you tried things like cssutils http://cthedot.de/cssutils/?
[.There are other such... And I dont know which is best...]
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.