Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Yet another "split string by spaces preserving single quotes" problem

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


massi_srb at msn

May 13, 2012, 2:14 PM

Post #1 of 3 (467 views)
Permalink
Yet another "split string by spaces preserving single quotes" problem

Hi everyone,
I know this question has been asked thousands of times, but in my case
I have an additional requirement to be satisfied. I need to handle
substrings in the form 'string with spaces':'another string with
spaces' as a single token; I mean, if I have this string:

s ="This is a 'simple test':'string which' shows 'exactly my'
problem"

I need to split it as follow (the single quotes must be mantained in
the splitted list):

["This", "is", "a", "'simple test':'string which'", "shows",
"'exactly my'", "problem"]

Up to know I have written some ugly code which uses regular
expression:

splitter = re.compile("(?=\s|^)('[^']+') | ('[^']+')(?=\s|$)")

temp = [.t for t in splitter.split(s) if t not in [None, '']]
print temp
t = []
for i, p in enumerate(temp) :
for x in ([p] if (p[0] == "'" and p[1] == "'") else p.split('
')) :
t.append(x)

But it does not handle "colon" case.
Any hints? Thanks in advance!
--
http://mail.python.org/mailman/listinfo/python-list


python.list at tim

May 14, 2012, 5:23 PM

Post #2 of 3 (442 views)
Permalink
Re: Yet another "split string by spaces preserving single quotes" problem [In reply to]

On 05/13/12 16:14, Massi wrote:
> Hi everyone,
> I know this question has been asked thousands of times, but in my case
> I have an additional requirement to be satisfied. I need to handle
> substrings in the form 'string with spaces':'another string with
> spaces' as a single token; I mean, if I have this string:
>
> s ="This is a 'simple test':'string which' shows 'exactly my'
> problem"
>
> I need to split it as follow (the single quotes must be mantained in
> the splitted list):

The "quotes must be maintained" bit is what makes this different
from most common use-cases. Without that condition, using
shlex.split() from the standard library does everything else that
you need. Alternatively, one might try hacking csv.reader() to do
the splitting for you, though I had less luck than with shlex.

> Up to know I have written some ugly code which uses regular
> expression:
>
> splitter = re.compile("(?=\s|^)('[^']+') | ('[^']+')(?=\s|$)")

You might try

r = re.compile(r"""(?:'[^']*'|"[^"]*"|[^'" ]+)+""")
print r.findall(s)

which seems to match your desired output. It doesn't currently
handle tabs, but by breaking it out, it's easy to modify (and may
help understand what it's doing)

>>> single_quoted = "'[^']*'"
>>> double_quoted = '"[^"]*"'
>>> other = """[^'" \t]+""" # added a "\t" tab here
>>> matches = '|'.join((single_quoted, double_quoted, other))
>>> regex = r'(?:%s)+' % matches
>>> r = re.compile(regex)
>>> r.findall(s)
['This', 'is', 'a', "'simple test':'string which'", 'shows',
"'exactly my'", 'problem']


Hope this helps,

-tkc






--
http://mail.python.org/mailman/listinfo/python-list


steve+comp.lang.python at pearwood

May 15, 2012, 6:15 AM

Post #3 of 3 (437 views)
Permalink
Re: Yet another "split string by spaces preserving single quotes" problem [In reply to]

On Sun, 13 May 2012 14:14:58 -0700, Massi wrote:

> Hi everyone,
> I know this question has been asked thousands of times, but in my case I
> have an additional requirement to be satisfied. I need to handle
> substrings in the form 'string with spaces':'another string with spaces'
> as a single token; I mean, if I have this string:
>
> s ="This is a 'simple test':'string which' shows 'exactly my' problem"
>
> I need to split it as follow (the single quotes must be mantained in the
> splitted list):
>
> [."This", "is", "a", "'simple test':'string which'", "shows", "'exactly
> my'", "problem"]
>
> Up to know I have written some ugly code which uses regular expression:

And now you have two problems *wink*


> Any hints? Thanks in advance!

>>> s = "This is a 'simple test':'string which' shows 'exactly my'
problem"
>>> import shlex
>>> result = shlex.split(s, posix=True)
>>> result
['This', 'is', 'a', 'simple test:string which', 'shows', 'exactly my',
'problem']


Then do some post-processing on the result:

>>> [."'"+s+"'" if " " in s else s for s in result]
['This', 'is', 'a', "'simple test:string which'", 'shows', "'exactly
my'", 'problem']


--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.