Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Parsing a path to components

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


eliben at gmail

Jun 6, 2008, 10:55 PM

Post #1 of 9 (10367 views)
Permalink
Parsing a path to components

Hello,

os.path.split returns the head and tail of a path, but what if I want
to have all the components ? I could not find a portable way to do
this in the standard library, so I've concocted the following
function. It uses os.path.split to be portable, at the expense of
efficiency.

----------------------------------
def parse_path(path):
""" Parses a path to its components.

Example:
parse_path("C:\\Python25\\lib\\site-packages\
\zipextimporter.py")

Returns:
['C:\\', 'Python25', 'lib', 'site-packages',
'zipextimporter.py']

This function uses os.path.split in an attempt to be portable.
It costs in performance.
"""
lst = []

while 1:
head, tail = os.path.split(path)

if tail == '':
if head != '': lst.insert(0, head)
break
else:
lst.insert(0, tail)
path = head

return lst
----------------------------------

Did I miss something and there is a way to do this standardly ?
Is this function valid, or will there be cases that will confuse it ?

Thanks in advance
Eli
--
http://mail.python.org/mailman/listinfo/python-list


s0suk3 at gmail

Jun 6, 2008, 11:57 PM

Post #2 of 9 (10274 views)
Permalink
Re: Parsing a path to components [In reply to]

On Jun 7, 12:55am, eliben <eli...@gmail.com> wrote:
> Hello,
>
> os.path.split returns the head and tail of a path, but what if I want
> to have all the components ? I could not find a portable way to do
> this in the standard library, so I've concocted the following
> function. It uses os.path.split to be portable, at the expense of
> efficiency.
>
> ----------------------------------
> def parse_path(path):
> """ Parses a path to its components.
>
> Example:
> parse_path("C:\\Python25\\lib\\site-packages\
> \zipextimporter.py")
>
> Returns:
> ['C:\\', 'Python25', 'lib', 'site-packages',
> 'zipextimporter.py']
>
> This function uses os.path.split in an attempt to be portable.
> It costs in performance.
> """
> lst = []
>
> while 1:
> head, tail = os.path.split(path)
>
> if tail == '':
> if head != '': lst.insert(0, head)
> break
> else:
> lst.insert(0, tail)
> path = head
>
> return lst
> ----------------------------------
>
> Did I miss something and there is a way to do this standardly ?
> Is this function valid, or will there be cases that will confuse it ?
>

You can just split the path on `os.sep', which contains the path
separator of the platform on which Python is running:

components = pathString.split(os.sep)

Sebastian

--
http://mail.python.org/mailman/listinfo/python-list


bj_666 at gmx

Jun 7, 2008, 1:15 AM

Post #3 of 9 (10271 views)
Permalink
Re: Parsing a path to components [In reply to]

On Fri, 06 Jun 2008 23:57:03 -0700, s0suk3 wrote:

> You can just split the path on `os.sep', which contains the path
> separator of the platform on which Python is running:
>
> components = pathString.split(os.sep)

Won't work for platforms with more than one path separator and if a
separator is repeated. For example r'\foo\\bar/baz//spam.py' or:

In [140]: os.path.split('foo//bar')
Out[140]: ('foo', 'bar')

In [141]: 'foo//bar'.split(os.sep)
Out[141]: ['foo', '', 'bar']

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


eliben at gmail

Jun 7, 2008, 2:09 AM

Post #4 of 9 (10278 views)
Permalink
Re: Parsing a path to components [In reply to]

On Jun 7, 10:15 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
> On Fri, 06 Jun 2008 23:57:03 -0700, s0suk3 wrote:
> > You can just split the path on `os.sep', which contains the path
> > separator of the platform on which Python is running:
>
> > components = pathString.split(os.sep)
>
> Won't work for platforms with more than one path separator and if a
> separator is repeated. For example r'\foo\\bar/baz//spam.py' or:
>
> In [140]: os.path.split('foo//bar')
> Out[140]: ('foo', 'bar')
>
> In [141]: 'foo//bar'.split(os.sep)
> Out[141]: ['foo', '', 'bar']
>
> Ciao,
> Marc 'BlackJack' Rintsch

Can you recommend a generic way to achieve this ?
Eli
--
http://mail.python.org/mailman/listinfo/python-list


s0suk3 at gmail

Jun 7, 2008, 2:15 AM

Post #5 of 9 (10274 views)
Permalink
Re: Parsing a path to components [In reply to]

On Jun 7, 3:15am, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
> On Fri, 06 Jun 2008 23:57:03 -0700, s0suk3 wrote:
> > You can just split the path on `os.sep', which contains the path
> > separator of the platform on which Python is running:
>
> > components = pathString.split(os.sep)
>
> Won't work for platforms with more than one path separator and if a
> separator is repeated. For example r'\foo\\bar/baz//spam.py' or:
>
> In [140]: os.path.split('foo//bar')
> Out[140]: ('foo', 'bar')
>
> In [141]: 'foo//bar'.split(os.sep)
> Out[141]: ['foo', '', 'bar']
>

But those are invalid paths, aren't they? If you have a jumble of a
path, I think the solution is to call os.path.normpath() before
splitting.

Sebastian
--
http://mail.python.org/mailman/listinfo/python-list


bj_666 at gmx

Jun 7, 2008, 2:34 AM

Post #6 of 9 (10274 views)
Permalink
Re: Parsing a path to components [In reply to]

On Sat, 07 Jun 2008 02:15:07 -0700, s0suk3 wrote:

> On Jun 7, 3:15 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
>> On Fri, 06 Jun 2008 23:57:03 -0700, s0suk3 wrote:
>> > You can just split the path on `os.sep', which contains the path
>> > separator of the platform on which Python is running:
>>
>> > components = pathString.split(os.sep)
>>
>> Won't work for platforms with more than one path separator and if a
>> separator is repeated.  For example r'\foo\\bar/baz//spam.py' or:
>>
>> In [140]: os.path.split('foo//bar')
>> Out[140]: ('foo', 'bar')
>>
>> In [141]: 'foo//bar'.split(os.sep)
>> Out[141]: ['foo', '', 'bar']
>>
>
> But those are invalid paths, aren't they?

No. See `os.altsep` on Windows. And repeating separators is allowed too.

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


M8R-yfto6h at mailinator

Jun 7, 2008, 7:19 AM

Post #7 of 9 (10264 views)
Permalink
Re: Parsing a path to components [In reply to]

"eliben" <eliben [at] gmail> wrote in message
news:e5fd542f-56d2-4ec0-a3a7-aa1ee106c624 [at] a70g2000hsh
> On Jun 7, 10:15 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
>> On Fri, 06 Jun 2008 23:57:03 -0700, s0suk3 wrote:
>> > You can just split the path on `os.sep', which contains the path
>> > separator of the platform on which Python is running:
>>
>> > components = pathString.split(os.sep)
>>
>> Won't work for platforms with more than one path separator and if a
>> separator is repeated. For example r'\foo\\bar/baz//spam.py' or:
>>
>> In [140]: os.path.split('foo//bar')
>> Out[140]: ('foo', 'bar')
>>
>> In [141]: 'foo//bar'.split(os.sep)
>> Out[141]: ['foo', '', 'bar']
>>
>> Ciao,
>> Marc 'BlackJack' Rintsch
>
> Can you recommend a generic way to achieve this ?
> Eli

>>> import os
>>> from os.path import normpath,abspath
>>> x=r'\foo\\bar/baz//spam.py'
>>> normpath(x)
'\\foo\\bar\\baz\\spam.py'
>>> normpath(abspath(x))
'C:\\foo\\bar\\baz\\spam.py'
>>> normpath(abspath(x)).split(os.sep)
['C:', 'foo', 'bar', 'baz', 'spam.py']

-Mark

--
http://mail.python.org/mailman/listinfo/python-list


duncan.booth at invalid

Jun 9, 2008, 1:12 AM

Post #8 of 9 (10239 views)
Permalink
Re: Parsing a path to components [In reply to]

"Mark Tolonen" <M8R-yfto6h [at] mailinator> wrote:

>> Can you recommend a generic way to achieve this ?
>> Eli
>
>>>> import os
>>>> from os.path import normpath,abspath
>>>> x=r'\foo\\bar/baz//spam.py'
>>>> normpath(x)
> '\\foo\\bar\\baz\\spam.py'
>>>> normpath(abspath(x))
> 'C:\\foo\\bar\\baz\\spam.py'
>>>> normpath(abspath(x)).split(os.sep)
> ['C:', 'foo', 'bar', 'baz', 'spam.py']

That gets a bit messy with UNC pathnames. With the OP's code the double
backslah leadin is preserved (although arguably it has split one time too
many, '\\\\frodo' would make more sense as the first element:

>>> parse_path(r'\\frodo\foo\bar')
['\\\\', 'frodo', 'foo', 'bar']

With your code you just get two empty strings as the leadin:

>>> normpath(abspath(r'\\frodo\foo\bar')).split(os.sep)
['', '', 'frodo', 'foo', 'bar']

--
Duncan Booth http://kupuguy.blogspot.com
--
http://mail.python.org/mailman/listinfo/python-list


Scott.Daniels at Acm

Jun 10, 2008, 5:55 AM

Post #9 of 9 (10215 views)
Permalink
Re: Parsing a path to components [In reply to]

eliben wrote:
... a prety good try ...
> def parse_path(path):
> """..."""
By the way, the comment is fine. I am going for brevity here.
> lst = []
> while 1:
> head, tail = os.path.split(path)
> if tail == '':
> if head != '': lst.insert(0, head)
> break
> else:
> lst.insert(0, tail)
> path = head
> return lst
> ----------------------------------
>
> Did I miss something and there is a way to do this standardly ?
Nope, the requirement is rare.

> Is this function valid, or will there be cases that will confuse it ?
parse_path('/a/b/c//d/')

Try something like:
def parse_path(path):
'''...same comment...'''
head, tail = os.path.split(path)
result = []
if not tail:
if head == path:
return [head]
# Perhaps result = [''] here to an indicate ends-in-sep
head, tail = os.path.split(head)
while head and tail:
result.append(tail)
head, tail = os.path.split(head)
result.append(head or tail)
result.reverse()
return result

--Scott David Daniels
Scott.Daniels [at] Acm
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.