Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue1140] re.sub returns str when processing empty unicode string

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Sep 9, 2007, 11:37 PM

Post #1 of 14 (383 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string

New submission from Beda Kosata:

While re.sub normally returns unicode strings when processing unicode,
it returns a normal string when dealing with an empty unicode string.

Example:
>>> print type( re.sub( "XX", "", u""))
<type 'str'>
>>> print type( re.sub( "XX", "", u"A"))
<type 'unicode'>

This inconsistency could lead to annoying bugs (at least it did for me :)

----------
components: Regular Expressions
messages: 55775
nosy: beda
severity: minor
status: open
title: re.sub returns str when processing empty unicode string
type: behavior
versions: Python 2.4, Python 2.5

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 9, 2007, 11:37 PM

Post #2 of 14 (368 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

New submission from Beda Kosata:

While re.sub normally returns unicode strings when processing unicode,
it returns a normal string when dealing with an empty unicode string.

Example:
>>> print type( re.sub( "XX", "", u""))
<type 'str'>
>>> print type( re.sub( "XX", "", u"A"))
<type 'unicode'>

This inconsistency could lead to annoying bugs (at least it did for me :)

----------
components: Regular Expressions
messages: 55775
nosy: beda
severity: minor
status: open
title: re.sub returns str when processing empty unicode string
type: behavior
versions: Python 2.4, Python 2.5

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 10:14 AM

Post #3 of 14 (371 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Guido van Rossum added the comment:

I agree. I wonder if it should return Unicode as soon as *any* of the
arguments are unicode???

----------
nosy: +gvanrossum

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 11:25 AM

Post #4 of 14 (362 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Beda Kosata added the comment:

I would certainly expect it to return unicode when either the "modified"
string or the replacement are unicode. I don't think that the type of
the replaced string should influence the type of the result.

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 11:42 AM

Post #5 of 14 (364 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Guido van Rossum added the comment:

Actually, it already implements the best possible rules, *except* for
the special case of an empty 3rd argument. (When there are no
substitutions, it normally returns the input unchanged; but somehow an
empty input is handled with a shortcut even before that point. It ought
to be a simlpe fix.

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 1:37 PM

Post #6 of 14 (364 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Guido van Rossum added the comment:

Here's a patch.

----------
assignee: -> gvanrossum

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
Attachments: sre.diff (2.02 KB)


report at bugs

Sep 10, 2007, 2:40 PM

Post #7 of 14 (361 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Guido van Rossum added the comment:

Here's a better patch that also fixes a few related issues.

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
Attachments: sre.diff (2.75 KB)


report at bugs

Sep 10, 2007, 2:40 PM

Post #8 of 14 (361 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Guido van Rossum added the comment:

Fredrik, thoughts?

----------
assignee: gvanrossum -> effbot
nosy: +effbot

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 2:54 PM

Post #9 of 14 (365 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Fredrik Lundh added the comment:

Looks good to me. I still subscribe to the idea that
robust code should accept 8-bit *ASCII* strings any-
where it accepts Unicode (especially when the 8-bit
string is empty), but that's me.

Feel free to check this in (or assign back to you if
you don't have the time).

----------
assignee: effbot -> gvanrossum
resolution: -> accepted

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 2:56 PM

Post #10 of 14 (360 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Fredrik Lundh added the comment:

(is there a way to just add a comment in the new tracker, btw, or is
everything a "change note", even if nothing has changed?)

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 3:01 PM

Post #11 of 14 (360 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Fredrik Lundh added the comment:

Well, I spent a minute hunting around for a "comment" field or an "add
comment" button. Guess this is a "you only need to learn this once"
thing...

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 3:03 PM

Post #12 of 14 (365 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Guido van Rossum added the comment:

Thanks, Fredrik.
Fixed in 2.6.
Committed revision 58098.
Someone else could backport to 2.5.
Shouldn't be merged into 3.0.

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 10, 2007, 3:04 PM

Post #13 of 14 (364 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Changes by Fredrik Lundh:


__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Sep 17, 2007, 2:44 AM

Post #14 of 14 (356 views)
Permalink
[issue1140] re.sub returns str when processing empty unicode string [In reply to]

Sean Reifschneider added the comment:

Applied as revision 58179 to 2.5 maintenance branch, passes tests.

----------
nosy: +jafo
priority: -> low
status: open -> closed

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue1140>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.