Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

[perl #70317] splitting accented words

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


perlbug-followup at perl

Nov 7, 2009, 3:16 PM

Post #1 of 1 (181 views)
Permalink
[perl #70317] splitting accented words

# New Ticket Created by mochan [at] em
# Please include the string: [perl #70317]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=70317 >



This is a bug report for perl from mochan [at] em,
generated with the help of perlbug 1.36 running under perl 5.10.0.


-----------------------------------------------------------------
[Please enter your report here]

The following code makes a list of words, one to a line, and seems to
work correctly even when the words contain accented words (my locale
is en_US.UTF-8):

$ echo aei aéi | perl -CS -ne 'print join "\n", split /\W/'
aei
aéi

However, the following almost identical code fails and incorrectly
splits accented words at the accented characters

$ echo aei aéi | perl -CS -ne 'print join "\n", split /[\W]/'
aei
a
i

Both fail if I remove the -CS switch and both pass if I replace \W by
\P{IsWord}. I guess that the difference between \W and [\W] is a bug.


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl 5.10.0:

Configured by mochan at Thu Aug 20 22:55:52 CDT 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
Platform:
osname=linux, osvers=2.6.26-2-amd64, archname=x86_64-linux
uname='linux em 2.6.26-2-amd64 #1 smp sun jul 26 20:35:48 utc 2009 x86_64 gnulinux '
config_args='-de'
hint=previous, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-fno-strict-aliasing -pipe -I/usr/local/include -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
ccversion='', gccversion='4.3.2', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
libs=-lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.7.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.7'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib'

Locally applied patches:


---
@INC for perl 5.10.0:
/usr/local/lib/perl5/5.10.0/x86_64-linux
/usr/local/lib/perl5/5.10.0
/usr/local/lib/perl5/site_perl/5.10.0/x86_64-linux
/usr/local/lib/perl5/site_perl/5.10.0
.

---
Environment for perl 5.10.0:
HOME=/home/mochan
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/mochan/bin:/home/mochan/bin:/usr/local/bin:/usr/bin:/bin:/usr/games
PERL_BADLANG (unset)
SHELL=/bin/bash

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.