
perlbug-followup at perl
Nov 7, 2009, 3:16 PM
Post #1 of 1
(181 views)
Permalink
|
|
[perl #70317] splitting accented words
|
|
# New Ticket Created by mochan [at] em # Please include the string: [perl #70317] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=70317 > This is a bug report for perl from mochan [at] em, generated with the help of perlbug 1.36 running under perl 5.10.0. ----------------------------------------------------------------- [Please enter your report here] The following code makes a list of words, one to a line, and seems to work correctly even when the words contain accented words (my locale is en_US.UTF-8): $ echo aei aéi | perl -CS -ne 'print join "\n", split /\W/' aei aéi However, the following almost identical code fails and incorrectly splits accented words at the accented characters $ echo aei aéi | perl -CS -ne 'print join "\n", split /[\W]/' aei a i Both fail if I remove the -CS switch and both pass if I replace \W by \P{IsWord}. I guess that the difference between \W and [\W] is a bug. [Please do not change anything below this line] ----------------------------------------------------------------- --- Flags: category=core severity=medium --- Site configuration information for perl 5.10.0: Configured by mochan at Thu Aug 20 22:55:52 CDT 2009. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.26-2-amd64, archname=x86_64-linux uname='linux em 2.6.26-2-amd64 #1 smp sun jul 26 20:35:48 utc 2009 x86_64 gnulinux ' config_args='-de' hint=previous, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -I/usr/local/include -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' ccversion='', gccversion='4.3.2', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 libs=-lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.7.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.7' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib' Locally applied patches: --- @INC for perl 5.10.0: /usr/local/lib/perl5/5.10.0/x86_64-linux /usr/local/lib/perl5/5.10.0 /usr/local/lib/perl5/site_perl/5.10.0/x86_64-linux /usr/local/lib/perl5/site_perl/5.10.0 . --- Environment for perl 5.10.0: HOME=/home/mochan LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/mochan/bin:/home/mochan/bin:/usr/local/bin:/usr/bin:/bin:/usr/games PERL_BADLANG (unset) SHELL=/bin/bash
|