
brion at pobox
Feb 20, 2002, 3:37 AM
Views: 371
Permalink
|
|
New case conversion functions
|
|
I've noticed that the traditional locale-based case conversion functions (ucfirst(), strtolower(), etc) aren't too reliable for anything but English. Even when they do work, it's very dependant on the system configuaration, and thus isn't really transparently portable. So, I've added new case conversion functions ucfirstIntl(), strtoupperIntl(), and strtolowerIntl() which can more or less properly convert cases in a system-independent manner. For single-byte character encodings this is very simple, based on the PHP strtr() function; just define strings $wikiUpperChars containing all the uppercase characters and $wikiLowerChars containing all the lowercase chars. (See example for iso-8859-1 in wikiTextEn.php) For multibyte character sets it's a little more complex, using the same function in an array mode that associates byte sequences. Most multibyte character sets are for Asian languages which don't have a case distinction, so it's not likely to come up often except for those using UTF-8. I've included conversion arrays for UTF-8 in utf8Case.php which should cover just about everything, so any future 'pedias that may use UTF-8 need just include that (as does wikiTextEo.php). Also, it should be possible to extend ucfirstIntl() a bit to allow for multiple-character first letter sequences (for instance treating ij->IJ as one letter, which I believe is the officially correct behavior for Dutch). -- brion vibber (brion @ pobox.com)
|