
aw at ice-sa
Feb 25, 2011, 12:48 PM
Post #7 of 7
(1100 views)
Permalink
|
Thanks to Michael, Michael, Lloyd, Cees, your answers and insights have made things clearer for me. I think I'll use a combination of all of that for this new application we're writing. In other words, to program "defensively", I propose to do this : when sending the html page with the <form> : - create the page and save it as UTF-8 - have the proper charset indications in it - include a hidden test field with some known UTF-8 sequence (e.g. "ÄÖÜ") - make sure that the application and the webserver send out the page with the proper Content-type and charset (HTTP headers) But since we still don't know what the browser (and the user) will actually do with this, upon reception of the POST : - get the test field and check how it was received a) check if it has the "is_utf8()" flag set (probably not) b) if not (a) check if at least it has the correct UTF-8 bytes in it (6, not 3) c) if nor (a) nor (b), reject with error (don't know what it is then) d) if not (a), but (b), then set a flag 'must_decode' - get the other parameters, and - if the 'must_decode' flag is not set, leave them 'as is' - if the flag is set, Encode::decode('utf8',..) all received parameters, except for file uploads (*) That's of course in the hope that, some day, browsers will send multipart data with the proper charset indication, and that CGI.pm will take it into account and do the right thing. (*) although a question then is how a Polish browser would send the filename attribute, assuming it is originally something like "Qualitätsübersicht.pdf"
|