control characters and Util::clean_text()

David Miller justdave at bugzilla.org
Wed Dec 21 17:45:52 UTC 2005


Paulo Casanova wrote on 12/21/05 12:25 PM:

> If i'm not mistaken :) when you enter a character in a browser form, it will
> be encoded into UTF-8 and then sent to the web server which will decode it;
> you will receive a UTF-8 string in perl so the sub would work correctly too.
> Even if you have your pages in cp1251 and use russian characters in your
> bugzilla products.
> 
> Or maybe i'm just wrong :)

You are.  The Browser will return it in whatever charset was specified 
by the web server feeding it the form, or with the charset specified in 
the attributes of the <form> element, defaulting to ISO-8859-1 if no 
charset was specified.  If you include a character that's not 
representable in the charset selected, it will entity-encode the 
character with &#<unicode value in decimal>;

-- 
Dave Miller                                   http://www.justdave.net/
System Administrator, Mozilla Corporation      http://www.mozilla.com/
Project Leader, Bugzilla Bug Tracking System  http://www.bugzilla.org/



More information about the developers mailing list