control characters and Util::clean_text()

Paulo Casanova paulo.casanova at link.pt
Wed Dec 21 17:25:24 UTC 2005


Hi,

0x00 up to 0x7F are single byte characters on both ASCII and UTF-8 (they
have equal meaning in both standards). As such, I see no problem with the
stripping sub.

If i'm not mistaken :) when you enter a character in a browser form, it will
be encoded into UTF-8 and then sent to the web server which will decode it;
you will receive a UTF-8 string in perl so the sub would work correctly too.
Even if you have your pages in cp1251 and use russian characters in your
bugzilla products.

Or maybe i'm just wrong :)

Paulo 

-----Original Message-----
From: developers-owner at bugzilla.org [mailto:developers-owner at bugzilla.org]
On Behalf Of Dennis Melentyev
Sent: quarta-feira, 21 de Dezembro de 2005 15:43
To: developers at bugzilla.org
Subject: Re: control characters and Util::clean_text()

ASCII 127 is a *correct* Russian symbol in cp1251 (thanks to M$).
Also, what to do with UTF-8 input?

В ср, 21/12/2005 в 14:09 +0100, Frédéric Buclin пишет:
> Hello!
> 
> bug 238780 added a new method Util::clean_text($str) whose goal is to 
> remove control characters from the string $str (ASCII 0 through 31 and 
> ASCII 127). The idea was to prevent newlines and such characters in 
> fields such as the product version (bug 238780), the target milestone 
> (bug 177773) and the bug summary (bug 101380), among others.
> 
> As far as I know, only comments should allow such characters (well, 
> apart from newlines (ASCII 10 and 13) and maybe horizontal tabs (ASCII 
> 9), I don't see why we should allow other control characters in 
> comments). This brings us to the following problem: if we want to 
> filter
> *all* fields using clean_text(), we would have to change a large part 
> of the code, replacing most trim() by clean_text() (clean_text(), in 
> his updated version, returns the trimmed string already). This is 
> clearly not something I'm going to do nor to approve (6 patches are in 
> my review queue about such changes, including one for the 2.16 
> branch!). So why not updating trim() to automatically remove such
characters everywhere?
> This solution would be much less invasive.
> 
> If nobody has objection about my suggestion, that's what I would like 
> to see implemented. I could even imagine trick_taint() to do this kind 
> of cleanup itself.
> 
> Comments?
> 
> LpSolit
> -
> To view or change your list settings, click here:
> <http://bugzilla.org/cgi-bin/mj_wwwusr?user=dennis.melentyev@infopulse
> .com.ua>

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=paulo.casanova@link.pt>





More information about the developers mailing list