control characters and Util::clean_text()

Dennis Melentyev dennis.melentyev at infopulse.com.ua
Wed Dec 21 15:42:40 UTC 2005


ASCII 127 is a *correct* Russian symbol in cp1251 (thanks to M$).
Also, what to do with UTF-8 input?

В ср, 21/12/2005 в 14:09 +0100, Frédéric Buclin пишет:
> Hello!
> 
> bug 238780 added a new method Util::clean_text($str) whose goal is to 
> remove control characters from the string $str (ASCII 0 through 31 and 
> ASCII 127). The idea was to prevent newlines and such characters in 
> fields such as the product version (bug 238780), the target milestone 
> (bug 177773) and the bug summary (bug 101380), among others.
> 
> As far as I know, only comments should allow such characters (well, 
> apart from newlines (ASCII 10 and 13) and maybe horizontal tabs (ASCII 
> 9), I don't see why we should allow other control characters in 
> comments). This brings us to the following problem: if we want to filter 
> *all* fields using clean_text(), we would have to change a large part of 
> the code, replacing most trim() by clean_text() (clean_text(), in his 
> updated version, returns the trimmed string already). This is clearly 
> not something I'm going to do nor to approve (6 patches are in my review 
> queue about such changes, including one for the 2.16 branch!). So why 
> not updating trim() to automatically remove such characters everywhere? 
> This solution would be much less invasive.
> 
> If nobody has objection about my suggestion, that's what I would like to 
> see implemented. I could even imagine trick_taint() to do this kind of 
> cleanup itself.
> 
> Comments?
> 
> LpSolit
> -
> To view or change your list settings, click here:
> <http://bugzilla.org/cgi-bin/mj_wwwusr?user=dennis.melentyev@infopulse.com.ua>




More information about the developers mailing list