control characters and Util::clean_text()

Dennis Melentyev dennis.melentyev at infopulse.com.ua
Thu Dec 22 11:14:16 UTC 2005


Wed, 21/12/2005 в 17:35 +0100, Emmanuel Seyman wrote:
> Dennis Melentyev wrote:
> >
> > ASCII 127 is a *correct* Russian symbol in cp1251 (thanks to M$).
> > Also, what to do with UTF-8 input?
> 
> 127 should be DELETE, no matter what charset you are using (since it's
> part of the ASCII charset). This is a non-printable character so it should
>  be trimmed.
Please pardon me. I wrongly got it as 0xFF, not 0x7F. Later one means
nothing meaningful in cp1251.
I just triggered on really frequent problem with 0xFF treated as a
whitespace character, which is a small cyrillic "JA".





More information about the developers mailing list