control characters and Util::clean_text()

Tom Emerson tree at basistech.com
Wed Dec 21 16:08:48 UTC 2005


Dennis Melentyev writes:
> ASCII 127 is a *correct* Russian symbol in cp1251 (thanks to M$).
> Also, what to do with UTF-8 input?

What about UTF-8? It won't matter. 0x7F is a control character in
Unicode and is a valid UTF-8 single-byte value. Stripping it won't
hurt anything.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
 "You can't fake quality any more than you can fake a good meal." (W.S.B.)



More information about the developers mailing list