control characters and Util::clean_text()

Benton, Kevin kevin.benton at amd.com
Wed Dec 21 17:34:30 UTC 2005


> -----Original Message-----
> From: developers-owner at bugzilla.org
[mailto:developers-owner at bugzilla.org]
> On Behalf Of Tom Emerson
> Sent: Wednesday, December 21, 2005 9:09 AM
> To: developers at bugzilla.org
> Subject: Re: control characters and Util::clean_text()
> 
> Dennis Melentyev writes:
> > ASCII 127 is a *correct* Russian symbol in cp1251 (thanks to M$).
> > Also, what to do with UTF-8 input?
> 
> What about UTF-8? It won't matter. 0x7F is a control character in
> Unicode and is a valid UTF-8 single-byte value. Stripping it won't
> hurt anything.

Actually, it does.  If I put a character in and you strip it without
telling me, you've hurt my ability to get my "stuff" done.  We ran into
this with Perforce and non-utf8 Unicode files that were getting
corrupted on check-in.  Perforce messed up the file format but didn't
tell the user that it couldn't handle Unicode in any format except
UTF-8.  Users didn't know any better so they went on with their work
only to find out later that their check-ins had been corrupted when they
tried to use them from a different system at a later date (not easy to
troubleshoot).

As a result, I think we ought to do one of two things; 1) either treat
characters that are to be stripped as errors in the user's input, or 2)
we need to warn users that their input was modified by removing
unsupported characters.  Acting as if nothing happened is (in my mind)
unacceptable.

---
Kevin Benton
Perl/Bugzilla Developer/Administrator, Perforce SCM Administrator
Personal Computing Systems Group
Advanced Micro Devices
 
The opinions stated in this communication do not necessarily reflect the
view of Advanced Micro Devices and have not been reviewed by management.
This communication may contain sensitive and/or confidential and/or
proprietary information.  Distribution of such information is strictly
prohibited without prior consent of Advanced Micro Devices.  This
communication is for the intended recipient(s) only.  If you have
received this communication in error, please notify the sender, then
destroy any remaining copies of this communication.





More information about the developers mailing list