control characters and Util::clean_text()

Frédéric Buclin LpSolit at gmail.com
Fri Dec 23 11:04:57 UTC 2005


I have backed out bug 238780 (Edit versions should reject newline
characters) because I found a testcase which makes it to fail and also 
because we have to take a clear decision on how to handle these control
characters.

Here are the four choices we have (I see no others for the moment):

1) force checksetup.pl to scan the whole DB to remove such characters
(except for fields which are in some given whitelist) and prevent such
characters for all future values for fields which are not in this whitelist;
2) only prevent *new* values to have such characters (either new values
or renamed values), but let old (existing) values untouched;
3) do nothing (except for bug summaries?) and live with it;
4) force Util::perform_substs() to remove these control characters when
preparing the templates to be passed to your MTA, except for fields
being in some given whitelist, but don't alter the DB itself.


1. is a problem because scanning the whole DB (even only once) would 
probably take a huge time. Moreover, we could fall on the testcase I 
describe in bug 238780 comment 23 (duplicated entries in the DB). The 
reason to do these changes from checksetup.pl itself is that as long as 
they are not made by someone with enough privs, e.g. the bug summary by 
a user with editbugs privs, or a version value by a user with 
editcomponents privs, the existing "corrupted" values would still be 
used, and would still be sent as is per email. I reviewed a patch 
yesterday which automatically removes control characters from bug 
summaries when editing a bug, but if the user editing this bug is a user 
with no privs, process_bug.cgi is complaining that this powerless user 
tries to alter the bug summary (because control characters are removed) 
and so the user cannot edit bugs having such characters anymore. And so 
he would have to wait till a nice guy with editbugs privs edits the bug.
Argh!!

A workaround would be to call CheckCanChangeField() *before* removing
such characters so that it doesn't complain if that's really the only
change occuring with the bug summary. And this way the bug summary used
in the email header would be "clean" (because the new bug summary is 
used, the old one appearing in the body of the email, which is safe). 
But there are other fields involved when editing a bug, such as the 
product name itself, the component name, the product version, the target 
milestone, etc.. and such values can only be edited by someone with 
editcomponents privs. But as long as no such users go to edit*.cgi to 
change these values, the "corrupted" ones would still be used, again and 
again, and would still appear in emails (but by default, they do not 
appear in the email header, so that's not a security issue). That's why 
I suggested to let checksetup.pl do this cleanup itself, to be sure to 
have a clean and usable DB.


2) here, we keep existing values as is, but prevent to rename them to
values having control characters, and also prevent the user from 
creating new values with such characters. But we would have mixed 
values, some without and some with control characters in them. We could 
as well do nothing... this is precisely my point 3). Moreover, even this 
kind of approach can fail, see again my testcase in bug 238780 comment 23.


4) this last suggestion is to only filter values passed to emails, but
without altering values being in the DB. After all, Bugzilla lives
pretty well with such characters AFAIK. We would have to edit some email 
templates too, because not all scripts call perform_substs() (the ones 
calling BugMail::MessageToMTA() directly) and those have to be checked 
too, and "FILTER html" used in templates doesn't prevent control 
characters, so we would need a new "FILTER paranoid" or something like that.


As you can see, I have no good solution so far. Some are more invasive 
than others, some are prone to regressions, etc... Have you other 
suggestions? Or any preference among the ones I presented here?

What I would like, really, is a "Live" discussion on IRC at some given 
date and time where we could discuss such problems. A kind of weekly (?) 
"Bugzilla meeting", as someone suggested on IRC a few days ago.


LpSolit



More information about the developers mailing list