control characters and Util::clean_text()
Frédéric Buclin
LpSolit at gmail.com
Fri Dec 23 11:04:57 UTC 2005
I have backed out bug 238780 (Edit versions should reject newline
characters) because I found a testcase which makes it to fail and also
because we have to take a clear decision on how to handle these control
characters.
Here are the four choices we have (I see no others for the moment):
1) force checksetup.pl to scan the whole DB to remove such characters
(except for fields which are in some given whitelist) and prevent such
characters for all future values for fields which are not in this whitelist;
2) only prevent *new* values to have such characters (either new values
or renamed values), but let old (existing) values untouched;
3) do nothing (except for bug summaries?) and live with it;
4) force Util::perform_substs() to remove these control characters when
preparing the templates to be passed to your MTA, except for fields
being in some given whitelist, but don't alter the DB itself.
1. is a problem because scanning the whole DB (even only once) would
probably take a huge time. Moreover, we could fall on the testcase I
describe in bug 238780 comment 23 (duplicated entries in the DB). The
reason to do these changes from checksetup.pl itself is that as long as
they are not made by someone with enough privs, e.g. the bug summary by
a user with editbugs privs, or a version value by a user with
editcomponents privs, the existing "corrupted" values would still be
used, and would still be sent as is per email. I reviewed a patch
yesterday which automatically removes control characters from bug
summaries when editing a bug, but if the user editing this bug is a user
with no privs, process_bug.cgi is complaining that this powerless user
tries to alter the bug summary (because control characters are removed)
and so the user cannot edit bugs having such characters anymore. And so
he would have to wait till a nice guy with editbugs privs edits the bug.
Argh!!
A workaround would be to call CheckCanChangeField() *before* removing
such characters so that it doesn't complain if that's really the only
change occuring with the bug summary. And this way the bug summary used
in the email header would be "clean" (because the new bug summary is
used, the old one appearing in the body of the email, which is safe).
But there are other fields involved when editing a bug, such as the
product name itself, the component name, the product version, the target
milestone, etc.. and such values can only be edited by someone with
editcomponents privs. But as long as no such users go to edit*.cgi to
change these values, the "corrupted" ones would still be used, again and
again, and would still appear in emails (but by default, they do not
appear in the email header, so that's not a security issue). That's why
I suggested to let checksetup.pl do this cleanup itself, to be sure to
have a clean and usable DB.
2) here, we keep existing values as is, but prevent to rename them to
values having control characters, and also prevent the user from
creating new values with such characters. But we would have mixed
values, some without and some with control characters in them. We could
as well do nothing... this is precisely my point 3). Moreover, even this
kind of approach can fail, see again my testcase in bug 238780 comment 23.
4) this last suggestion is to only filter values passed to emails, but
without altering values being in the DB. After all, Bugzilla lives
pretty well with such characters AFAIK. We would have to edit some email
templates too, because not all scripts call perform_substs() (the ones
calling BugMail::MessageToMTA() directly) and those have to be checked
too, and "FILTER html" used in templates doesn't prevent control
characters, so we would need a new "FILTER paranoid" or something like that.
As you can see, I have no good solution so far. Some are more invasive
than others, some are prone to regressions, etc... Have you other
suggestions? Or any preference among the ones I presented here?
What I would like, really, is a "Live" discussion on IRC at some given
date and time where we could discuss such problems. A kind of weekly (?)
"Bugzilla meeting", as someone suggested on IRC a few days ago.
LpSolit
More information about the developers
mailing list