Do we want Bugzilla to be case-sensitive or

Thu Nov 3 21:10:25 UTC 2005

> -----Original Message-----
> From: developers-owner at bugzilla.org
[mailto:developers-owner at bugzilla.org]
> On Behalf Of Max Kanat-Alexander
> Sent: Thursday, November 03, 2005 12:54 PM
> To: developers at bugzilla.org
> Subject: Re: Do we want Bugzilla to be case-sensitive or
case-insensitive?
> 
> 	I think that perhaps the heart of the issue has been slightly
> missed,
> in some of the discussion here.
> 
> 	When using the Search screen, Products, Components, Keywords,
> Summaries, *anything* that you type in the Boolean Charts -- all are
> *case sensitive* on PostgreSQL. In fact, any text that you type
> *anywhere* to match *anything* is case sensitive on PostgreSQL.
> 
> 	The is NO database-wide way around that.
> 
> 	The only way to fix it is to make EVERY SINGLE:
> 
> 	product = ?
> 
> 	into:
> 
> 	$dbh->sql_istrcmp('product', '?')
> 
> 	In ALL OF BUGZILLA. And not just for products, but for
everything
> that
> we want to match, anywhere, ever.
> 
> 	At which point, on some DBs, we'd lose the value of our indexes,
or
> on
> some others we'd have to start keeping track of a bunch of indexes on
> LOWER(product).
> 
> 	Since we're not going to do that, we made only the most critical
> thing
> (usernames) case-insensitive on PostgreSQL, and everything else is
case
> sensitive.
> 
> 	If anybody comes up with a way to make an entire PostgreSQL
database
> do
> case-insensitive comparisons, then I'm certainly open to it.

First, my disclaimer - I'm not a PostgreSQL expert so if I'm off-base,
please let me know (kindly) and then politely ignore the comments below.

For things like product and component names, in PostgreSQL, you could
decide to store it in two forms, one already converted to lower case,
and another in mixed-case format.  One other way to get around the
problem entirely is to enforce or not enforce duplicate inserts for all
records that differ only by case.  So, if someone tries to insert Xyz,
then someone else tries to insert xyz or XYZ, as a Product or Component
for example, you could reject the search given the proper settings.
That could probably be done fairly easily through a trigger.  You could
also do your comparisons against a function that converts field values
and search strings to lower case.

>From a more philosophical view...

As we incorporate new databases as back-ends for Bugzilla, we're going
to have to deal with the quirks of each database separately.  Since I'm
hearing you say that PostgreSQL doesn't support case-insensitive
searches globally, we need to find a way to address that quirk.  One
possible solution is to ask the PostgreSQL developers for that ability.
Since we haven't supported PostgreSQL until now, asking users to upgrade
to a very recent version seems fair to me.  The point of saying use DBI
is appropriate, but it's not.  Not every database does indexing the same
way (as we know) and to try to deal with all databases using one
standard method seems to me to be impractical.  On the other hand,
creating a class to deal with all the databases so that programmers
developers who really don't care about the quirks of one over another
can do it without caring seems very wise.  Personally, I thought that
getting rid of DB.pm wasn't such a hot idea.  Why?  It moves us away
from abstracting interaction with the underlying back-end.  If we
utilize a module for DB interaction, it allows us to update just that
portion of the code when it comes time to deal with how to ask the
database for information and how to send it changes.  As we add support
for new back-ends like Oracle, MS-SQL, XML, or whatever else, it will be
much easier if we use a DB interaction class because the interaction
would be pre-defined.  Implementation would be left only in the DB.pm
module (or a module it imported).  In this model, only that module would
need to be updated in order to handle new databases (or new database
versions).

Do I think comparisons should be case insensitive?  From a process
management perspective, hands-down yes I think searches should be case
insensitive except in very rare circumstances.  If someone can type Xyz
and someone else can type XYZ in a search not knowing that they mean
something different, they're going to get confusing and different
results.  If Acme and acme are different, it seems to me that there's a
bigger underlying problem - people don't usually know that Acme != acme
!= ACME != aCME.  Having said that, I wouldn't want to take away the
ability for users to search in a method that's case sensitive or
insensitive.

This brings up a point about usability - while it's really cool to have
the ability to store things in case-sensitive formats, I think Microsoft
did do something right by deciding that filenames should be stored in
mixed case format, but will always match regardless of case.
Unfortunately for us, this is the mentality we have to deal with in a
large part of our user-bases.  Users simply don't expect that Acme is
different from acme.  No matter how we choose to tackle this, it seems
to me that we need the highest level of usability we can offer and in my
mind, that means we must be able to provide case-insensitive searches in
the Boolean search tables.  It also seems to me that where a
case-sensitive search is required over a case-insensitive search is very
rare.

Therefore, I think that we should be able to do all our searches without
regard for case first, then deal with the exceptions, possibly by giving
a user an option to do a case-sensitive search.