FullText Searching: The Dilemma

Tue Jun 26 03:55:48 UTC 2007

Max,

I'm sure this may have been discussed before, however, I don't see any
reason why we can't split the bugs table into non-text fields stored
using InnoDB and a separate bugs_more table that holds text fields
stored in MyISAM.  This would have two benefits:  1) it minimizes risk
by splitting data across tables (versus leaving the entire table in
MyISAM), and 2) gains transactional integrity while retaining speed (the
bugs table would retain transactional status while the bugs_more table
would not).  Splitting the tables also gains a hidden benefit - it makes
it more likely that searches of non-bugs_more data will go much faster
because it's possible that records in the bugs table may be stored in a
fixed format even though we have variable-length records in the table.

Just a few thoughts... :-)

Kevin Benton
MySQL DBA #5739
Senior Software Developer
MSS Silicon Design Engineering
Advanced Micro Devices

The opinions stated in this communication do not necessarily reflect the
view of Advanced Micro Devices and have not been reviewed by management.
This communication may contain sensitive and/or confidential and/or
proprietary information.  Distribution of such information is strictly
prohibited without prior consent of Advanced Micro Devices.  This
communication is for the intended recipient(s) only.  If you have
received this communication in error, please notify the sender, then
destroy any remaining copies of this communication.

> -----Original Message-----
> From: developers-owner at bugzilla.org
[mailto:developers-owner at bugzilla.org]
> On Behalf Of Max Kanat-Alexander
> Sent: Monday, June 25, 2007 4:39 PM
> To: developers at bugzilla.org
> Subject: FullText Searching: The Dilemma
> 
> 	Full-text searching in Bugzilla is a bit of a problem.
> 
> 	In MySQL, InnoDB tables don't support FULLTEXT indexes. But
> MyISAM tables don't support transactions.
> 
> 	So, for 3.0 we removed bugs.short_desc from the FULLTEXT
> searching and now we just use a normal LIKE search. However, you can
> see that that makes large installations like bugzilla.mozilla.org very
> slow.
> 
> 	In addition, it's very difficult to write SQL (with the current
> way Search.pm works) that ranks a bug based on *all* of its comments
> combined, instead of just ranking it on each comment individually.
That
> is, if you search for "Java", a bug with a single comment that's just
> the word "Java" could actually be ranked higher than a bug with 10
> comments about Java. (To you SQL experts out there: I know it's
> possible, it's just not easy with the way Search.pm works.)
> 
> 	Also, fulltext engines are very different between databases,
> requiring us to re-implement fulltext separately for each one.
> 
> 	I've looked into using an external fulltext engine, like
> Lucene or something similar. The advantage is that fulltext would be
the
> same across every DB, and we could rank based on all comments
combined.
> 
> 	The problem would be that the fulltext search would no longer
> be combined with the SQL search. So we'd have to first do one, and
then
> the other. For example, we could first search all bugs for "Java" and
> then restrict them based on the other search criteria.
> 
> 	Does anybody have any ideas in this department, as to how we
> could do this whole full-text searching better? In a way that would
> perform well, be easy to develop, and work well with our current
search
> architecture?
> 
> 	-Max
> --
> http://www.everythingsolved.com/
> Competent, Friendly Bugzilla Services. And Everything Else, too.
> -
> To view or change your list settings, click here:
> <http://bugzilla.org/cgi-bin/mj_wwwusr?user=kevin.benton@amd.com>
>