FullText Searching: The Dilemma

Mon Jun 25 22:38:50 UTC 2007

	Full-text searching in Bugzilla is a bit of a problem.

	In MySQL, InnoDB tables don't support FULLTEXT indexes. But
MyISAM tables don't support transactions.

	So, for 3.0 we removed bugs.short_desc from the FULLTEXT
searching and now we just use a normal LIKE search. However, you can
see that that makes large installations like bugzilla.mozilla.org very
slow.

	In addition, it's very difficult to write SQL (with the current
way Search.pm works) that ranks a bug based on *all* of its comments
combined, instead of just ranking it on each comment individually. That
is, if you search for "Java", a bug with a single comment that's just
the word "Java" could actually be ranked higher than a bug with 10
comments about Java. (To you SQL experts out there: I know it's
possible, it's just not easy with the way Search.pm works.)

	Also, fulltext engines are very different between databases,
requiring us to re-implement fulltext separately for each one.

	I've looked into using an external fulltext engine, like
Lucene or something similar. The advantage is that fulltext would be the
same across every DB, and we could rank based on all comments combined.

	The problem would be that the fulltext search would no longer
be combined with the SQL search. So we'd have to first do one, and then
the other. For example, we could first search all bugs for "Java" and
then restrict them based on the other search criteria.

	Does anybody have any ideas in this department, as to how we
could do this whole full-text searching better? In a way that would
perform well, be easy to develop, and work well with our current search
architecture?

	-Max
-- 
http://www.everythingsolved.com/
Competent, Friendly Bugzilla Services. And Everything Else, too.