FullText Searching: The Dilemma

Tue Jun 26 13:57:18 UTC 2007

On 26/06/07, Gervase Markham <gerv at mozilla.org> wrote:
> Max Kanat-Alexander wrote:
> >       In MySQL, InnoDB tables don't support FULLTEXT indexes. But
> > MyISAM tables don't support transactions.
>
> Are either of these facts going to change in the near future?

I believe not. Theres the new storage engine (whose name I forget),
but I wouldn't count on that for production code for a while.

> Just to be clear: different tables in the same database can use
> different storage engines?

Yes, but you get into ickyness when you try to use transactions over
them. It means that bugzilla could never do BEGIN; try something;
COMMIT and rely on <something> failing doing a rollback, for example.

> Can we support transactions on a MyISAM table by having a shadow InnoDB
> table with 1:1 row correspondence but no content, whose rows we lock
> before accessing the real table and unlock afterwards?

No, because you can't do row-level locking. I've implemented something
similar in the past with GET_LOCK - you GET_LOCK("bz_bugX"). But you
then have to have everything grab that lock, and buglist.cgi would
have to lock the whole table since it doesn't know what rows to do in
advance.

> Another random thought: would it be possible to mirror the full text
> fields into a second MyISAM "cache" table? Again, this avoids the
> restructuring needed for a split. But perhaps it wouldn't make it any
> easier to write the search code.

You can possibly do something with replication. Since everything is
statement level replication (5.1 may be different), you could
replicate the single table into another db where the table type was
different. But since you can only have one replication master in
mysql, you'd actually probably have to do it in a different mysql
instance. Its possible, but....

> >       I've looked into using an external fulltext engine, like
> > Lucene or something similar. The advantage is that fulltext would be the
> > same across every DB, and we could rank based on all comments combined.
>
> It's Yet Another Dependency, of course. And it doesn't seem very good
> architecturally, for the reasons you mention.

The 'best' option would have to be something like the old shadowdb
code did - insert changes into a separate (innodb) table, then have a
separate script put the data into a side myisam table, with format
(bug_id, key, value) using REPLACE. Remember, its not just long_desc,
its the summary field too, at least (isn't it?)

That comes will all the problems that the shadowdb had, too, of course.

Anyone know how useful mysql's trigger support is for doing that
stuff? Especially without having its next key locking get in the way?

Or, of course, just recommend that large installs use postgres+tsearch? ;)

Bradley