FullText Searching: The Dilemma

Tue Jun 26 22:17:33 UTC 2007

On 27/06/07, Gervase Markham <gerv at mozilla.org> wrote:
> Bradley Baetz wrote:
> > On 26/06/07, Gervase Markham <gerv at mozilla.org> wrote:
> >> Are either of these facts going to change in the near future?
> >
> > I believe not. Theres the new storage engine (whose name I forget),
> > but I wouldn't count on that for production code for a while.
>
> I ask only because MySQL/InnoDB engineers were mumbling in emails about
> full text searching for InnoDB as far back as early '06.

Yeah.

> > Yes, but you get into ickyness when you try to use transactions over
> > them. It means that bugzilla could never do BEGIN; try something;
> > COMMIT and rely on <something> failing doing a rollback, for example.
>
> I can't quite parse that...

Its not ACID, so:

eval {
  $dbh->begin;
  UPDATE
  $dbh->commit;
};
if ($@) {
  ThrowCodeError(....)
}

may leave the secondary table modified but not the first. Or a power
failure in the middle may do the same. Theres no ACID for myiasm.

> >> Can we support transactions on a MyISAM table by having a shadow InnoDB
> >> table with 1:1 row correspondence but no content, whose rows we lock
> >> before accessing the real table and unlock afterwards?
> >
> > No, because you can't do row-level locking.
>
> I don't quite follow. Where am I going wrong?
>
> - Lock row 23 of InnoDB table
> - Edit row 23 of MyISAM table
> - Unlock row 23 of InnoDB table
>
> InnoDB supports row level locking...

Firstly, you can't explicitly lock a row.

Secondly, changes made to the myisam table will show up immediately,
but the innodb ones won't.

> > I've implemented something
> > similar in the past with GET_LOCK - you GET_LOCK("bz_bugX"). But you
> > then have to have everything grab that lock, and buglist.cgi would
> > have to lock the whole table since it doesn't know what rows to do in
> > advance.
>
> Why does buglist.cgi have to lock anything? All it does is display data.
> You might get some bugs being after one update, and other bugs being
> before it, but that's not the world's scariest thing.

> > Or, of course, just recommend that large installs use postgres+tsearch? ;)
>
> We could do that. How hard would it be to convert b.m.o.? Are there
> scripts, or do all conversions have to be done by hand? Do we know how
> MoCo IT would feel about supporting Postgres? Do we feel the Postgres
> support is stable enough? Are there other downsides?

The biggest issues are postgres upgrades (dump+restore). Theres also
the issue that if you use slony for replication, schema changes have
to happen via the slony tools, which may prove interesting for
checksetup. Probably not too hard now that the sql-level stuff has
been abstracted out, but would need a fair bit of tweaking.

OTOH, maybe bugzilla doesn't need replication for performance due to
MVCC, and can use DRBD for redundancy?

Bradley