Possibly moving to InnoDB

Sat Aug 5 11:55:16 UTC 2006

On Sat, 2006-08-05 at 13:07 +1000, Bradley Baetz wrote:
> Its *insanely* slow, although that may be because the on disk size is
> 2-3 times as large. You need to be careful tuning the innodb
> cache/buffer/etc sizes, too.

	Is that just compared to PostgreSQL? And is that on INSERTs or SELECTs,
and on what kind of load? Bugzilla's load is not exactly a typical
database load, since it's very few INSERTs and long-running complex
SELECTs with a lot of simultaneous simple SELECTs.

	I'd think that once we move to transactions, also, on something the
size of bmo anything would be faster than our current architecture,
which has to lock the whole table every time we want to do an INSERT.

	I'd be willing to take a very small performance degrade to get
referential integrity and transactions, but I wouldn't accept, say, a
50% degrade in Bugzilla's overall performance.

	I've also heard that the storage format improved in 5.0, and takes less
disk space.

	I know that LiveJournal and Wikipedia use InnoDB, and so do a lot of
other large-scale sites, and because of the reliability issues, they
mostly say that they'd choose InnoDB over MyISAM any day, except for
tables that need FULLTEXT indexing.

> Also, there are lots of deadlock problems (documented as 'features'),
> mostly to do with next key locking (which is also documented as a
> feature)
> [snip]

	Hrm. We don't have any multi-row INSERTs that a CGI would do, that I
can think of, all of our PKs are AUTO_INCREMENT integers, and we never
try to insert out-of-order items. DELETEs are very rare, also, in
Bugzilla, except in the logincookies table.

	Thankfully, we also never use SELECT FOR UPDATE or SELECT FOR INSERT,
which would also cause the problem, and we only rarely insert items out
of PK order.

	I might be concerned about this for the logincookies table; I'd have to
look into it. Perhaps I should write a multi-process Bugzilla
load-testing tool anyhow.

> They also have a really really annoying deadlock detection setup - if
> a query takes too long, its assumed to be due to a deadlock. So if you
> have any long running query (and buglist.cgi does!), a concurrent
> insert (which has to block if the select scans a row 'near' it, due to
> next-key locking) may assume its in a deadlock.

	Okay. In many cases, that would actually be fine for us, as long as the
timeout is long enough. MySQL would just be automatically killing the
long-running queries that we kill by hand in places like bmo.

> This is really painful in replication setups,

	Ah, I can imagine that could be difficult.

> so there are a couple of options for the
> replication thread to retry a few times (eww!) Which helps a fair bit,
> until you run a really really long query on the slave....

	There shouldn't be any really really long INSERT queries in Bugzilla,
though. And if the replicator has to kill a long-running buglist SELECT
statement in order to replicate an INSERT, that's fine by me.

> Also, be wary of mixing table types in queries, especially with locking.

	Right, I'll have to look into that more. In an ideal world, I'd like to
lock only the longdescs table and whatever table short_desc ends up in
(since they both need a FULLTEXT index, although perhaps short_desc
could live with a normal index now that it's a varchar), and let the DB
engine do row-level locks on the other tables. However, I suspect that
isn't possible...?

> On the flip side, I've yet to see corrupt tables with innodb. I've yet
> to see them with postgres either, though....

	Hahaha, yeah. Obviously none of this is a problem with PostgreSQL. :-)

	-Max
-- 
http://www.everythingsolved.com/
Everything Solved: Competent, Friendly Bugzilla and Linux Services