Possibly moving to InnoDB

Bradley Baetz bbaetz at acm.org
Sat Aug 5 12:29:41 UTC 2006


On 05/08/06, Max Kanat-Alexander <mkanat at bugzilla.org> wrote:
> On Sat, 2006-08-05 at 13:07 +1000, Bradley Baetz wrote:
> > Its *insanely* slow, although that may be because the on disk size is
> > 2-3 times as large. You need to be careful tuning the innodb
> > cache/buffer/etc sizes, too.

> > They also have a really really annoying deadlock detection setup - if
> > a query takes too long, its assumed to be due to a deadlock. So if you
> > have any long running query (and buglist.cgi does!), a concurrent
> > insert (which has to block if the select scans a row 'near' it, due to
> > next-key locking) may assume its in a deadlock.
>
>         Okay. In many cases, that would actually be fine for us, as long as the
> timeout is long enough. MySQL would just be automatically killing the
> long-running queries that we kill by hand in places like bmo.

It can be tweaked; I forget what the default is. But it doesn't kill
the *running* query - it kills the blocking one. (All together now -
eww!)

> > so there are a couple of options for the
> > replication thread to retry a few times (eww!) Which helps a fair bit,
> > until you run a really really long query on the slave....
>
>         There shouldn't be any really really long INSERT queries in Bugzilla,
> though. And if the replicator has to kill a long-running buglist SELECT
> statement in order to replicate an INSERT, that's fine by me.

Nope. Specifically, the case I see is a long-running reconcillation
script that basically does:

SELECT * FROM tab_a, tab_b WHERE tab_a.id=tab_b.a_id AND tab_a.foo = 2;
SELECT * FROM tab_a, tab_c WHERE tab_a.id=tab_c.a_id AND tab_a.foo = 2;

(Yes, it does make sense to do this)

in a transaction. This takes up to 15-20 minutes to run, pulling a few
hundred thousand rows each time.

At the same time, theres a replication statment doing:

UPDATE tab_a SET bar=1 WHERE id = 12345;

That blocks for the 15-20 minutes, then eventually dies, stopping
replication. |SLAVE START| is required to fix it.

The reason is that the various selects have grabbed a share lock on
the row, so the update can't run. Thats fine (well, its not really,
but...), except for the timeout.

See http://bugs.mysql.com/bug.php?id=8325:

  Fixed in 4.1.11 and 5.0.3, by introducing the SLAVE_TRANSACTION_RETRIES option
  (will be documented in the next days); the slave shall be started with
  --slave-transaction-retries=x where x>0 and then it will retry a failed
  transaction for x times before giving up.

Of course, setting it too high may just hide real problems.....

> > Also, be wary of mixing table types in queries, especially with locking.
>
>         Right, I'll have to look into that more. In an ideal world, I'd like to
> lock only the longdescs table and whatever table short_desc ends up in
> (since they both need a FULLTEXT index, although perhaps short_desc
> could live with a normal index now that it's a varchar), and let the DB
> engine do row-level locks on the other tables. However, I suspect that
> isn't possible...?

I think its meant to be, but may interact with manual LOCKs a bit
oddly - since you can't LOCK TABLE more than once per transaction, I'm
not sure if the 'implicit' lock works. It didn't used to work, but
that was a few years ago, and I may be misremembering it.

>         Hahaha, yeah. Obviously none of this is a problem with PostgreSQL. :-)

Oh, postgres has its own issues. They just fix them rather than
calling them features....

Bradley



More information about the developers mailing list