Custom fields schema
Myk Melez
myk at mozilla.org
Fri Jan 21 19:09:05 UTC 2005
Sean McAfee wrote:
>Do you have a specific proposal that can be tested? And, er, I don't want
>to sound rude, but would you mind testing it yourself? I've already done
>quite a lot in that area.
>
>
Ok, I modified construct-tables.pl to add fulltext indexes to the text
columns in the tests, create "real-distributed" columns (real columns,
but in a separate table rather than the bugs table, which is how a "real
columns" custom fields implementation would handle sparse columns), and
make distributed and real-distributed versions of the status whiteboard
field.
I then wrote a script to automate testing of the queries in your earlier
email. It runs each query six times, discards the first result (which
is often inaccurate because it includes costs unrelated to the query
itself), averages the rest, and reports the results in a table similar
to the one you included in your earlier email along with each test's
details (the query, its query optimizer "explanation", and individual
run times).
I didn't modify your test queries except to fix some typos that caused
several of them not to run. I added additional tests, including
versions of yours that use fulltext indexes, a couple that join tables
the way Bugzilla does in its queries (labeled "bzlike" in the tests),
and one that runs an actual Bugzilla query (a search for "meta" in the
status whiteboard of all bugs) along with distributed and
real-distributed versions of it against a recent copy of the b.m.o database.
I ran the tests on two machines, one called holepunch with two 733Mhz
x86 processors and one called megalon with two hyperthreaded Intel Xeon
2.xGhz processors. holepunch did nothing but run the tests, but megalon
did other things at the same time, so its results may be subject to more
variance.
I've attached the modified construct-tables.pl script, the
test-performance.pl script that runs the tests, and both machines'
results, which show the real and real-distributed queries outperforming
the distributed queries by significant margins in every case, regardless
of indexing, but up to 10x faster when using fulltext indexes.
I think these results are pretty accurate. They validate database
theory (as bbaetz said of the distributed model, "it can usually never
be faster") and the consequencies of MySQL's "one index per table"
restriction. I'm sure there are still optimizations that could be done,
and the queries themselves could be made much more indicative of real
Bugzilla queries along the lines bbaetz suggested in his email.
It would also be useful to run these against a PostGreSQL database and
on additional machines. But I think the result would be the same: real
columns (either within the bugs table or in their own table if sparse)
would outperform distributed ones.
-myk
Notes:
* Make sure you're running at least MySQL 4.0 if you try these tests
yourself. Version 3.23 creates fulltext indexes too slowly on
datasets of the size generated by construct-tables.pl.
* Found records will vary between non-indexed and indexed queries
because the fulltext searches look for words starting with "meta"
instead of the "meta" substring. This limitation of fulltext
searching is well worth the advantages in query speed and
relevance ranking--most people most of the time will want to
fulltext search text.
* I made a single change to the construct-tables.pl script since
running it on the machines, removing generation of fulltext
indexes for the real distributed tables other than cf_aa (since
that one is actually used in a test, while the others aren't).
This shouldn't have any effect on test results.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: construct-tables.pl
Type: application/x-perl
Size: 5412 bytes
Desc: not available
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-performance.pl
Type: application/x-perl
Size: 17245 bytes
Desc: not available
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment-0001.pl>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment-0001.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment-0002.html>
More information about the developers
mailing list