Custom fields schema

Myk Melez myk at mozilla.org
Fri Jan 21 19:09:05 UTC 2005


    Sean McAfee wrote:

>Do you have a specific proposal that can be tested?  And, er, I don't want
>to sound rude, but would you mind testing it yourself?  I've already done
>quite a lot in that area.
>  
>

Ok, I modified construct-tables.pl to add fulltext indexes to the text 
columns in the tests, create "real-distributed" columns (real columns, 
but in a separate table rather than the bugs table, which is how a "real 
columns" custom fields implementation would handle sparse columns), and 
make distributed and real-distributed versions of the status whiteboard 
field.

I then wrote a script to automate testing of the queries in your earlier 
email.  It runs each query six times, discards the first result (which 
is often inaccurate because it includes costs unrelated to the query 
itself), averages the rest, and reports the results in a table similar 
to the one you included in your earlier email along with each test's 
details (the query, its query optimizer "explanation", and individual 
run times).

I didn't modify your test queries except to fix some typos that caused 
several of them not to run.  I added additional tests, including 
versions of yours that use fulltext indexes, a couple that join tables 
the way Bugzilla does in its queries (labeled "bzlike" in the tests), 
and one that runs an actual Bugzilla query (a search for "meta" in the 
status whiteboard of all bugs) along with distributed and 
real-distributed versions of it against a recent copy of the b.m.o database.

I ran the tests on two machines, one called holepunch with two 733Mhz 
x86 processors and one called megalon with two hyperthreaded Intel Xeon 
2.xGhz processors.  holepunch did nothing but run the tests, but megalon 
did other things at the same time, so its results may be subject to more 
variance.

I've attached the modified construct-tables.pl script, the 
test-performance.pl script that runs the tests, and both machines' 
results, which show the real and real-distributed queries outperforming 
the distributed queries by significant margins in every case, regardless 
of indexing, but up to 10x faster when using fulltext indexes.

I think these results are pretty accurate.  They validate database 
theory (as bbaetz said of the distributed model, "it can usually never 
be faster") and the consequencies of MySQL's "one index per table" 
restriction.  I'm sure there are still optimizations that could be done, 
and the queries themselves could be made much more indicative of real 
Bugzilla queries along the lines bbaetz suggested in his email.

It would also be useful to run these against a PostGreSQL database and 
on additional machines.  But I think the result would be the same: real 
columns (either within the bugs table or in their own table if sparse) 
would outperform distributed ones.

-myk


Notes:

    * Make sure you're running at least MySQL 4.0 if you try these tests
      yourself.  Version 3.23 creates fulltext indexes too slowly on
      datasets of the size generated by construct-tables.pl.
    * Found records will vary between non-indexed and indexed queries
      because the fulltext searches look for words starting with "meta"
      instead of the "meta" substring.  This limitation of fulltext
      searching is well worth the advantages in query speed and
      relevance ranking--most people most of the time will want to
      fulltext search text.
    * I made a single change to the construct-tables.pl script since
      running it on the machines, removing generation of fulltext
      indexes for the real distributed tables other than cf_aa (since
      that one is actually used in a test, while the others aren't). 
      This shouldn't have any effect on test results.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: construct-tables.pl
Type: application/x-perl
Size: 5412 bytes
Desc: not available
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-performance.pl
Type: application/x-perl
Size: 17245 bytes
Desc: not available
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment-0001.pl>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment-0001.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050121/59d411c6/attachment-0002.html>


More information about the developers mailing list