<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

    Sean McAfee wrote:

<blockquote cite="mid20041224035205.4174BBC60@mail.schwag.org"

 type="cite">

  <pre wrap="">Do you have a specific proposal that can be tested?  And, er, I don't want

to sound rude, but would you mind testing it yourself?  I've already done

quite a lot in that area.

  </pre>

</blockquote>

<br>

Ok, I modified construct-tables.pl to add fulltext indexes to the text

columns in the tests, create "real-distributed" columns (real columns,

but in a separate table rather than the bugs table, which is how a

"real columns" custom fields implementation would handle sparse

columns), and make distributed and real-distributed versions of the

status whiteboard field.<br>

<br>

I then wrote a script to automate testing of the queries in your

earlier email.  It runs each query six times, discards the first result

(which is often inaccurate because it includes costs unrelated to the

query itself), averages the rest, and reports the

results in a table similar to the one you included in your earlier

email along with each test's details (the query, its

query optimizer "explanation", and individual run times).<br>

<br>

I didn't modify your test queries except to fix some typos that caused

several of them not to run.  I added additional tests, including

versions of yours that use fulltext indexes, a couple that join tables

the way Bugzilla does in its queries (labeled "bzlike" in the tests),

and one that runs an actual Bugzilla query (a search for "meta" in the

status whiteboard of all bugs) along with distributed and

real-distributed versions of it against a recent copy of the b.m.o

database.<br>

<br>

I ran the tests on two machines, one called holepunch with two 733Mhz

x86 processors and one called megalon with two hyperthreaded Intel Xeon

2.xGhz processors.  holepunch did nothing but run the tests, but

megalon did other things at the same time, so its results may be

subject to more variance.<br>

<br>

I've attached the modified construct-tables.pl script, the

test-performance.pl script that runs the tests, and both machines'

results, which show the real and real-distributed queries outperforming

the distributed queries by significant margins in every case,

regardless of indexing, but up to 10x faster when using fulltext

indexes.<br>

<br>

I think these results are pretty accurate.  They validate database

theory (as bbaetz said of the distributed model, "it can

usually never be faster") and the consequencies of MySQL's "one index

per table" restriction.  I'm sure there are still optimizations that

could be done, and the queries themselves could be made much more

indicative of real Bugzilla queries along the lines bbaetz suggested in

his email.<br>

<br>

It would also be useful to run these against a

PostGreSQL database and on additional machines.  But I think the result

would be the same: real columns (either within the bugs table or in

their own table if sparse) would outperform distributed ones.<br>

<br>

-myk<br>

<br>

<br>

Notes:<br>

<ul>

  <li>Make sure you're running at least MySQL 4.0 if you try these

tests yourself.  Version 3.23 creates fulltext indexes too slowly on

datasets of the size generated by construct-tables.pl.</li>

  <li>Found records will vary between non-indexed and indexed queries

because the fulltext searches look for words starting with "meta"

instead of the "meta" substring.  This limitation of fulltext searching

is well worth the advantages in query speed and relevance ranking--most

people most of the time will want to fulltext search text.</li>

  <li>I made a single change to the construct-tables.pl script since

running it on the machines, removing generation of fulltext indexes for

the real distributed tables other than cf_aa (since that one is

actually used in a test, while the others aren't).  This shouldn't have

any effect on test results.<br>

  </li>

</ul>

<br>

</body>

</html>