Custom fields schema

Mon Dec 20 19:34:50 UTC 2004

Bradley Baetz <bbaetz at acm.org> wrote:
>However, the biggest issue you run into is MySQL's use of at most one
>index per table per query. Try it again with product and component, and
>requirements on both, (joining the main table twice to the secondary
>table) and you'll see the difference. Then try it with a 'text table' and an
>'integer table' (so that you really have multiple tables).

OK, more hard data ahead...

Attached is a revised construct-table.pl program, which can create and
populates arbitrary numbers of string and integer fields (ten and two,
respectively, as currently written).  String-field columns in REALCOL are
called cf_aa, cf_ab, etc, and integer-field columns are cf_int1, cf_int2,
etc.  There are now two distributed-column tables, DISTCOL_STR and
DISTCOL_INT.  String fields are assigned field IDs going incrementally from
zero, and integer fields get field IDs going incrementally after the last
string field ID.  String fields are given random values taken from
/usr/share/dict/words, as before, with "meta" inserted randomly 1% of the
time.  Integer fields are given random values from 0 to 999.

All string-field conditions are INSTR searches for "meta", as before.  All
integer-field conditions are for numbers greater than 500.  The SQL
statements get pretty lengthy, so I've put those in another attachment,
sql-statements.txt.  I ran each query three times.

+--------------+---------------------+---------------------+------------+
| Query        | "Real" times        | "Distributed" times | Bugs found |
+--------------+---------------------+---------------------+------------+
| 1 str        | 8.96, 7.72, 7.74    | 19.71, 4.70, 4.73   |    4786    |
| 2 str        | 13.23, 13.39, 13.56 | 11.71, 6.68, 5.87   |      77    |
| 3 str        | 14.27, 14.49, 14.53 | 6.70, 5.79, 5.54    |       2    |
| 10 str       | 14.22, 14.56, 14.69 | 5.99, 4.84, 4.90    |       0    |
| 1 str, 1 int | 14.52, 14.42, 14.48 | 9.65, 8.52, 7.32    |    2371    |
| 2 str, 2 int | 13.81, 13.75, 14.57 | 12.44, 9.17, 7.77   |      16    |
+--------------+---------------------+---------------------+------------+

Both methods agreed on the number of bugs found in all cases.

Except for that first anomalous one-string distributed-column search, the
distributed-column search times are consistently significantly less than the
real-column search times, even with uncommonly large numbers of search
terms.  The real-column performance remains fairly flat, while the
distributed-column performance seems to benefit from repeated searches; I
assume this is due to cacheing.

Huff, puff...

Is this enough data to put this issue to rest?  Pretty please?

--Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: construct-tables.pl
Type: application/x-perl
Size: 2355 bytes
Desc: not available
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20041220/c9bd94ee/attachment.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sql-statements.txt
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20041220/c9bd94ee/attachment.txt>