The Road to 2.18 - Customfields

Mon Mar 15 19:41:39 UTC 2004

Bradley Baetz <bbaetz at acm.org> wrote:
>On Mon, Mar 15, 2004 at 02:21:05AM -0500, Sean McAfee wrote:
>> Far from revealing a performance penalty, it appears that the more (scalar)
>> custom fields one queries against, the better the results.

>Thats not really likely, unless mysql is scanning fewer rows due to 
>additional constraints.

It is scanning fewer rows, thanks to the indices on the field_id column in
each of the custom field tables.  For example, here's the SQL that gets
generated (more or less) for the summary/integer field case:

SELECT b.bug_id FROM bugs AS b, cf_integer AS cfi WHERE (INSTR(LOWER(b.short_desc), 'astro') AND cfi.value >= 2700) AND cfi.field_id = 1 AND b.bug_id = cfi.bug_id

Thanks to the index, the MySQL server knows only to check those rows in
CF_INTEGER where the field_id is 1.  Because of the inner join, it
apparently can figure out that it only has to test the rows in BUGS that
match, by bug_id, the rows in CF_INTEGER.

>Run each query a few times, and throw the first run away. Mem caching 
>may account for that.

OK, OK.  How's this:

--------------------
#!/usr/bin/perl

use Bugzilla::CustomFields::Query qw(bug_iterator);
use Benchmark;
use strict;

do 'globals.pl';

my @summary     = ([ 'short_desc', 'casesubstring', 'Astro' ]);
my @onecustom   = (@summary, [ 'tlc_incident_num', '>=', 2700 ]);
my @twocustom   = (@onecustom, [ 'tlc_platform_serial_num', 'casesubstring',
                                 'n/a' ]);
my @threecustom = (@twocustom, [ 'tlc_reproducibility_details', 'casesubstring',
                                 'happens' ]);

timethese(25, {
    summary   => sub { count_bugs(@summary) },
    onecust   => sub { count_bugs(@onecustom) },
    twocust   => sub { count_bugs(@twocustom) },
    threecust => sub { count_bugs(@threecustom) },
});

sub count_bugs {
    my @where = @_;
    my $count = 0;
    my $iterator = bug_iterator(where => \@where, custom_fields => 0);
    while (my $bug = $iterator->()) {
        ++$count;
    }
    print "$count bugs found.\n";
}
--------------------

I replaced 'allwords' with 'casesubstring', which is more efficient.
This is the output (with duplicate output trimmed):

Benchmark: timing 25 iterations of onecust, summary, threecust, twocust...
11817 bugs found.
[...]
   onecust: 137 wallclock secs (59.54 usr +  4.03 sys = 63.57 CPU) @  0.39/s (n=25)
35552 bugs found.
[...]
   summary: 397 wallclock secs (115.32 usr +  6.94 sys = 122.26 CPU) @  0.20/s (n=25)
1010 bugs found.
[...]
 threecust: 106 wallclock secs (35.16 usr +  2.63 sys = 37.79 CPU) @  0.66/s (n=25)
2121 bugs found.
[...]
   twocust: 95 wallclock secs (37.37 usr +  2.52 sys = 39.89 CPU) @  0.63/s (n=25)

So yes, the trend that I thought I noted--more custom fields means less
time--is seen not to always hold.  Two is still more efficient than one,
however.

Now can we dispense with the notion, once and for all, that storing custom
field data in multiple tables is bad for performance?

--Sean