Why quicksearch sucks

Bradley Baetz bbaetz at student.usyd.edu.au
Fri Nov 15 10:56:57 UTC 2002


As you've probably noticed, quicksearch has been disabled on bmo due to
load issues. This is bug 179960.

The main problem is that quicksearch entries are inherently slow. We do
a substring search on the description OR status whiteboard for each
word. This means that the best we can get away with is a linear search
of all the bug in teh database.

However, we also do a substring match on the product/components. Before
the upgrade, this was just an extra string search. After the upgrade +
bug 43600, this is now a string search on the product/component table,
and an integer equality test on teh bugs table, via a JOIN/

Now, you'd think that that would be faster - comparing a number has to
beat a string subsearch on each bug entry. The problem is that QS uses
OR between items, so the database cannot use the product/component stuff
as a filter to start with. Instead, rather than the 'simple' traversal
of all the entries, because this is a join the db has to use a separate
table, to create the rows containt the bugs, product, _and_ component
entries. This table ends up having to be on disk for our 180000-odd
bugs, which is real slow.

This is basically the same problem as bug 127200, I think.

So, how do we fix it? The correct fix is to use a subselect for the
product id:

SELECT ... FROM bugs WHERE bugs.product_id IN (SELECT id FROM products
WHERE (instr(products.name, 'foo')))) OR ... OR instr(bugs.short_desc,
"foo"))

That then gets us back to the previous search, but probably slightly
faster because of the number checks.

Which is great, except that mysql doesn't support SUBSELECTs.

So the alternate solution is to do what was done in bug 21700, and
lookup stuff in joined tables first. I wasnt to do this via some sort of
warpper func which would use subselects for database servers, and a
separate select + IN for mysql. There are a couple of disadvantages with
this:

- the routine we use will be messy
- for small matches, (not using OR), this may be slower, because of the
extra perl<->db overhead
- for really large matches, this may be slower, not only because of the
overhead, but because matching most of the db will require the temp
table anyway.

Also:

- &debug=1 output won't show the full query. Tough. :)
- Bugzilla::Search won't really be abel to give you back the SQL per se;
it'd have to change to a ->runQuery method. Its probably going to have
to do that for other reasons, though, so thats not an issue.

I'm not sure how much the last two will affect things; the second isn't
the common case, and we really shouldn't be tuning to people seecting >
50000 bugs. The first is unlikly to be much - even if its (say) 20%
(which its not on localhost, but could be if your db server was a long
way away), 120% of 0.01 seconds isn't going to be noticable by anyone,
and it smore than balanced out by teh cases where it would matter.

There are issues with the max size for an IN; we can fix that by
breaking it up into subsections and then using OR on the bits.

Note that when we move to custfields, we'll have exactly the same
problem, so we do need a generic solution to this.

Thoughts?

Bradley



More information about the developers mailing list