Search.pm SQL queries

Thu Feb 26 02:56:21 UTC 2009

On Wed, 25 Feb 2009 14:40:49 -0800 David Marshall
<dmarshal at yahoo-inc.com> wrote:
> (I am working semi-furiously on
> catching us up and will expose whatever code I can as soon as
> possible.)

	Cool. We refactored Search.pm, so if you gave us a patch right
now it wouldn't apply, but the actual internal logic of all the search
operators hasn't really changed much, so it shouldn't be too hard to
port.

> I anecdotally suspect that for at least MySQL 4.1, when the number of
> possible query plans exceeds some number (around 24), MySQL gives up
> and just goes with some arbitrary join order, usually the order as
> written in the query.

	BTW, this makes me wonder--have you tried an experimental
import into PostgreSQL to see if it gives you better query plans/times?
I know that Yahoo has a big investment in MySQL and so wouldn't move
off it, but I'd be interested to know if there really is a significant
difference. I saw a fair difference in some of my testing a few years
back, but I didn't have a DB like you guys do.

> WHERE bugs.bug_id IN (list)

	Isn't that IN slow, though? I thought MySQL stopped using
indexes on IN clauses over a certain size, or something.

> Finally, what has made the biggest difference is based on MySQL 4.1's
> inability to use multiple indexes at once.

	Supposedly fixed in 5.1 or 5.0, though. I'm not sure I've seen
any query plans where it actually does that though.

> Problems with Search.pm have less to do with the transformation of
> some chart tuple to SQL and more to do with the philosophy on
> constructing one big query and hoping for the best.  Making this more
> complicated is that for some searches, it is better to just leave
> things alone and let the database figure it out!

	Yeah, although I'd like to think in an ideal world that the
database should always be the best at figuring these things out...or at
least that query optimizers will improve in the future.

> I used to say that Search.pm should only return a list of bug IDs
> that match the search criteria, and then buglist.cgi should get the
> data needed for display.  I'm wavering on that now, because as long
> as Search.pm has gone to the trouble of figuring everything out, why
> not just get the data?

	Yeah, I think eventually Search.pm will just return Bug
objects.

> The future direction of Search.pm for us is to improve the tree
> mechanism to produce faster queries. 

	And perhaps to upgrade to MySQL 5.x and see if that improves
things?

>  This will probably involve locking all the tables involved

	Probably unnecessary, since 3.2 uses InnoDB? Or you could do
READ UNCOMMITTED and see if that makes a difference?

> and doing possibly multiple
> queries to determine the list of bugs followed by one grand query to
> get all the data for those bugs,

	Yeah, that might be reasonable. I'd want to see what the actual
pathological queries are, though, if we're going to adjust things to
work better upstream. I'm sure you guys have a lot of data that would
be valuable to us in terms of optimization.

	I'd be opposed to any optimization before we finish refactoring
Search.pm architecturally, though, so that should happen first.

	-Max
-- 
http://www.everythingsolved.com/
Competent, Friendly Bugzilla and Perl Services. Everything Else, too.