How do we want CC list or commenter "does not match" to work??

Bradley Baetz bbaetz at acm.org
Sat Feb 7 12:45:20 UTC 2004


On Fri, Feb 06, 2004 at 06:28:45PM -0500, David Miller wrote:
> how do you do it from SQL?

subselects.

> I can't think of any SQL to produce that which doesn't involve subselects
> (and would probably be a major performance hit, even if we required a new
> enough version of MySQL to do it that way).

We used to use Joel's thing for groupset checking, but it was too slow. 
(Also, you need to use DISTINCT in general)

Its also painful when you want a bug where person a is on the cc list 
and person b is on the cc list, but not person c. You can combine them, 
but not given how Search.pm is set up.

You really, really want subselects. Here is PG 7.4.1 on a made up 
randomly distributed DB, with 100,000 bugs, 20000 people, and 1,000,000 
entries in the cc table. After running the queries a few times, so that 
its all in RAM:

bbaetz=> select bugs.bug_id from bugs left join cc on 
bugs.bug_id=cc.bug_id and cc.who = 123 where cc.who IS NULL;
Time: 464.632 ms

bbaetz=> select bug_id from bugs where bug_id not in (select bug_id from 
cc where who = 123);
Time: 254.640 ms

(Numbers vary by 10-15% each run)

If you look at the plans, both do an index scan on the cc table, to get 
all the bugs where person 123 is ccd.

The first plan takes those results, sorts them by cc.bug_id. It then 
does an outer merge join based on bug_id, filtering on the |who IS NULL| 
bit.

The second plan puts all the results from the cc subquery into a hash 
table, and then does a sequential scan of the bugs table, removing 
entries which are in that hashtable.

Now, theres a problem with the above. That assumes that we know the 
userid of the person we're looking for. Lets assume that we don't:

bbaetz=> explain analyze select bug_id from bugs where bug_id not in
(select bug_id from cc where who = (SELECT userid FROM profiles where 
login_name='QJEYECRS'));
Time: 261.747 ms

The best I can come up with without subselects is:

bbaetz=> explain analyze select bugs.bug_id from bugs left join cc on 
bugs.bug_id=cc.bug_id left join profiles on cc.who = profiles.userid and 
profiles.login_name='QJEYECRS' group by bugs.bug_id having 
count(login_name) = 0;
Time: 30372.595 ms

which swaps heavily. I don't think that that can be reliably extended to 
include other constraints.

MySQL takes 1 second on the first query, and 32 seconds on the second. 
Again, it swaps heavily on the second, so the numbers are lager than they
would be if it was all  in RAM. Saying 'get more RAM' wouldn't change 
the problem if you have more than one query going at once, though.

Bradley



More information about the developers mailing list