How do we want CC list or commenter "does not match" to work??
Bradley Baetz
bbaetz at acm.org
Sat Feb 7 12:45:20 UTC 2004
On Fri, Feb 06, 2004 at 06:28:45PM -0500, David Miller wrote:
> how do you do it from SQL?
subselects.
> I can't think of any SQL to produce that which doesn't involve subselects
> (and would probably be a major performance hit, even if we required a new
> enough version of MySQL to do it that way).
We used to use Joel's thing for groupset checking, but it was too slow.
(Also, you need to use DISTINCT in general)
Its also painful when you want a bug where person a is on the cc list
and person b is on the cc list, but not person c. You can combine them,
but not given how Search.pm is set up.
You really, really want subselects. Here is PG 7.4.1 on a made up
randomly distributed DB, with 100,000 bugs, 20000 people, and 1,000,000
entries in the cc table. After running the queries a few times, so that
its all in RAM:
bbaetz=> select bugs.bug_id from bugs left join cc on
bugs.bug_id=cc.bug_id and cc.who = 123 where cc.who IS NULL;
Time: 464.632 ms
bbaetz=> select bug_id from bugs where bug_id not in (select bug_id from
cc where who = 123);
Time: 254.640 ms
(Numbers vary by 10-15% each run)
If you look at the plans, both do an index scan on the cc table, to get
all the bugs where person 123 is ccd.
The first plan takes those results, sorts them by cc.bug_id. It then
does an outer merge join based on bug_id, filtering on the |who IS NULL|
bit.
The second plan puts all the results from the cc subquery into a hash
table, and then does a sequential scan of the bugs table, removing
entries which are in that hashtable.
Now, theres a problem with the above. That assumes that we know the
userid of the person we're looking for. Lets assume that we don't:
bbaetz=> explain analyze select bug_id from bugs where bug_id not in
(select bug_id from cc where who = (SELECT userid FROM profiles where
login_name='QJEYECRS'));
Time: 261.747 ms
The best I can come up with without subselects is:
bbaetz=> explain analyze select bugs.bug_id from bugs left join cc on
bugs.bug_id=cc.bug_id left join profiles on cc.who = profiles.userid and
profiles.login_name='QJEYECRS' group by bugs.bug_id having
count(login_name) = 0;
Time: 30372.595 ms
which swaps heavily. I don't think that that can be reliably extended to
include other constraints.
MySQL takes 1 second on the first query, and 32 seconds on the second.
Again, it swaps heavily on the second, so the numbers are lager than they
would be if it was all in RAM. Saying 'get more RAM' wouldn't change
the problem if you have more than one query going at once, though.
Bradley
More information about the developers
mailing list