Towards faster keyword searches
jfearn at redhat.com
Mon Sep 26 03:51:38 UTC 2016
On 26/09/2016 12:46, Dylan Hardison wrote:
> When you have a keyword on 50,000+ bugs it means you're building a sql query with a 50,000 list in the form of IN (50,000 ids).
> This is gets close to the limits mysql has on queries, at least and it's pretty slow.
Not only that, when you get to 65K you hit parameter limits in DBD::Pg :(
I've just had to fix a bug for this related to viewing bugs in a group.
I solved it by replacing the use of new_from_list with new_from_where
which takes the SQL used to generate the bug id list and uses it in the
I'm not very happy with it because it requires you knowing exactly how
the sql will be used and coupling is bad, mmmk.
A more general approach to doing sub-selects instead of ID lists would
> There are a whole class of these -- anything that is a 'multiselect' type search. I believe someone suggested embedding these queries (at least as an option)
> as a sub select and in general I think that's a good idea. An even nicer idea is to just let elastic search do the searching.
> One of these approaches I hope to explore with an outreachy intern.
The effect of using ID lists instead of sub selects is wider than just
searching, as the above patch demonstrates.
> However, for the moment I really need keyword searches to be fast, so I started looking at the search code.
> I started at the problem backwards -- first by looking at where we build search objects (the tree-like Bugzilla::Search::Clause::* stuff).
> For searches with a single keyword search, I thought it would be nice to turn that into a join.
> Then I wrote that, and it appears the code I wrote actually works generally for multiple keywords (although you then start wondering what the max number of joins is.)
> Anyway, it's not finished code but it does work on a test install.
> It's in a github branch: https://github.com/dylanwh/bugzilla/tree/fast-keywords,
> you can look at the diff here: https://github.com/dylanwh/bugzilla/commit/e88bdf7168c6723d9930bc771ea93c93c3dec6b0
> I will be polishing this up for a review, but I wanted to have other people look at it first.
> Here's an example of a query it builds, for single_keyword=batman AND single_keyword:frog
> SELECT bugs.bug_id AS bug_id, bugs.priority AS priority, bugs.bug_severity AS bug_severity
> FROM bugs
> LEFT JOIN bug_group_map AS security_map ON bugs.bug_id = security_map.bug_id
> LEFT JOIN cc AS security_cc ON bugs.bug_id = security_cc.bug_id AND security_cc.who = 1
> INNER JOIN priority AS map_priority ON bugs.priority = map_priority.value
> INNER JOIN bug_severity AS map_bug_severity ON bugs.bug_severity = map_bug_severity.value
> LEFT JOIN keywords AS keywords_1 ON bugs.bug_id = keywords_1.bug_id
> LEFT JOIN keyworddefs AS keyworddefs_1 ON keywords_1.keywordid = keyworddefs_1.id
> LEFT JOIN keywords AS keywords_2 ON bugs.bug_id = keywords_2.bug_id
> LEFT JOIN keyworddefs AS keyworddefs_2 ON keywords_2.keywordid = keyworddefs_2.id
> WHERE bugs.creation_ts IS NOT NULL
> AND ( (security_map.group_id IS NULL OR security_map.group_id IN (1,10,11,14,12,13,9,4,8,5,6,7,3,2))
> OR (bugs.reporter_accessible = 1 AND bugs.reporter = 1)
> OR (bugs.cclist_accessible = 1 AND security_cc.who IS NOT NULL)
> OR bugs.assigned_to = 1
> AND bugs.resolution IN ('') AND keyworddefs_1.name = 'batman' AND INSTR(keyworddefs_2.name, 'frog') = 0
> GROUP BY bugs.bug_id
> ORDER BY map_priority.sortkey, map_priority.value, map_bug_severity.sortkey, map_bug_severity.value
> LIMIT 500-
> To view or change your list settings, click here:
I ran this on a copy of our DB and it seems to perform well. I did have
to chop off the order by as it's not leagl in Pg to have a group by
clause and then order by things not in the group by clause or an
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 473 bytes
Desc: OpenPGP digital signature
More information about the developers