Towards faster keyword searches

Dylan Hardison dylan at mozilla.com
Mon Sep 26 02:46:35 UTC 2016


When you have a keyword on 50,000+ bugs it means you're building a sql query with a 50,000 list in the form of IN (50,000 ids).
This is gets close to the limits mysql has on queries, at least and it's pretty slow.

There are a whole class of these -- anything that is a 'multiselect' type search. I believe someone suggested embedding these queries (at least as an option)
as a sub select and in general I think that's a good idea. An even nicer idea is to just let elastic search do the searching.
One of these approaches I hope to explore with an outreachy intern.

However, for the moment I really need keyword searches to be fast, so I started looking at the search code[1].
I started at the problem backwards -- first by looking at where we build search objects (the tree-like Bugzilla::Search::Clause::* stuff).
For searches with a single keyword search, I thought it would be nice to turn that into a join.

Then I wrote that, and it appears the code I wrote actually works generally for multiple keywords (although you then start wondering what the max number of joins is.)

Anyway, it's not finished code but it does work on a test install.

It's in a github branch: https://github.com/dylanwh/bugzilla/tree/fast-keywords,
you can look at the diff here: https://github.com/dylanwh/bugzilla/commit/e88bdf7168c6723d9930bc771ea93c93c3dec6b0

I will be polishing this up for a review, but I wanted to have other people look at it first.

Here's an example of a query it builds, for single_keyword=batman AND single_keyword:frog

SELECT bugs.bug_id AS bug_id, bugs.priority AS priority, bugs.bug_severity AS bug_severity
  FROM bugs
LEFT JOIN bug_group_map AS security_map ON bugs.bug_id = security_map.bug_id
LEFT JOIN cc AS security_cc ON bugs.bug_id = security_cc.bug_id AND security_cc.who = 1
INNER JOIN priority AS map_priority ON bugs.priority = map_priority.value
INNER JOIN bug_severity AS map_bug_severity ON bugs.bug_severity = map_bug_severity.value
LEFT JOIN keywords AS keywords_1 ON bugs.bug_id = keywords_1.bug_id
LEFT JOIN keyworddefs AS keyworddefs_1 ON keywords_1.keywordid = keyworddefs_1.id
LEFT JOIN keywords AS keywords_2 ON bugs.bug_id = keywords_2.bug_id
LEFT JOIN keyworddefs AS keyworddefs_2 ON keywords_2.keywordid = keyworddefs_2.id
 WHERE bugs.creation_ts IS NOT NULL
   AND ( (security_map.group_id IS NULL OR security_map.group_id IN (1,10,11,14,12,13,9,4,8,5,6,7,3,2))
        OR (bugs.reporter_accessible = 1 AND bugs.reporter = 1)
        OR (bugs.cclist_accessible = 1 AND security_cc.who IS NOT NULL)
        OR bugs.assigned_to = 1
)
   AND  bugs.resolution IN ('')  AND keyworddefs_1.name = 'batman' AND INSTR(keyworddefs_2.name, 'frog') = 0
GROUP BY bugs.bug_id
ORDER BY map_priority.sortkey, map_priority.value, map_bug_severity.sortkey, map_bug_severity.value
LIMIT 500


More information about the developers mailing list