Towards faster keyword searches
dylan at mozilla.com
Mon Sep 26 02:46:35 UTC 2016
When you have a keyword on 50,000+ bugs it means you're building a sql query with a 50,000 list in the form of IN (50,000 ids).
This is gets close to the limits mysql has on queries, at least and it's pretty slow.
There are a whole class of these -- anything that is a 'multiselect' type search. I believe someone suggested embedding these queries (at least as an option)
as a sub select and in general I think that's a good idea. An even nicer idea is to just let elastic search do the searching.
One of these approaches I hope to explore with an outreachy intern.
However, for the moment I really need keyword searches to be fast, so I started looking at the search code.
I started at the problem backwards -- first by looking at where we build search objects (the tree-like Bugzilla::Search::Clause::* stuff).
For searches with a single keyword search, I thought it would be nice to turn that into a join.
Then I wrote that, and it appears the code I wrote actually works generally for multiple keywords (although you then start wondering what the max number of joins is.)
Anyway, it's not finished code but it does work on a test install.
It's in a github branch: https://github.com/dylanwh/bugzilla/tree/fast-keywords,
you can look at the diff here: https://github.com/dylanwh/bugzilla/commit/e88bdf7168c6723d9930bc771ea93c93c3dec6b0
I will be polishing this up for a review, but I wanted to have other people look at it first.
Here's an example of a query it builds, for single_keyword=batman AND single_keyword:frog
SELECT bugs.bug_id AS bug_id, bugs.priority AS priority, bugs.bug_severity AS bug_severity
LEFT JOIN bug_group_map AS security_map ON bugs.bug_id = security_map.bug_id
LEFT JOIN cc AS security_cc ON bugs.bug_id = security_cc.bug_id AND security_cc.who = 1
INNER JOIN priority AS map_priority ON bugs.priority = map_priority.value
INNER JOIN bug_severity AS map_bug_severity ON bugs.bug_severity = map_bug_severity.value
LEFT JOIN keywords AS keywords_1 ON bugs.bug_id = keywords_1.bug_id
LEFT JOIN keyworddefs AS keyworddefs_1 ON keywords_1.keywordid = keyworddefs_1.id
LEFT JOIN keywords AS keywords_2 ON bugs.bug_id = keywords_2.bug_id
LEFT JOIN keyworddefs AS keyworddefs_2 ON keywords_2.keywordid = keyworddefs_2.id
WHERE bugs.creation_ts IS NOT NULL
AND ( (security_map.group_id IS NULL OR security_map.group_id IN (1,10,11,14,12,13,9,4,8,5,6,7,3,2))
OR (bugs.reporter_accessible = 1 AND bugs.reporter = 1)
OR (bugs.cclist_accessible = 1 AND security_cc.who IS NOT NULL)
OR bugs.assigned_to = 1
AND bugs.resolution IN ('') AND keyworddefs_1.name = 'batman' AND INSTR(keyworddefs_2.name, 'frog') = 0
GROUP BY bugs.bug_id
ORDER BY map_priority.sortkey, map_priority.value, map_bug_severity.sortkey, map_bug_severity.value
More information about the developers