Regarding Duplicate Bug Report Detection in Bugzilla

amar.budhiraja1 at amar.budhiraja1 at
Sat Oct 15 09:44:43 UTC 2016

So, I think I will develop a web app/google form which will show you groups of words and then you are supposed to label if they belong to the same topic or not. 

Also, I was hoping to have each set labelled by say 3 (preferably 5 people)  in order to remove the diversity of human judgement.

If each user still does only 10 sets of 10-words each, I will minimally require 180 volunteers and ideally say 300 volunteers. I was hoping Mozilla's community size would help in there.


On Saturday, October 15, 2016 at 8:59:17 AM UTC+5:30, Dylan Hardison wrote:
> > On Oct 14, 2016, at 21:03, amar.budhiraja1 at wrote:
> > 
> > Hi,
> > I am working on a research project on automatic detection of duplicate bug report detection. 
> > I am using the last 10 years of Mozilla bug reports(~750K) to do the ask in order to make it easier triager by getting the duplicate bug report in top-10. 
> > The results look promising quantitatively and we want to publish the result in a tier 1 conference.
> > 
> > For the same, we need Mozilla's help. We have about 600 sets of 10-words and we request Mozilla to help us do the quantitative evaluation on those. Basically for each set of 10-words, someone will have to say whether these words belong to the same topic. 
> > 
> > We would appreciate if Mozilla could help by asking its community to help with the labeling. 
> > 
> > Hoping to hear back.
> > 
> This is very interesting. How will the labeling work? Some online questionnaire / google form?
> Let me know exactly what's expected and I'll see what I can do. If we can find 60 volunteers, that's only ten sets of ten words
> each -- but I might be misunderstanding how you'd need to collect the answers
> (and how you control for humans reporting incorrectly).
> Kind regards,
> Dylan Hardison.-
> To view or change your list settings, click here:
> <>
dev-apps-bugzilla mailing list
dev-apps-bugzilla at

More information about the developers mailing list