Bugzilla Contribution Process (was RE: New language discussion?)

Wed Oct 31 18:46:41 UTC 2007

All,

I hope that this discussion helps make others aware of some of the
challenges that more than one business is facing with contributing code
(sponsored code) back to the community - and if possible, that we can
work together to find ways to see more sponsored code make it into the
community releases.  That *is* my goal in this topic.

> Benton, Kevin wrote:
> > As we all know, Bugzilla is implemented in Perl and the codebase is
> > getting better as we move to a more object-oriented model where
> > persistence is done at the proper layers. 
> 
> Indeed. Several years, ago, we asked the same question - rewrite or 
> incrementally fix? The rewrite, Hixie's Bugzilla 3, didn't 
> happen; Myk 
> argued for incrementally fixing, and the code now is much 
> better than it 
> was. It seems to me that the arguments now are even more in favour of 
> incremental improvement than they were then.

If you had asked us how we felt when evaluating this question against
2.20rc1, our answer would have been fork or re-write (it wound up being
fork at the time).  Knowing then what I know now, I may have made a
decision to switch tools.  Today, however, the code is *much* better
than it was and we're more likely to stick with what is already out
there.

> > There's a lot of work to be
> > done, but no matter what language is chosen, it's possible 
> to wind up
> > with the same problems we have today if we don't change the 
> standards by
> > which we code.  So, I suggest that we look at the real 
> issue - how do we
> > design our code so that it doesn't look and feel like procedural
> > spaghetti?  Having SQL in the CGI, for example is a great 
> example of why
> > you don't want to repeat yourself again (:-).  Moving the SQL into
> > modules that deal directly with the database (DAO) makes 
> the most sense
> > to me because then, there is only one place to maintain 
> with regard to
> > the back-end and it's easy to change out if needed. 
> 
> You see, five years ago, people weren't doing this - or it 
> wasn't nearly 
> as well known as best practice. Bugzilla is improving all the time as 
> people's understanding of how to write web apps improves, and 
> that can 
> only be a good thing. But we'll never be perfect, because perfect is 
> always changing.

We agree.  Best practices are getting better, though I don't know that
communicating those best practices is going as well as coding to them
is...  That may be because I haven't been looking for them in the right
places.  Max and others are doing a fantastic job of making the code
much more object oriented (not just having modules for the sake of
modules).  Hooks and the API are good things too...

> > We're (AMD) actively working on a radical shift in the way 
> that "custom"
> > fields are being done by providing a field scheme model of 
> implementing
> > "custom" fields.  We prefer to call those "custom" fields "add-on"
> > fields rather than custom. 
> 
> How do they compare to the various other ways of implementing custom 
> fields that were proposed when we originally did them?

The "add-on" fields actually use a large part of the custom field code,
however, we're making sweeping changes so we utilize a small set of
routines to deal with field views and changes.  For example, we're
getting rid of is_(active|obsolete) and isobsolete, replacing each with
isactive, then updating the code so we don't have four different ways to
deal with whether or not a field or value is active or obsolete.  We're
also converting bugs columns to store ID's rather than values to enforce
referential integrity.  This makes it easier to maintain code because
all that's required afterward is to make sure that an insert or update
was successful.  Once we have a more unified way of dealing with fields
generically, then we can deal with field schemes.  Until then, we would
have far too many exceptions to deal with.

For those that don't know what field schemes are about, field schemes
determine what fields are displayed based on a bug's active product.
All values are stored in a bug regardless of field scheme, but this
allows product Foo to include all the default fields plus fields A and
B.  Product Bar doesn't care about field B, but cares about A and C, so
only A and C are displayed when Bar is the current product.  Field
schemes also set the list of available values on a per-scheme or
per-product basis depending on the scheme settings.  For example, field
schemes using "Bug Type" will share the same set of values regardless of
product, where field schemes using "First Rev Affected" will use a set
of values based on the product configuration (in the one-many and
many-many configurations).

Workflow schemes are similar to and build on field schemes.  Workflow
schemes affect more than just the available options of state changes.
Workflow schemes also specify when a field is required or not, when a
field gets cleared automatically, and who is allowed to make certain
changes.  So, when an approval must be made to release code, workflow
schemes allow for a list of individuals who can grant approval (as a
state, not a flag) before a bug can progress to the Incorporated state
(where the code is incorporated into a release).

For the purpose of searching, we've made some other improvements such as
adding a bug_values_cache table that stores a bug_id, and the values of
many-to-many relationships for each bug (such as the CC list members by
email and dependency relationships).  This makes it possible to display
columns of many-to-many relationship fields without expensive queries to
the database.  The downside is the very slight increase in time required
to create and update bugs where those fields are also involved.

We're also adding a description column to every table that doesn't
already have one for the purpose of improving the help system so it can
describe a table's values.  This provides guidance when users don't
understand when to use certain values versus others, especially when the
labels may not be clear enough.

> > Frankly, our number one concern with contributing code back to the
> > community is the length of time it takes to go from contribution to
> > approval in combination with the risk that our change 
> set(s) won't be
> > accepted by the community. 
> 
> Contributing has concerns; forking has other concerns (like increased 
> maintenance). You have to do what is right for you and your business. 
> That's fair enough.
> 
> > however, we've also seen resistance to our methods of
> > making changes because we need to package things in chunks 
> that we feel
> > reviewers can review and approvers can approve. 
> 
> Do you really think this is unreasonable?

There are times when it seems reasonable, yet there are times when we
feel it's not.  There are times when we feel we just want to give back
but we don't have the resources to integrate.  There are other times
when a feature set is so invasive to the core of the community code and
so critical to our day-to-day development and operations that we're
willing to make the extra effort.  There are also times when we feel we
just can't wait for the community to review and approve what we're
doing.

> It's always the case that one guy working on his own with complete 
> freedom can work faster than when he's using a team process 
> with checks 
> and balances. It's one of the downsides of collaboration 
> which have to 
> be measured against the upsides of (hopefully) increased code quality 
> and more future-proof design, etc.

That's a given, and up to a point, there are times when it's worth the
effort.  The problem isn't just the size of the chunks, however.  It's
the extra effort that must be given to limit the scope of work in those
chunks.  The problem that corresponds with that, however, is do we wait
for the review / approval process to complete to move on to developing a
piece that builds on the chunk we just submitted for review, hoping that
it will be reviewed quickly and approved or, do we make the decision to
keep moving forward and backport any differences into our own code?  The
problem is that the review and approval process is a wildcard that we
have no control over.  Even though code may have completely passed our
own review and approval process, the community may inject its own
requirements that may or may not be compatible with our own.  That risk
is a real cost of doing business with open source communities and like
it or not, has a negative impact on schedules.

>From our perspective, the challenge is deciding if/how to contribute and
when.  There are pieces we've added to Bugzilla that are intellectual
property that we will not contribute back to the community.  There are
other pieces that we've developed that we clearly want to see included,
yet will take significant effort because someone has to syncronize that
code with the current tip.  If the review takes long enough that someone
else's patch makes it into the tip before our review completes, then we
end up being asked to take the burden of merging the code again.
Granted, I would never ask the community to stop development just
because we want to contribute, but at the same time, our current review
process doesn't offer a way to protect contributors from excessive
amounts of merging due to lag between submission and review.  When we're
done with our field scheme and workflow scheme code, it's likely to be
thousands of lines each of implementation.  This is a prime example of
the risk versus reward situation.  With large patches, a lot of work
must be done to review, thus the likelihood of needing to merge
increases as well.  Are these features desired by the community?  There
is no question in my mind that in both cases, a resounding yes is
appropriate.  The problem on our end is that management is requiring
that we complete our work long before I think the review process can
complete from within the community.  Once we're "done" with our
implementation internally, managers want to see us continue to move
forward with the next series of changes as they come to us.  They don't
understand when we tell them we have to go back to update our code so
it'll work with the community when we've already moved beyond a feature
set.  It's an issue to explain that we're dealing with forward
compatibility with Bugzilla.

I hope that others don't get the idea that I'm complaining for the sake
of complaining about this.  I hope it'll help bring a bit more awareness
of some of the issues corporate contributors face when deciding do I
contribute or fork?  Unfortunately, I continue to get responses leading
me to believe that the answer often ends up being fork.  If there are
things that this community can do to help improve responsiveness by
helping corporate contributors what kind of things will help reviews get
done faster (such as submitting full sets of test cases and selenium
test code to go with the developed code), then I think you'll see more
sponsored contributions coming.  Otherwise, the risk of being stalled by
the review and approval process will remain high enough for many that
it's cheaper to fork than contribute and I think that's when we all
loose.

Kevin

---

Kevin Benton
MySQL DBA #5739
Senior Software Developer
CAD Global Infrastructure Flow Services
Advanced Micro Devices
2950 E Harmony Rd
Fort Collins, CO  80528

The opinions stated in this communication do not necessarily reflect the
view of Advanced Micro Devices and have not been reviewed by management.
This communication may contain sensitive and/or confidential and/or
proprietary information.  Distribution of such information is strictly
prohibited without prior consent of Advanced Micro Devices.  This
communication is for the intended recipient(s) only.  If you have
received this communication in error, please notify the sender, then
destroy any remaining copies of this communication.