Custom fields schema

Wed Jan 26 19:05:11 UTC 2005

I have pretty much kept my mouth shut regarding this debate lately until
now.  I think it's getting a little out of hand.  I think what really needs
to happen is for us to look at what we want out of the design, and more
importantly, to ask ourselves how will our proposal be used realistically?

Right now, I can tell you that I am really beating up MySQL 3.23.x doing
standard queries against the tables given with my statuscount.cgi.  I was
able to flood MySQL to the point where it took over a minute to get results
back from a query that basically asked - show me a list of bugs that have
been opened but never had a status change in it.  Until I added some new
indexes, that report was taking far too long to make it usable.

Now, we're talking about having the ability to add custom fields.  This is a
good thing, but let's be sure we do it wisely.  No matter which method we
choose, it needs to be easy to use, understand, and perform without a
serious hit to the DB.  To do that - we must consider some kinds of custom
fields that might be created and how that creation would happen.  If it's
something that can be handled without adding new code, then we should expect
ours to handle optimization reasonably.  If not, we should guide admins. on
how to properly optimize after adding new fields.

Conceptually, for me, it's a lot easier to understand fields as columns than
fields as rows.  Why?  A table contains a record.  That record contains
fields.  That's traditional design.  Creating a table for each field simply
doesn't make sense to me because the fields are no longer directly related
to one another.  Understanding how field A relates to field B relates to
field C can then become a real nightmare.

If we have Bugzilla administer the fields itself, there are optimization
considerations.  Because it's nearly impossible for us to figure out ahead
of time what fields will be added and how they will relate to each other, we
must also consider how to implement some kind of optimization aid on top of
it as well.  So, if new field 1 relates to existing field A, and existing
field B, the admin should be able to tell Bugzilla that they're related and
have it add the indexes.

Please note that I am not saying that this is how it should be done.  As an
administrator, developer, and report writer, I want to make sure that
anything we do does not kill performance while obtaining some form of
elegance or simplicity in coding.  We have to find the best middle ground
where performance is excellent, but code maintenance is reasonably easy as
well.

> -----Original Message-----
> From: developers-owner at bugzilla.org [mailto:developers-owner at bugzilla.org]
> On Behalf Of Sean McAfee
> Sent: Wednesday, January 26, 2005 11:37 AM
> To: developers at bugzilla.org
> Subject: Re: Custom fields schema
> 
> Responding to several messages at once:
> 
> >not to
> >mention six years of Bugzilla development techniques.
> 
> Custom fields are a completely new kind of beast.  There is no precedent
> for
> them in Bugzilla's development history.  (Not in the core distribution,
> anyway; some partial solutions are attached to bug 91037.  I don't think
> that's what you're talking about, though.)  There's no a priori reason to
> apply past Bugzilla development techniques to them.
> 
> Later:
> 
> >Custom fields are exactly what relational database columns were designed
> >for; they fit perfectly into the column metaphor, just as the standard
> >Bugzilla fields do.
> 
> What is this "column metaphor"?  Your design treats custom fields very
> differently than standard fields, applying more of a "table metaphor".
> 
> >They're modifiable via SQL statements just as
> >easily as the data within them is.  And while Bugzilla doesn't modify
> >its schema very much today, there's nothing inherently more dangerous
> >about it doing so.
> 
> But much less elegant.  To paraphrase Einstein, I think the schema ought
> to
> be as simple as possible, but no simpler.  Transmeta's Bugzilla
> installation
> has 187 custom fields.  A schema with in excess of 250 tables is not
> simple.
> Imagine trying to manage a schema of that size with a visual tool!
> 
> Later, responding to Christopher Hicks:
> 
> >That doesn't mean we should store all meta-data as data.  We should use
> >the right tool for the job, as we have already done with standard
> >fields, for which we rightly use columns.
> 
> Yes, that is right, because all bugs share the same standard fields.  That
> condition is violated by custom fields.  And, again, your proposal
> implements custom fields very differently from the way standard fields are
> implemented, anyway.
> 
> >> (4) Since queries involving custom fields are going to take a few
> >> database hits to figure out what the field names so the query could be
> >> written you end up with cases where 1 query turns into 4 queries.  If
> >> the database is across a WAN from the bugzilla instance the effect of
> >> multiple queries where there were one will be more noticable.
> 
> >Sure, more queries is slower.  But FAC would use less queries overall,
> >and simpler ones at that.
> 
> Simpler, I grant you.  Fewer, I don't think so:
> 
> >Consider a simple query on a single custom field "foo" where the user
> >wants bugs where foo=bar.  With FAC, we search for bugs where the "foo"
> >column contains "bar".  With FAD, we look up the field ID for "foo" and
> >then search for bugs where the "field_id" column contains that ID and
> >the "value" column contains "bar" (or do it in one query with a join).
> 
> Querying against N custom fields results in joins against N tables in your
> scheme.  In mine, it results in joins against T(N), the number of distinct
> datatypes among those N fields.  The total number of joins among those
> T(N) tables is N including repeated joins, but I suspect that it is still
> cheaper to access fewer tables.
> 
> Consider also simply retrieving custom field data.  For a bug with twenty
> custom short string fields, your design would require SELECTs against
> twenty different tables; mine requires only one.
> 
> >> Myk - it sounds like you're basing the decision on what way to go here
> >> totally based on performance and I think there's a lot more that
> >> should go into this decision.
> 
> >To the contrary, my proposal is based on much more than performance.  My
> >previous email was about performance only because that was Sean's
> >primary argument against it (he thought my proposal would be slower and
> >offered data to support his conclusion--I ran his tests myself and found
> >the opposite was true).
> 
> I haven't analyzed your tests in detail yet--it's been a hassle getting
> MySQL 4 to peacefully coexist with my previous MySQL 3 system.  (By the
> way,
> if my design ignores thirty years of database theory, as you assert, why
> does yours require a recent version of MySQL to best it?)
> 
> Can you describe exactly what was wrong with my test that it went from
> being
> 3-4 times better to being nearly an order of magnitude worse?  I frankly
> find that hard to believe.
> 
> >I think we should use real columns for custom fields because:
> >
> >   1. that's what they're there for;
> >
> >      Custom fields are no different from standard fields in how they're
> >      used (queried, displayed, updated, etc.), and columns were
> >      designed for this express purpose when database systems were
> >      developed.  Given that they've been used to represent "fields" of
> >      all kinds for decades, and that we've used them in Bugzilla to
> >      represent the standard fields for over five years, they're a
> >      mature and proven technology for doing what we want and likely to
> >      be better than any new mechanism we come up with which represents
> >      fields as data.
> 
> I posted to comp.databases yesterday seeking advice regarding the merits
> of
> our two designs.  The subject of the thread is "Best database schema for
> object fields".  To date, the only poster to offer substantial criticism
> has
> stated "They both stink", but he did provide the useful information that
> the
> model I implemented has a name, EAV, or "Entity-Attribute-Value".  Armed
> with that knowledge, I was able to Google several articles on the subject.
> A few were highly critical of EAV, but most were more balanced, listing
> advantages and disadvantages, and describing in what situations it's
> appropriate.  In all cases, though, the data model EAV was compared
> against
> was the classic all-columns-in-one-table approach; I could find no example
> of your one-table-per-field design.  (The crotchety poster in
> comp.databases
> described both of our designs as variants of EAV, but I can't really see
> how
> that's true.)  So, it's hard to accept your assertion that
> one-field-per-table is "what columns are for".  Can you refer me to any
> systems that use your design?
> 
> >   2. then they work the same as standard fields;
> >
> >      Custom fields and standard fields are both used (queried,
> >      displayed, updated) in much the same way, and using the same
> >      technology to store them means we can use the same code in many
> >      cases (and the same kind of code in others) to access and
> >      manipulate them, making the source simpler, more robust, and
> >      easier to develop.
> 
> If I'm not mistaken, the long-term plan is to migrate standard fields to
> custom fields, so short-term discrepancies are not really relevant.
> Besides--yet again!--your design treats custom fields very differently
> than
> standard fields.
> 
> >   3. it makes them significantly faster;
> >
> >      Per my tests and standard database design theory, real columns are
> >      much faster than data columns.
> 
> Again, I find this hard to believe.  I suspect either some flaw in your
> test
> program, or some unfair advantage in the limited nature of the tests.
> 
> 
> --Sean
> -
> To view or change your list settings, click here:
> <http://bugzilla.org/cgi-bin/mj_wwwusr?user=kevin.benton@amd.com>