Custom fields schema

Wed Jan 26 18:37:22 UTC 2005

Responding to several messages at once:

Myk:

>*sigh*, ok, I'll do some testing next week when I get back from 
>vacation, although I really think the burden of proof should be on you, 
>given that you're the one suggesting we overturn thirty years of RDBMS 
>and relational database theory, design, and practical usage,

OK, I take exception to this.  Am I really such a maverick?  Most of the
people who have stated an opinion on the subject have expressed approval
of my design.  Can we all be as naive as you say?

>not to 
>mention six years of Bugzilla development techniques.

Custom fields are a completely new kind of beast.  There is no precedent for
them in Bugzilla's development history.  (Not in the core distribution,
anyway; some partial solutions are attached to bug 91037.  I don't think
that's what you're talking about, though.)  There's no a priori reason to
apply past Bugzilla development techniques to them.

Later:

>Custom fields are exactly what relational database columns were designed 
>for; they fit perfectly into the column metaphor, just as the standard 
>Bugzilla fields do.

What is this "column metaphor"?  Your design treats custom fields very
differently than standard fields, applying more of a "table metaphor".

>They're modifiable via SQL statements just as 
>easily as the data within them is.  And while Bugzilla doesn't modify 
>its schema very much today, there's nothing inherently more dangerous 
>about it doing so.

But much less elegant.  To paraphrase Einstein, I think the schema ought to
be as simple as possible, but no simpler.  Transmeta's Bugzilla installation
has 187 custom fields.  A schema with in excess of 250 tables is not simple.
Imagine trying to manage a schema of that size with a visual tool!

Later, responding to Christopher Hicks:

>That doesn't mean we should store all meta-data as data.  We should use 
>the right tool for the job, as we have already done with standard 
>fields, for which we rightly use columns.

Yes, that is right, because all bugs share the same standard fields.  That
condition is violated by custom fields.  And, again, your proposal
implements custom fields very differently from the way standard fields are
implemented, anyway.

>> (4) Since queries involving custom fields are going to take a few 
>> database hits to figure out what the field names so the query could be 
>> written you end up with cases where 1 query turns into 4 queries.  If 
>> the database is across a WAN from the bugzilla instance the effect of 
>> multiple queries where there were one will be more noticable.

>Sure, more queries is slower.  But FAC would use less queries overall, 
>and simpler ones at that.

Simpler, I grant you.  Fewer, I don't think so:

>Consider a simple query on a single custom field "foo" where the user 
>wants bugs where foo=bar.  With FAC, we search for bugs where the "foo" 
>column contains "bar".  With FAD, we look up the field ID for "foo" and 
>then search for bugs where the "field_id" column contains that ID and 
>the "value" column contains "bar" (or do it in one query with a join).

Querying against N custom fields results in joins against N tables in your
scheme.  In mine, it results in joins against T(N), the number of distinct
datatypes among those N fields.  The total number of joins among those
T(N) tables is N including repeated joins, but I suspect that it is still
cheaper to access fewer tables.

Consider also simply retrieving custom field data.  For a bug with twenty
custom short string fields, your design would require SELECTs against
twenty different tables; mine requires only one.

>> Myk - it sounds like you're basing the decision on what way to go here 
>> totally based on performance and I think there's a lot more that 
>> should go into this decision.

>To the contrary, my proposal is based on much more than performance.  My 
>previous email was about performance only because that was Sean's 
>primary argument against it (he thought my proposal would be slower and 
>offered data to support his conclusion--I ran his tests myself and found 
>the opposite was true).

I haven't analyzed your tests in detail yet--it's been a hassle getting
MySQL 4 to peacefully coexist with my previous MySQL 3 system.  (By the way,
if my design ignores thirty years of database theory, as you assert, why
does yours require a recent version of MySQL to best it?)

Can you describe exactly what was wrong with my test that it went from being
3-4 times better to being nearly an order of magnitude worse?  I frankly
find that hard to believe.

>I think we should use real columns for custom fields because:
>
>   1. that's what they're there for;
>
>      Custom fields are no different from standard fields in how they're
>      used (queried, displayed, updated, etc.), and columns were
>      designed for this express purpose when database systems were
>      developed.  Given that they've been used to represent "fields" of
>      all kinds for decades, and that we've used them in Bugzilla to
>      represent the standard fields for over five years, they're a
>      mature and proven technology for doing what we want and likely to
>      be better than any new mechanism we come up with which represents
>      fields as data.

I posted to comp.databases yesterday seeking advice regarding the merits of
our two designs.  The subject of the thread is "Best database schema for
object fields".  To date, the only poster to offer substantial criticism has
stated "They both stink", but he did provide the useful information that the
model I implemented has a name, EAV, or "Entity-Attribute-Value".  Armed
with that knowledge, I was able to Google several articles on the subject.
A few were highly critical of EAV, but most were more balanced, listing
advantages and disadvantages, and describing in what situations it's
appropriate.  In all cases, though, the data model EAV was compared against
was the classic all-columns-in-one-table approach; I could find no example
of your one-table-per-field design.  (The crotchety poster in comp.databases
described both of our designs as variants of EAV, but I can't really see how
that's true.)  So, it's hard to accept your assertion that
one-field-per-table is "what columns are for".  Can you refer me to any
systems that use your design?

>   2. then they work the same as standard fields;
>
>      Custom fields and standard fields are both used (queried,
>      displayed, updated) in much the same way, and using the same
>      technology to store them means we can use the same code in many
>      cases (and the same kind of code in others) to access and
>      manipulate them, making the source simpler, more robust, and
>      easier to develop.

If I'm not mistaken, the long-term plan is to migrate standard fields to
custom fields, so short-term discrepancies are not really relevant.
Besides--yet again!--your design treats custom fields very differently than
standard fields.

>   3. it makes them significantly faster;
>
>      Per my tests and standard database design theory, real columns are
>      much faster than data columns.

Again, I find this hard to believe.  I suspect either some flaw in your test
program, or some unfair advantage in the limited nature of the tests.

--Sean