Custom fields schema

Tue Jan 25 00:23:59 UTC 2005

Christopher Hicks wrote:

> Bah.  Your conclusion of what's necessary here seems to be based on a 
> very selective view of the universe.  Several "necessities" stick out 
> when considering a custom fields implemention based on abstraction at 
> the database level:
>
> (1) People can add as many custom fields as they want without worrying 
> about reaching the maximum record size of their database

Under my fields-as-columns (FAC) proposal, fields can live in their own 
tables, and there is no limit to the number of tables per database in 
MySQL, so this isn't an issue between my proposal and Sean's 
fields-as-data (FAD) proposal.

> (2) Code for dealing with custom fields is going to need to have a 
> goodly portion of the abstraction tables for keeping track of stuff 
> anyways.

I'm not sure what you mean by abstraction tables, but both proposals 
will require some meta-data about fields to be stored in fielddefs, just 
as we already do for standard fields, and will store lists of possible 
values for some fields in separate tables, just as we already do for 
standard fields like component and should be doing for fields like op_sys.

That doesn't mean we should store all meta-data as data.  We should use 
the right tool for the job, as we have already done with standard 
fields, for which we rightly use columns.

> (3) Since queries that involve custom fields will now have to be 
> written on the fly they are less able to be optimized by using a 
> database that can deal with prepare() usefully.

How does FAC require queries to be written "on the fly" in a less 
optimizable way, reducing performance on prepare()-happy databases?  Can 
you demonstrate this?

> (4) Since queries involving custom fields are going to take a few 
> database hits to figure out what the field names so the query could be 
> written you end up with cases where 1 query turns into 4 queries.  If 
> the database is across a WAN from the bugzilla instance the effect of 
> multiple queries where there were one will be more noticable.

Sure, more queries is slower.  But FAC would use less queries overall, 
and simpler ones at that.

Consider a simple query on a single custom field "foo" where the user 
wants bugs where foo=bar.  With FAC, we search for bugs where the "foo" 
column contains "bar".  With FAD, we look up the field ID for "foo" and 
then search for bugs where the "field_id" column contains that ID and 
the "value" column contains "bar" (or do it in one query with a join).

Even in the worst case, when you couldn't infer the column name from the 
form field name, FAC lookups would only be equal to, not worse than, FAD 
lookups.  And if these lookups mattered (which Sean claims they don't), 
the list of custom fields and associated identifiers would get cached 
under either proposal the same way we cache components and versions.

> (5) Custom fields should be able to be implemented without the 
> bugzilla user having database privs to alter tables.

FAD is even more insecure in this regard, since a compromised Bugzilla 
user account that wasn't allowed to alter tables would still be able to 
alter custom fields under that proposal (unless we implemented 
table/column-specific privileges for that account, which would be more 
work and complexity--and thus risk).

Nevertheless, note that the Bugzilla user account already has such 
privileges today (checksetup.pl uses them to set up and update the 
database schema), and even if we took them away, the Bugzilla 
administrators, to whom we will entrust the creation of custom fields, 
will certainly retain those privileges via a separate account.

> Myk - it sounds like you're basing the decision on what way to go here 
> totally based on performance and I think there's a lot more that 
> should go into this decision.

To the contrary, my proposal is based on much more than performance.  My 
previous email was about performance only because that was Sean's 
primary argument against it (he thought my proposal would be slower and 
offered data to support his conclusion--I ran his tests myself and found 
the opposite was true).

I think we should use real columns for custom fields because:

   1. that's what they're there for;

      Custom fields are no different from standard fields in how they're
      used (queried, displayed, updated, etc.), and columns were
      designed for this express purpose when database systems were
      developed.  Given that they've been used to represent "fields" of
      all kinds for decades, and that we've used them in Bugzilla to
      represent the standard fields for over five years, they're a
      mature and proven technology for doing what we want and likely to
      be better than any new mechanism we come up with which represents
      fields as data.

   2. then they work the same as standard fields;

      Custom fields and standard fields are both used (queried,
      displayed, updated) in much the same way, and using the same
      technology to store them means we can use the same code in many
      cases (and the same kind of code in others) to access and
      manipulate them, making the source simpler, more robust, and
      easier to develop.

   3. it makes them significantly faster;

      Per my tests and standard database design theory, real columns are
      much faster than data columns.

-myk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050124/c8f80577/attachment.html>