Custom fields schema

Tue Jan 25 05:28:39 UTC 2005

I agree with Myk. The FAC proposal makes more sense for me, from all 
points of view.

Vlad.

Myk Melez wrote:

> Christopher Hicks wrote:
>
>> Bah.  Your conclusion of what's necessary here seems to be based on a 
>> very selective view of the universe.  Several "necessities" stick out 
>> when considering a custom fields implemention based on abstraction at 
>> the database level:
>>
>> (1) People can add as many custom fields as they want without 
>> worrying about reaching the maximum record size of their database
>
> Under my fields-as-columns (FAC) proposal, fields can live in their 
> own tables, and there is no limit to the number of tables per database 
> in MySQL, so this isn't an issue between my proposal and Sean's 
> fields-as-data (FAD) proposal.
>
>> (2) Code for dealing with custom fields is going to need to have a 
>> goodly portion of the abstraction tables for keeping track of stuff 
>> anyways.
>
> I'm not sure what you mean by abstraction tables, but both proposals 
> will require some meta-data about fields to be stored in fielddefs, 
> just as we already do for standard fields, and will store lists of 
> possible values for some fields in separate tables, just as we already 
> do for standard fields like component and should be doing for fields 
> like op_sys.
>
> That doesn't mean we should store all meta-data as data.  We should 
> use the right tool for the job, as we have already done with standard 
> fields, for which we rightly use columns.
>
>> (3) Since queries that involve custom fields will now have to be 
>> written on the fly they are less able to be optimized by using a 
>> database that can deal with prepare() usefully.
>
> How does FAC require queries to be written "on the fly" in a less 
> optimizable way, reducing performance on prepare()-happy databases?  
> Can you demonstrate this?
>
>> (4) Since queries involving custom fields are going to take a few 
>> database hits to figure out what the field names so the query could 
>> be written you end up with cases where 1 query turns into 4 queries.  
>> If the database is across a WAN from the bugzilla instance the effect 
>> of multiple queries where there were one will be more noticable.
>
> Sure, more queries is slower.  But FAC would use less queries overall, 
> and simpler ones at that.
>
> Consider a simple query on a single custom field "foo" where the user 
> wants bugs where foo=bar.  With FAC, we search for bugs where the 
> "foo" column contains "bar".  With FAD, we look up the field ID for 
> "foo" and then search for bugs where the "field_id" column contains 
> that ID and the "value" column contains "bar" (or do it in one query 
> with a join).
>
> Even in the worst case, when you couldn't infer the column name from 
> the form field name, FAC lookups would only be equal to, not worse 
> than, FAD lookups.  And if these lookups mattered (which Sean claims 
> they don't), the list of custom fields and associated identifiers 
> would get cached under either proposal the same way we cache 
> components and versions.
>
>> (5) Custom fields should be able to be implemented without the 
>> bugzilla user having database privs to alter tables.
>
> FAD is even more insecure in this regard, since a compromised Bugzilla 
> user account that wasn't allowed to alter tables would still be able 
> to alter custom fields under that proposal (unless we implemented 
> table/column-specific privileges for that account, which would be more 
> work and complexity--and thus risk).
>
> Nevertheless, note that the Bugzilla user account already has such 
> privileges today (checksetup.pl uses them to set up and update the 
> database schema), and even if we took them away, the Bugzilla 
> administrators, to whom we will entrust the creation of custom fields, 
> will certainly retain those privileges via a separate account.
>
>> Myk - it sounds like you're basing the decision on what way to go 
>> here totally based on performance and I think there's a lot more that 
>> should go into this decision.
>
> To the contrary, my proposal is based on much more than performance.  
> My previous email was about performance only because that was Sean's 
> primary argument against it (he thought my proposal would be slower 
> and offered data to support his conclusion--I ran his tests myself and 
> found the opposite was true).
>
> I think we should use real columns for custom fields because:
>
>    1. that's what they're there for;
>
>       Custom fields are no different from standard fields in how
>       they're used (queried, displayed, updated, etc.), and columns
>       were designed for this express purpose when database systems
>       were developed.  Given that they've been used to represent
>       "fields" of all kinds for decades, and that we've used them in
>       Bugzilla to represent the standard fields for over five years,
>       they're a mature and proven technology for doing what we want
>       and likely to be better than any new mechanism we come up with
>       which represents fields as data.
>
>    2. then they work the same as standard fields;
>
>       Custom fields and standard fields are both used (queried,
>       displayed, updated) in much the same way, and using the same
>       technology to store them means we can use the same code in many
>       cases (and the same kind of code in others) to access and
>       manipulate them, making the source simpler, more robust, and
>       easier to develop.
>
>    3. it makes them significantly faster;
>
>       Per my tests and standard database design theory, real columns
>       are much faster than data columns.
>
> -myk
>