Custom fields schema
Vlad Dascalu
vladd at bugzilla.org
Tue Jan 25 05:28:39 UTC 2005
I agree with Myk. The FAC proposal makes more sense for me, from all
points of view.
Vlad.
Myk Melez wrote:
> Christopher Hicks wrote:
>
>> Bah. Your conclusion of what's necessary here seems to be based on a
>> very selective view of the universe. Several "necessities" stick out
>> when considering a custom fields implemention based on abstraction at
>> the database level:
>>
>> (1) People can add as many custom fields as they want without
>> worrying about reaching the maximum record size of their database
>
> Under my fields-as-columns (FAC) proposal, fields can live in their
> own tables, and there is no limit to the number of tables per database
> in MySQL, so this isn't an issue between my proposal and Sean's
> fields-as-data (FAD) proposal.
>
>> (2) Code for dealing with custom fields is going to need to have a
>> goodly portion of the abstraction tables for keeping track of stuff
>> anyways.
>
> I'm not sure what you mean by abstraction tables, but both proposals
> will require some meta-data about fields to be stored in fielddefs,
> just as we already do for standard fields, and will store lists of
> possible values for some fields in separate tables, just as we already
> do for standard fields like component and should be doing for fields
> like op_sys.
>
> That doesn't mean we should store all meta-data as data. We should
> use the right tool for the job, as we have already done with standard
> fields, for which we rightly use columns.
>
>> (3) Since queries that involve custom fields will now have to be
>> written on the fly they are less able to be optimized by using a
>> database that can deal with prepare() usefully.
>
> How does FAC require queries to be written "on the fly" in a less
> optimizable way, reducing performance on prepare()-happy databases?
> Can you demonstrate this?
>
>> (4) Since queries involving custom fields are going to take a few
>> database hits to figure out what the field names so the query could
>> be written you end up with cases where 1 query turns into 4 queries.
>> If the database is across a WAN from the bugzilla instance the effect
>> of multiple queries where there were one will be more noticable.
>
> Sure, more queries is slower. But FAC would use less queries overall,
> and simpler ones at that.
>
> Consider a simple query on a single custom field "foo" where the user
> wants bugs where foo=bar. With FAC, we search for bugs where the
> "foo" column contains "bar". With FAD, we look up the field ID for
> "foo" and then search for bugs where the "field_id" column contains
> that ID and the "value" column contains "bar" (or do it in one query
> with a join).
>
> Even in the worst case, when you couldn't infer the column name from
> the form field name, FAC lookups would only be equal to, not worse
> than, FAD lookups. And if these lookups mattered (which Sean claims
> they don't), the list of custom fields and associated identifiers
> would get cached under either proposal the same way we cache
> components and versions.
>
>> (5) Custom fields should be able to be implemented without the
>> bugzilla user having database privs to alter tables.
>
> FAD is even more insecure in this regard, since a compromised Bugzilla
> user account that wasn't allowed to alter tables would still be able
> to alter custom fields under that proposal (unless we implemented
> table/column-specific privileges for that account, which would be more
> work and complexity--and thus risk).
>
> Nevertheless, note that the Bugzilla user account already has such
> privileges today (checksetup.pl uses them to set up and update the
> database schema), and even if we took them away, the Bugzilla
> administrators, to whom we will entrust the creation of custom fields,
> will certainly retain those privileges via a separate account.
>
>> Myk - it sounds like you're basing the decision on what way to go
>> here totally based on performance and I think there's a lot more that
>> should go into this decision.
>
> To the contrary, my proposal is based on much more than performance.
> My previous email was about performance only because that was Sean's
> primary argument against it (he thought my proposal would be slower
> and offered data to support his conclusion--I ran his tests myself and
> found the opposite was true).
>
> I think we should use real columns for custom fields because:
>
> 1. that's what they're there for;
>
> Custom fields are no different from standard fields in how
> they're used (queried, displayed, updated, etc.), and columns
> were designed for this express purpose when database systems
> were developed. Given that they've been used to represent
> "fields" of all kinds for decades, and that we've used them in
> Bugzilla to represent the standard fields for over five years,
> they're a mature and proven technology for doing what we want
> and likely to be better than any new mechanism we come up with
> which represents fields as data.
>
> 2. then they work the same as standard fields;
>
> Custom fields and standard fields are both used (queried,
> displayed, updated) in much the same way, and using the same
> technology to store them means we can use the same code in many
> cases (and the same kind of code in others) to access and
> manipulate them, making the source simpler, more robust, and
> easier to develop.
>
> 3. it makes them significantly faster;
>
> Per my tests and standard database design theory, real columns
> are much faster than data columns.
>
> -myk
>
More information about the developers
mailing list