Custom fields schema
Myk Melez
myk at mozilla.org
Tue Jan 25 00:23:59 UTC 2005
Christopher Hicks wrote:
> Bah. Your conclusion of what's necessary here seems to be based on a
> very selective view of the universe. Several "necessities" stick out
> when considering a custom fields implemention based on abstraction at
> the database level:
>
> (1) People can add as many custom fields as they want without worrying
> about reaching the maximum record size of their database
Under my fields-as-columns (FAC) proposal, fields can live in their own
tables, and there is no limit to the number of tables per database in
MySQL, so this isn't an issue between my proposal and Sean's
fields-as-data (FAD) proposal.
> (2) Code for dealing with custom fields is going to need to have a
> goodly portion of the abstraction tables for keeping track of stuff
> anyways.
I'm not sure what you mean by abstraction tables, but both proposals
will require some meta-data about fields to be stored in fielddefs, just
as we already do for standard fields, and will store lists of possible
values for some fields in separate tables, just as we already do for
standard fields like component and should be doing for fields like op_sys.
That doesn't mean we should store all meta-data as data. We should use
the right tool for the job, as we have already done with standard
fields, for which we rightly use columns.
> (3) Since queries that involve custom fields will now have to be
> written on the fly they are less able to be optimized by using a
> database that can deal with prepare() usefully.
How does FAC require queries to be written "on the fly" in a less
optimizable way, reducing performance on prepare()-happy databases? Can
you demonstrate this?
> (4) Since queries involving custom fields are going to take a few
> database hits to figure out what the field names so the query could be
> written you end up with cases where 1 query turns into 4 queries. If
> the database is across a WAN from the bugzilla instance the effect of
> multiple queries where there were one will be more noticable.
Sure, more queries is slower. But FAC would use less queries overall,
and simpler ones at that.
Consider a simple query on a single custom field "foo" where the user
wants bugs where foo=bar. With FAC, we search for bugs where the "foo"
column contains "bar". With FAD, we look up the field ID for "foo" and
then search for bugs where the "field_id" column contains that ID and
the "value" column contains "bar" (or do it in one query with a join).
Even in the worst case, when you couldn't infer the column name from the
form field name, FAC lookups would only be equal to, not worse than, FAD
lookups. And if these lookups mattered (which Sean claims they don't),
the list of custom fields and associated identifiers would get cached
under either proposal the same way we cache components and versions.
> (5) Custom fields should be able to be implemented without the
> bugzilla user having database privs to alter tables.
FAD is even more insecure in this regard, since a compromised Bugzilla
user account that wasn't allowed to alter tables would still be able to
alter custom fields under that proposal (unless we implemented
table/column-specific privileges for that account, which would be more
work and complexity--and thus risk).
Nevertheless, note that the Bugzilla user account already has such
privileges today (checksetup.pl uses them to set up and update the
database schema), and even if we took them away, the Bugzilla
administrators, to whom we will entrust the creation of custom fields,
will certainly retain those privileges via a separate account.
> Myk - it sounds like you're basing the decision on what way to go here
> totally based on performance and I think there's a lot more that
> should go into this decision.
To the contrary, my proposal is based on much more than performance. My
previous email was about performance only because that was Sean's
primary argument against it (he thought my proposal would be slower and
offered data to support his conclusion--I ran his tests myself and found
the opposite was true).
I think we should use real columns for custom fields because:
1. that's what they're there for;
Custom fields are no different from standard fields in how they're
used (queried, displayed, updated, etc.), and columns were
designed for this express purpose when database systems were
developed. Given that they've been used to represent "fields" of
all kinds for decades, and that we've used them in Bugzilla to
represent the standard fields for over five years, they're a
mature and proven technology for doing what we want and likely to
be better than any new mechanism we come up with which represents
fields as data.
2. then they work the same as standard fields;
Custom fields and standard fields are both used (queried,
displayed, updated) in much the same way, and using the same
technology to store them means we can use the same code in many
cases (and the same kind of code in others) to access and
manipulate them, making the source simpler, more robust, and
easier to develop.
3. it makes them significantly faster;
Per my tests and standard database design theory, real columns are
much faster than data columns.
-myk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20050124/c8f80577/attachment.html>
More information about the developers
mailing list