<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

Sean McAfee wrote:

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">Myk:

  </pre>

  <blockquote type="cite">

    <pre wrap="">*sigh*, ok, I'll do some testing next week when I get back from 

vacation, although I really think the burden of proof should be on you, 

given that you're the one suggesting we overturn thirty years of RDBMS 

and relational database theory, design, and practical usage,

    </pre>

  </blockquote>

  <pre wrap=""><!---->

OK, I take exception to this.  Am I really such a maverick?  Most of the

people who have stated an opinion on the subject have expressed approval

of my design.  Can we all be as naive as you say?

  </pre>

</blockquote>

I didn't say you or your supporters were naive, nor do I think you

are.  I only said that your suggestion is contrary to proven general

and Bugzilla-specific database design principles, and thus the burden

is on you to prove its superiority.<br>

<br>

And I disagree that most people have approved of your design.  I count

only Joel Peshkin, Maxwell Kanat-Alexander, Shane H. W. Travis,

Christopher Hicks as having expressed a clear opinion in support of

your solution in this thread, while Gervase Markham and Vlad Dascalu

both seem to oppose it, and John Fisher, Kevin Benton, Bradley Baetz,

and Nick Barnes have not expressed a clear position either way.<br>

<br>

But even if all of those people supported your position, the sample

size is too small for it to prove that your solution is preferable in

the face of credible contrary evidence from the field.  We're a small

group, and we could all be incorrect.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">Custom fields are a completely new kind of beast.  There is no precedent for

them in Bugzilla's development history.  (Not in the core distribution,

anyway; some partial solutions are attached to bug 91037.  I don't think

that's what you're talking about, though.)  There's no a priori reason to

apply past Bugzilla development techniques to them.

  </pre>

</blockquote>

Custom fields aren't very different from standard fields, and a number

of our standard fields will become custom fields or reuse custom fields

code once it's done.  But even if they weren't similar, there is plenty

of precedent for them, as installations have been adding custom fields

for years, often with real columns.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">What is this "column metaphor"?  Your design treats custom fields very

differently than standard fields, applying more of a "table metaphor".

  </pre>

</blockquote>

My design uses real columns to represent fields, accounting for

sparsity variance by putting dense columns into the bugs table (like

the standard "severity" field, which lives in the bugs.bug_severity

column) and sparse columns into their own tables (like the standard

"duplicate of bug #"  field, which  lives in the duplicates.dupe_of

column).  In both cases, however, each custom field is represented by

its own unique database column, so the term "column metaphor" is apropo.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">They're modifiable via SQL statements just as 

easily as the data within them is.  And while Bugzilla doesn't modify 

its schema very much today, there's nothing inherently more dangerous 

about it doing so.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

But much less elegant.  To paraphrase Einstein, I think the schema ought to

be as simple as possible, but no simpler.  Transmeta's Bugzilla installation

has 187 custom fields.  A schema with in excess of 250 tables is not simple.

Imagine trying to manage a schema of that size with a visual tool!

  </pre>

</blockquote>

Elegance, in this case, truly seems to be in the eye of the beholder. 

To my mind, FAC is more elegant because it uses the database system as

it was designed to be used, and that makes it simpler (although not too

simple to be useful).  And while it might be difficult to manage a

schema with 250 tables using certain visual tools (f.e. an ER

diagrammer), it wouldn't be with others (f.e. a list representation of

tables).  Plus, at least it's possible to visualize FAC with standard

visual tools, while it's not possible for FAD.  And visual overload in

ER diagrammers can be overcome by limiting the scope of tables

visualized.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">Later, [Myk] responding to Christopher Hicks:

  </pre>

  <blockquote type="cite">

    <pre wrap="">That doesn't mean we should store all meta-data as data.  We should use 

the right tool for the job, as we have already done with standard 

fields, for which we rightly use columns.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Yes, that is right, because all bugs share the same standard fields.  That

condition is violated by custom fields.</pre>

</blockquote>

Actually some standard fields are used only by certain products on some

installations, and some installations never use certain standard

fields.  That hasn't prevented those fields from serving Bugzilla well,

so it is no reason to throw that storage model out (although it's worth

tweaking it to store sparse fields in separate tables and converting

frequently unused fields to custom fields).<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">And, again, your proposal

implements custom fields very differently from the way standard fields are

implemented, anyway.

  </pre>

</blockquote>

Actually my proposal implements custom fields as columns within the

bugs table or within their own table, just as Bugzilla does today with

standard fields.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">Querying against N custom fields results in joins against N tables in your

scheme.  In mine, it results in joins against T(N), the number of distinct

datatypes among those N fields.  The total number of joins among those

T(N) tables is N including repeated joins, but I suspect that it is still

cheaper to access fewer tables.

  </pre>

</blockquote>

My tests, which include the queries you designed to showcase the

performance of your approach, indicate otherwise.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">Consider also simply retrieving custom field data.  For a bug with twenty

custom short string fields, your design would require SELECTs against

twenty different tables; mine requires only one.

  </pre>

</blockquote>

This isn't strictly accurate, since under my proposal only sparse

fields will live in separate tables.  But even in a worst-case

scenario, retrieving data for a single bug is insignificantly expensive

under both designs.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">I haven't analyzed your tests in detail yet--it's been a hassle getting

MySQL 4 to peacefully coexist with my previous MySQL 3 system.  (By the way,

if my design ignores thirty years of database theory, as you assert, why

does yours require a recent version of MySQL to best it?)

  </pre>

</blockquote>

First of all, I said your design contradicts thirty years of database

design theory, not performance optimization principles.  As I said from

the beginning, performance isn't my only consideration when choosing a

design.  Nevertheless, I expect a design conformant with standard

database design principles to perform better, too, and my tests show

that it does, even on MySQL 3.x (see below).<br>

<br>

Second of all, my design does not require a recent version of MySQL.  I

suggested MySQL 4.x+ because 3.x contains an inefficient fulltext

indexer, so the fulltext indexes take too long to create on tables of

the size in the test.  But you can run my tests on MySQL 3.x without

creating fulltext indexes, and the results are the same: FAC wins in

most cases, and with bigger margins.<br>

<br>

(I've attached new versions of construct-tables.pl and

test-performance.pl which don't create/use fulltext indexes by default

(specify "--fulltext" on the command line to turn them back on) and

work with MySQL 3.x, along with some test results from the machine

"myk".)<br>

<br>

Third of all, MySQL 3.x search results are relatively unimportant,

because MySQL AB no longer recommends 3.x except in special cases (they

now recommend 4.1, two generations newer than 3.x), Bugzilla will

require MySQL 4.x in the near future (probably before a custom fields

implementation lands), and 4.x is already preferable for Bugzilla today

due to stability and fulltext performance.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">Can you describe exactly what was wrong with my test that it went from being

3-4 times better to being nearly an order of magnitude worse?  I frankly

find that hard to believe.

  </pre>

</blockquote>

I cannot.  I took the exact queries you ran, fixed a number of syntax

errors in them that prevented them from running at all on my machines,

plugged them into an automated script that runs each query six times

(discarding the first result and averaging the rest), and flagged them

SQL_NO_CACHE to prevent the cache from skewing the results (but this

would not have affected comparisons between your tests and mine, since

MySQL version 3.x doesn't have a query cache).<br>

<br>

I ran the script on two different machines (holepunch and megalon) and

reported their results.  I subsequently ran the script on a third

machine (myk) and got similar results.  I have since run the tests

against MySQL version 3.x on myk (skipping the fulltext index tests)

and got similar results (attached).<br>

<br>

So I cannot explain the variance, but I'm confident in the reliability

of my testing code and results, and they're more apparently automated

and have been tested on more machines, so I think it's your results

which are errant.  In any case, I'm happy to provide all comers with

the scripts for duplicating my tests, and I look forward to any

additional results (and tests, and refinements of tests) anyone else

can provide.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">I posted to comp.databases yesterday seeking advice regarding the merits of

our two designs.  The subject of the thread is "Best database schema for

object fields".  To date, the only poster to offer substantial criticism has

stated "They both stink", but he did provide the useful information that the

model I implemented has a name, EAV, or "Entity-Attribute-Value".</pre>

</blockquote>

<a class="moz-txt-link-freetext" href="http://groups-beta.google.com/group/comp.databases/browse_frm/thread/aa5eca674b5a2073/a38a196ace5ef6b5#a38a196ace5ef6b5">http://groups-beta.google.com/group/comp.databases/browse_frm/thread/aa5eca674b5a2073/a38a196ace5ef6b5#a38a196ace5ef6b5</a><br>

<br>

Actually Celko said that both models are EAV, but he bases that claim

on an error in your explanation of FAC which I've corrected in a

followup.  In reality, FAC is a standard ER modeling approach.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">Armed

with that knowledge, I was able to Google several articles on the subject.

A few were highly critical of EAV, but most were more balanced, listing

advantages and disadvantages, and describing in what situations it's

appropriate.</pre>

</blockquote>

I did a similar search and found that EAV modeling has applicability in

some specialized scientific and medical problem domains which

experience extremely numerous, highly variable, and very sparse

fields.  It is, however, rarely used otherwise, while FAC, which models

dense attributes as bug table columns and sparse attributes as columns

in separate tables, employs the standard ER modeling approach which is

widely used by the majority of relational databases, especially those

like Bugzilla.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">In all cases, though, the data model EAV was compared against

was the classic all-columns-in-one-table approach; I could find no example

of your one-table-per-field design.  (The crotchety poster in comp.databases

described both of our designs as variants of EAV, but I can't really see how

that's true.)  So, it's hard to accept your assertion that

one-field-per-table is "what columns are for".  Can you refer me to any

systems that use your design?

  </pre>

</blockquote>

Bugzilla, Bonsai, and Despot, to name only some Mozilla apps.  But of

course now I'm talking about the ER modeling approach outlined above,

not some one-table-per-field approach which I have never advocated.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <pre wrap="">If I'm not mistaken, the long-term plan is to migrate standard fields to

custom fields, so short-term discrepancies are not really relevant.

  </pre>

</blockquote>

Actually the long-term plan is to migrate some but not all standard

fields to custom fields (some fields cannot be custom because their

processing is too specialized, while others are common to [virtually]

all bug tracking and so should be standard), and which fields are

standard/custom can and will change over time (in both directions), so

discrepancies between the two will remain relevant in all terms.<br>

<br>

<blockquote cite="mid20050126183723.05ECDBC77@mail.schwag.org"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">     Per my tests and standard database design theory, real columns are

     much faster than data columns.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Again, I find this hard to believe.  I suspect either some flaw in your test

program, or some unfair advantage in the limited nature of the tests.

  </pre>

</blockquote>

Perhaps.  I doubt the flaw is in my test program, since it uses the

same function to run and time all queries, whether FAC or FAD.  But the

tests it runs are indeed pretty limited (consisting only of your tests

and a few of my own), so it's quite possible for them to be unfairly

advantageous--in either direction.  I welcome any improvement in them,

but I still think the onus should be on you to prove EAV superiority,

not on me to disprove it, given its nonstandard approach, and

especially given documentation on the web about its niche value for

certain extreme applications.<br>

<br>

-myk<br>

<br>

</body>

</html>