The Interesting Installation Integer Problem

Mon Mar 7 09:32:25 UTC 2005

	I've come across an interesting theoretical problem, today, when
thinking about our new cross-platform installation.

	Basically, here's the problem:

	We have a new Bugzilla::DB::Schema module coming, which is almost
complete. (See
<https://bugzilla.mozilla.org/show_bug.cgi?id=bz-dbschema>). It maps
"generic" types to database-specific types. It's very cool.

	We, the developers, cannot really know how any given database will map
a "generic" type to a database-specific type. There's just no way at
this point to predict that for every database that we might implement. 

	For example, for the PostgreSQL patch, all integers will probably be
the same size, namely, "integer."

	Now, guess how checksetup works? It checks to see if fields have
changed their definition! So let's say that we changed a field from a
"small integer" to a "large integer," and then we wanted to trigger a
change based on that. On MySQL, the triggered change code would run
fine. On PostgreSQL, the triggered change would never run, because a
"small integer" and a "large integer" are the SAME SIZE.

	Now, what if we had a database system where all variable-length text
fields had the same underlying implementation? That is, where varchar ==
tinytext == mediumtext, etc. We couldn't trigger a change on moving
something from varchar to text.

	Thankfully, this is not an urgent issue. We currently have no other
databases besides MySQL to upgrade. Even when we implement PostgreSQL
support, we will only have to truly upgrade starting with 2.22, although
our first 2.21 release may also have some requirements in that area.

	I think we have a few choices that I can think of that will be entirely
future-proof:

	(1) Store a version number somewhere in the database. When we upgrade,
read that version number and run the correct upgrade code.
	Problem: This makes running checksetup against a CVS tree very risky,
where before that was no problem.

	(2) Store the entire current schema itself in the database. That way
we'll always know what the "generic" type of a field is SUPPOSED to be,
even if it's not possible to read out that information.
	Problem: That's a somewhat-decent amount of work.

	Really, to keep our current functionality, I think that #2 is probably
the best solution at this point. With Bugzilla::DB::Schema, this is
somewhat easier. In fact, the easiest way would just be to store a
Data::Dumper->Dump of the generic {schema} hash inside of
Bugzilla::DB::Schema. However, that requires that the internal data
structures of Bugzilla::DB::Schema either:

	(1) Always stay the same.
	(2) Always provide backwards-compatibility.

	Storing the schema in the database will also make my life in
implementing cross-db installation a LOT easier. Right now, in order to
get the definition of a table column in a cross-database fashion, I have
to do a lot of hocus-pocus with DBI.

	Now, any advice on how we can best accomplish all of this in a fully
maintainable and future proof way is very welcome. :-) 

	I think the structures of Bugzilla::DB::Schema (the next version, not
quite this version) should be fine. We'll have to implement a
serialize() and deserialize() function, and when we serialize the data,
we MUST store along with it the version of the schema that's being
stored. I think that we should use a special DB::Schema version, as
opposed to the Bugzilla version itself, since we care about when the
Schema internal format changes, not when Bugzilla itself changes.

	-Max