MySQL user's Conference

Fri Apr 18 02:26:08 UTC 2003

Last week I attended the MySQL user's conference.  I attended a number 
of technical sessions on features and how to use them.  I also attended 
sessions on open source licensing strategies and the MySQL development 
methodology and business model, and I got to demonstrate Bugzilla to 
some MySQL AB employees looking into replacements for their current bug 
tracking systems.  Here's a summary of important things I learned:

State of the Code & Future Plans

The project is working on four releases right now: 3.23 (the older 
stable release), 4.0 (the current stable release), 4.1 (in alpha, 
scheduled for beta later this year), and 5.0 (pre-alpha, scheduled for 
alpha soon after 4.1 goes beta).  The 3.23 stable release is still 
supported but will only be updated to fix critical bugs. 

3.23 introduced a number of major features including transactions (in 
the "max" distribution), replication (for mirroring databases on 
multiple machines), and full-text search (Google-like searches).

4.0 expands on these with transactions available in the default 
distribution, much easier replication, and boolean full-text search 
("+some_word -some_other_word") along with multi-table delete (DELETE 
FROM ... JOIN ...).

4.1 will include partial sub-query support, SSL encryption for 
connections to the server, full-text search optimizations, and prepared 
statements (increases efficiency of multiple syntactically-equivalent 
queries run consecutively).

5.0 will have full sub-query support and optimizations, referential 
integrity for MyISAM tables, and possibly stored procedures.  5.0 is the 
first release that will be targeted at the enterprise market.

Planning for 5.1 is underway.

Development Methodology & Goals

Their four "commitments" in order of highest to lowest priority are 
speed, reliability, ease of use, and economy.  Developers can't check in 
code without checking in documentation for it.  Their release cycle has 
four stages:

    * pre-alpha (anything goes);
    * alpha (stable, but features and API may change);
    * beta (no changes to features and API except by request of
      important paying customer);
    * production/stable (only bug fixes).

They have a 30 day development cycle with five objectives for each 
developer per month.  They are looking to shorten their release cycles.  
They use Bitkeeper for source code and revision control because Monty 
(the lead developer) thinks it's better than CVS.

Business

MySQL AB is the company developing MySQL.  They have 65 employees and 12 
full-time developers.  Innobase OY is the company developing the InnoDB 
storage engine.  They have one and a half employees, one of whom is the 
founder.  Because of their dual GPL/commercial license, they get very 
little community development, a common problem with such licenses.  They 
are a developer-driven company, but sales is gaining influence over time.

User-Defined Functions

User-defined functions are useful for things MySQL doesn't provide, 
especially when extracting data from the database, processing it, and 
then putting it back into the database is expensive.  Can be written in 
C++, Perl, or presumably other languages, but doesn't make much sense to 
write them in scripted languages; you might as well just do them in your 
app.  Simple API with f.e. Init(), Request() (called once per row), 
DeInit() functions.  Used by Slashdot to improve performance, f.e. by 
implementing customized high-performance full-text search. 
http://software.tangent.org/

Subqueries

There are two kinds of subqueries: dependent/correlated, which rely on 
tables/columns from parent query (and execute once for each row of the 
parent query), and independent/noncorrelated, which don't and tend to be 
more efficient.  Joins are generally more performant than subqueries at 
the moment in 4.1; subquery optimization is scheduled for 5.0.

Types & Examples (in pseudo-code):

    * derived tables: SELECT * FROM foo, (SELECT ...) (cannot be dependent)
    * expressions (single values or rows): SELECT * FROM addresses WHERE
      state = (SELECT abbr FROM states WHERE name = 'California')
    * booleans: EXISTS, IN, ALL, ANY, SOME (synonym for ANY)
    * UNIQUE, MATCH, and FOR are currently unimplemented

InnoDB

InnoDB is a table type with full ACID (transactional) compliance.  It 
was built for speed by a former mathematician who thought he could beat 
commercial enterprise-class databases at the game.  Unofficial 
benchmarks show up to 2-4 times improvement over commercial 
alternatives.  4.1 will bring performance improvements in edge cases, 
5.0 will bring transaction savepoints.  Developers may implement 
multiple tablespaces so tables can be spread across multiple disks, and 
other plans include BLOB prefix indexes, 2-phase commit, full-text 
indexes, and auto lockless failover in replication.  InnoDB apparently 
has some transaction isolation options PostGreSQL doesn't have (don't 
have details on this).

Perl DBI
(Tim Bunce)

Perl DBI was built for speed and is pretty efficient, but you have to 
know how to use it right.  For speed:

    * use prepare() (doesn't work until MySQL 4.1, but you should use it
      in preparation)
    * use prepare() and execute() in speed-critical loops instead of do()
    * select as reference, especially if columns increase
    * try fetchall_arrayref to see if it's faster
    * use RaiseError instead of checking return values
    * bind columns
    * beware copying data into functions
    * use operators (grep, map) where possible instead of loops
    * profile with # DBI_PROFILE=x test.pl where "x" is 1-6

For reliability:

    * use DBI->connect_cached
    * don't use DBD::mysql < 2.1026 with locks (because of its
      auto-reconnect feature, which you can't disable without recompiling)

Misc:

    * use $h->{TraceLevel} = x where x = 0,1,2,3 etc. to trace execution
    * use Taint, TaintIn, TaintOut settings to turn on/off tainting
    * use $dbh->quote_identifier(space, table, column, attributes) to
      quote "space.table.column" references
    * use DBD::Multiplex to deal with mirrors
    * use $dbh->clone for new database handles
    * finish() is most misunderstood method in API; really means
      discard_pending_results

Optimizing MySQL
(Jeremy Zawodny)

    * MySQL is tuned for small-medium data sets and systems by default. 
      If you have lots of memory you need to configure it to use it.
    * Use MySQL super smack for benchmarking.
    * Don't use bigger columns than you need, use NOT NULL (helps in
      edge cases) and fixed sizes where possible.
    * Compress data if possible.  gzip is quick, cheap, and makes things
      faster, but you can't use it if you want to search the data.
    * InnoDB can be slower with large data than MyISAM, but it's
      generally about as fast.
    * Understand the leftmost prefix rule for indexes, and note that
      MySQL will only ever use one index per table, so you need to
      create compound column indexes if you want it to use two indexes
      (f.e. how about product_id and bug_status in Bugzilla?)
    * Index hashes instead of large values (f.e. URLs).
    * Don't store computable data.
    * Keep clients near the server, since network latency can be a killer.
    * Keep primary keys short for InnoDB.
    * Pick the fastest driver.
    * Simplify WHERE clauses and don't compute in them.
    * The most important variables to tune are the query cache, table
      buffer, and table cache.
    * 2.4.10 - 2.4.15 of Linux kernel are bad for MySQL.
    * Use SQL_CALC_ROWS() and SELECT_FOUND_ROWS().
    * Use a UNION to break a query not using two indexes into two, each
      using one.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bugzilla.org/pipermail/developers/attachments/20030417/264c3398/attachment.html>