<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
Last week I attended the MySQL user's conference. I attended a number
of technical sessions on features and how to use them. I also attended
sessions on open source licensing strategies and the MySQL development
methodology and business model, and I got to demonstrate Bugzilla to
some MySQL AB employees looking into replacements for their current bug
tracking systems. Here's a summary of important things I learned:<br>
<br>
<br>
State of the Code & Future Plans<br>
<br>
The project is working on four releases right now: 3.23 (the older
stable release), 4.0 (the current stable release), 4.1 (in alpha,
scheduled for beta later this year), and 5.0 (pre-alpha, scheduled for
alpha soon after 4.1 goes beta). The 3.23 stable release is still
supported but will only be updated to fix critical bugs. <br>
<br>
3.23 introduced a number of major features including transactions (in
the "max" distribution), replication (for mirroring databases on
multiple machines), and full-text search (Google-like searches).<br>
<br>
4.0 expands on these with transactions available in the default
distribution, much easier replication, and boolean full-text search
("+some_word -some_other_word") along with multi-table delete (DELETE
FROM ... JOIN ...).<br>
<br>
4.1 will include partial sub-query support, SSL encryption for
connections to the server, full-text search optimizations, and prepared
statements (increases efficiency of multiple syntactically-equivalent
queries run consecutively).<br>
<br>
5.0 will have full sub-query support and optimizations, referential
integrity for MyISAM tables, and possibly stored procedures. 5.0 is
the first release that will be targeted at the enterprise market.<br>
<br>
Planning for 5.1 is underway.<br>
<br>
<br>
Development Methodology & Goals<br>
<br>
Their four "commitments" in order of highest to lowest priority are
speed, reliability, ease of use, and economy. Developers can't check
in code without checking in documentation for it. Their release cycle
has four stages:<br>
<ul>
<li>pre-alpha (anything goes);</li>
<li>alpha (stable, but features and API may change);</li>
<li>beta (no changes to features and API except by request of
important paying customer);</li>
<li>production/stable (only bug fixes).</li>
</ul>
They have a 30 day development cycle with five objectives for each
developer per month. They are looking to shorten their release
cycles. They use Bitkeeper for source code and revision control
because Monty (the lead developer) thinks it's better than CVS.<br>
<br>
<br>
Business<br>
<br>
MySQL AB is the company developing MySQL. They have 65 employees and
12 full-time developers. Innobase OY is the company developing the
InnoDB storage engine. They have one and a half employees, one of whom
is the founder. Because of their dual GPL/commercial license, they get
very little community development, a common problem with such licenses.
They are a developer-driven company, but sales is gaining influence
over time.<br>
<br>
<br>
User-Defined Functions<br>
<br>
User-defined functions are useful for things MySQL doesn't provide,
especially when extracting data from the database, processing it, and
then putting it back into the database is expensive. Can be written in
C++, Perl, or presumably other languages, but doesn't make much sense
to write them in scripted languages; you might as well just do them in
your app. Simple API with f.e. Init(), Request() (called once per
row), DeInit() functions. Used by Slashdot to improve performance,
f.e. by implementing customized high-performance full-text search.
<a class="moz-txt-link-freetext" href="http://software.tangent.org/">http://software.tangent.org/</a><br>
<br>
<br>
Subqueries<br>
<br>
There are two kinds of subqueries: dependent/correlated, which rely on
tables/columns from parent query (and execute once for each row of the
parent query), and independent/noncorrelated, which don't and tend to
be more efficient. Joins are generally more performant than subqueries
at the moment in 4.1; subquery optimization is scheduled for 5.0.<br>
<br>
Types & Examples (in pseudo-code):<br>
<ul>
<li>derived tables: SELECT * FROM foo, (SELECT ...) (cannot be
dependent)<br>
</li>
<li>expressions (single values or rows): SELECT * FROM addresses
WHERE state = (SELECT abbr FROM states WHERE name = 'California')<br>
</li>
<li>booleans: EXISTS, IN, ALL, ANY, SOME (synonym for ANY)<br>
</li>
<li>UNIQUE, MATCH, and FOR are currently unimplemented<br>
</li>
</ul>
<br>
<br>
InnoDB<br>
<br>
InnoDB is a table type with full ACID (transactional) compliance. It
was built for speed by a former mathematician who thought he could beat
commercial enterprise-class databases at the game. Unofficial
benchmarks show up to 2-4 times improvement over commercial
alternatives. 4.1 will bring performance improvements in edge cases,
5.0 will bring transaction savepoints. Developers may implement
multiple tablespaces so tables can be spread across multiple disks, and
other plans include BLOB prefix indexes, 2-phase commit, full-text
indexes, and auto lockless failover in replication. InnoDB apparently
has some transaction isolation options PostGreSQL doesn't have (don't
have details on this).<br>
<br>
<br>
Perl DBI<br>
(Tim Bunce)<br>
<br>
Perl DBI was built for speed and is pretty efficient, but you have to
know how to use it right. For speed:<br>
<ul>
<li>use prepare() (doesn't work until MySQL 4.1, but you should use
it in preparation)</li>
<li>use prepare() and execute() in speed-critical loops instead of
do()</li>
<li>select as reference, especially if columns increase</li>
<li>try fetchall_arrayref to see if it's faster</li>
<li>use RaiseError instead of checking return values</li>
<li>bind columns<br>
</li>
<li>beware copying data into functions</li>
<li>use operators (grep, map) where possible instead of loops</li>
<li>profile with # DBI_PROFILE=x test.pl where "x" is 1-6</li>
</ul>
For reliability:<br>
<ul>
<li>use DBI->connect_cached</li>
<li>don't use DBD::mysql < 2.1026 with locks (because of its
auto-reconnect feature, which you can't disable without recompiling)</li>
</ul>
Misc:<br>
<ul>
<li>use $h->{TraceLevel} = x where x = 0,1,2,3 etc. to trace
execution</li>
<li>use Taint, TaintIn, TaintOut settings to turn on/off tainting</li>
<li>use $dbh->quote_identifier(space, table, column, attributes)
to quote "space.table.column" references</li>
<li>use DBD::Multiplex to deal with mirrors</li>
<li>use $dbh->clone for new database handles</li>
<li>finish() is most misunderstood method in API; really means
discard_pending_results<br>
</li>
</ul>
<br>
<br>
Optimizing MySQL<br>
(Jeremy Zawodny)<br>
<br>
<ul>
<li>MySQL is tuned for small-medium data sets and systems by
default. If you have lots of memory you need to configure it to use it.</li>
<li>Use MySQL super smack for benchmarking.</li>
<li>Don't use bigger columns than you need, use NOT NULL (helps in
edge cases) and fixed sizes where possible.</li>
<li>Compress data if possible. gzip is quick, cheap, and makes
things faster, but you can't use it if you want to search the data.</li>
<li>InnoDB can be slower with large data than MyISAM, but it's
generally about as fast.</li>
<li>Understand the leftmost prefix rule for indexes, and note that
MySQL will only ever use one index per table, so you need to create
compound column indexes if you want it to use two indexes (f.e. how
about product_id and bug_status in Bugzilla?)</li>
<li>Index hashes instead of large values (f.e. URLs).</li>
<li>Don't store computable data.</li>
<li>Keep clients near the server, since network latency can be a
killer.</li>
<li>Keep primary keys short for InnoDB.</li>
<li>Pick the fastest driver.</li>
<li>Simplify WHERE clauses and don't compute in them.</li>
<li>The most important variables to tune are the query cache, table
buffer, and table cache.</li>
<li>2.4.10 - 2.4.15 of Linux kernel are bad for MySQL.</li>
<li>Use SQL_CALC_ROWS() and SELECT_FOUND_ROWS().</li>
<li>Use a UNION to break a query not using two indexes into two, each
using one.<br>
</li>
</ul>
<br>
</body>
</html>