<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

  <title></title>

</head>

<body>

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

<title></title>

Last week I attended the MySQL user's conference.  I attended a number

of technical sessions on features and how to use them.  I also attended

sessions on open source licensing strategies and the MySQL development

methodology and business model, and I got to demonstrate Bugzilla to

some MySQL AB employees looking into replacements for their current bug

tracking systems.  Here's a summary of important things I learned:<br>

<br>

<br>

State of the Code & Future Plans<br>

<br>

The project is working on four releases right now: 3.23 (the older

stable release), 4.0 (the current stable release), 4.1 (in alpha,

scheduled for beta later this year), and 5.0 (pre-alpha, scheduled for

alpha soon after 4.1 goes beta).  The 3.23 stable release is still

supported but will only be updated to fix critical bugs.  <br>

<br>

3.23 introduced a number of major features including transactions (in

the "max" distribution), replication (for mirroring databases on

multiple machines), and full-text search (Google-like searches).<br>

<br>

4.0 expands on these with transactions available in the default

distribution, much easier replication, and boolean full-text search

("+some_word -some_other_word") along with multi-table delete (DELETE

FROM ... JOIN ...).<br>

<br>

4.1 will include partial sub-query support, SSL encryption for

connections to the server, full-text search optimizations, and prepared

statements (increases efficiency of multiple syntactically-equivalent

queries run consecutively).<br>

<br>

5.0 will have full sub-query support and optimizations, referential

integrity for MyISAM tables, and possibly stored procedures.  5.0 is

the first release that will be targeted at the enterprise market.<br>

<br>

Planning for 5.1 is underway.<br>

<br>

<br>

Development Methodology & Goals<br>

<br>

Their four "commitments" in order of highest to lowest priority are

speed, reliability, ease of use, and economy.  Developers can't check

in code without checking in documentation for it.  Their release cycle

has four stages:<br>

<ul>

  <li>pre-alpha (anything goes);</li>

  <li>alpha (stable, but features and API may change);</li>

  <li>beta (no changes to features and API except by request of

important paying customer);</li>

  <li>production/stable (only bug fixes).</li>

</ul>

They have a 30 day development cycle with five objectives for each

developer per month.  They are looking to shorten their release

cycles.  They use Bitkeeper for source code and revision control

because Monty (the lead developer) thinks it's better than CVS.<br>

<br>

<br>

Business<br>

<br>

MySQL AB is the company developing MySQL.  They have 65 employees and

12 full-time developers.  Innobase OY is the company developing the

InnoDB storage engine.  They have one and a half employees, one of whom

is the founder.  Because of their dual GPL/commercial license, they get

very little community development, a common problem with such licenses. 

They are a developer-driven company, but sales is gaining influence

over time.<br>

<br>

<br>

User-Defined Functions<br>

<br>

User-defined functions are useful for things MySQL doesn't provide,

especially when extracting data from the database, processing it, and

then putting it back into the database is expensive.  Can be written in

C++, Perl, or presumably other languages, but doesn't make much sense

to write them in scripted languages; you might as well just do them in

your app.  Simple API with f.e. Init(), Request() (called once per

row), DeInit() functions.  Used by Slashdot to improve performance,

f.e. by implementing customized high-performance full-text search.

<a class="moz-txt-link-freetext" href="http://software.tangent.org/">http://software.tangent.org/</a><br>

<br>

<br>

Subqueries<br>

<br>

There are two kinds of subqueries: dependent/correlated, which rely on

tables/columns from parent query (and execute once for each row of the

parent query), and independent/noncorrelated, which don't and tend to

be more efficient.  Joins are generally more performant than subqueries

at the moment in 4.1; subquery optimization is scheduled for 5.0.<br>

<br>

Types & Examples (in pseudo-code):<br>

<ul>

  <li>derived tables: SELECT * FROM foo, (SELECT ...) (cannot be

dependent)<br>

  </li>

  <li>expressions (single values or rows): SELECT * FROM addresses

WHERE state = (SELECT abbr FROM states WHERE name = 'California')<br>

  </li>

  <li>booleans: EXISTS, IN, ALL, ANY, SOME (synonym for ANY)<br>

  </li>

  <li>UNIQUE, MATCH, and FOR are currently unimplemented<br>

  </li>

</ul>

<br>

<br>

InnoDB<br>

<br>

InnoDB is a table type with full ACID (transactional) compliance.  It

was built for speed by a former mathematician who thought he could beat

commercial enterprise-class databases at the game.  Unofficial

benchmarks show up to 2-4 times improvement over commercial

alternatives.  4.1 will bring performance improvements in edge cases,

5.0 will bring transaction savepoints.  Developers may implement

multiple tablespaces so tables can be spread across multiple disks, and

other plans include BLOB prefix indexes, 2-phase commit, full-text

indexes, and auto lockless failover in replication.  InnoDB apparently

has some transaction isolation options PostGreSQL doesn't have (don't

have details on this).<br>

<br>

<br>

Perl DBI<br>

(Tim Bunce)<br>

<br>

Perl DBI was built for speed and is pretty efficient, but you have to

know how to use it right.  For speed:<br>

<ul>

  <li>use prepare() (doesn't work until MySQL 4.1, but you should use

it in preparation)</li>

  <li>use prepare() and execute() in speed-critical loops instead of

do()</li>

  <li>select as reference, especially if columns increase</li>

  <li>try fetchall_arrayref to see if it's faster</li>

  <li>use RaiseError instead of checking return values</li>

  <li>bind columns<br>

  </li>

  <li>beware copying data into functions</li>

  <li>use operators (grep, map) where possible instead of loops</li>

  <li>profile with # DBI_PROFILE=x test.pl where "x" is 1-6</li>

</ul>

For reliability:<br>

<ul>

  <li>use DBI->connect_cached</li>

  <li>don't use DBD::mysql < 2.1026 with locks (because of its

auto-reconnect feature, which you can't disable without recompiling)</li>

</ul>

Misc:<br>

<ul>

  <li>use $h->{TraceLevel} = x where x = 0,1,2,3 etc. to trace

execution</li>

  <li>use Taint, TaintIn, TaintOut settings to turn on/off tainting</li>

  <li>use $dbh->quote_identifier(space, table, column, attributes)

to quote "space.table.column" references</li>

  <li>use DBD::Multiplex to deal with mirrors</li>

  <li>use $dbh->clone for new database handles</li>

  <li>finish() is most misunderstood method in API; really means

discard_pending_results<br>

  </li>

</ul>

<br>

<br>

Optimizing MySQL<br>

(Jeremy Zawodny)<br>

<br>

<ul>

  <li>MySQL is tuned for small-medium data sets and systems by

default.  If you have lots of memory you need to configure it to use it.</li>

  <li>Use MySQL super smack for benchmarking.</li>

  <li>Don't use bigger columns than you need, use NOT NULL (helps in

edge cases) and fixed sizes where possible.</li>

  <li>Compress data if possible.  gzip is quick, cheap, and makes

things faster, but you can't use it if you want to search the data.</li>

  <li>InnoDB can be slower with large data than MyISAM, but it's

generally about as fast.</li>

  <li>Understand the leftmost prefix rule for indexes, and note that

MySQL will only ever use one index per table, so you need to create

compound column indexes if you want it to use two indexes (f.e. how

about product_id and bug_status in Bugzilla?)</li>

  <li>Index hashes instead of large values (f.e. URLs).</li>

  <li>Don't store computable data.</li>

  <li>Keep clients near the server, since network latency can be a

killer.</li>

  <li>Keep primary keys short for InnoDB.</li>

  <li>Pick the fastest driver.</li>

  <li>Simplify WHERE clauses and don't compute in them.</li>

  <li>The most important variables to tune are the query cache, table

buffer, and table cache.</li>

  <li>2.4.10 - 2.4.15 of Linux kernel are bad for MySQL.</li>

  <li>Use SQL_CALC_ROWS() and SELECT_FOUND_ROWS().</li>

  <li>Use a UNION to break a query not using two indexes into two, each

using one.<br>

  </li>

</ul>

<br>

</body>

</html>