MySQL user's Conference

Fri Apr 18 02:59:59 UTC 2003

On Thu, Apr 17, 2003 at 07:26:08PM -0700, Myk Melez wrote:
> 
> 4.0 expands on these with transactions available in the default 
> distribution, much easier replication, and boolean full-text search 
> ("+some_word -some_other_word") along with multi-table delete (DELETE 
> FROM ... JOIN ...).

multi-table delete isn't really useful for us, bccase its not portable.
If you have subselects, then you can get teh same functionality
trivially.

> 
> 4.1 will include partial sub-query support, SSL encryption for 
> connections to the server, full-text search optimizations, and prepared 
> statements (increases efficiency of multiple syntactically-equivalent 
> queries run consecutively).

Prepared statements do sound interesting. Postgres has them, but not at
the protocol level, although thats planned for the next release.

> User-Defined Functions
> 
> User-defined functions are useful for things MySQL doesn't provide, 
> especially when extracting data from the database, processing it, and 
> then putting it back into the database is expensive.  Can be written in 
> C++, Perl, or presumably other languages, but doesn't make much sense to 
> write them in scripted languages; you might as well just do them in your 
> app.

thats not actually true. Its useful for abstraction, plus you can run
teh comparison at the DB level, eg as part of a VIEW column, or as a
constraint on the query. If its not speed criticial, then perl would be
a lot easier than c++. PL/SQL would be easier, mind you - are they
planning PL/SQL support (or some varient)? If it is speed critical, a
functional index would be nice.

> Subqueries
> 
> There are two kinds of subqueries: dependent/correlated, which rely on 
> tables/columns from parent query (and execute once for each row of the 
> parent query), and independent/noncorrelated, which don't and tend to be 
> more efficient.  Joins are generally more performant than subqueries at 
> the moment in 4.1; subquery optimization is scheduled for 5.0.

Hmm. That makes it sound like subqueries won't be usable until 5.0. This
is similar to the problems postges has (but which are fixed in CVS),
where it does a full nested loop join for subqueries.

> 
> Types & Examples (in pseudo-code):
> 
>    * derived tables: SELECT * FROM foo, (SELECT ...) (cannot be dependent)

Are they allowing SELECT a, (SELECT name FROM lookup_tab WHERE id=b)
FROM table? ie with the subselect in the from clause?

> InnoDB
> 
> InnoDB is a table type with full ACID (transactional) compliance.  It 
> was built for speed by a former mathematician who thought he could beat 
> commercial enterprise-class databases at the game.  Unofficial 
> benchmarks show up to 2-4 times improvement over commercial 
> alternatives.

> InnoDB apparently 
> has some transaction isolation options PostGreSQL doesn't have (don't 
> have details on this).

I suspect that they're talking about READ UNCOMMITTED mode, which means
that you can see uncommitted transactions which may end up rolled back
later. Its not exactly something which is very useful from a data
concurrency PoV.

> 
> Perl DBI was built for speed and is pretty efficient, but you have to 
> know how to use it right.  For speed:

<snip> We do most of that now, at least in the new code.

> For reliability:
> 
>    * use DBI->connect_cached

I'm planning to use Apache::DBI instead, mainly because it'll detect
dead connections on startup for you, and do ->rollback, plus you can set
up the connections when the server is forked. Its aso theoretically
possible to share connections if using a threaded mod_perl on apache2,
but thats not supported ATM by the DBI stuff, and we don't support
mod_perl, so its not really important ATM. There is talk of margine
Apache::DBI support into the main DBI code, and abstracting the 'pool'
interface.

Our current connect stuff is abstracted enough that its really really
easy to change, anyway.

>    * don't use DBD::mysql < 2.1026 with locks (because of its
>      auto-reconnect feature, which you can't disable without recompiling)

Hmmm. Do we need to up our requirements, or is that only an issue if you
use connect_cached/Apache::DBI ?

> Optimizing MySQL
> (Jeremy Zawodny)
> 
>    * MySQL is tuned for small-medium data sets and systems by default. 
>      If you have lots of memory you need to configure it to use it.

Is there tuning we can do? bmo has massive ammounts of RAM, right - its
just CPU bound.

>    * Compress data if possible.  gzip is quick, cheap, and makes things
>      faster, but you can't use it if you want to search the data.

That doesn't apply to us, except for attachments, and I don't think that
that woudl help anything we do.

>    * InnoDB can be slower with large data than MyISAM, but it's
>      generally about as fast.

I have that test case from a while back - someone want to point the
mysql folks to it?

http://bugzilla.org/cgi-bin/mj_wwwusr?list=developers&brief=on&func=archive-get-part&extra=200302/118

I am very interested to see if this can be tuned, or if 4.0 improves
that.

>    * Understand the leftmost prefix rule for indexes, and note that
>      MySQL will only ever use one index per table, so you need to
>      create compound column indexes if you want it to use two indexes
>      (f.e. how about product_id and bug_status in Bugzilla?)

We do too many ad-hoc queries for that to work. (Or, more precisly, its
the ad-hoc queries which are the ones with the problem) This is a very
serious limitation of mysql. Its not important if you only have one
join, but we have lots. The fact that mysql only appears to be doing
nested loop joins (at least in 3.23.x) also shows up as a problem on
some of the buglist.cgi stuff.

What happened to the plan to give developers access to the bmo data? I'd
love to run some real pg vs mysql comparision tests.

>    * Don't store computable data.

*cough* votes/keywords cache *cough*. Which, btw, is where a function to
get the sorted list of keywords from a bug_id (for display) would be
very useful. We can't use a join because that won't sort it, so we could
just SELECT bug_id, get_keyword_string(bug_id), .... where
get_keyword_string just concatentates SELECT name FROM keywords WHERE
bug_id=aBugId ORDER BY UPPER(name), or something along those lines.

>    * Use a UNION to break a query not using two indexes into two, each
>      using one.

We really can't do that for buglist.cgi. Plus UNION is a new 4.0 thing,
so we can't use it at all yet.

Bradley