Using DBI

Bradley Baetz bbaetz at acm.org
Fri Jan 17 14:12:41 UTC 2003


A few days ago, I checked in a reorg for the database code (bug 163290).
Combined with various other patches which went in over the past few
motnhs, this means that we can now use the DBI. All new code should use
the DBI methods; SendSQL and friends are now deprecated. (You need to
update DBI/DBD::mysql for this to work. checksetup should pick that up,
though)

|perldoc DBI| has more information than you ever want to know about the
DBI. There are also numerous docs on the web.

A quick summary:

- Don't use SendSQL (or {Push,Pop}GlobalSQLState/MoreSQLData/etc. They
won't work w/o SendSQL anyway)

- Use the DBI helper functions. They're easier to write, clearer to
understand, and some of them are written in XS (ie C) which makes them
faster. (Postgres doesn't use the C version, and you must have compiled
your DBD:: driver _after_ installing the updated DBI for the C version
to be used)

For example, consider get_product_id:

  sub get_product_id {
    my ($prod) = @_;
    PushGlobalSQLState();
    SendSQL("SELECT id FROM products WHERE name = " . SqlQuote($prod));
    my ($prod_id) = FetchSQLData();
    PopGlobalSQLState();
    return $prod_id;
  }

That routine can now become:

  sub get_product_id {
    my $prod = shift;
    my $dbh = $Bugzilla->instance->dbh; # Explained below
    return $dbh->selectrow_array("SELECT id FROM products WHERE name = "
                                 . SqlQuote($prod));
  }

(Note that this is not optimal; see below)

selectrow_array (and fetchrow_array) return a scalar if called in scalar
context. If there is more than one value in the SELECT, then its
undefined which is returned; so don't do that.

When selecting more than one row of data, there are also selectall_
variants which can put all the data into an arrayref/hashref for you. Be
careful about storing the arrayref somewhere. For really tight loops
where SQL is the main overhead, look into ->bind_columns. That stuff is
a readability vs speed tradoff, and is unlikly to be useful in teh
general case since we don't really hammer the DB in that way. (It should
be sufficient to use the ref stuff)

See DBI docs for details.

The $Bugzilla->instance->dbh stuff is needed in order to actually get
the database handle to use. See Bugzilla.pm for docs.

- Use placeholders

 sub get_product_id {
    my $prod = shift;
    my $dbh = $Bugzilla->instance->dbh;
    return $dbh->selectrow_array("SELECT id FROM products WHERE name=?",
                                 undef,
                                 $prod);
  }

This is not only useful for prepared statements (see below), but also
for quoting. Notice how there is no SQLQuote - DBI 'knows' that since
$prod is a string, it needs to be escaped. This also helps sybase, I
believe, since if we get a number DBI knows not to quote it.

There is one important difference, however. The DBI checks all incoming
values for tainted-ness. Before, we were mainly worried about tainted
data from an sql POV - 'can I put this string here?'. Now we have to
think about it from a logical point of view - 'is this query still OK no
matter what the user has given me?'. The _caller_ should trick_taint,
since it is in the best position to know what it wants. (I'm open for
discuission on that one though. It may end up being handled on a case by
case basis)

(Note that this does not apply to attachment uploads, mainly for
cross-db reasons. Gory details on the DBI list, in the DBD::Pg
discussions)

- Use prepared statements.

Creating a statement handle is relativly expensive. If we're going to
run an sql command lots of times, it makes sense to produce it once and
then just reuse the statement handle. If you have a nested loop, you can
use ->prepare. From within a sub, use prepare_cached:

  sub get_product_id {
    my $prod = shift;
    my $dbh = $Bugzilla->instance->dbh; # Explained below
    my $sth = $dbh->prepare_cached("SELECT id FROM products WHERE name=?");
    return $dbh->selectrow_array($sth, undef, $prod);
  }

Don't use prepare_cached unless the statement is going to be reused
multiple times from somewhere where you can't use prepare - there is DBI
overhead to match the SQL to the prepared handle. However, DBI handles
the 'firsttime' creation of the sth, including recreation/revalidation
if the database has disconnected in the meantime (eg via a subsequent
mod_perl connection)

Notice how we need placeholders for this to work. If we used
concatenation, then we'd have one statement handle for every value
passed in. It would still work, but would be ess efficient, due to the
_cached overhead, and the lack of any benefit.

However, when deciding whether to use _cached or not, remember that even
if most scripts only call get_product_id once, bugzilla will be running
multiple pages at some point, via mod_perl, so it will be a benefit.

As an example, I've attached a benchmark script. The results (+/- 5%)
are:

                  Rate SendSQL concat prepare_cached prepare_nobind prepare
SendSQL         5305/s      --    -4%           -45%           -46% -63%
concat          5540/s      4%     --           -43%           -43% -61%
prepare_cached  9709/s     83%    75%             --            -0% -32%
prepare_nobind  9756/s     84%    76%             0%             -- -31%
prepare        14184/s    167%   156%            46%            45% --

In other words, using prepare makes the SQL 2.67 times as fast as
SendSQL.

Some notes:

a) This obviously doesn't mean that bugzilla will be 2.67 times as fast.
Module load time is the vast majority of stuff.
b) prepare_nobind is faster than prepare_bind, but as mentioned above
its no help unelss the value is constand, in which case you shouldn't be
using a bind val anyway.
c) Where the SQL is complicated, the differences will obviously get
less.
d) The SendSQL sub has the ->finish because selectall_* does. IOW it was
used to make a fair comparison. Without it, SendSQL is marginally faster
than concat.

Comments? I'm sure I've forgotton stuff....

Bradley
-------------- next part --------------
#!/usr/bin/perl -w

use strict;

use Benchmark qw(cmpthese);
use DBI;

my $dbh = DBI->connect("DBI:mysql:host=localhost;database=bugs", "bugs", 
"bugged");

my $id = 1;
use constant QUERY => "SELECT bug_id FROM bugs WHERE bug_id = ";
my $sth = $dbh->prepare(QUERY . "?");
$dbh->prepare_cached(QUERY . "?");

cmpthese (20000, {
  'concat' => sub { my $res = $dbh->selectrow_array(QUERY . $id); },
  'prepare' => sub { my $res = $dbh->selectrow_array($sth, undef, $id);},
  'prepare_cached' => sub { my $sth = $dbh->prepare_cached(QUERY . "?");
                            my $res = $dbh->selectrow_array($sth, undef, $id);
                          },
  'prepare_nobind' => sub { my $sth = $dbh->prepare_cached(QUERY . "1");
                            my $res = $dbh->selectrow_array($sth);
                          },
  'SendSQL' => sub { my $sth = $dbh->prepare(QUERY . $id);
                     $sth->execute || die $dbh->errstr;
                     my @arr = $sth->fetchrow_array;
                     my $res = $arr[0];
                     $sth->finish;
                   },
});


More information about the developers mailing list