Pagination and efficiency

Sat May 12 14:20:30 UTC 2007

Hi all,

After getting that rant off my chest hopefully we can move onto
discussing useful things :)

Private conversation with Max follows, as I'd like to move it forward
on the list, and maybe start sending patches and fixing stuff.

>> Aaron :
>> I had a look at Bugzilla::Search and adding pagination should be
>> trivial - either by lifting a bunch of code from Data::Pager and
>> Maypole's templates, or by using the cpan module as is. Which brings
>> me to my next question.. presumably there is cookie for storing
>> session info in?

> Max :
> You could just make a new cookie for this particular session
> information. There is the Bugzilla_user (I think that's its name)
> cookie that handles session-based auth.

>> That could be useful in the case where we don't want
>> to repeatedly fetch every result from the db, but do want to know how
>> many results there are.. so the first query would not use a limit, but
>> only the first X records will be fetched, thereafter offset and limit
>> can be used, and the total number of results stored in a session or
>> something (with some simple bounds checking).

> Yeah, perhaps. We should also consider the problems with a
> result set changing underneath us, though.

That would be the same problem as the current one of the entire result
set being retrieved in one go, in fact it would improve, as each page
would be up to date, and the pagination can easily handle coping if
there are extra results fetched at the end.

The number of results/pages won't change much as people page through,
and can always be updated by fetching everything if the last result
was 'old'.

>> My other thoughts about efficiency make me think you could lose a lot
>> of the hashes, by  populating an iterator with fetchall_arrayref, and
>> using array based objects instead of hashes.

> I'm not sure the efficiency there would make up for the added
> inconvenience. We do use the object hashes to store non-DB data, also.

I don't there would be any inconvenience  - in fact switching to using
objects to represent data fetched from the db (even if they are simple
read-only blessed pseudohash)  would provide a facade allowing you to
use any type of object, such as RoseDB, DBIx::Class or CDBI later
without having to change much code.

The Class::DBI Iterator is fairly simple, and I believe I could create
something very simple based on it for bugzilla - again this would
provide a facade allowing you to use any similar iterators provided by
an ORM later.

>> You could also use
>> Tree::Binary::Dictionary for all the smaller hashes, and reduce the
>> size of the over-all hash space, and so save some memory.

> I haven't seen Tree::Binary::Dictionary, but that could help.

buglist appears to use a bunch of hashes that could contain anything
between a few and a few thousand entries, using slices of results or
an iterator would mean that bunch of hashes could each be small enough
to be replaced with Binary::Tree::Dictionary objects without loss of
performance, while reducing the need for memory.

>> Contrary to popular belief you *can* have a long running perl process
>> that doesn't hog memory - my scheduler is written in perl and runs for
>> weeks, never going above 30 MB, despite dealing with 000s of objects
>> and being 10s of 000s of LoC (excluding the framework and modules it
>> uses).

regards,

A.

-- 
http://www.aarontrevena.co.uk
LAMP System Integration, Development and Hosting