Rewriting improved gnats converter
Daniel Berlin
dberlin at dberlin.org
Sat Apr 12 18:31:46 UTC 2003
Just so no one else attempts it (or if anyone wants the WIP), I'm
rewriting my vastly improved gnats converter.
The current perl version is in gcc's cvs. See
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/wwwdocs/bugzilla/contrib/
gnats2bz.pl
I'm rewriting it in python, for a few reasons:
1. The perl memory leaks memory like a sieve, with no internally caused
memory leaks visible. I've run all the perl memory leak checkers, and
nothing is marked as leaking. Yet the process grows to roughly half
the size of the gnats db it's converting. For GCC's 600 meg gnats
database, this means it grows to 300 meg, quite quickly, and stays
around there. Every variable is undef'd when it's done, every file is
closed. Still, no dice. The python version just doesn't leak at all.
2. It is badly in need of cleanup, and it's hard to modularize/OOify
it in a nice way in perl.
The python version has two main classes, GNATSbug and Bugzillabug. It
builds the GNATSbug from a file, then creates Bugzillabug from it (The
BugzillaBug constructor does the conversion), then writes out the
Bugzillabug.
The perl version has all these pieces mixed in together.
3. The python version is actually 2x-3x faster (overall) than the perl
version (which was ~10x faster than the original gnats2bz.pl bugzilla
comes with) because it's 2x-3x faster (average) in parsing the GNATS
bugs. The code is the same in both versions (this part is a direct
copy/paste/convert) if you account for language syntax differences.
The gnats parsing is bounded by the speed of string concatenation in
both python and perl, and the python version is just faster at it.
One 21 meg PR takes 19 seconds in Perl to parse, and 2 seconds in
python.
The whole 600 meg, 10000 PR gnats db takes 3 minutes to convert with
the python script.
4. I'm a python person (have been for a long time), so i've just been
meaning to do this anyway for a while.
If anyone wants the WIP, let me know.
--Dan
More information about the developers
mailing list