Rewriting improved gnats converter

Daniel Berlin dberlin at dberlin.org
Sun Apr 13 01:57:22 UTC 2003



On Sat, 12 Apr 2003, J. Paul Reed wrote:

> On 12 Apr 2003 at 20:43:38, Daniel Berlin moved bits on my disk to say:
>
> > > Please 'splain.
> >
> > The re's run roughly the same speed in either language.
> > Besides the re's, the time is bounded by the time it takes
> > to add lines to multiline fields.
> > The python version is faster at this.
>
> Maybe I'm missing someting here; could you show an example of string
> concatenation in perl that is slower than when done in Python?

Wish i could, i can't get smallprof to work anymore, so i'm guessing.
1.  I know the re's run roughly the same speed, i ripped out the actual
storing of the stringscode (IE just left the re's), and the times are comparable for both
languages. The re's are only ever run on the input strings anyway in the
speed tester.
2.  The only other operation left is a huge number of string
concatenation, and a small number of string comparison, since it's all the
the parsing routine does.
3. The perl code to simply do re + storing takes a much larger amount of
memory in PERL for the one 21 meg PR.  Since the number of string
comparisons is too small to  affect anything, the only thing left is
string concatenation.

It could also be the overhead of perl doing type-f*cking and all that jazz
it does behind the scenes when nobody is looking, that python doesn't do.

It could also be the cache effects of whatever the hell perl is doing
allocating so much more memory + the larger overhead for most perl
operations is the difference.

Note I don't do string concatenation the same way in both languages,
however,
since python strings are immutable (It would be deadly to performance on
large PR's).
Generally, you append to a list in pieces and join the list together at
the end.
Or use the array module to get an array of char's which is mutable to do
this on.

IE
a = ""
for x in xrange(300000):
	a += 'b' * 80

will take many minutes
a = []
for x in xrange(300000):
	a.append('b' * 80)
string = "".join(a)

will take ~4 seconds

Not that any of this matters, the python code is just plain easier to
read, cleaner, and doesn't leak for no reason. The fact that it's faster
is just icing.


 >
> Later,
> Paul
> ------------------------------------------------------------------------
> J. Paul Reed -- 0xDF8708F8 || preed at sigkill.com || web.sigkill.com/preed
> To hold on to sanity too tight is insane.   -- Nick Falzone, Pushing Tin
>
> I use PGP; you should use PGP too... if only to piss off John Ashcroft
> ----
> To view or change your list settings, click here:
> <http://bugzilla.org/cgi-bin/mj_wwwusr?user=dberlin@dberlin.org>
>



More information about the developers mailing list