Tue, 19 Feb 2008
The Cost of Flexibility (and Cleanliness)
In the previously mentioned distributed computing project, I am trying to do something like the following code:
sub parse_file { my $fh = shift; while (my $parsed_data = nontrivial_get_data_from($fh)) { handle($parsed_data); } }
The nontrivial_get_data_from($fh)
code
is indeed non-trivial (in the terms of lines of code, not necessarily
in the terms of CPU time), while handle($parsed_data)
is pretty straightforward. Now the problem is that I want to use this
non-trivial code with different handle($parsed_data)
routines (for example, printing out the $parsed_data
for
testing purposes). A natural way would be to implement a pure virtual
class in which the $self->handle($parsed_data)
routine
would be called inside the parse_file()
method, and which
the programmer would subclass, providing different $self->handle()
implementations.
I have found that using a subclassed method $self->handle()
instead of putting the handling code directly into parse_file()
costs about 14 % of time (the dirty inlined code took 35 seconds on the
test data set, while the nice and clean subclassed one took 40 seconds).
So, my dear Perl gurus, how would you implement this? I need to call different
code in the innermost loop of the program, and just factoring it out
into the subroutine (or a virtual method) costs me about 14 % of time.
Maybe some clever eval { }
and precompiling different instances
of parse_file()
? In fact I don't really need the flexibility
of objects: I need only a single implementation of
handle($parsed_data)
in a single program run, but I want to
be able to use a different handle()
code with the
same parse_file()
code base called from different programs.
6 replies for this story:
Adelton wrote:
How about taking reference to that handle function and passing it to the parsing function as an argument. That way you'd still have the liberty of using different handles, while avoiding the ISA method resolution. Hmmm. I remember that back in 2000, Andreas K. was solving similar problem (slow method dispatching) with Apache::HeavyCGI. And of course, there is DBI as an example of module which avoids the class inheritance and method dispatching for speed reasons.
Yenya wrote: Re: Adelton
As far as I can tell, the problem persits even when I use the handle() code as a static function. Just factoring out the handling code into a separate function instead of writing it directly to the parse_file() causes this 14% slowdown. Object inheritance here does not have measurable overhead against a function call.
Miroslav Suchý wrote: Param length
How big is $parsed_data? Can you use reference instead? $a = 'some data' x 10000; for (1..1000000) { handle($a); } This take 27 sec, whereas: for (1..1000000) { handle(\$a); } last only one second.
errhm..hmm..yesihavesome wrote: I don't want to troll, but..
FOURTEEN PRECENT? Buy a faster CPU, and you're done.
Yenya wrote:
Well, 14 % in a single part of the code (and ignoring it) can easily lead into ten percent here, another ten percent there, and the system would be unbearably slow. And there is an upper limit of how fast CPU you can buy (not to mention other limits, such as memory bandwidth).
errhm..hmm..yesihavesome wrote: I don't want to troll, but..
Don'ŧ worry. 10% here + 10% there = 10% overall.