hype-free: Profile first!

Wednesday, July 01, 2009

Profile first!

I was using my code to parse a medium-sized CVS log and it was being slow (~1min). So, like an idiot I said: I know that the algorithm is quite slow, so I optimize it. Here is a version which is twice as fast as the original version:

   use strict;
use warnings;
use Test::More tests => 8;

is(cmp_cvs_tags('1.1', '1.1'),    0);
is(cmp_cvs_tags('1.1', '1.1.1'), -1);
is(cmp_cvs_tags('1.1', '1.2'),   -1);
is(cmp_cvs_tags('1.2', '1.2'),    0);
is(cmp_cvs_tags('1.2', '1.3'),   -1);

is(cmp_cvs_tags('1.1.1', '1.1'), 1);
is(cmp_cvs_tags('1.2',   '1.1'), 1);
is(cmp_cvs_tags('1.3',   '1.1'), 1);

sub cmp_cvs_tags {
  my ($a, $b) = @_;
  return 0 if ($a eq $b);
  
  # eliminate common prefix
  my ($alen, $blen) = (length($a), length($b));
  my $minlen = ($alen < $blen) ? $alen : $blen;
  for (my $i = 0; $i < $minlen; ++$i) {
    if (substr($a, $i, 1) ne substr($b, $i, 1)) {
      # first different character, cut until this point
      $a = substr($a, $i);
      $b = substr($b, $i);
      last;
    }
  }
  
  my @a_lst = split /\./, $a;
  my @b_lst = split /\./, $b;
  
  while (1) {
    my ($apart, $bpart) = (shift @a_lst, shift @b_lst);
    return -1 if (!defined $apart);
    return  1 if (!defined $bpart);
    next if ($apart == $bpart);
    return $apart <=> $bpart;
  } 
}

However, the whole script was still running slow. So I did which I should have done in the first place: run NYTProof on it. After looking at the reports I was clear that 99% of the time was being spent in the read loop. Dooh!