Lesson 11

De Bugger

So now you know how Perl works. But what do you do when it doesn't? Well, the first thing to do is not blame perl. Perl does have some bugs and misfeatures, but it's extremely unlikely that you've found a new one that's not in the docs. Given the chances are it's you that's buggered up, how can you find out where the problem is?

Well, the zeroeth thing to do is make sure you don't make life difficult for yourself in the first place.

Plan your code before you start. Work out what you want to do, how you plan to achieve it, and then write bare-bones prototype code:

#!/usr/bin/perl
use strict;
use warnings;
my ( $input_file, $output_file ) = @ARGV;
my @fields = parse ( $input_file );
open my $OUTPUT, ">", $output_file;
foreach ( @fields ) { print $OUTPUT "$_\n" }
sub parse { print "Got to the parser\n"; }

You can flesh out these bare bones later, testing each new bit of functionality as you go. This is especially useful if you have many largely independent subroutines to write. It's easier to debug code when you know which block the mistake is in.

If you repeat a piece of code more than three lines long anywhere in your code, put it in a subroutine. Which would you rather: debugging one occurrence of a possible bug (and all code is a possible bug), or debugging eighty? The same applies to subroutines themselves. If you ever use a subroutine across more than one script, put it in a module. Really anal people may even want to put every subroutine they ever use in a module, although I don't.

In a similar vein, don't reinvent the wheel: check out CPAN and the standard perl distribution before you write a program that copies files (File::Copy), interfaces with a database (DBI::), or traverses directory trees (File::Find). It's extremely unlikely you will do a better job of handrolling of these functions yourself.

The other important thing is to make your code clean. Check out perldoc perlstyle for Larry's preferences on formatting, but be consistent no matter how you chose to write your code. Comment your code with #comments, and document your code with POD. This will make life easier for anyone using or modifying your code later, which will probably include your own good self at some point. Also, although perl's motto is TIMTOWTDI, chose the most appropriate Way! Which of these would you prefer to debug?

(open A,"<$ARGV[0]")||die($!);
($a,$b,$c,$d,$e)=split/\//,<A>;
print "$_\n" for($a,$b,$c,$d,$e);

or:

#!/usr/bin/perl
use strict;
use warnings;
my $input = shift;
open my $INPUT, '<', $input or die "Can't open '$input': $!\n";
print "$_\n" foreach split m{ / }x, <$INPUT>;

or even:

#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
my $input = shift; 
    # get the input file
open my $INPUT, '<', $input or die "Can't open '$input': $!\n";
    # open the input file
my $line = <$INPUT>; 
    # read a line from the input file
my @records = split m{ / }x, $line; 
    # split the records on colons
foreach my $record ( @records ) { print "$record\n"; }
    # print them out \n delimited

These all do the same thing: the first is horrible: ugly formatting, leaning toothpicks /\//, no strict, $a, $b…where you really need an array (not to mention that using $a and $b is a bad idea because of sort), a nondescript A for a filehandle, (blah)|| instead of low precedence blah or. The third is far too über-careful for my taste: it's obvious what it does, but note that I've used three lines of wankingly self-indulgent verbose code when the second shows you can do the same thing just as clearly in one. Well written code shouldn't need many comments: it's obvious already what the third code does without echoing every line in English. Reserve comments for nasty things like regexes, ugly-but-necessary constructs, and commenting the gist of paragraphs of code. Verbosity isn't necessarily a good thing. It's quite obvious what this does (read it backwards from split to foreach):

print "$_\n"
  foreach
    reverse
      sort { lc $a cmp lc $b }
        grep { ! /^#/ }
          split m{ / }x,
            ( "usr/bin/perl/#comment/blah" );

whereas:

my $string   = "usr/bin/perl/#comment/blah";
my @splat    = split m{ / }x, $string;
my @grepped  = grep { ! /^#/ } @splat;
my @sorted   = sort { lc $a cmp lc $b } @grepped;
my @reversed = reverse @sorted;
foreach my $item ( @reversed ) { print "$item\n"; }

Has rather more chances of bugging up, if only from misspellings.

The last thing to make your life easy is to ensure the first lines of any code you write look something like:

use strict;
use warnings;

These will catch some of the commonest mistakes, like trying to write to read-only filehandles, variables you only use once (probably misspellings), and so on. You may also find use diagnostics; helpful: it translates warnings into something more descriptive and gives you ideas on how to fix stuff.

OK, so you've not made life difficult for yourself in the first place. And it's still not working. What next? Well, if you've done as you're told, you will know roughly where the cock up is. If not (or just to check), sprinkle some print statements around liberally in the general area:

my $var = "this";
if ( $var = "that" ) { print "TRUE\n"; }
TRUE

Oops. Sprinkle about print $suspect_variables:

my $var = "this";
if ( $var = "that" ) { print "$var\n"; print "TRUE\n"; }
that
TRUE

Ah. We have commited Perl goof number 1:

Perl goofs

Equality

Getting = (assignment) and == (numerical equality) mixed up:

$a = 20;
if ( $a = 2 ) { print "TRUE\n"; }

$a is assigned the value 2, which returns the value 2, which isn't undef or 0, so it's TRUE. D'oh! In a similar vein, don't get eq and == mixed up, and don't get = and =~ mixed up in regexes.

Sprinkling about print statements may be augmented by the use of any of the following:

For things nastier than a single string, like objects or hashrefs:

use Data::Dumper;
print Dumper( \$very_complex_data_structure );

For redirecting output (you could also use the module D::Oh for this):

open STDOUT, '>', "stdout.txt" or die $!;
open STDERR, '>', "stderr.txt" or die $!;
print "output this";
warn "whinge about this";

For avoiding Error 500: You can't write Perl when playing with CGI applications:

use CGI::Carp qw( fatalsToBrowser );

And, whatever you do with CGI, don't forget to print the header, either with the CGI module, or directly, with:

print "Content-type: text/html\n\n";

Besides messing up = and ==, some other frequent cock-ups include:

Messing up pairs and semicolons

It's very easy to lose track of paired things like {} [] <> and (). Most text editors have a brace-matching function that will help you find missing braces and parentheses. Whether or not you've remembered to put a ; at the end of every line is also a common source of problems.

Quotes are even better at this, quotes in the general sense of " ", <<"HEREDOC"; HEREDOC\n, / /, s##!! , qx@@ , and tr||| . Don't forget to escape quotes if you have to embed them. Furthermore, be warned: perl may well seem to get the line wrong when it complains about things like this, as it may not realise you've made a cock up till it's too late:

$string = "forgot the quote at the end,;
# so perl thinks the remaining lines are still string
print qq:$_\n: for ( 1.. 10 ) # and forgot the semicolon here too
print " and only now does it realise something bad has happened";
String found where operator expected at D:\Steve\perl\t.pl line 4, 
    at end of line (Missing semicolon on previous line?)
Can't find string terminator ' " ' anywhere before EOF at D:\Steve\perl\t.pl line 4.

Arrays and lists

Index from 0, not 1. Don't forget many Perl operators will return different things in list or scalar context too, splice, localtime, each and arrays being common cases in point.

Input, output and modules

Printing to a closed filehandle is a silent error unless you use warnings; and it's very easy to forget the > in:

open my $FH, '>', $f; # for writing

As far as modules go, bugs in your own modules are your problem, deal with them in the same way as using a script: write a small script that uses a bit of functionality from the module, and make sure each bit works individually. Other people's modules are usually quite well tested, but beware that some modules don't work on all systems (they'll warn you when you install them), and beware of using old scripts that may use old versions of modules, and vice versa : there's no point in writing a CGI script for HTML::Parser v3 when your ISP only has v2.

Maths

Exponents are written like Fortran-esque **, not like M$ Excel-esque ^, and are more closely associated than unary minus, hence -2**2 is -4, despite what maths might say about the matter (have never understood this myself). Precedence issues are also sometimes a problem: if in doubt, add parentheses to make sure perl understands what you mean. If you're feeling brave, you could even try:

perl -MO=Deparse -p script.pl

This uses the perl backend modules B and O to parse your script into opcodes, then reassemble them into the code that perl actually executes: it's instructive to run a script through the deparser to see what perl is really doing. The -p gets perl to put in parentheses so you can see the precedence explicitly.

And if all else fails

Now, if all this doesn't help, you still have all these places to try for help:

perldoc

perldoc or the HTML that comes with the ActiveState perl distribution is your friend. The literature that comes with perl is 'extensive', so use it. Read The Fabulous Manual (RTFM)…

perldoc -f function_you_may_be_using_the_wrong_syntax_for
perldoc perltrap

perltrap is the 'Traps for the unwary' documentation: the above are the commonest problems for people such as me who came to perl with no idea about other programming languages. If you are a hard-core C programmer or Python hacker, your problems (how do I take an address? What're all these braces for?) may be different.

perl.com

The perl website (well, one of them). This would be what is termed Searching The Frabjulous Web (STFW). Full of links to other helpful documentation, CPAN, and most importantly to:

perlmonks.org

The perlmonks' website is lovely: it has a FAQ for common 'how do I do this', a tutorial (like you'd need it. Pah!) and trawling through the archives is a good way of picking up tips. You can also post requests for help, which are almost always answered with grace and helpfulness, although remember to RTFM before you waste someone's time with a spurious question about why this:

$a = "foo"; print "TRUE" if $a = "bar";

doesn't work.

dev.perl.org

dev.perl.org is where to go for what's new (perl 6, parrot, perl 5.8.0, perl 1, etc.).

Some other things I've never done

I have never posted to comp.lang.perl, for fear of being laughed at. I have never used the perl debugger (perl -d) since sprinkling print about always seems to fix my problems. Maybe I'm missing a trick. Feel free to do either of these things!

Big things

Perl is a language for dirty little hacks, shell scripts and for confusing the hell out of maintenance staff with all Th@t_L!Ne_\n0i$e, or so I'm told. Hopefully, all the nagging about use strict; POD documentation, comments, the /x modifier for regexes and the importance of modules, classes and debugging means that your dirtier hacks are not something you'd want to release onto the world. Here are some things that may help you if you move from writing little scripts for yourself that don't do anything very important, to programming bigger, portable applications that have to interact with users, databases, libraries, servers and other programs. Various tiresome models of how you should program things are bandied about (waterfalls and spirals and design patterns and eXtreme Programming, etc.), but basically, programming anything is basically some mixture of analysis (what's it supposed to do?), design (how will I make it do it?), implementation (oh god, what's the syntax for sprintf?), and testing (does the damn thing work after all that?).

Larry Wall (creator of Perl), says the most important attributes of a programmer are laziness, impatience and hubris. That is not to say you should be bone-idle, stroppy and arrogant. Lazy means reducing overall effort on a project: that means writing it well and documenting it well, so you don't end up squandering hours trying to understand it again later. Impatient means not putting up with crap from the computer. Perl helps greatly with this: write it and run it rather than write it, compile it, link it, run it, swear at it, etc., etc. Do your bit too by making your scripts robust and responsive to users' needs. Although you may not want, or be able to justify, building in functionality that you're not going to need, it's still worth keeping an eye on the future extensibility of your code. Hubris means taking pride in your work. It's much better to write clean scripts that people like using and don't mind maintaining, rather than writing twisty, messy guff that eventually ends up as some unmaintainable but irremovable ball of mud with a cargo-cult built up around it ("we know it works, we just don't know how it works"). Would you rather people think you're a good programmer, or bitch behind your back about the evil spaghetti you've written that they now have to maintain. Anyway, here're a few things that you might like to aim for:

The user (no matter how computer un-savvy) should like it, i.e. the interface is pretty, functional, forgiving and functioning. If you're writing something for the computer illiterate, write it so it aims at their level (this sounds like teaching!): think web browser interface, not command line switches. Even better, make it dual purpose, and make sure it puts up gracefully with silly things like taking both / or \ as path delimiters. Get it to print usage instructions, or even its own POD, if it gets called wrongly:

unless ( defined $ARGV[0] )
    { system "perldoc $0" and die "Usage: $0 [-v] file\n" }

and provide sensible defaults:

use Getopt::Long;
$DEFAULT_OUTPUT = "~/plops.txt";
my $output;
GetOptions( "output=s" => \$output );
$output ||= $DEFAULT_OUTPUT;

if that's sensible. Make sure you get feedback about whether you are coding the right thing, and implementing the right interface, as you go along. You don't want to code a search engine if you're supposed to be programming a shopping cart. Clients are fickle, and probably don't really know what they want, so keep getting feedback as you implement your code: sometimes a trivial change in what the user wants will require significant refactoring of your code (or not, if you're lucky and conscientious).

As well as the user, the computing science types should like it too, i.e. you could show it to Edsgar Dijkstra (may he rest in peace) and he would find nothing harmful in it (besides the fact you're using Perl and not Assembler). That means avoiding dirty hacks where possible, no goto (obviously), well thought out, correct and O(<N2) algorithms, and using appropriate tools for appropriate tasks. If you're wondering about the O(N) thing it's called 'big-O' notation, and it describes how long your program takes to run as a function of how much data it has to manipulate. Generally, any algorithm whose time of execution increases more rapidly than the square of the number of data items is unusable. The thing to look out for here is embedded loops: if you have anything that looks like:

for my $first ( @all_the_data_items )
{
    for my $second ( @all_the_data_items )
    {
        print "Match!\n" if $first eq $second;
    }
}

you are skirting the borders of unusability: for every ($first) item in @all_the_items, you make a comparison to every ($second) item in @all_the_items. Hence you will make N2 comparisons, where N is scalar @all_the_items. Hence, the script's time of execution will increase with the square of N. Whatever you do, don't put another loop in the inner loop that goes over all the data, or you'll likely be dead by the time the script has dealt with more than ten thousand items (seriously: if it takes 1 ms to do the innermost loop thing, it will take (10000**3)/(1000*60*60*24*365) = 32 years). The perl module Benchmark may help you decide on issues of speed and optimisation, but 'make it work, make it right, then make it fast' is a traditional warning against the dangers of premature optimisation.

Finally, your successor should like it, i.e. whoever ends up maintaining the code knows how it works, why it works, and how to fix it if it breaks or needs extending. Think consistent formatting, comments, documentation, $VERSIONing, $variables_that_mean_something, built-in debugging, tests (preferably ones that can be configured to use a test arena where it doesn't matter if your script accidentally deletes everything) that s/he can run to check the thing still works if they have to alter anything, and modularisation, i.e. using loosely coupled modules and classes, avoiding global variables, and if there are a load of configurations, putting them in a separate file, especially if they are accessed by many scripts, etc.

A final, final word: testing. When you write a module Blah using h2xs, it will provide you with a test script Blah.t (or 1.t under older perls), in the t directory, which looks something like this:

# Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl Blah.t'
use Test::More tests => 1;
BEGIN { use_ok('Blah') };
# Insert your test code below, the Test::More module is use()ed here so read
# its man page ( perldoc Test::More ) for help writing this test script.

Inside a test script, you can probe the output for known inputs and check that your module is working correctly. For example, if Blah exports a function called greet, you might well like to check that:

use Test::More tests => 2;
    # Note that you need to tell Test::More how many tests you intend on running
BEGIN { use_ok('Blah') };
    # Checks that the module compiles correctly
ok( greet( "Einstein" ) eq "Hello, Mr. Einstein" );
    # ok checks that the comparison made is TRUE.

The ok function from Test::More checks whether the argument it is given is true or not. Although this sounds somewhat trivial, if you ever come to write a gigantic module from Hell, and then decided to rewrite (refactor) it from scratch, it's essential to know whether or not your new version of the module behaves in the same way as the previous version. If you write tests that cover the code sufficiently well (checking every branch in the logic), then you will have a much better idea of whether your refactored module can be a drop in replacement. Even better, you will be told immediately (by running make test) which tests are failing (and hopefully, therefore, where the code is still broken). Computer programs are largely black boxes to their users, and the main thing they are interested in is not the neatness of the box's contents (which they'll never look at), but the fact that no matter what's in the box, when you give it input A, it always produces output B, and not a recipe for cheesecake or a segfault. Even if you are writing a brand new module, it is worth writing the tests as you go along (or if you're a real eXtreme Programming devotee, before you start coding), so that as you tinker with new functions, you know you are not buggering up the old ones you have so carefully coded minutes before.

HTH.

Next…