De Bugger
So now you know how Perl works. But what do you do when it doesn't? Well, the first thing to do is not blame perl. Perl does have some bugs and misfeatures, but it's extremely unlikely that you've found a new one that's not in the docs. Given the chances are it's you that's buggered up, how can you find out where the problem is?
Well, the zeroeth thing to do is make sure you don't make life difficult for yourself in the first place.
Plan your code before you start. Work out what you want to do, how you plan to achieve it, and then write bare-bones prototype code:
#!/usr/bin/perl
use strict;
use warnings;
my ( $input_file, $output_file ) = @ARGV;
my @fields = parse ( $input_file );
open my $OUTPUT, ">", $output_file;
foreach ( @fields ) { print $OUTPUT "$_\n" }
sub parse { print "Got to the parser\n"; }
You can flesh out these bare bones later, testing each new bit of functionality as you go. This is especially useful if you have many largely independent subroutines to write. It's easier to debug code when you know which block the mistake is in.
If you repeat a piece of code more than three lines long anywhere in your code, put it in a subroutine. Which would you rather: debugging one occurrence of a possible bug (and all code is a possible bug), or debugging eighty? The same applies to subroutines themselves. If you ever use a subroutine across more than one script, put it in a module. Really anal people may even want to put every subroutine they ever use in a module, although I don't.
In a similar vein, don't reinvent the wheel: check out CPAN and the standard perl distribution
before you write a program that copies files (File::Copy),
interfaces with a database (DBI::), or traverses directory
trees (File::Find). It's extremely unlikely you will do a
better job of handrolling of these functions yourself.
The other important thing is to make your code clean. Check out
perldoc perlstyle for Larry's preferences on formatting, but
be consistent no matter how you chose to write your code. Comment your
code with #comments, and document your code with POD. This
will make life easier for anyone using or modifying your code later,
which will probably include your own good self at some point. Also,
although perl's motto is TIMTOWTDI, chose the most appropriate Way! Which
of these would you prefer to debug?
(open A,"<$ARGV[0]")||die($!); ($a,$b,$c,$d,$e)=split/\//,<A>; print "$_\n" for($a,$b,$c,$d,$e);
or:
#!/usr/bin/perl
use strict;
use warnings;
my $input = shift;
open my $INPUT, '<', $input or die "Can't open '$input': $!\n";
print "$_\n" foreach split m{ / }x, <$INPUT>;
or even:
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
my $input = shift;
# get the input file
open my $INPUT, '<', $input or die "Can't open '$input': $!\n";
# open the input file
my $line = <$INPUT>;
# read a line from the input file
my @records = split m{ / }x, $line;
# split the records on colons
foreach my $record ( @records ) { print "$record\n"; }
# print them out \n delimited
These all do the same thing: the first is horrible: ugly formatting,
leaning toothpicks /\//, no strict, $a,
$b…where you really need an array (not to mention that
using $a and $b is a bad idea because of
sort), a nondescript A for a filehandle,
(blah)|| instead of low precedence blah or. The
third is far too über-careful for my taste: it's obvious what it does,
but note that I've used three lines of wankingly self-indulgent verbose
code when the second shows you can do the same thing just as clearly in
one. Well written code shouldn't need many comments: it's obvious already
what the third code does without echoing every line in English. Reserve
comments for nasty things like regexes, ugly-but-necessary constructs,
and commenting the gist of paragraphs of code. Verbosity isn't
necessarily a good thing. It's quite obvious what this does (read it
backwards from split to foreach):
print "$_\n"
foreach
reverse
sort { lc $a cmp lc $b }
grep { ! /^#/ }
split m{ / }x,
( "usr/bin/perl/#comment/blah" );
whereas:
my $string = "usr/bin/perl/#comment/blah";
my @splat = split m{ / }x, $string;
my @grepped = grep { ! /^#/ } @splat;
my @sorted = sort { lc $a cmp lc $b } @grepped;
my @reversed = reverse @sorted;
foreach my $item ( @reversed ) { print "$item\n"; }
Has rather more chances of bugging up, if only from misspellings.
The last thing to make your life easy is to ensure the first lines of any code you write look something like:
use strict; use warnings;
These will catch some of the commonest mistakes, like trying to write
to read-only filehandles, variables you only use once (probably
misspellings), and so on. You may also find use diagnostics;
helpful: it translates warnings into something more descriptive and gives
you ideas on how to fix stuff.
OK, so you've not made life difficult for yourself in the first place.
And it's still not working. What next? Well, if you've done as you're
told, you will know roughly where the cock up is. If not (or just to
check), sprinkle some print statements around liberally in
the general area:
my $var = "this";
if ( $var = "that" ) { print "TRUE\n"; }
TRUE
Oops. Sprinkle about print $suspect_variables:
my $var = "this";
if ( $var = "that" ) { print "$var\n"; print "TRUE\n"; }
that TRUE
Ah. We have commited Perl goof number 1:
Perl goofs
Equality
Getting = (assignment) and == (numerical
equality) mixed up:
$a = 20;
if ( $a = 2 ) { print "TRUE\n"; }
$a is assigned the value 2, which returns
the value 2, which isn't undef or 0, so it's
TRUE. D'oh! In a similar vein, don't get eq and
== mixed up, and don't get = and
=~ mixed up in regexes.
Sprinkling about print statements may be augmented by the
use of any of the following:
For things nastier than a single string, like objects or hashrefs:
use Data::Dumper; print Dumper( \$very_complex_data_structure );
For redirecting output (you could also use the module
D::Oh for this):
open STDOUT, '>', "stdout.txt" or die $!; open STDERR, '>', "stderr.txt" or die $!; print "output this"; warn "whinge about this";
For avoiding Error 500: You can't write Perl when playing
with CGI applications:
use CGI::Carp qw( fatalsToBrowser );
And, whatever you do with CGI, don't forget to print the header, either with the CGI module, or directly, with:
print "Content-type: text/html\n\n";
Besides messing up = and ==, some other
frequent cock-ups include:
Messing up pairs and semicolons
It's very easy to lose track of paired things like {} []
<> and (). Most text editors have a
brace-matching function that will help you find missing braces and
parentheses. Whether or not you've remembered to put a ; at
the end of every line is also a common source of problems.
Quotes are even better at this, quotes in the general sense of "
", <<"HEREDOC"; HEREDOC\n, / /,
s##!! , qx@@ , and tr||| . Don't
forget to escape quotes if you have to embed them. Furthermore, be
warned: perl may well seem to get the line wrong when it complains about
things like this, as it may not realise you've made a cock up till it's
too late:
$string = "forgot the quote at the end,; # so perl thinks the remaining lines are still string print qq:$_\n: for ( 1.. 10 ) # and forgot the semicolon here too print " and only now does it realise something bad has happened";
String found where operator expected at D:\Steve\perl\t.pl line 4,
at end of line (Missing semicolon on previous line?)
Can't find string terminator ' " ' anywhere before EOF at D:\Steve\perl\t.pl line 4.
Arrays and lists
Index from 0, not 1. Don't forget many Perl
operators will return different things in list or scalar context too,
splice, localtime, each and arrays
being common cases in point.
Input, output and modules
Printing to a closed filehandle is a silent error unless you use
warnings; and it's very easy to forget the >
in:
open my $FH, '>', $f; # for writing
As far as modules go, bugs in your own modules are your problem, deal
with them in the same way as using a script: write a small script that
uses a bit of functionality from the module, and make sure each bit works
individually. Other people's modules are usually quite well tested, but
beware that some modules don't work on all systems (they'll warn you when
you install them), and beware of using old scripts that may use old
versions of modules, and vice versa : there's no point in
writing a CGI script for HTML::Parser v3 when your ISP only
has v2.
Maths
Exponents are written like Fortran-esque **, not like M$
Excel-esque ^, and are more closely associated than unary
minus, hence -2**2 is -4, despite what maths
might say about the matter (have never understood this myself).
Precedence issues are also sometimes a problem: if in doubt, add
parentheses to make sure perl understands what you mean. If you're
feeling brave, you could even try:
perl -MO=Deparse -p script.pl
This uses the perl backend modules B and O
to parse your script into opcodes, then reassemble them into the code
that perl actually executes: it's instructive to run a script through the
deparser to see what perl is really doing. The -p gets perl
to put in parentheses so you can see the precedence explicitly.
And if all else fails
Now, if all this doesn't help, you still have all these places to try for help:
perldoc
perldoc or the HTML that comes with the ActiveState perl
distribution is your friend. The literature that comes with perl is
'extensive', so use it. Read The Fabulous Manual (RTFM)…
perldoc -f function_you_may_be_using_the_wrong_syntax_for perldoc perltrap
perltrap is the 'Traps for the unwary' documentation: the
above are the commonest problems for people such as me who came to perl
with no idea about other programming languages. If you are a hard-core C
programmer or Python hacker, your problems (how do I take an address?
What're all these braces for?) may be different.
perl.com
The perl website (well, one of them). This would be what is termed Searching The Frabjulous Web (STFW). Full of links to other helpful documentation, CPAN, and most importantly to:
perlmonks.org
The perlmonks' website is lovely: it has a FAQ for common 'how do I do this', a tutorial (like you'd need it. Pah!) and trawling through the archives is a good way of picking up tips. You can also post requests for help, which are almost always answered with grace and helpfulness, although remember to RTFM before you waste someone's time with a spurious question about why this:
$a = "foo"; print "TRUE" if $a = "bar";
doesn't work.
dev.perl.org
dev.perl.org is where to go for what's new (perl 6, parrot, perl 5.8.0, perl 1, etc.).
Some other things I've never done
I have never posted to comp.lang.perl, for fear of being
laughed at. I have never used the perl debugger (perl -d)
since sprinkling print about always seems to fix my
problems. Maybe I'm missing a trick. Feel free to do either of these
things!
Big things
Perl is a language for dirty little hacks, shell scripts and for
confusing the hell out of maintenance staff with all
Th@t_L!Ne_\n0i$e, or so I'm told. Hopefully, all the nagging
about use strict; POD documentation, comments, the
/x modifier for regexes and the importance of modules,
classes and debugging means that your dirtier hacks are not something
you'd want to release onto the world. Here are some things that may help
you if you move from writing little scripts for yourself that don't do
anything very important, to programming bigger, portable applications
that have to interact with users, databases, libraries, servers and other
programs. Various tiresome models of how you should program
things are bandied about (waterfalls and spirals and design patterns and
eXtreme Programming, etc.), but basically, programming anything
is basically some mixture of analysis (what's it supposed to do?), design
(how will I make it do it?), implementation (oh god, what's the syntax
for sprintf?), and testing (does the damn thing work after
all that?).
Larry Wall (creator of Perl), says the most important attributes of a programmer are laziness, impatience and hubris. That is not to say you should be bone-idle, stroppy and arrogant. Lazy means reducing overall effort on a project: that means writing it well and documenting it well, so you don't end up squandering hours trying to understand it again later. Impatient means not putting up with crap from the computer. Perl helps greatly with this: write it and run it rather than write it, compile it, link it, run it, swear at it, etc., etc. Do your bit too by making your scripts robust and responsive to users' needs. Although you may not want, or be able to justify, building in functionality that you're not going to need, it's still worth keeping an eye on the future extensibility of your code. Hubris means taking pride in your work. It's much better to write clean scripts that people like using and don't mind maintaining, rather than writing twisty, messy guff that eventually ends up as some unmaintainable but irremovable ball of mud with a cargo-cult built up around it ("we know it works, we just don't know how it works"). Would you rather people think you're a good programmer, or bitch behind your back about the evil spaghetti you've written that they now have to maintain. Anyway, here're a few things that you might like to aim for:
The user (no matter how computer un-savvy) should like it,
i.e. the interface is pretty, functional, forgiving and
functioning. If you're writing something for the computer
illiterate, write it so it aims at their level (this sounds like
teaching!): think web browser interface, not command line switches. Even
better, make it dual purpose, and make sure it puts up gracefully with
silly things like taking both / or \ as path
delimiters. Get it to print usage instructions, or even its
own POD, if it gets called wrongly:
unless ( defined $ARGV[0] )
{ system "perldoc $0" and die "Usage: $0 [-v] file\n" }
and provide sensible defaults:
use Getopt::Long; $DEFAULT_OUTPUT = "~/plops.txt"; my $output; GetOptions( "output=s" => \$output ); $output ||= $DEFAULT_OUTPUT;
if that's sensible. Make sure you get feedback about whether you are coding the right thing, and implementing the right interface, as you go along. You don't want to code a search engine if you're supposed to be programming a shopping cart. Clients are fickle, and probably don't really know what they want, so keep getting feedback as you implement your code: sometimes a trivial change in what the user wants will require significant refactoring of your code (or not, if you're lucky and conscientious).
As well as the user, the computing science types should like it too,
i.e. you could show it to Edsgar Dijkstra (may he rest in peace)
and he would find nothing harmful in it (besides the fact you're using
Perl and not Assembler). That means avoiding dirty hacks where possible,
no goto (obviously), well thought out, correct and
O(<N2) algorithms, and using appropriate tools for
appropriate tasks. If you're wondering about the O(N) thing
it's called 'big-O' notation, and it describes how long your
program takes to run as a function of how much data it has to manipulate.
Generally, any algorithm whose time of execution increases more rapidly
than the square of the number of data items is unusable. The thing to
look out for here is embedded loops: if you have anything that looks
like:
for my $first ( @all_the_data_items )
{
for my $second ( @all_the_data_items )
{
print "Match!\n" if $first eq $second;
}
}
you are skirting the borders of unusability: for every
($first) item in @all_the_items, you make a
comparison to every ($second) item in
@all_the_items. Hence you will make N2
comparisons, where N is scalar @all_the_items.
Hence, the script's time of execution will increase with the square of
N. Whatever you do, don't put another loop in the inner loop
that goes over all the data, or you'll likely be dead by the time the
script has dealt with more than ten thousand items (seriously: if it
takes 1 ms to do the innermost loop thing, it will take
(10000**3)/(1000*60*60*24*365) = 32 years). The perl module
Benchmark may help you decide on issues of speed and
optimisation, but 'make it work, make it right, then make it fast' is a
traditional warning against the dangers of premature optimisation.
Finally, your successor should like it, i.e. whoever ends up
maintaining the code knows how it works, why it works, and how to fix it
if it breaks or needs extending. Think consistent formatting, comments,
documentation, $VERSIONing,
$variables_that_mean_something, built-in debugging, tests
(preferably ones that can be configured to use a test arena where it
doesn't matter if your script accidentally deletes everything) that s/he
can run to check the thing still works if they have to alter anything,
and modularisation, i.e. using loosely coupled modules and
classes, avoiding global variables, and if there are a load of
configurations, putting them in a separate file, especially if they are
accessed by many scripts, etc.
A final, final word: testing. When you write a module
Blah using h2xs, it will provide you with a
test script Blah.t (or 1.t under older perls),
in the t directory, which looks something like this:
# Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl Blah.t'
use Test::More tests => 1;
BEGIN { use_ok('Blah') };
# Insert your test code below, the Test::More module is use()ed here so read
# its man page ( perldoc Test::More ) for help writing this test script.
Inside a test script, you can probe the output for known inputs and
check that your module is working correctly. For example, if
Blah exports a function called greet, you might
well like to check that:
use Test::More tests => 2;
# Note that you need to tell Test::More how many tests you intend on running
BEGIN { use_ok('Blah') };
# Checks that the module compiles correctly
ok( greet( "Einstein" ) eq "Hello, Mr. Einstein" );
# ok checks that the comparison made is TRUE.
The ok function from Test::More checks
whether the argument it is given is true or not. Although this sounds
somewhat trivial, if you ever come to write a gigantic module from Hell, and then decided
to rewrite (refactor) it from scratch, it's essential to know whether or
not your new version of the module behaves in the same way as the
previous version. If you write tests that cover the code sufficiently
well (checking every branch in the logic), then you will have a much
better idea of whether your refactored module can be a drop in
replacement. Even better, you will be told immediately (by running
make test) which tests are failing (and hopefully,
therefore, where the code is still broken). Computer programs are largely
black boxes to their users, and the main thing they are interested in is
not the neatness of the box's contents (which they'll never look at), but
the fact that no matter what's in the box, when you give it input A, it
always produces output B, and not a recipe for cheesecake or a segfault.
Even if you are writing a brand new module, it is worth writing the tests
as you go along (or if you're a real eXtreme Programming devotee, before
you start coding), so that as you tinker with new functions, you know you
are not buggering up the old ones you have so carefully coded minutes
before.
HTH.
