Bondage, discipline and subroutines
You may (although it's unlikely) have noticed a little thing I slipped
in the last script: the keyword my in the
chomp. my is a very important keyword, although
it doesn't seem to make any difference if you delete it and run the
program. What my does is pin a variable to a particular part
of your program, so that it can't be seen from elsewhere. This may not
seem very useful at the moment, but is exceedingly important as your
programs get bigger. Such as here:
#!/usr/bin/perl
use strict;
use warnings;
my @peas = qw/chick mushy split/;
while ( my $type = pop @peas )
{
print "$type peas are ", flavour( $type ), ".\n";
}
sub flavour
{
my $query = shift @_;
my @peas = qw/chick garbanzo/;
foreach ( @peas )
{
if ( $query eq $_ )
{
return "delicious";
}
}
return "disgusting";
}
Many new things, we'll take it a bit at a time. Most Perl tutorial's
I've read leave my until the very end, but it's not really
very difficult, and in the interests of getting you into good habits
early, we'll take it on now. I've read scripts written for the servers at
my university that don't use my, which makes me worry about
how well the scripts are coded in other ways. The first way to write well
behaved scripts is to bung this at the top:
use strict;
This turns on perl's bondage and discipline mode. In
strict mode, if you do not use my (or its big
brother, our) on all variables (and therefore
safely pin them down to particular bits of your code), perl will barf.
Why should you want bondage and discipline? Why should you want to pin
variables down to specific places? Well, on little throwaway scripts, you
might not, and it's fine not to bother. But on big things, with lots of
user defined functions (subroutines), it's essential. We'll get onto
exactly what my does in a little
while.
The next part of the code goes:
my @peas = qw/chick mushy split/;
i.e. create an array called @peas containing the
obvious items. Note the random choice of quoting characters, and ignore
the my for the second. Then:
while ( my $type = pop @peas )
{ print "$type peas are ", flavour( $type ), ".\n"; }
Three new things here, the while loop, the
pop and the flavour(). We'll take these in
turn. while is another loop control, like for
and foreach. It has the
general form:
while ( THIS_IS_TRUE ) { DO_SOMETHING; }
So when is:
my $type = pop @peas
TRUE then? Well, perl considers anything apart from
undefined variables, and the number zero as TRUE.
pop is an array operator, which pulls the last member out of
an array and returns it (shortening the array by one). Here the popped
member is captured each time into the variable $type. Since
"chick", "mushy" and "split" are
not the number zero, and are most clearly defined as
something, $type is TRUE until perl tries to
pop a non-existent, undefined, fourth item out
of the array, whereupon the loop exits. Which is all very obvious
really:
while ( there are still things to pop out of the array ) { DO_SOMETHING; }
So all this loop does is iterate over the array, just like
foreach, but empties the array from the end in so doing.
Perl has several other sorts of loop, in addition to while,
for and foreach loops. This one should be
fairly obvious too:
until ( THIS_IS_TRUE ) { DO_SOMETHING; }
We'll get onto loop control (exiting loops prematurely) later.
Perl has plenty of types of loop.
It also has plenty of array manipulators. As you now know,
pop will pop out the last member of an array.
If you want to pull values out of the front end, you'll need
shift, which returns the first member of an array,
shortening the array by one from the front. If you want to add things to
an array, you'll want to use push or unshift,
which add things to the end or beginning of an array respectively. For
example:
#!/usr/bin/perl
use warnings;
@peas = ( "chick", "mushy", "split" );
print "\@peas contains ( @peas ).\n";
$foo = pop @peas;
# $foo contains "split", @peas now contains ("chick", "mushy")
print "$foo was popped, ( @peas ) are left in \@peas.\n";
$bar = shift @peas;
# $bar contains "chick", @peas now contains just ("mushy")
print "$bar was shifted, ( @peas ) is left in \@peas.\n";
push @peas, "garbanzo";
# @peas now contains ("mushy", "garbanzo")
print "garbanzo was pushed, now \@peas contains ( @peas ).\n";
unshift @peas, "marrowfat";
# @peas now contains ("marrowfat", "mushy", "garbanzo")
print "marrowfat was unshifted, now \@peas contains ( @peas ).\n";
push @peas, $foo, $bar;
# @peans now contains ("marrowfat", "mushy", "garbanzo", "split", "chick")
print "( $foo $bar ) were pushed, now \@peas contains ( @peas ).\n";
@peas contains ( chick mushy split ). split was popped, ( chick mushy ) are left in @peas. chick was shifted, ( mushy ) is left in @peas. garbanzo was pushed, now @peas contains ( mushy garbanzo ). marrowfat was unshifted, now @peas contains ( marrowfat mushy garbanzo ). ( split chick ) were pushed, now @peas contains ( marrowfat mushy garbanzo split chick ).
push and unshift are list operators, and
will add an entire list of things to the array. Bearing in mind an array
is just a posh sort of list:
#!/usr/bin/perl use warnings; @peas = ( "chick", "mushy", "split" ); @beans = ( "adzuki", "haricot", "mung" ); push @peas, @beans, "and this too"; print "@peas\n";
chick mushy split adzuki haricot mung and this too
will shove the entire contents of @beans onto the end of
@peas, followed by the string "and this
too".
The least popular array operator is
splice. Although splice can do everything
pop, push, shift and
unshift can do and more, it has a rather difficult
syntax.
splice @ARRAY, START_INDEX, THIS_MANY, LIST;
will remove THIS_MANY items starting from START_INDEX, and replace
them with the contents of LIST. Incidentally, splice is one
of the context sensitive operators: in list context, it will return all
the spliced out items, but if you call it in scalar context, it returns
just the last item removed from the array, rather than the whole list of
them. So:
@all_removed = splice ...; #list context, because there's an @rray to capture what splice returns $last_one_removed = splice ...; #scalar context, because there's only a $calar to capture the output of splice
THIS_MANY and LIST are optional, defaulting to 1 and undefined
(undef) respectively.
pop @things;
and
splice( @things, -1, 1, undef );
mean the same thing: both remove a single item (1): the
last (-1) member of an array (@things), and
replace it with nothing (which is called undef in Perl).
pop is more intuitive though. Another useful array operator
is reverse:
@backward_peas = reverse @peas;
reverse leaves @peas itself
unchanged, but returns the array in reversed order, here to be captured
in @reversed. If you want to reverse an array
in situ, use:
@array = reverse @array.
Note that some of these operators will only work on arrays, but not on
lists. The distinction between an array and a list is similar to that
between a scalar and a value: an array is something you can name, like
@bits, whereas a list is just a comma-separated list of
values in a script. Likewise, $that is a scalar, but
'this' is just a value.
You can slice lists in the same way as you slice arrays:
my @bits = ( 'this', 'is', 'a', 'list', 'not', 'an', 'array' )[ 0 .. 1, 5 .. 6 ]; print "@bits";
However, you cannot pop a list:
my $word = pop ( 'this', 'is', 'a', 'list', 'not', 'an', 'array' ); print $word;
Type of arg 1 to pop must be array (not list). Execution aborted due to compilation errors.
The reason for this is that although it makes sense that you can
slice, or even reverse a list:
print reverse ( qw( t s i l ) );
you cannot remove the last item from a list, because a list is not a
variable: to pop a value from the list would be equivalent
to taking an eraser to the text of your script, and that is
nonsensical.
Giving something back
Anyway, back to the point. The only other new thing in the code we were examining:
while ( my $type = pop @peas )
{ print "$type peas are ", flavour( $type ), ".\n"; }
is the function flavour(). Although Perl has some
bizarrely named operators (like chomp, pop,
getgrent and dump), flavour is not
amongst them. flavour() is a user defined function, or
subroutine, which is the next thing to look at. To create a
subroutine you need to write something like:
sub NAME { DO_SOMETHING; }
And to call it, you simply need to write
NAME( ARGUMENT_LIST );
The flavour subroutine is called by the body of the
program to determine how the three peas of interest taste. Subroutines
frequently need to return things to the main part of the
program: in this case, flavour() returns what the subroutine
thinks about certain sorts of pea. So let's look at how
flavour() does this:
sub flavour
{
my $query = shift @_;
my @peas = qw/chick garbanzo/;
foreach ( @peas )
{
if ( $query eq $_ )
{
return "delicious";
}
}
return "disgusting";
}
Now, the first new thing here is another of perl's infamous
punctuation variables, @_. @_
contains a list of all the arguments passed to the subroutine, in this
case, whatever the value of $type was when the subroutine
was called in the body of the program. For the sake of argument, let's
say this is "chick". @_ is just an array, so
shift will pull the first member out as it would with any
array. So $query will end up containing
"chick". Like $_, @_ is assumed by
certain operators: in a subroutine, shift will assume
@_ if you don't tell it otherwise, hence:
sub blah { $arg = shift @_; }
sub blah { $arg = shift; }
sub blah { ( $arg ) = @_; }
are more-or-less
equivalent. I always use the last one, since it's easier to add extra
arguments later. In the last one, we have assigned @_ to a
[one item long] list (in parentheses):
( $name, $date, $error, @other_things ) = @_; ( $arg ) = @_;
which allows you to refer to the arguments with pretty names, rather than the perfectly valid, but rather painful:
$_[0]; $_[1]; ...
Note that you can't just say:
$arg = @_;
if there's only one argument, since the $arg forces scalar context, as we've seen before,
and arrays tell you how big they are, not what's in them
in this context. The parentheses are required, unless (of course), you
actually want to know how many arguments were passed,
rather than what arguments were passed. Which is unlikely.
The subroutine flavour()
defines a list of peas ("chick" and
"garbanzo"), called @peas. And
this is where my comes in.
flavour's @peas has exactly the same
name as the @peas in the main body of the program. How is
perl supposed to know the difference? What
my does is prevent the @peas
in the subroutine from trashing the @peas in the main body
of the program. Try this out:
#!/usr/bin/perl
use warnings;
@peas = qw/chick mushy/;
# The body of the program contains an array called @peas
print "In the body of the program, \@peas contains @peas.\n";
trasher();
# Call the subroutine, no need for arguments
print "Oh dear, it appears that \@peas in the body of the program has been trashed, "
. "and now contains @peas.\n";
print "This is because \@peas in the subroutine overwrites the \@peas in main.\n";
sub trasher
{
@peas = qw/petit-pois yellow-gram/;
# Because we haven't pinned this @peas down with 'my',
# it refers to the same array as that in the body of the program
print "In the subroutine trasher, \@peas contains @peas.\n";
}
In the body of the program, @peas contains chick mushy. In the subroutine trasher, @peas contains petit-pois yellow-gram. Oh dear, it appears that @peas in the body of the program has been trashed, " . "and now contains petit-pois yellow-gram.
And note that without the my to pin down the two separate
@peas to their proper places, subroutines have free reign to
overwrite variables in the body of the program. This is a Bad Thing:
subroutines can change the value of variables in the body of the
program, but that doesn't mean they should be allowed to! In
general, a good subroutine is a black box: you feed it values, and it
feeds values back. That way, people can use your subroutines and
functions (as they would if you packaged them up into a nice module), without worrying what they might do to the
variables in their program, or indeed, what their program might do to
yours. Sometimes, you really will want a subroutine to change a
'global' variable, that is one in the body of a program, but more often
than not, you don't, and my is the way to stop it, thus:
#!/usr/bin/perl
use warnings;
@peas = qw/chick mushy/;
print "In the body of the program, \@peas contains @peas.\n";
well_behaved( );
print "Using my, we have avoided trashing \@peas in the body of the program\n";
print "\tIt still contains @peas.\n";
sub well_behaved
{
my @peas = qw/petit-pois yellow-gram/;
print "In the subroutine well_behaved, \@peas contains its own values, @peas.\n";
}
In the body of the program, @peas contains chick mushy.
In the subroutine well_behaved, @peas contains its own values, petit-pois yellow-gram.
Using my, we have avoided trashing @peas in the body of the program
It still contains chick mushy.
So what exactly does my do? It stops a variable being
visible outside the block in which it is created
(declared). Blocks are things enclosed in { }
braces:
BODY OF PROGRAM HERE
START OF OUTER BLOCK {
OUTER BLOCK'S SCOPE EXTENDS FROM HERE
start of inner block {
inner block's scope
} end of inner block
TO HERE AND INCLUDES THE INNER BLOCK'S SCOPE TOO
} END OF OUTER BLOCK
The 'scope' is basically what is enclosed in a block. If you created a
my variable in the inner block, only things in the scope of
the inner block could see it. The outer block would not
be able to see it (or trash it) at all. If you created a my
variable in the outer block, only things in the outer block's scope could
see it (but this happens to include the inner block too!). The
BODY OF PROGRAM couldn't see either. A subroutine is just a particular
case of this:
BODY OF PROGRAM HERE
START OF SUBROUTINE BLOCK {
SUBROUTINE'S SCOPE EXTENDS FROM HERE
start of inner block {
inner block's scope
} end of inner block
TO HERE AND INCLUDES THE INNER BLOCK'S SCOPE TOO
} END OF SUBROUTINE BLOCK
So the @peas declared in the subroutine
well_behaved() is only visible (and is the first variable of
that name that is visible) within the braces that surround the
subroutine:
sub well_behaved
{
my @peas = qw/petit-pois yellow-gram/;
print "In the subroutine thing, \@peas contains @peas.\n";
}
Outside this italic 'scope', my @peas is invisible, to
both the body of the program, and to any other subroutines you might
create. A my variable is only visible from the place
it's created to the end of the innermost enclosing block. There
a few quasi-exceptions to this:
foreach my $pea ( @peas ) { print $pea; }
DWIMs: the $pea belongs to the inner block, the rest of
the program can't see it, even though it seems to be declared in the
scope of the program, not the foreach block. This is a Good
Thing. One thing to be careful of is if you want to use a loop to stuff
things into a my variable:
foreach ( @a ) { my @b; push @b, $_; } # WRONG
my @b; foreach ( @a ) { push @b, $_; } # RIGHT
The first one will create a new
@b on each pass of the loop, and when the loop exits,
@b goes out of scope, so you can't see it anyway! Waste of
time. Use the second one. While we're on the subject of
foreach loops, you should know that the loop variable stands
for the actual variable from the list you're looping over, so mucking
with it will muck with the original list:
#!/usr/bin/perl
use warnings;
my @bits = qw/ b c m t /;
print "@bits\n";
foreach my $bit ( @bits ) { $bit .= "ap" };
print "@bits\n";
b c m cr bap cap map crap
To be very good, and to allow the program to pass with use
strict; we must also put my on variables in the
body of the program. These will still be visible to
subroutines (since the scope of the body includes all its subroutines),
and subroutines can still change them, but they will stop
use strict; from barfing. It also has some other advantages
when we get to playing with modules.
The penultimate bit of the program we were originally discussing was this:
foreach ( @peas )
{
if ( $query eq $_ )
{
return "delicious";
}
}
This part compares the type of pea the subroutine was passed with all
the peas in its own @peas, and if it matches any of them,
the subroutine returns 'delicious'. Furthermore, you have just met perl's
most important conditional statement, if:
if ( THIS_IS_TRUE ) { DO_SOMETHING; }
which is analogous to:
while ( THIS_IS_TRUE ) { DO_SOMETHING; }
The equivalent of:
until ( THIS_IS_TRUE ) { DO_SOMETHING; }
is:
unless ( THIS_IS_TRUE ) { DO_SOMETHING; }
The actual comparison the if statement makes is:
$query eq $_
The eq tests to see if
two strings are identical. Perl has two sets of comparisons: numerical
and string. The 'equal to' test is eq for strings, and
== for numbers (that's two = signs). Perl goof number one is getting ==
comparison and = assignment mixed up.
In addition to 'equal to' comparisons, Perl also has greater than,
less than, greater than or equal to, less than or equal to, and not equal
to comparisons. For numbers these are >,
<, <=, >=, and
!= respectively. The equivalents for strings are
gt, lt, ge, le, and
ne.
The reason Perl makes a distinction between numerical and string
comparisons is because "2" and "2.0" are numerically equal, but not
stringily equal : "2" == "2.0" is TRUE because 2 and 2.0 are
the same numerically (I don't want to hear any mathematicians whining
about reals and integers either). However, "2" eq "2.0" is
FALSE, because they are clearly not the same string of
characters. Just remember you want the maths symbols to compare things as
numbers, and the language symbols to compare them as strings.
if statements can be optionally followed by any number of
elsif statements, and an optional else
statement, so:
if ( THIS_IS_TRUE )
{
DO_THIS_THING;
}
elsif ( THIS_OTHER_THING_IS_TRUE )
{
DO_THIS_OTHER_THING;
}
else
{
DO_THE_DEFAULT_THING;
}
Which is all very simple and obvious. You can also nest
if's inside other if's to a gazillion degrees,
which is a perfect way of making code unreadable, but will be necessary
from time to time.
Anyway, the upshot for the code we're looking at:
foreach ( @peas )
{
if ( $query eq $_ )
{
return "delicious";
}
}
is that if the type of pea flavour() gets passed matches
anything in flavour()'s own @peas, it will
return "delicious", using:
return "delicious";
return simply returns the list of things you give it
(here the list is just one item long). So if we pass
flavour() the value 'chick', which is in
flavour()'s list of delicious peas,
flavour('chick') will be 'delicious' and this is exactly
what is printed out by the body of the program. However, if what we pass
doesn't match any of flavour()'s preferences, the
foreach loop will end naturally, and we come across:
return "disgusting";
which it duly does.
If you come from a C background, you may be wondering if Perl has a
switch statement, which, if you don't, is basically a shorthand for a
very long if...elsif...elsif...elsif...else statement. Perl
doesn't currently have one of these, but Perl
6 will do. For the moment, you'll have to make do with:
for ( $arg )
{
/^quit$/ && do { exit 0; } ;
/^help$/ && do { system "perldoc $0" };
}
Which you'll probably not understand until you've covered regexes anyway!
Summary
That's largely all this is to subroutines: create (declare) them with a:
sub blah { DO_SOMETHING; }
use (call) them with a:
blah( LIST_OF_ARGUMENTS ); blah( $calar, @nd_an_array_too, @nd_another_array ); blah(); # if blah doesn't need telling what to do
All the arguments - including any items from arrays passed as arguments - will be flattened into a single long list, which is passed to the subroutine, and available for manipulation within the subroutine inside the default array:
@_
which you can get at using any array operator (or assigning it to a list).
my $arg1 = shift @_; my $arg2 = pop @_; my $arg3 = shift; # defaults to @_ my( $arg4, @args5 ) = @_;
Exit the subroutine with:
return ( "something\n", 'and maybe another', $thing, @or_things ); return; # or just exit without returning anything at all
Subroutines will return without an explicit
return with the value they last evaluated. I always use
return as I like to be explicit. You can capture what is
returned in the usual way: if blah() takes a list of
arguments, and returns just one thing:
$thing_returned_by_blah = blah( $argument, @other_arguments );
or if blah takes no arguments at all but returns a
list:
@lot_of_things = blah();
etc., etc.
Finally, be warned that:
use strict;
if ( $you_do_not_use eq "my variables" )
{
my @variables;
my $pinned_down;
print "you'll trash variables of the same name in the program body.\n";
print "and strict will kill you";
}
Test yourself
See if you can write a script that does the following:
- Write a subroutine that converts the names of trees to a sentence
about their uses using a hash, and provides a default if the tree is
unknown. Make sure it runs under
strict. Make it respond to user input until the user types 'STOP'. You may need to use theexitfunction, which exits a perl program.
#!/usr/bin/perl
use strict;
use warnings;
print "Please enter the names of trees you want to find out about...\n";
while ( my $tree = <STDIN> )
{
chomp $tree;
exit if $tree eq 'STOP';
my $uses = uses( $tree );
if ( $uses )
{
print "The products of $tree include $uses.\n";;
}
else
{
print "I don't have any information about this tree.\n";
}
}
sub uses
{
my $tree = shift;
my %uses =
(
oak => "wood, acorns",
apple => "apples, jam, juice",
orange => "oranges, neroli oil",
pine => "pallet boards",
);
return $uses{ $tree };
}
