Lesson 8

Classy

You are now an accomplished Perl programmer, and could happily hack away all day writing throwaway scripts. However, sooner or later, you will come to realise that there's more to life than little throwaway scripts. Suddenly, you find that you are copying and pasting reams of code from old scripts into new scripts. Then you find a big old bug in the code you've merrily pasted into forty scripts, and spend a day finding all occurrences of the bug to fix them.

There is another way.

If you ever use the same bit of code two or more times in a single script, you should put it in a subroutine. That way you only have to worry about debugging it once. Likewise, if you ever find yourself using the same subroutine or snippet of code in more than one script, bung it into a module.

Modules are Perl's not-so-secret weapon. If you've not been to CPAN yet, go there now. It's always a good idea (essential, I would argue), to have a look on CPAN before you start any significant project, as the chances are, someone else will have been there before you, written the code, worried about it, debugged it, put fifteen bells and twelve whistles onto it, and released it for all and sundry to use. Don't reinvent the wheel if you don't have to! (Although sometimes it's worth half-reinventing the wheel to prove you can do it for your own satisfaction). So, let's find out how to write a module, which we will imaginatively title MyModule.

There's a lot of things you can mess up if you're writing a module from scratch, so the best way to do it, even for 'personal' modules you have no intention of unleashing on the world, is to use a utility called h2xs. Change to a directory you don't mind creating a directory called MyModule in, and type:

h2xs -AXn MyModule

at the command prompt. The A and X switches tell perl to make a vanilla module, not a weird-ass XS C-extension. The n switch tells perl the name of your module. If all goes well, you will now have a directory called MyModule containing the files:

Changes
Makefile.PL
MANIFEST
MyModule.pm
README
test.pl

You needn't worry about:

unless you actually plan on unleashing your module on the world. The meat of the module distribution is MyModule.pm (pm is 'perl module'), which will contain a template something along the lines of (commenty bits removed):

package MyModule;
use 5.008;
use strict;
use warnings;
require Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = ( 'all' => [ qw( ) ] );
our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
our @EXPORT = qw( );
our $VERSION = '0.01';
# Preloaded methods go here.
1;
__END__
=head1 NAME
MyModule - Perl extension for blah blah blah
=head1 SYNOPSIS
use MyModule;
=head1 DESCRIPTION
Stub documentation for MyModule, created by h2xs.
=head2 EXPORT
None by default.
=head1 AUTHOR
A. U. Thor, E<lt>a.u.thor@a.galaxy.far.far.awayE<gt>
=head1 SEE ALSO
L<perl>.
=cut

Let's take this a bit at a time, so we can find out how to edit this template to do our bidding. Later on, I would recommend either subverting the output of h2xs, writing your own boilerplate from scratch, or using Module::Starter, but for the moment, we'll look at the dirty details of creating modules without the syntactic sugar…

package MyModule;

The first thing that should be at the top of any module is a package statement. If you read the bit on symbol tables, you may have a vague idea about what a package is. A package (or name-space) is a way of letting you use the same names for variables and subroutines in different parts of a program. For example:

#!/usr/bin/perl
package Foo;
$e = "hello";
print "In package Foo, \$e is $e\n";
package Bar;
$e = "goodbye";
print "But in package Bar, \$e is $e\n";
print "You can still see \$e in package Foo if you fully qualify it...\n";
print "\$Foo::e is still $Foo::e\n";
In package Foo, $e is hello
But in package Bar, $e is goodbye
But you can still see $e in package Foo if you fully qualify it...
$Foo::e is still hello

In the same way that a command shell will assume you mean the file.ext in the current working directory, perl assumes you mean the variable called $e in the current package. The reason you've not seen the word package at the top of every script so far is that perl automatically assumes you are working in package main; unless you tell it otherwise explicitly. Think of main as your home package if you like. If you want to fiddle with things from other packages, you'll need to 'fully qualify' their names with :: double colons, which are similar to the / delimiter in the shell. If you think of package like chdir, and :: as /, all will become clear. So the package variable:

$e

in package Foo is called:

$Foo::e

and the subroutine:

function()

in package Foo::Parp is called:

&Foo::Parp::function()

if you have to fully qualify them. Note in the second case that you can have subpackages (of a sort) with more than one :: double colon. The reason we create modules in new packages is that if we wrote this:

# my module
$x = "blah";
# my script
$x = "bobble";

then when we used the module, our script would overwrite the module's definition of $x, because they would share the same namespace. When you create modules, you create a new namespace where you can make and manipulate variables to your heart's content without having to worry about trashing other people's variables and subroutines of the same name in other packages. (Note that lexical variables don't suffer from this problem, which is another reason to use strict;).

That's pretty largely all there is to packages. You can define several in one file, or spread one over several files, but the 'natural' size of one package is one file. If you create a file called MyModule.pm, and let it contain the package MyModule, then perl will be very happy, and the rest of this tutorial will be nice and easy. Otherwise you're on your own!

Next up:

use 5.008;
use strict;
use warnings;

For the sake of paranoia, use 5.008; means 'die if the version of perl you're running is less than 5.008'. This may be important if you're using something new for perl, like Unicode support, that old versions of perl don't support. use strict; is something you ought to have been doing for a while now, and use warnings; is the newer and better way of saying -w that we've been using for a while now. Incidentally, if you hadn't realised, every time you've written use strict; or use ANYTHING; at the top of a script, you've been using other people's modules. Modules written with lowercase names like strict are often called pragmata or pragma modules: they generally affect how perl deals with your script itself, rather than giving you extra functionality.

require Exporter;

Now we get into the nitty gritty. For the moment, we'll look at the require keyword: Exporter is just a perl module that exports things (like subroutines) from one package to another. require is very similar to use in that it loads in the contents of a module, so that you have access to its functions from your scripts.

The difference between require and use is that require doesn't import any functions into your package, and it does what it does at compile-time rather than run-time.

Qué?! Well, if you were writing a script (which by default would define itself in package main), and you wanted to use the function parse() from package MyModule, you have two ways of doing it.

You can require MyModule; and then call the function with 'fully qualified' names (the :: double colon syntax):

# we're in package main if we don't say we're not
require MyModule;
my ( @parsed ) = MyModule::parse( @things_to_parse );

Alternatively, you can use MyModule; which (if suitably set up) will export the function parse() from package MyModule into package main (or wherever you're working), so you can use it more easily:

# use exports the functions from package MyModule to package main
use MyModule;
my ( @parsed ) = parse( @things_to_parse );

No need to fully qualify the function name. When you require Exporter; you are asking perl to read in the Exporter module, but not to import any functions from it. As we don't actually want to import functions from the Exporter module, we require, not use it. The other thing about use its that it does its thing at compile-time, rather than run-time: this means that when your script is compiled by perl, it will check to see if you have all the requisite modules before executing anything, and if you don't have them all, it will die. require doesn't do this compile-time checking.

You may be able to guess therefore, that use MyModule; is exactly equivalent to:

BEGIN { require MyModule; import( MyModule ) }

BEGIN{} is a special block that is automagically called by perl when its starts: it makes things happen at the very beginning of compiling a script (END{} is similar: it's executed just before your program ends). This effects the compile-time checking.

import() is just a subroutine in the MyModule.pm file that tells perl which functions to import into the caller's namespace (i.e. the package, probably main, that the script use-ing the module is working in). This effects the function importing.

Now it's all very well saying perl will import functions from one package to another, but where does perl look for these packages in the first place? Well, when you create a perl module, you need to save it somewhere perl can find it. Use this:

#!/usr/bin/perl
print "$_\n" foreach @INC;

to list the places in your computer's filesystem that perl will search for modules in. @INC is like the PATH environmental variable for perl. You'll notice that ".", the current working directory (CWD), is one of the places on the list. So if you put MyModule.pm in your CWD, it will be found and used by perl when a script says use MyModule;. What about that Package::Subpackage business? If you create a directory called MyModule in the CWD (say D:/Steve/), then create a file called Subpackage.pm, perl would look for the package MyModule::Subpackage in D:/Steve/MyModule/Subpackage.pm. See what I mean about :: being like the path delimiter / ?

So now you know how to go about writing a module: you simply need to write some functions, and write a subroutine called import that exports these functions from one package to another. The latter is a simple matter of setting a typeglob in the caller's symbol table to a reference to the subroutine you wish to export.

Erm, yeah. In fact, almost no-one rolls their own import function. Almost everyone just borrows the one in Exporter, which is what:

our @ISA = qw( Exporter );

is for. @ISA (that's @rray 'is a') is where you can put the names of modules that you want perl to search in, to find functions you can't be bothered to define. So, if you can't be bothered to define import() yourself, you can tell perl to look for this function in Exporter.pm instead, hence:

our @ISA = qw( Exporter );
# MyModule IS A Exporter, and inherits functions 
# I can't be bothered to define from it

So now, when a script use-s MyModule, it will use the import() method from the Exporter module to furnish the script with whatever functions you chose to export from MyModule.pm. Hope this is all clear!

Well, perhaps not entirely clear, if you're wondering what our does. As you may have guessed, our is related to my. When you use strict; all variables have to be nailed down to a particular lexical scope with my, and will disappear from the symbol table, making them inaccessible from other scopes and packages. If they're not nailed down, perl will barf. This you know. However, what happens if you do want someone to be able to see the value of a variable in your module? For example, in the module File::Find, the variable $dir contains the current directory being processed, which is a useful bit of information for scripts using the module. But if you make $dir a lexically scoped my variable, it will be invisible outside of the scope in which it is created. For modules, this means invisible outside of the module itself. Oops.

This is what our is for. our explicitly allows you to share nasty global variables, which is exactly what strict doesn't like. our allows you to circumvent strict for variables you really do want to be accessible from anywhere using the $Package::variable or @MyModule::ISA notation. Since @ISA needs to be visible outside the scope in which it is defined (Exporter uses it), we must our it, not my it.

That's the worst bit over. The rest of it is just prettification of the interface. The next lines of MyModule tell the Exporter module which functions to export from the module if someone use-s it.

our %EXPORT_TAGS = ( 'all' => [ qw( ) ] );
our @EXPORT_OK   = ( @{ $EXPORT_TAGS{'all'} } );
our @EXPORT      = qw( );
our $VERSION     = '0.01';

$VERSION is obvious. Like use 5.008; you can also use MyModule 0.02; This makes perl die if the version of MyModule you have is older than the version you want to use.

@EXPORT is the easiest way of exporting functions. If your module contained three functions sublime(), boil() and melt(), and you wanted to export all of them to the caller's namespace:

our @EXPORT = qw( sublime boil melt );

would do just that. However, people usually prefer to selectively import functions, and the use of @EXPORT is discouraged unless your module is just one or two functions (like File::Find or File::Path). This is what @EXPORT_OK is for. Ignore the @{ $EXPORT_TAGS{'all'} } bit for the minute. If you wanted people to be able to import these three functions selectively, you could do this:

our @EXPORT_OK = qw( sublime boil melt );

Then users of your module could:

use MyModule "sublime", "boil"; # or
use MyModule qw( sublime boil ); # avoid all those quotes

if they had no interest in importing the melt() function and polluting their namespace.

Finally, the %EXPORT_TAGS is very useful: it allows you to define groups of functions to export (see CGI for an example). Say you want people to be able to import your three functions as a lump without having to go to all the trouble of writing three whole things:

use MyModule qw( sublime boil melt );

you can create an export tag called all, which contains all three functions. %EXPORT_TAGS is just a hash of key/value pairs. The keys are the names of the tags you want to define, and the values are an arrayref of the functions you want to dump in the tag:

our %EXPORT_TAGS = ( 'all' => [ qw( sublime boil melt ) ] ); # or
our %EXPORT_TAGS = ( 'all' => [ "sublime", "boil", "melt" ] );

With this defined, you can:

use MyModule qw( :all );

and Exporter will conveniently translate the tag :all into the list of three functions you have defined with the all key in the %EXPORT_TAGS hash. If you do define an :all tag, which is probably good practice, you can then use it in @EXPORT_OK:

our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );

That is, it's OK to export all the functions referred to by the all value out of %EXPORT_TAGS. Note the @{ } dereferencing syntax from the last lesson.

Finally, after all the package, exportation and global variables nonsense, we finally get onto the beef:

# Preloaded methods go here.
1;
__END__

This bit is just a perl program. Go write it in the space # Preloaded methods go here. Mostly, you'll only be defining subroutines here, since these are what you usually want to export. The 1; is needed because all modules have to return TRUE when they load: this ensures they do. The __END__ token is a signal to perl to stop reading, since after this comes the documentation for the module, and this is of interest only to perldoc, not to perl itself.

Perldocumenting yourself

Talking of which:

=head1 NAME
MyModule - Perl extension for blah blah blah
=head1 SYNOPSIS
use MyModule;
=head1 DESCRIPTION
Stub documentation for MyModule, created by h2xs.
=head2 EXPORT
None by default.
=head1 AUTHOR
A. U. Thor, E<lt>a.u.thor@a.galaxy.far.far.awayE<gt>
=head1 SEE ALSO
L<perl>.
=cut

Perl documentation is written in POD (plain old documentation) format, which is a markup language like HTML, but simpler. perldoc can read and display the POD embedded in a module, which makes it the perfect tool for documenting your module so you don't forget how it works, and so others can use it without getting up close and personal with the source code. Things starting = are processing directives. I think you can guess what head1 and head2 do. =cut is the signal for the end of the POD. Some other useful directives are:

=over 4

and

=back

=over indents the text by some amount (here 4 spaces), and =back restores the indent to 0. You'll notice that if you want a newline in your POD, you need a blank line: POD is otherwise newline-insensitive.

=item * function()

is used to create itemised lists, with a pretty * as a bullet point. Like HTML, POD uses angle brackets to mark up certain bits of text, but unlike HTML/XML (with its <open-tag> </close-tag> syntax), the thing you want to italicise, or whatever, goes inside the brackets:

I<text>

will put text in italics. B<text> does bold, C<blah> does code, L<foobar> does links (here L<perl> links to the perl manpages), and E<> does escapes like E<lt> and E<gt> for < and >. Documenting your code is essential if you want people to use it: don't fall into the trap of assuming a) everyone's stupid and you're going to let them wallow in it or b) everyone will know how to use your code by osmosing it in. Documentation is extremely important: if you have a memory like mine, you won't remember how to use your own scripts in six month's time, so write the documentation now, so you don't have to remember the entire script later. Nuff rant. The easiest way to learn POD documentation is to use perldoc to read some prettily formatted, then look at the module itself to see what it looks like in code. It's not very difficult. Just do it.

My first module

The hello world module. I think this should all be very obvious (srand seeds perl's random number generator, rand(NUMBER) generates a random number between 0 and NUMBER, and ||= is an assignment operator for ||, which is a perl idiom for 'default': A ||= B is the shorthand for A = A || B, which means 'A equals B unless A already equals something other than 0 or undef'):

package Hello;
use 5.006;
use strict;
use warnings;
require Exporter;
our @ISA = qw( Exporter );
    #no need for export tags or for export_ok in a single function module
our @EXPORT = qw( hello );
our $VERSION = '0.01';
srand;
sub hello
{
    my $name = shift;
    $name ||= "you";
    my $message = rand(1) > 0.5 ? "a waste of time" : "a lot of fun";
    return "Hello, $name, isn't this $message?\n";
}
1; # Magical TRUE value that all modules must return when they are loaded
__END__
=head1 NAME
Hello - Perl extension for printing a stupid message      
=head1 SYNOPSIS
  use Hello;
  $msg = hello( "Steve" );
  print $msg;
=head1 DESCRIPTION
Stub documentation for MyModule, created by h2xs. It looks like the author 
of the module took careful note of the importance of documentation, 
and here it is:
=head2 EXPORTED FUNCTIONS
=item * hello( $arg )
=over 4
Randomly prints one of two stupid message for $arg, which should be a name,
but will default to 'you'.
=back
=head1 AUTHOR
Steve Cook, E<lt>steve@steve.gb.comE<gt>
=head1 SEE ALSO
L<perl>.
=cut

Then all we need to do is save the module in the root of one of the directories in @INC (i.e. the CWD, or similar) and:

#!/usr/bin/perl
use strict;
use warnings;
use Hello;
hello( "Perl novice" );

Object orientation

Well, that's how to write a perl module that exports some functions that others might find useful. What about you Java programmers who just have to encapsulate everything into an object? For those who have no idea what an object is, think of Windows, or the Gnome desktop: object oriented programming (OOP) doesn't really have anything to do with graphical user interfaces, but they are similar in that they abstract the implementation from the interface: it doesn't matter how icky the goo of code and data under the bonnet is, all you get to see is the shiny buttons and pretty output.

Some definitions: an object is a thingy (in perl, objects generally are thingies, i.e. a gelatinous mass of references), containing data which has some associated methods, which do something to the data when you call them. Object oriented programming has a lot of pretentious terminology, so keep you eye out for high faluting words for simple ideas. The main idea of OOP is to keep data and the functions that manipulate that data together in an otherwise opaque object. Simple as that.

In object oriented programming, everything starts by creating an object of a particular class, usually with the new method:

my $cat = Cat->new(); # create new object $cat of class Cat

and continues by making the object do things to itself, such as with a method called feed:

$cat->feed( "Mechanically recovered meat sludge" ); 
    # invoke method feed on object $cat

If you were writing this with 'normal' non-OO perl, you might create a hashref called $cat:

$cat = { stomach => "empty" };

and write a function called feed():

sub feed
{
    my ( $cat, $food ) = @_;
    $cat->{ stomach } = $food;
    # better start getting used to these reference thingies
}

so you could call:

feed( $cat, "Mechanically recovered meat sludge" );

to feed the cat. However, in OOP, the data and the functions (methods) are incestuously tied up with each other. This is bad (as it makes OO programs chunkier and slower) and good (because it hides all the implementation under the bonnet, and keeps the data within the object, rather than cluttering up your program with lots of variables). Although the non-OO program above with $cat and sub feed works fine, you have to worry about the $cat, what its keys and values are, what the return values of sub feed are, and the fact that cat is a hashref (not an arrayref), and so on. And so would anyone else trying to write new functions for the cat such as worm and spay. In OOP, the object is the centre of all data and manipulations thereof. OO encapsulates all the details of what is going on, so the user doesn't have to see the code's innards, and presents them in a black box with big shiny buttons called methods. All you need to know is which buttons to press (see the Windows analogy); you need know nothing about what is going on inside.

As a user of the code anyway. If you want to write the code, you'll have to know the guts intimately. Objects are implemented by simple modules in perl 5. In fact, lets dump the terminology for a minute: in Perl:

Classes are actually easier to write than vanilla modules at first. Here is the start of an OO Cat module:

package Cat;
use 5.008;
use strict;
use warnings;
our $VERSION = '0.01';
our @ISA = ();
#We'll fill in the gaps here presently
1;
__END__

There's no need to worry about exporting functions, as the whole point of objects is that objects look after their own functions (methods) themselves. Hence, no @EXPORT, etc. @ISA takes on a special importance in OO programming. As we said earlier, @ISA contains places to look if you can't find a function in the module itself. In OO programming, looking somewhere else is called inheriting methods. We'll cover this presently.

Now, as you may have gathered, Cat is actually a class, not an object. An object is a particular instance of a class: the object Steve Cook is a particular instance of class Human, perhaps. The class (module) provides the code to generate new objects, hence every class needs something to make new objects with, a 'class method' called a constructor that instantiates new objects. In perl, you can call this method anything you like, but it's best to stick with common parlance and call it new like everyone else:

sub new
{
    my $class = shift;
    my $self = { stomach => "empty" }; # lovely hashrefs
    bless $self, $class;
    return $self;
}

This method can be called in two equivalent ways in a script:

use Cat;
my $mr_tibbles = new Cat;
my $mrs_tibbles = Cat->new();

I prefer the latter (the former can lead to some nasty syntactic ambiguities). The new method is just a subroutine, a factory for making objects of class Cat. When you create an OO module, you need to be aware of one extremely important fact: the name of the class, or the object you call a method on, is the first thing in the @_ of the subroutine that implements it. So:

Cat->new();

will do something along the lines of calling the function new( "Cat" ); in package Cat. This seems fairly obvious, but wait till we get to 'object methods'. So the new() method we wrote gets "Cat" when it it called and it shifts this into $class. So it will know what sort of an object it should make. As an aside, don't be tempted to hardcode the class, as in:

$class = "Cat";

because this will break should anyone want to make a 'subclass' out of your class: if someone wants to implement a class called Tabby and inherit your new() constructor, the hardcoded new() will merrily make objects of the wrong class (i.e. Cat, not Tabby). This is a Bad Thing.

Next, the constructor creates the data the object needs. This is conventionally called $self, but doesn't have to be. This is conventionally a hashref, but doesn't have to be. TIMTOWTDI. Here, we are implementing the Cat as a hash. Almost. Perl objects are scalars, so we'll actually use an anonymous hashref. In this we put our 'stomach' stuff. Then comes the important bit. We know our class. We have our data. We need to glue these together to form an object. bless does this:

bless $self, $class;

makes the data in $self an instance of class $class. And it returns this blessed hashref, to be captured by our user's script in $mr_tibbles. That is all there is to constructing an object: in fact, all a constructor need really do is:

sub new { bless {}, $_[0] } # perl's smallest constructor

Now, if you wanted to see what $mr_tibbles actually looks like on the inside, you can investigate him using the dereferencing operator->, so:

$contents = $mr_tibbles->{ "Stomach" };

will get you 'empty'. To be really clever:

use Data::Dumper;
Dumper( $mr_tibbles );

Will spray $mr_tibbles 's guts out all over the screen. However, such direct dissection is generally considered extremely bad OO form. The only way to investigate $mr_tibbles should be via the object methods (big shiny buttons ahoy) that we can call on him:

sub feed
{
    my ( $self, $food ) = @_;
    $self->{ stomach } = $food if defined $food;
    return $food;
}

feed() is such a method. You call the method with a -> (which is the same as . for most OO languages, and due to mutate in Perl 6):

$mr_tibbles->feed( "Mechanically recovered meat sludge" );

The -> here is being used not to dereference a reference, but to call a method on $mr_tibbles. This dual use for -> confused the life out of me at first, but if you're careful to note the brackets, you'll be OK:

$thing->{ key };
    # hashref dereference, note the {}
$thing->[ index ];
    # arrayref dereference, note the []
$thing->( args );
    #coderef dereference, note the ()
$thing->method( args );
    # method call on object $thing, optional arguments in ()

Now, remember what I said: the object ($mr_tibbles) you call an method on is the first thing passed in @_. So to the method feed, @_ is ( $mr_tibbles, "Mechanically recovered meat sludge" ). These are assigned to $self and $food respectively. Then, if the $food is defined, it's put into $mr_tibbles 's stomach with $self->{ stomach } = $food;

If no food is passed:

$contents = $mr_tibbles->feed();

does nothing to $mr_tibbles: the stomach contents are unchanged. However, via return $food; the method can both alter (mutate) $mr_tibbles's stomach contents and just report (access) what he's eaten. How useful.

The ref operator will usually return what a reference refers to (ARRAY, SCALAR, HASH, etc.), as you know. However, if we call it on an object, it will return the class the object belongs to. So:

ref ( $mr_tibbles );

This can be useful for debugging. We will now add some more object methods:

sub hairball
{
    my ( $self ) = @_;
    my $vomit = $self->feed();
 $self->feed( "empty" );
    return $vomit;
}

This demonstrates that you could (and probably should) use methods even within the class. You could've written:

sub hairball
{
    my ( $self ) = @_;
    my $vomit = $self->{ stomach };
   $self->{ stomach } = "empty";
    return $vomit;
}

and manipulated the cat's innards directly, but using the first version protects you from your own changes to your own code: let your methods do everything for you and it will save you a lot of grief when you decide to rearrange the innards of the cat later.

A little earlier, we mentioned inheritance, but what is it? To see, let's implement a rudimentary Tabby class that inherits from Cat.

package Tabby;
use strict;
   # blah blah blah,
our @ISA = qw( Cat );
sub miaow
{
    my( $self ) = @_;
    print "Miaow\n";
}
1;

When you:

my $tabitha = new Tabby;

you'll get a new Tabby cat. Even though there's no method called new() in package Tabby. That's where @ISA comes in: a Tabby IS A Cat, and if perl can't find the relevant method in Tabby, it'll search the packages in @ISA (i.e. Cat) to find the method instead. So Tabby does exactly what Cat does, only you can make her miaow. Wow. If you wanted to be more practical, you could define your own new:

package Tabby;
use strict;
    # blah blah blah,
our @ISA = qw( Cat );
sub new
{
    my ( $class ) = @_;
    my $self = $class->SUPER::new();
        # inherit cattiness from SUPER-class, i.e. Cat, by calling the
        # superclass's constructor
    $self->{ breed } = "Tabby";
    bless $self, $class; # re-bless the cat into a tabby
    return $self;
}
sub breed
{
    my ( $self ) = @_;
    return $self->{ breed };
}
1;

When we make a new Tabby, we're actually making a new Cat, by calling the SUPER::new() method. We then shove an extra bit of information into the $self hashref, and rebless it.

What happens if you want your whole class to have some data (rather than each individual object)? Say you want to know how many cats you have created:

package Cat;
my $census = 0;
sub new
{
    my $class = shift;
    my $self = { stomach => "empty", _census => \$census };
    bless $self, $class;
    ++ ${ $self->{ _census } };
    return $self;
}
sub census
{
    $self = shift;
    return ref $self ? ${ $self->{ _census } } : $census;
}
sub DESTROY
{
    $self = shift;
    -- ${ $self->{ _census } };
}

The part that initially implements the census is:

my $self = { stomach => "empty", _census => \$census };

Now, you may be wondering why we're using some tortuous scalar reference, and then having to do some horrible backflips to increase the census by one later:

++ ${ $self->{ _census } };

and to retrieve it in the method census(), (note this can and should be able to be called as a class or object method):

return ref $self ? ${ $self->{ _census } } : $census;

and to decrement it in the destructor method DESTROY, which is automatically called when an object is destroyed:

--${ $self->{ _census } };

The reason for the scalar reference is that if we don't take a reference to $census, and instead try to decrement $census directly in DESTROY, we could end up decrementing the wrong $census if our object methods (e.g. DESTROY) were inherited. For example, if Tabby inherited Cat's DESTROY method and you directly decremented $census in this method, you'd end up decrementing the Cat census, not the Tabby census. This is because the $census that DESTROY can see is the one defined in package Cat. Technically 'object methods execute in the context in which they were defined (i.e. package Cat), not in the context that invoked them (i.e. package Tabby)'. Decrementing the Cat census when a Tabby is DESTROYed is probably not what you want to do (or it might be: either way, you need to think about it). However, by always using a reference to $census, we ensure that if Tabby inherits DESTROY, but supplies its own $census class data and new constructor, then DESTROY will decrement the Tabby $census, not the Cat $census. (Read that again until it makes sense!).

You may also be wondering why the underscore in _census. The reason for the underscore is that _hashkeys and _methods look special to C++ programmers, since they indicate that the data are private. In OO perl, it's considered bad form for a script to mess with the insides of an object (like the value of $self->{ stomach }) directly. It's considered unforgivably bad form to mess with a private $self->{ _underscored } value. You can do it (unlike in C++, where private means private, with razor wire), but there's probably a very good reason why you shouldn't.

There's loads more to OO programming, if you're interested, try perldoc perltoot, which'll tell you even more of the gory details of method inheritance, multiple inheritance (a class can inherit from more than one parent), the SUPER and UNIVERSAL classes, and AUTOLOAD. A brief word on the last: when you write OO perl, you will soon get bored of creating a thousand methods all of the form:

sub X
{
    my ( $self, $x ) = @_;
    $self->{ X }  = $x if defined $x;
    return $x;
}

to access the data in the object. Autoloading allows perl to mimic methods for these, so you don't have to. Autoloading is a dirty hack though so don't use it. ☺.

Next…