We've come a long way…
…from the 'Hello, world' script. I guess by now,
you should be able to do most of the following with your eyes shut:
- Install and run perl from files or from the command line.
- Manipulate and compare scalars, hashes, arrays and slices.
- Write loops, conditionals and subroutines.
- Understand what the symbol table is, and how to manipulate typeglobs.
- Use simple IO, filehandles, dirhandles and pipes.
- Use
strict, lexical (my) and dynamic (local) scoping. - Use regular expressions and the
grep, sort, split, joinandmapfunctions. - Create complex data structures with references and use coderefs and return closures.
- Use perl to invoke other programs via
system, and perl itself viaeval. - Create and
useclasses and objects. - Write, document, install and
usemodules. - Know what happens when you invoke perl, and what Perl 6 and Parrot will be like.
- Debug programs, use
perldocand the wealth of other perl resources on the Internet. - Use some of the Windows-specific bits of perl, should you be so cursed.
- Write servers, clients and
forking programs.
That's really quite impressive! Just to bring you back down to Earth, we're going to start back at the very beginning all over again.
Hello, world
You may recognise this from an earlier lesson or three:
#include <stdio.h>
int main()
{
printf( "Hello, world\n" );
exit( 0 );
}
This is the benighted script in C. Why, in a Perl tutorial, should you
give a damn about programming in this centuries-old glorified Assembler
language ;) ? Well, there's one very good reason: perl itself is
written in C, and in this lesson we will be delving a little into perl's
guts, and messing with them. The old way of doing this was via
the XS mechanism, whereby you wrote a module in Perl and a module written
in a macro-language called XS (a sort of bastard love-child of C, Perl,
English and pain). You then wrote a makefile for the modules,
make-d and compiled them, and then got bored and decided to
implement it in pure Perl anyway as your head hurt. No longer. We will be
using the Inline modules instead, which have nearly all the
power of XS, but without the grief of having to actually do anything.
To install Inline, all you need to do is:
ppm install Inline
or
perl -MCPAN -e shell install Inline
However, as I mentioned before, there is a problem: if you are running
the ActiveState port of perl under Windows, you will need
cl.exe, the C/C++ compiler (and its libraries and linker)
from MS Visual C++ Studio v6.0 (this is also the case if you want to
compile XS extensions). You'll also need nmake. To play with
Inline under WinNT, I'd strongly recommend installing the
(free) Cygwin environment, with the Cygwin ports of perl,
gcc (the GNU C compiler with which Cygwin perl is compiled)
and make, then install Inline for this binary
instead. You can then invoke your Inline-d Perl scripts from
the Cygwin bash shell (make sure your shebang is correct
though).
Let's look at the Inline::C version of the world's most
famous program:
#!/usr/bin/perl
use strict;
use Inline 'C';
hello();
__END__
__C__
int hello()
{
printf( "Hello, world\n" );
exit( 0 );
}
To use Inline, you need to tell it which language you
want to use, i.e. 'C', then include the program in
the Perl script somehow. There are several ways to do this: we'll use
this one, where we just dump the C-code after the __END__
marker in a section starting __C__.
If your C programing experience is non-existent, then the rest of this lesson may be a little confusing. You might want to check out a C tutorial first.
To execute the script, all you need to do is save it as
script.pl, and run it:
script.pl ...time passes... Hello, world
Whoohoo! There are a number of things that can do wrong, the obvious
one of which is writing buggy code, but the other is due to
Inline not finding a place to build the C components of the
script. If you have the latter problem, try creating an environment
variable called PERL_INLINE_DIRECTORY with the value
c:/cygwin/tmp/inline or similar (you'll need to actually
create this directory, obviously).
Now, try running the script again:
script.pl ...very little time passes... Hello, world
You may notice the whole thing is rather quicker this time. The reason
for this is the first time you invoke an Inline-d script,
Inline does all the nasty building (compiling, assembling,
linking and installing) that is required to get the Perl/C interface to
work i.e. stripping out the C code from your script,
transforming it into an XS module that binds your C subroutines to perl
subroutines, writing a makefile.pl, executing it, running
make, testing the code and finally compiling it with
make install. This takes a while. However, after this,
Inline will realise it has already compiled the code, and
doesn't go through the rigmarole the second time: it just uses the
extension it has already built.
So far so easy
Now the good stuff. Creating a script that just printfs
something dull isn't very useful. What happens if we want to send data to
and from the subroutine? Unfortunately, you'll need to know a little
about how perl actually works to do this, because the fundamental data
types of perl and C are quite different. A simple example first:
#!/usr/bin/perl
use strict;
use Inline 'C';
chomp( my $name = <STDIN> );
my $size = count( $name );
print "Your name is $size letters long\n";
__END__
__C__
#include <string.h>
int count( char *name )
{
int length = strlen( name );
return length;
}
This time, we grab a string from STDIN, and pass it to
the C function count, which returns the length of the
string. We then print this out. Now, if you're hazy on C, you need to
realise the following: C has no inbuilt functions for directly
manipulating strings: strings are treated as arrays of characters
terminated by a null character \0. Furthermore, C is
strongly typed i.e. there is no generic 'scalar' like in perl:
it needs to know if what you want to store or return is a
character, integer, floating point
number, double precision float, long integer,
etc. This is not a C tutorial, but we'll take this one a bit at
a time.
The first thing we do is include the standard C library
string (by #include-ing its header file
string.h), which defines a function called
strlen(), which returns the length of a string (less any
trailing \0). There's actually no need to do this
#include-ing, as Inline automatically
#includes all the standard C libraries (like
stdio and string), and all the perl libraries
too: the sharp-eyed among you may have noticed the lack of #include
<stdio.h> in the hello world script.
Then we define a function called count, which takes a
char* argument (we'll explain this in a minute), which it
will call name, and returns an integer. All C
functions look something like:
RETURN_TYPE function_name( ARG1_TYPE ARG1_NAME, ARG2_TYPE ARG2_NAME, ... )
The RETURN_TYPE can be any of the types mentioned above
(int, char, etc.), or
void if the function doesn't actually return anything.
However, C cannot return an array, and as strings are just arrays of
chars in C, it cannot return a string either. For similar
reasons, it cannot easily receive a string as an argument. So how can we
pass count the string whose length we want to find? The
answer is to pass a pointer, which is very similar to passing a
reference in Perl. You can't pass several arrays to a subroutine directly
in Perl (without their being 'flattened' to a list), so you pass
references to them. You can't pass an array/string directly in C, so you
pass pointers to them. The pointer is (literally) a number that says
where first member of an array lives in memory. So the char*
means 'a pointer (*) to an array of characters,
i.e. a string'. The first (and only) argument to
count() is therefore the pointer char*. The
rest of the function is obvious: strlen() takes a
character pointer and returns the integer length.
You can easily pass ints, longs,
doubles, and char* pointers to and from C
subroutines. A file called typemaps (usually in the
lib/ExtUtils directory) provides the glue that shows
Inline how to convert between C's types and perl's types, in
the above case, ensuring that the perl scalar value containing your name
gets appropriately converted into a C-style pointer to an array of
characters. And this is where we begin to delve into the insides of
perl:
Inline and XS allow you to directly manipulate perl's own internal
data structures. The most important of these is the pointer to a scalar value (SV),
SV*. SV*s are pointers to C
structs (a little like Perl's objects), and represent the
basic internal data type that perl uses to store scalar variables like
$v. An SV* contains the data you stored when
you create a scalar like $v. The insides of an
SV* can contain a variety of other structures (such as
IV*, integer values and PV*, string /pointer
values), depending on whether perl thought you wanted to store an
integer, a float, a string, etc. Various functions can be used
to assign to, manipulate and otherwise torture SV*s, and
this is what perl itself does when using $v in numerical
($x=2+$v), boolean (exit if $v) or string
($v.=" percent") context: the data stored in the scalar
value is retrieved as doubles or as pointers to arrays of
chars, etc: whatever is required by the
interpreter. There are also AVs and HVs (no prizes for guessing what
these are), themselves composed of SVs. When you passed
$name to the count() function earlier,
Inline implicitly converted the SV containing "Steve" or
whatever into the C 'string' (char*) that the
count() function wanted.
However, there's no reason why you shouldn't pass pointers to SVs and
torture them as you see fit. The functions you can use to
manipulate them are documented in perldoc perlapi. If you
replace the C code in the previous example with that below:
int count( SV *name )
{
int length = strlen( SvPV( name, PL_na ) );
return length;
}
Nothing changes when you run this: it does exactly the same as the
last bit of code, but you are doing the conversion explicitly: the
function SvPV is the one perl (and Inline) uses
to extract a pointer value (char*) from a scalar value (SV).
It returns a C 'string' (char*), and takes two arguments,
the first is a pointer to an SV (SV*), here
name, the second is a variable into which the length of the
string is put: if you don't care about this, the API (application
programming interface) provides a convenience junk variable called
PL_na, which we use here. In fact, that makes no sense at
all, as the length of the string is exactly what we are after! A better
idea would be:
int count( SV *name )
{
int length;
char *string = SvPV( name, length );
return length;
}
An even better idea would be to use the SvCUR function,
which does exactly what we want (i.e. get the length of
the string stored in a SV) without pointlessly returning that char
*string:
int count( SV *name )
{
return SvCUR( name );
}
Stack hackery
There are hundreds of other functions you can use to manipulate SVs,
AVs and HVs from within C, all documented in perldoc
perlapi. For the next example, we'll look at how to pass and
return an indefinitely long list of SVs. To do this, we'll need to become
acquainted with the perl Stack, which the the thing perl uses to pass
multiple arguments to and from C functions, which are inherently
incapable of doing this alone. So, the Stack is the pile of
SV*s that perl uses to pass and retrieve values to and from
a subroutine. When you call a perl subroutine with arguments ($foo,
$bar), the corresponding SV*s for $foo
and $bar are pushed onto the Stack. The subroutine then pops
them off the Stack as required. In the previous examples, you left it to
perl and Inline to pop SV* name off the Stack, and push [the
SV* corresponding to] int length onto the
Stack. However, Inline provides a number of functions for
manipulating the Stack directly:
#!/usr/bin/perl
use strict;
use Inline 'C';
my @numbers = qw( 1 2 3 4 5 6 7 8 9 10 );
my @pairwise_sums = sum( @numbers );
print "The pairwise sums are @pairwise_sums\n";
__END__
__C__
void sum( int num1, ... )
{
Inline_Stack_Vars;
int i;
int j=0;
int sum[Inline_Stack_Items/2];
/* Create an array called sum half the size of the Stack */
for (i=0; i<Inline_Stack_Items; i+=2)
/*Iterate over the stack two at a time */
{
sum[j++]=SvIV(Inline_Stack_Item(i))+SvIV(Inline_Stack_Item(i+1));
/*
Each item on the stack is an SV*.
We use SvIV to extract the integer value from the SV*.
Then we sum them and dump them in the C array sum.
*/
}
Inline_Stack_Reset;
for (i=0; i<j; i++)
{
Inline_Stack_Push(newSViv(sum[i]));
/*
Here we iterate over the sum array,
creating new perl SV*s with the newSViv function.
Then we push these new SV*s onto the Stack.
*/
}
Inline_Stack_Done;
}
(apologies to anyone who thinks my C is rubbish!). Here we have
written a C function called sum that takes a list of
integers and returns another list of integers that are the pairwise sums
of the input list (i.e. 1+2, 3+4, 5+6, etc.). The
syntax for receiving a list of arguments is:
RETURN_TYPE function_name ( dummy_type dummy_var, ... )
To receive a variable size list of arguments, we use the ... ellipsis
notation. XS requires at least one argument in these cases, so we provide
it with a dummy variable int num1, which we never intend to
use, and don't. Instead we manipulate the perl Stack directly. The first
thing we need to do is initialise the Inline Stack handling
functions, we do this with:
Inline_Stack_Vars;
This should be at the top of any function manipulating the Stack, as it defines the following macros:
Inline_Stack_Items, which is the number of items on the Stack.Inline_Stack_Item( n ), which returns the n-thSV*in the Stack.Inline_Stack_Reset, which resets the Stack pointer (i.e. 'clears' the stack ready to push new values onto it).Inline_Stack_Push( foobar ), which pushes theSV*calledfoobaronto the Stack.Inline_Stack_Done, which tellsInlineyou've finished manipulating the Stack.Inline_Stack_Void, which tellsInlineyoureallywant to return nothing from the subroutine, i.e. push nothing onto the Stack.
The function sum itself works by iterating over the
Stack, grabbing out a pair of SV*s with
Inline_Stack_Item( i ), and using the perl API function
SvIV to extract the integer value of the SV*.
It then sums these and dumps the result in a C array called
sum. Then we iterate over this C array, creating new
SV*s using the newSViv function, which creates
a perl SV* from a C int. These are then pushed
onto the reset Stack, and returned. NB: note that the
RETURN_TYPE of a function using Stack manipulation directly
should be void, or perl will get terribly
confused.
Here are some other perl API functions that may come in handy for manipulating scalar values (I'll leave arrays and hashes for your own edification):
SV *newstring = newSVpvf( "Create a new SV with %s or %s semantics",
"printf", "sprintf" );
sv_setpvn( newstring, "Or just overwrite one", 21 );
The sv_setpvn function can modify the string inside a SV:
the three arguments are the SV* to torture (here
newstring), the string to put into the SV, and the length of
the string you're putting into the SV.
The usefulness of embedding C code into Perl may not seem obvious at
the moment, but it allows you to include external C libraries (such as
your favourite blah library you use all the time in C) and call
them directly from Perl. XS and Inline also allow you to
write C extensions that may run faster than perl does for simple tasks
(e.g. if you wanted to quickly sum all the A, G, T and Cs in a
nucleotide string, it might be quicker to use C than to use a
foreach( split //, $nucl ){ blah } Perl construct,
and the overhead that entails). Finally, the Inline
mechanism has even been extended so you can embed C++, Java, Python,
Ruby, Awk, BASIC, Tcl and even Perl, the last using the entirely silly
Acme::Inline::PERL
module. There's now even more than more than one way to do it!
