Save it for later
Well, hacking on the symbol table is all well and good, but let's get back to practicalities. How do you mess about with files in Perl? Well, messing with files and directories is dead easy. A simple example:
#!/usr/bin/perl
use strict;
use warnings;
open my $INPUT, "<", "C:/autoexec.bat"
or die "Can't open C:/autoexec.bat for reading $!\n";
open my $OUTPUT, ">", "C:/copied.bat"
or die "Can't open C:/copied.bat for writing $!\n";
while ( <$INPUT> )
{
print "Writing line $_";
print $OUTPUT "$_";
}
Here we open two files, one to read from, one to write to. The
$INPUT and $OUTPUT are filehandles, just like
STDIN was, only we have created these two ourselves with
open. It's a good idea to give filehandles uppercase names,
as these are less likely to conflict with perl keywords (we don't want to
try reading from a filehandle called print for example).
Note that it's also possible to write the above in the following way:
#!/usr/bin/perl
use strict;
use warnings;
open INPUT, "C:/autoexec.bat"
or die "Can't open C:/autoexec.bat for reading $!\n";
open OUTPUT, ">C:/copied.bat"
or die "Can't open C:/copied.bat for writing $!\n";
while ( <INPUT> )
{
print "Writing line $_";
print OUTPUT "$_";
}
Note three things.
- You can miss off the
$sigil on the filehandles. Although this will work fine, modern Perl usage is to use a lexically scoped filehandle (except for the standard input, output and error handles that are opened automatically for you). You will see the old style filehandles in code, but you should avoid them if you are running under perl versions > 5.8, as they rely on dodgy global variables. - You can miss off the
<on calls toopen, and perl will assume you mean 'to read'. It's better practice to explicitly state what you mean with the three argument form. - You can also combine the read/write bit into the filename. However,
both this and missing out the
<on opening to read can be the cause of subtle bugs, so you'd be better to avoid them unless you really know what you're doing. Since you're reading this, I assume you don't…
The open command always needs two arguments: a filehandle
and a string containing the name of a file to open. So the first
line:
open INPUT, "<", "C:/autoexec.bat"
or die "Can't open C:/autoexec.bat for reading $!\n";
means 'open the file C:/autoexec.bat for
reading, and attach it to filehandle INPUT'. Now, if this
works, everything will be fine, the open function will
return TRUE, and the stuff after or will never be executed.
However, if something does go wrong (like the file doesn't exist, as it
won't if you're running on Linux or MacOS), the open
function will return FALSE, and the thing after the or
will be executed. die causes the Perl program to
terminate, with the message you give it (think of it as a suicidal
print). When something goes wrong, like problems opening
files, the Perl special variable $! is set with an error
message, which will tell you what went wrong. So this die
tells you what you couldn't do, followed by $!, which'll
probably contain 'No such file or directory' or similar.
A word of advice before we go any further. On Windows, paths are delimited using the \ backslash. On Unix, paths are delimited using the / forward-slash, on MacOS < X, I have no idea (colon?). Perl will happily accept either of these when running under Windows, but bear in mind \ is an escape, so to write it in a string, you'll have to escape it, thusly:
$file = "C:/autoexec.bat"; $file = "C:\\autoexec.bat";
I'd go with the first one in the name of portability and legibility,
although if you ever need to call an external program from perl (using system, more later), you'll probably
have to convert the / to \ with a s/\//\\/
The second line:
open OUTPUT, ">", "C:/copied.bat"
or die "Can't open C:/copied.bat for writing $!\n";
is very similar to the first, but here we are opening a file for
writing. The difference is the >:
open my $READ, "<C:/autoexec.bat"; # explicit < for reading open my $READ, "<", "C:/autoexec.bat"; # three argument version is safer open my $WRITE, ">C:/autoexec.bat"; # open for writing with > open my $WRITE, ">", "C:/autoexec.bat"; # safer open my $APPEND, ">>C:/autoexec.bat"; # open for appending with >> open my $APPEND, ">>", "C:/autoexec.bat"; # safer open my $READ, "C:/autoexec.bat"; # perl will assume you 'read'
The > means open the file for writing. If you do this
the file will be erased and then written to. If you
don't want to wipe the file first, use >>,
which opens the file for writing, but doesn't clobber the contents first.
The three argument versions are generally safer (consider whether you
want this to work:
chomp( my $file_name = <STDIN> ); # user types ">important_file" open my $FILE, $file_name; # the writer assumes for reading, but the > the user enters overrides this. Oops.
The next bit is easy:
while ( <$INPUT> )
{
print "Writing line $_";
print $OUTPUT "$_";
}
Remember the line reading angle
brackets <> ? As in:
chomp ( $name = <STDIN> );
This is the same, but here we are reading lines from our own
filehandle, INPUT. A line is defined as stuff up to and
including a newline character (just as it was when you were reading
things from the keyboard). [And you also know this is strictly a fib,
<> and chomp deal with lines delimited by
whatever is in $/ currently]. Conveniently:
while ( <$INPUT> )
is a shorthand for:
while ( defined ( $_ = <$INPUT> ) )
i.e. while there are lines to read, read them into
$_. The defined will eventually return FALSE
when it gets to the end of the file (don't test for eof
explicitly!), and then the while loop will terminate.
However, while there really is stuff to read, perl will
print to the command line "writing line blah…", then
print it to the OUTPUT filehandle too using:
print $OUTPUT "$_";
Note that there is no comma between the filehandle and the thing
to print. A normal print:
print "Hello\n";
is actually shorthand for:
print STDOUT "Hello\n";
where STDOUT is the standard output (i.e. the
screen), like STDIN was the standard input (i.e.
the keyboard). To print to a filehandle other than the default
STDOUT, you need to tell print the filehandle
name explicitly.
What else can we do with filehandles? As well as opening them to read
and write files, we can also open them as 'pipes' to external programs,
using the | symbol, rather than > or
<.
open my $PIPE_FROM_ENV, "-|", "env" or die $!; print "$_\n" while ( <$PIPE_FROM_ENV> );
This should (as long as your operating system has a program called
env) print out your environmental variables. The
open command:
open my $PIPE_FROM_ENV, "-|", "env" or die $!;
means 'open a filehandle called
PIPE_FROM_ENV, and attach it to the output of the
command env run from the command line'. You can then read
lines from the output of 'env' using the
<> as usual.
You can also pipe stuff into an external program like this:
open my $PIPE_TO_X, "|-", "some_program" or die $!; print $PIPE_TO_X "Something that means something useful to some_program";
Note the or die $! :
it's always important to check the return value of external commands,
like open, to make sure something funny isn't going on. Get
into the habit early: it's surprising how often the file that can't
possible be missing actually is…
An even more common way of executing external programs is to use
system. system is useful for running external
programs that do something with some data that perl has just created, and
for running other external programs:
system "DIR";
Will run the program DIR from the shell, should it exist.
Given it doesn't exist on anything but Windows (please tell me no-one out
there still has a computer running nothing but MS-DOS), there's no point
in running it unless the OS is correct. Perl has the OS name (sort of) in
a punctuation variable. Try running:
print $^O;
MSWin32
to find out what perl thinks your OS is called.
system is a weird command: it generally returns FALSE
when it works. Hence:
#!/usr/bin/perl
use strict;
use warnings;
if ( $^O eq "MSWin32") { system "dir" or warn "Couldn't run dir $!\n" }
else { print "Not a Windows machine.\n" }
will give spurious warnings. Here we have used warn
instead of die: warn does largely the same
thing as die, but doesn't actually exit: it
just prints a warning. [As you may guess from my 'coding' the word
exit, if you want to kill a perl program happily (rather
than unhappily, with die), use exit.
print "Message to STDOUT\n"; warn "Message to STDERR\n"; exit 0; # exits program gracefully with return code 0 die "Whinge to STDERR\n"; # exits program with an error message
What you actually need for system is the utterly
bizarre:
system "dir" and warn "Couldn't run dir $!\n";
a (historically explicable, but still bizarre) wart that will be fixed
in Perl 6. By the way, perl actually
opens three filehandles when it starts up: STDIN,
STDOUT and STDERR. You've met the first two
already. STDERR is the filehandle warnings, dyings and other
whingings are printed to: it is also connected to the screen by default,
just like STDOUT, but is actually a different
filehandle:
warn "bugger";
and
print STDERR "bugger";
have largely the same effect. There's no reason why you can't close and re-open a filehandle, even one of the three default ones:
#!/usr/bin/perl use strict; use warnings; close STDERR; open STDERR, ">>errors.log"; warn "You won't see this on the screen, but you'll find it in the error log";
You have now met two of Perl's logical operators, or and
and. Perl has several others, including not and
xor. It also has a set stolen from C that look like
line-noise: ||, && and !,
which also mean 'or', 'and' and 'not', but bind more tightly to their
operands. Hence:
open my $FILE, "<", "C:/file.txt" or die "oops";
will work fine, because the precedence of or
(and all the wordy logic operators) is very low, i.e.
perl thinks this means:
open( my $FILE, "<", "C:/file.txt" ) or die "oops";
because or has an even lower precedence than the
comma that separates the items of the list. However, perl thinks
that:
open my $FILE, "<", "C:/file.txt" || die "oops";
means
open my $FILE, "<", ( "C:/file.txt" || die "oops" );
because || has a much higher precedence than the
comma. Since "C:/file.txt" is TRUE (it's
defined, and not the number 0), perl will never see
'die "oops"'. The logical operators like
&&, or and || return
whatever they last evaluated, here C:/file.txt, so perl will
try and open this file, but if it doesn't exist, there is nothing
more to do and you will get no warning that something has gone
wrong. The upshot: don't use || when you should use
or, or make sure you put in the brackets yourself:
open( FILE, "<", "C:/file.txt" ) || die "oops";
Operator precedence is boring, but important. If you are worried, bung in parentheses to ensure it does what you mean. Generally perl DWIMs (particularly if you're a C programmer), but don't always count on it, especially if you're doing something complicatedly line-noisy.
One last way of executing things from the shell is to use `
` backticks. These work just like the quote operators, and will
happily interpolate variables (as will system "$blah @args"
for that matter), but they actually capture the output into a
variable:
my $output = `ls`; print $output;
Like qq() and q() and qw(),
there is also a qx() (quote execute) operator, which is just
like backticks, only you chose your own quotes:
my @output = qx:ls:;
Handling directories is a simple as handling files:
#!/usr/bin/perl
use strict;
use warnings;
opendir my $DIR, ".";
while ( defined( $_ = readdir $DIR ) )
{
print "$_\n";
}
- The
opendircommand takes a directory handle, and a directory to open, which can be something absolute, likeC:/winnt, or something relative, like.the current working directory (CWD) or../parpthe directoryparpin the parent directory of the CWD. - Rather than using the
<>line reader, you must use the commandreaddirto read the contents of a directory. I've used thedefinedexplicitly, as you never know what idiot is going to create a file or dir called0in the directory you're reading. - When you get to the end of a directory listing using
readdir, you will need to userewinddirto get back to the beginning, should you need to read the contents in again. - To change the current working directory, you use the command
chdir.
Here's a program that changes to a new directory, and spews out stuff
about the contents to a file called ls.txt in the new
directory.
#!/usr/bin/perl
use strict;
use warnings;
my $dir = shift @ARGV;
chdir $dir or die "Can't change to $dir: $!";
opendir my $DIR, "."
or die "Can't opendir $dir: $!\n"; # the new CWD, to which we changed
open my $OUTPUT, ">", "ls.txt" or die "Can't open ls.txt for writing: $!";
while ( defined ( $_ = readdir $DIR ) )
{
if ( -d $_ ) { print $OUTPUT "directory $_\n" }
elsif ( -f $_ ) { print $OUTPUT "file $_\n" }
}
close $OUTPUT or die "Can't close ls.txt: $!\n";
# pedants will want to use an 'or die' here
closedir $DIR or die "Can't closedir $dir: $!";
# perl will close things itself, but it doesn't hurt to be explicit
There are a few new things here. @ARGV you may recognise
from the symbol table programs. This
is another special perl variable, like $_ and
$a. It contains the arguments you passed to the program on
the command line. Hence to run this program you will need to type:
perl thing.pl d:/some/directory/or/other
@ARGV will contain a list of the single value
d:/some/directory/or/other, which you can get out using any
array operator of your choice. In fact, pop and
shift will automatically assume @ARGV in the
body of the program, so you could equally well write..
my $dir = shift;
and get the same effect. This should remind you of subroutines, the only
difference is that array operators default to @ARGV in the
body, and @_ in a sub. The V stands for 'vector' if you're
interested, it's a hangover from C.
The rest of the program is self explanatory, except for the
-f and -d. Not too surprisingly, these are
'file test' operators. -f tests to see if a file is a file,
and -d tests to see if a file is a directory. So:
-f "C:/autoexec.bat"
will return TRUE, as will:
-d "C:/windows"
as long as they exist! Perl has a variety of other file test
operators, such as -T, which tests to see if a file is a
plain text file, -B, which tests for binary-ness, and
-M, which returns the age of a file in days at the time the
script started. The others can be found using perldoc.
RTFPD: read the perldoc
perldoc is perl's own command line manual: if you
type:
perldoc -f sort
at the command prompt, perldoc will get all the
documentation for the perl function sort (the
-f is a switch for f(unction) documentation), and display it
for you. Likewise:
perldoc -f -x
will get you information on file test operators (generically called
'-x' functions). For really general stuff:
perldoc perl
will get you general information on perl itself, and:
perldoc MODULE_NAME
e.g.:
perldoc strict
will extract internal documentation from modules (including
pragma modules like strict) to tell you how to use
them. This internal documentation is written
in POD (plain old documentation) format, which we'll cover when we
get onto writing modules. Lastly:
perldoc -h
or amusingly:
perldoc perldoc
will tell you how to use perldoc itself, which contains
all the other information for its correct use I can't be bothered to
write out here.
Summary
Next up, regexes, but first a quick summary. Opening files looks like:
open my $FILEHANDLE, $RW, $file_to_open; # note the commas
If $RW looks like "blah", it'll be opened
for reading, if ">blah", for writing, if
">>blah", for appending, and if "-|",
opened as a pipe from an external command called blah, if
"|-" as a pipe to an external program.
You should always check return values of open to make
sure the file exists, with or die $! or similar, which
prints to the STDERR filehandle, as does warn.
External commands can also be run with system (don't forget
the counterintuitive 'and die $!'), backticks, or the
qx() quotes. Read from files with the
<$FILEHANDLE> angle brackets, print to them with:
print $FILEHANDLE "parp"; # note the lack of comma
and close them with close.
Use opendir, readdir,
rewinddir, chdir and closedir to
investigate directories (with or die as appropriate), and
the file-test operators -x to investigate files and
directories. And if in doubt, use the perldoc.
Test yourself
See if you can write a script that does the following:
- Use perldoc to find the usage of the function
mkdir. Use this to create a directory called "environment" containing a file called 'list.txt" containing a listing of the user's environment variables, gathered from the shell.
#!/usr/bin/perl
use strict;
use warnings;
mkdir "environment", 0777 or die "Can't mkdir 'environment': $!\n";
open my $FILE, ">", "environment/list.txt"
or die "Can't open 'list.txt' for writing: $!\n";
my $env = `env`;
print $FILE $env or die "Can't print to 'list.txt': $!\n";
# can't be too careful
close $FILE or die "Can't close file: $!\n";
