Previous Section | Table of Contents | Next Section

Running a guestbook

As the final installment of this tutorial, we'll create a guestbook script that lets visitors to your site sign in and leave a pithy comment for others to read.

This section of the tutorial covers:

The guestbook script
Taint mode
Guestbook preliminaries
Regular expressions
Untainting with backreferences
File locking
Array variables
Guestbook file permissions
Turning the guestbook upside-down
Preventing duplicate guestbook entries

The guestbook script

Here's the script:

#!/usr/bin/perl -Tw

# guestbook.cgi - a simple guestbook script

# This program is copyright 1999 by John Callender.

# This program is free and open software. You may use, copy, modify,
# distribute and sell this program (and any modified variants) in any way
# you wish, provided you do not restrict others from doing the same.

$data_file = '/w1/l/lies/begperl/data/guestbook.txt';

$max_entries = 50; # how many guestbook entries to save?
                   # set to '0' (zero) for infinite entries...

use CGI;
use Fcntl;
$query = new CGI;

unless ($action = $query->param('action')) {
    $action = 'none';
}

print <<"EndOfText";
Content-type: text/html

<HTML>
<HEAD>
<TITLE>My guestbook</TITLE>
</HEAD>
<BODY>
<H1>My guestbook</H1>

<P>Here's my guestbook. You can <A HREF="#form">add
your own comment</A> using the form at the bottom
of the page.</P>
<HR>
EndOfText

if ($action eq 'Add comment') {

    # process the form submission
    # and assemble the guestbook entry

    $name = $query->param('name');
    $city = $query->param('city');
    $state = $query->param('state');
    $country = $query->param('country');
    $comment = $query->param('comment');

    # clean up and fiddle with $name

    unless ($name) {
        $name = 'Anonymous';
    }
    if (length($name) > 50) {
        $name = 'Someone with a really long name';
    }

    # disable all HTML tags
    $name =~ s/</&lt;/g;

    # untaint variable
    unless ($name =~ /^([^<]*)$/) {
        die "couldn't untaint name: $name\n";
    }
    $name = $1;

    # clean up and fiddle with $from_where

    $from_where = "$city, $state, $country";
    $from_where =~ s/, , /, /;        # remove duplicate ', '
    $from_where =~ s/^, //;           # remove initial ', '
    $from_where =~ s/, $//;           # remove final ', '
    if ($from_where =~ /^[,\s]+$/) {
        # nothing but commas and whitespace
        $from_where = 'parts unknown';
    }

    if (length($from_where) > 75) {
        $from_where = 'somewhere with a really long name';
    }

    # disable HTML tags
    $from_where =~ s/</&lt;/g;

    # untaint variable
    unless ($from_where =~ /^([^<]*)$/) {
        die "couldn't untaint from_where: $from_where\n";
    }
    $from_where = $1;

    # clean up and fiddle with $comment

    if (length($comment) > 32768) {
        $comment = '...more than I feel like posting in my guestbook.';
    }
    unless ($comment) {
        $comment = '...nothing to speak of.';
    }

    # fix line-endings
    $comment =~ s/\r\n?/\n/g;

    # lose HTML tags
    $comment =~ s/</&lt;/g;

    # untaint variable
    unless ($comment =~ /^([^<]*)$/) {
        die "couldn't untaint comment: $comment\n";
    }
    $comment = $1;

    # assemble finished guestbook entry

    $entry = <<"EndOfText";
<P><STRONG>$name</STRONG> <EM>from $from_where wrote:</EM><BR>
<BLOCKQUOTE>$comment</BLOCKQUOTE></P>
<HR>
EndOfText

    # open non-destructively, read old entries, write out new

    sysopen(ENTRIES, "$data_file", O_RDWR)
                             or die "can't open $data_file: $!";
    flock(ENTRIES, 2)        or die "can't LOCK_EX $data_file: $!";
    while(<ENTRIES>) {
        $all_entries .= $_;
    }
    $all_entries .= $entry;

    if ($max_entries) {

        # lop the head off the guestbook, if necessary

        @all_entries = split(/<HR>/i, $all_entries);
        $entry_count = @all_entries - 1;

        while ($entry_count > $max_entries) {
            shift @all_entries;
            $entry_count = @all_entries - 1;
        }

        $all_entries = join('<HR>', @all_entries);

    }

    # now write out to $data_file

    seek(ENTRIES, 0, 0)        or die "can't rewind $data_file: $!";
    truncate(ENTRIES, 0)       or die "can't truncate $data_file: $!";
    print ENTRIES $all_entries or die "can't print to $data_file: $!";
    close(ENTRIES)             or die "can't close $data_file: $!";

}

# display the guestbook

open (IN, "$data_file") or die "Can't open $data_file for reading: $!";
flock(IN, 1)            or die "Can't get LOCK_SH on $data_file: $!";
while (<IN>) {
    print;
}
close IN                or die "Can't close $data_file: $!";

# display the form

print <<"EndOfText";
<A NAME="form"><H2>Add a comment to the guestbook (no HTML):</H2></A>

<FORM METHOD="POST" ACTION="guestbook.cgi">
<TABLE>

<TR>
<TD ALIGN="right"><STRONG>Name:</STRONG></TD>
<TD><INPUT NAME="name" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>City:</STRONG></TD>
<TD><INPUT NAME="city" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>State:</STRONG></TD>
<TD><INPUT NAME="state" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>Country:</STRONG></TD>
<TD><INPUT NAME="country" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>Comment:</STRONG></TD>
<TD>
<TEXTAREA NAME="comment" ROWS=5 COLS=30 WRAP="virtual"></TEXTAREA>
</TD>
</TR>

<TR><TD COLSPAN=2> </TD></TR>
<TR>
<TD> </TD>
<TD><INPUT TYPE="submit" NAME="action" VALUE="Add comment"></TD>
</TR>
</TABLE>

</FORM>
</BODY>
</HTML>
EndOfText

Return to the top of the page

Taint mode

The first thing you'll notice about this script is the -T shebang-line switch we've added:

#!/usr/bin/perl -Tw

This switch (which is lumped together with the -w warnings switch we covered earlier) is a very useful tool for making your CGI scripts more secure. The -T switch turns on Perl's built-in "taint mode," which has the following effects:

any data obtained by your script from the outside world (meaning anywhere other than the script itself) is considered "tainted"
any variable that is modified in an expression that contains tainted data becomes tainted itself
tainted data may not be used by your script to take certain kinds of actions that would affect the outside world (writing to a file, for example)

In essence, taint mode makes your script paranoid. It doesn't trust anything anybody tells it, and won't take certain kinds of actions based on that information. Instead, it will die with an error message if it catches you trying to do something it considers unsafe. The hoops you have to jump through to overcome this tainting process are a bit of a pain, but they are in fact a Really Good Thing, since they force you to pay attention to the things bad guys could do with your script by feeding it bogus data.

Basically, in order for your scripts to run successfully under taint mode, you must "untaint" each and every piece of data that comes from an outside source. (In some cases, though not in this script, you must do some other things, too, like explicitly setting the $ENV{'PATH'} variable.)

Perl trivia: The motto of the Perl community is "There's more than one way to do it" (written TMTOWTDI, and supposedly pronounced "tim toady", but I tend not to hang out with people geeky enough to actually say things like that). Anyway, untainting data is something of an exception to the TMTOWTDI principle: There is, in fact, only one (1) way to untaint your data (at least, there's only one way I'm going to teach you in this tutorial): by using a regular expression match, which you're going to read about in just a minute.

Return to the top of the page

Guestbook preliminaries

Here's a guided tour of what's going on at the top of the guestbook script (after the copyright notice and terms-of-use statement). If you've been working through the tutorial in order, all this should be reasonably familiar:

$data_file = '/w1/l/lies/begperl/data/guestbook.txt';

This is the location where we're going to save guestbook entries between invocations of the script. Choose your own path and filename and specify them here. More about permissions for this file and directory later on.

$max_entries = 50; # how many guestbook entries to save?
                   # set to '0' (zero) for infinite entries...

To prevent someone from using up all your disk space, set this $max_entries variable to a number representing the number of entries you're going to save in the guestbook. Once that number is reached, the oldest entry will be deleted every time a new entry is added. Set $max_entries to zero (0) in order to disable this feature, and keep infinite entries (but don't say I didn't warn you).

More Perl trivia: Notice, by the way, that I didn't bother to use single- or doublequotes to delimit the 50 on the right side of the assignment. That's because I'm assigning a number, rather than a text string, so I don't need to worry about quoting.

use CGI;
use Fcntl;
$query = new CGI;

Here I'm pulling in the CGI module, as well as a module called Fcntl that we'll be using later for some opening and locking of files. I'm also initializing my CGI object with the "$query = new CGI" line (which you don't need to understand; just do it).

unless ($action = $query->param('action')) {
    $action = 'none';
}

The $action variable gets set to whatever is in the 'action' parameter. If someone clicks the submit button in the guestbook form (which you'll be seeing in a minute), then $action ends up containing 'Add comment', because that's the value attribute I use for the submit button in the form. Otherwise (that is, if they invoke the script, but not by clicking on the submit button in the form), $action will be empty. In that case, I assign 'none' to it to avoid a warning that the -w shebang-line switch will otherwise cause the script to spit out when I try to use the $action variable later on.

print <<"EndOfText";
Content-type: text/html

<HTML>
<HEAD>
<TITLE>My guestbook</TITLE>
</HEAD>
<BODY>
<H1>My guestbook</H1>

<P>Here's my guestbook. You can <A HREF="#form">add
your own comment</A> using the form at the bottom
of the page.</P>
<HR>
EndOfText

Here I'm printing out the top of my HTML page. Notice that I've gotten lazy with the "Content-type: text/html\n\n" header, and have just thrown it in at the beginning of my multi-line here-document. This works fine; you just have to be careful, as always, that nothing is output before the header, and that it is delivered just like it was in the single-line version, with a completely blank line following it. Putting "\n\n" in a double-quoted string does the same thing as hitting the Enter key twice in a here-document.

if ($action eq 'Add comment') {

    # process the form submission
    # and assemble the guestbook entry

This part of the script features a long if statement that runs only if the script was invoked by someone submitting the guestbook form. If the script is invoked without the form having been submitted, this whole section gets skipped.

What this section does, as the comment indicates, is to format and assemble the newly submitted guestbook entry.

    $name = $query->param('name');
    $city = $query->param('city');
    $state = $query->param('state');
    $country = $query->param('country');
    $comment = $query->param('comment');

Here we're just grabbing the submitted data out of our form. Note: All of this data, and hence all of the variables that we just assigned, are tainted. We won't be able to write this stuff out to our guestbook file until we've untainted them.

    # clean up and fiddle with $name

    unless ($name) {
        $name = 'Anonymous';
    }
    if (length($name) > 50) {
        $name = 'Someone with a really long name';
    }

The first part here checks to make sure that a name parameter was actually submitted (using an unless statement, which works exactly like an if statement, only with the meaning reversed, if you get my meaning). The second part checks to make sure the submitted name was not too long for convenient formatting on the guestbook page.

This is a good example, by the way, of the kind of suspicious thinking you need to cultivate when you write CGI scripts. There's nothing to prevent someone from feeding your script the complete text of War and Peace in that name field; the length function (which evaluates to the number of characters in whatever you feed to it) will let you catch and correct such hooliganism.

Return to the top of the page

Regular expressions

Regular expressions represent a deep and powerful magic at the heart of Perl. As such, they are responsible for much of the language's majesty and mystery. Approach them with awe, trepidation, and a pure heart, and they will reward you richly. Disregard them at your peril!

Heh. Just kidding. All I'm really trying to say is that for a beginning programmer, regular expressions are going to seem weird and obtuse and needlessly complicated. But stick with them, because they're important.

In a nutshell, regular expressions are a tool for matching and replacing text patterns. If you've used the "Find" or "Search and Replace" function in a word processor, you have some idea of what regular expressions do, but Perl's regular expressions are much more powerful than that. Their rich (that is to say, confusingly complex) syntax allows you to specify with astonishing precision exactly what patterns you are looking for and what you want done to them.

A full explanation of regular expressions is beyond the scope of this tutorial. All I'm going to do here is give some examples and explain a bit of what's going on.

    # disable all HTML tags
    $name =~ s/</&lt;/g;

So, here's our first regular expression. The =~ operator is used to tie the variable containing the text we want to work on ($name) to the expression that's going to do the work ( s/</</g ).

This happens to be the "substitution" flavor of regular expression, which takes the form:

s/expression to match/expression to replace it with/optional modifiers

In this case, looking carefully for the / characters that divide up the different parts of the expression, we can see the following:

s/</&lt;/g
  ^ ^^^^ ^
  |  |   |
  |  |    ----------- optional character that modifies the expression
  |  |
  |   ------ the expression to replace it with
  |
   -- the expression to match

The expression we're trying to match is '<', the less-than symbol, and the expression we're going to replace it with is '<', which HTML gurus will recognize as the entity for encoding a less-than symbol. That is, '<' is what you put in an HTML document in order to display the literal symbol '<'. You can't use the '<' character to do so, of course, because it would be interpreted as the start of a tag.

Why are we doing this? Because I've decided not to let users of my guestbook insert their own HTML code into my page. There are several reasons I chose to do this:

It's prettier this way, because my script can format the entries consistently.
It's easier this way, because disabling all markup is simpler than checking for user-supplied markup and passing only certain types through.
It's safer this way, because it prevents malicious users from doing things like using server-side includes to steal the Web server's password file, or using Javascript to do nasty things to visitors' browsers, or inlining pictures of Barney from some other Web site.

Oh, I almost forgot: the optional character at the end of the regular expression.

Putting 'g' after your regular expression makes the expression "global," in the sense that after it finds one match (or performs one search-and-replace operation, as in this case), it continues looking for more matches. The default behavior, which you get if you don't supply the trailing 'g', is to just do one match and then quit.

So, enough analysis. What's the bottom line?

The bottom line is that after the $name =~ s/</</g; line in the script has done its work, any HTML markup that was submitted in the form's 'name' field will have been disabled in the $name variable. For example, if the user submits the name "John <STRONG>Wild Man</STRONG> Callender", the line above will convert it into "John <STRONG>Wild Man</STRONG> Callender".

Return to the top of the page

Untainting with backreferences

    # untaint variable
    unless ($name =~ /^([^<]*)$/) {
        die "couldn't untaint name: $name\n";
    }
    $name = $1;

So, let's untaint some data. The way you untaint is to use a feature of regular expresssions called "backreferences". When you put parentheses around a part of your regular expression, the text that matches that part of the expression is remembered, and stored in the special variable $1. (If you have several pairs of parentheses inside your regular expression, the pair that begins furthest to the left stores its contents in $1, the next in $2, and so on.)

The way this ties in with tainting is that the contents of $1 are not considered tainted, even if the variable that the regular expression was matching against was tainted. Thus, regular expression backreferences let you "launder" your tainted data.

In the section of the guestbook script given above, we're using a regular expression match (a conventional match this time, not a substitution-type expression) to untaint the $name variable. Let's pick apart the regular expression so you know what's going on.

Matching regular expressions (as opposed to substitution regular expressions), take the following basic form:

/expression to match/optional modifiers

So, in this case, we're matching the $name variable's contents against the following expression:

^([^<]*)$

Here's an explanation of what each part of the expression is doing:

`^`	The initial caret symbol ^ serves as a "beginning-of-the-string anchor." It doesn't match anything itself, but it means that whatever follows can only match at the very beginning of the string being matched against.
`( )`	The parentheses don't have any effect at all on the matching operation, other than to capture whatever matches inside of them for later backreferencing via the $1 variable.
`[^<]`	The square brackets [ ] allow us to create what is called a "character class." The stuff inside the brackets is a list of characters, any of which can match. Except in this case, there's an initial caret symbol ^ inside the square brackets, and inside the square brackets of a character class an initial caret doesn't serve as a beginning-of-the-string anchor, but instead is a "negation" symbol, meaning that what follows is not a list of characters to be matched, but a list of characters not to be matched. That is, `[^<]` means "match any character whatsoever, except for the < character." It's a bit like the rules of the card game "Fizzbin"; remember that Star Trek episode?
`*`	The asterisk means that whatever character comes before it (or whatever character class comes before it, as in this example), can match any number of times, including zero times. In regular expressions we call the * symbol a "quantifier." Don't fall into the trap of thinking that the * symbol works like it does when you use it as a wildcard in DOS or Windows filenames, by the way. It's similar, but not quite the same. When you are wildcarding filenames, the * stands for any character or any string of characters (including no characters at all). In regular expressions, the * doesn't mean anything at all by itself; it only modifies whatever character or character class came immediately before it, telling you how many times it may be repeated (specifically, any number of times from zero to infinity). Because regular expression quantifiers are "greedy," they will try to match as much as they can when they see that asterisk. (Actually, it's a little more complicated than that, but the effect is usually to match as much as possible.)
`$`	Finally, the trailing dollar sign is another anchor, in this case an "end-of-string anchor." It doesn't match anything itself, but says that whatever comes immediately before it can only match successfully if it occurs at the very end of the string.

So, putting all those pieces together, what the ^([^<]*)$ regular expression says is, "try to do a match in which you start at the very beginning of the string, match a whole bunch of characters that are anything except '<', and end up at the end of the string" - and while you're at it, save whatever gets matched in $1 for later backreferencing. Or, to put it another way, this expression says, "Match the whole string, but only if the string has no < characters in it. If it has any < characters, don't match anything."

Whew. That's a whole lot of meaning for something that looks like a chunk of line-noise, or a comic-book curse word. See what I meant about regular expressions?

We use the unless statement to have the guestbook script die if this expression doesn't match, but we can be reasonably sure it will match, since we previously used the substitution expression to replace all the < characters with <. Then we just assign the backreferenced $1 to the $name variable, and voila, we've laundered the variable, and Perl's tainting mechanism no longer cares what we do with it.

    # clean up and fiddle with $from_where

    $from_where = "$city, $state, $country";
    $from_where =~ s/, , /, /;        # remove duplicate ', '
    $from_where =~ s/^, //;           # remove initial ', '
    $from_where =~ s/, $//;           # remove final ', '
    if ($from_where =~ /^[,\s]+$/) {
        # nothing but commas and whitespace
        $from_where = 'parts unknown';
    }
    if (length($from_where) > 75) {
        $from_where = 'somewhere with a really long name';
    }

    # disable HTML tags
    $from_where =~ s/</&lt;/g;

    # untaint variable
    unless ($from_where =~ /^([^<]*)$/) {
        die "couldn't untaint from_where: $from_where\n";
    }
    $from_where = $1;

Now we process the $city, $state, and $country variables in a similar fashion, sticking them all into a properly punctuated $from_where variable, replacing them with "parts unknown" if it turns out the user didn't supply any of them, and "somewhere with a really long name" if $from_where ends up being more than 75 characters long.

The only part of this that's different from the $name processing we did previously is this line:

    if ($from_where =~ /^[,\s]+$/) {

This regular expression is testing to see if our $from_where variable contains nothing but commas and "whitespace," which is how programmers refer to any character that produces space on the screen (tabs and spaces, basically). In Perl's regular expressions, you represent whitespace with the \s construction - that's a backslash (above the Enter key on your keyboard) followed by an 's'. Putting that and a comma inside square brackets makes a character class (as you no doubt remember), and putting a plus sign after the character class means that one of the characters from the class must match at least once, though they can match as many times as they want to (and with Perl's regular expression greediness, they will want to as many times as they can). Thus, /^[,\s]+$/ will match the entire string, but only if the string contains nothing but whitespace and commas (and it needs to have at least one; an empty string won't match, because we used a plus sign instead of an asterisk for our quantifier).

The rest of this chunk of code is just like the previous chunk: replacing all the < symbols with entities, and untainting the variable with a regular expression backreference.

    # clean up and fiddle with $comment

    if (length($comment) > 32768) {
        $comment = '...more than I feel like posting in my guestbook.';
    }
    unless ($comment) {
        $comment = '...nothing to speak of.';
    }

    # fix line-endings
    $comment =~ s/\r\n?/\n/g;

    # lose HTML tags
    $comment =~ s/</&lt;/g;

    # untaint variable
    unless ($comment =~ /^([^<]*)$/) {
        die "couldn't untaint comment: $comment\n";
    }
    $comment = $1;

The next section should look pretty familiar to you by now; we're cleaning up and formatting the user-supplied "comment" field. The only part that you haven't seen before is this one:

    # fix line-endings
    $comment =~ s/\r\n?/\n/g;

What's going on here is this: Some visitors to your site will enter their comments in the comments field with carriage-return/linefeed pairs (\r\n) at the end of each line, rather than just linefeed (\n) characters. This is one of those PC-vs-Unix incompatibilities; it happens to be the same one that makes your scripts break if you write them on your PC and then FTP them to the Unix server in binary, rather than text, mode.

If the person posting a comment is using a Mac, their lines will probably end with a solo carriage return (\r), with no linefeed at all.

The question mark (?) in the regular expression above makes the preceding item optional; that is, it is a quantifier that says the preceding item can match either 0 or 1 times. (Being greedy, it will always try to match the preceding item if it can.) Thus, the substitution expression in this case says, "take any carriage return characters, optionally followed by linefeed characters, and turn them into solo linefeed characters."

This doesn't make any real difference to a person viewing your guestbook, because Web browsers are smart enough to turn all these various line-ending sequences into a single space character. But I decided to go ahead and use the above substitution expression to make all the guestbook line endings conform to the Unix line-ending sequence, because if you want to go in and edit the guestbook file manually on the Unix server (say, to delete a comment you don't like, or add a response to a user's question), those various line-ending sequences will make things look ugly in your text editor.

    # assemble finished guestbook entry

    $entry = <<"EndOfText";
<P><STRONG>$name</STRONG> <EM>from $from_where wrote:</EM><BR>
<BLOCKQUOTE>$comment</BLOCKQUOTE></P>
<HR>
EndOfText

Nothing tricky going on here. We're just taking the formatted guestbook entry and putting it into the $entry variable. Notice, by the way, how I'm using here-document quoting to delimit what gets assigned to the variable. Here-documents aren't just for print statements; you can use them anywhere you would use a quote-delimited string.

Return to the top of the page

File locking

Now that we've got our guestbook entry looking how we want it, we need to combine it with the earlier entries, and save the whole kit and kaboodle in a file on the server.

Which raises an interesting issue. Depending on how popular your site is, it's entirely possible that two people will try to view the guestbook, or worse, add new entries to it, at the same time. This could create a problem; if two separate copies of your guestbook script are both writing data to the same guestbook file at the same time, you can end up with some really weird-looking guestbook entries.

We get around this problem by using something called "file locking." This is a way of marking a file as "yours," at least temporarily. Assuming the other programs that want to access your file are written to respect file locking, they will wait patiently for you to finish what you're doing before they do whatever it is they want to do.

This is a pretty simple concept; anyone who can figure out how to get three preschoolers to play happily with a single Tickle-Me Elmo doll should have no trouble with it.

    # open non-destructively, read old entries, write out new

    sysopen(ENTRIES, "$data_file", O_RDWR)
                             or die "can't open $data_file: $!";

Here we're opening a filehandle to the guestbook data file (the path and filename of which we defined at the top of the script). This is going to let us read from and write to the file. We're using the "sysopen" command, rather than the more commonly seen "open" command, because sysopen lets us open the filehandle "nondestructively," meaning we can open the filehandle, then check to make sure we've got the lock on the file before making changes to it.

    flock(ENTRIES, 2)        or die "can't LOCK_EX $data_file: $!";

Here's the file-locking part. We "flock" the filehandle we just opened, and specify '2' as the kind of lock. Typically you're going to flock as 1 (which gives you a "shared" lock on the file), or as 2 (which gives you an "exclusive" lock). Any number of scripts can have a shared lock at the same time; you use this when you're just going to be reading data from the file. An exclusive lock (which is the kind we got in this case) is what you want when you're going to be modifying the file. In order to get an exclusive lock, your script will have to wait until all other locks (shared or exclusive) on the file have been released before it gets the lock and continues with whatever comes next.

    while(<ENTRIES>) {
        $all_entries .= $_;
    }

This while (<ENTRIES>) thing lets us define a loop, with the script going through the loop once for each line in the file. Each time it goes through the loop, the current line being read from the file is automatically placed in a special Perl variable called "$_". All the loop does is to append (concatenate, for you polysyllabic types) the current line to the $all_entries variable. When we reach the end of the file the while ( ) thingy returns a false value, and we exit from the loop and continue with whatever comes next.

    $all_entries .= $entry;

What comes next is that we take the $entry variable (which contains the new guestbook entry just submitted) and append it to the $all_entries variable.

    if ($max_entries) {

        # lop the head off the guestbook, if necessary

Remember at the top of the script when we defined the $max_entries variable? Here's where we test whether it was set to a nonzero value. If it was, we need to check to see if we need to chop off some of the oldest guestbook entries.

Return to the top of the page

Array variables

        @all_entries = split(/<HR>/i, $all_entries);

Here's your first exposure to the second of Perl's three main variable types: the array variable. Indicated here by the @ sign in front of the variable name "@all_entries", an array variable doesn't hold a single thingy, like a scalar variable; instead, it holds a whole list of thingies. You access the thingies in the array by position; that is, you say to Perl, "Give me the 3rd thingy in the array," and Perl gives you the third thingy. (Actually, Perl programmers typically say "element" instead of "thingy," and they actually use an "offset" into the array, rather than an ordinal number like first, second, etc., such that they would say "$array[0]" to mean "the first element in the array," "$array[1]" to mean "the second element in the array," and so on. But it's the same thing, basically.)

We don't have to worry about accessing individual array elements here, though, because we're dealing with the whole array at once, assigning all the elements in one fell swoop via Perl's "split" function, which converts a scalar (the $all_entries scalar variable, in this case) into a list of elements by splitting the scalar everywhere that a particular regular expression matches.

In this case, the regular expression we're using to split the guestbook entries is /<HR>/, which means we're going to split the guestbook everywhere an <HR> tag appears. (The trailing 'i' modifier makes the match case-insensitive, such that <hr> would match, too.) Since our script formats the guestbook entries with an <HR> tag between each one, @all_entries ends up holding a list of all the guestbook entries. The <HR>'s get thrown away by the split operation, but that's okay, because we're going to put them back in in a second.

        $entry_count = @all_entries - 1;

Here's a bit of Perl trickiness: We need to see how many guestbook entries there are currently (so we know if we need to delete one to make room for the new one). In Perl, you can find out how many elements are in an array by sticking the array into what Perl programmers call a "scalar context." By assigning the array (or an expression containing the array, as in this case) to a scalar variable, you force Perl to evaluate the array as the number of elements it contains.

I threw in the " - 1" thing, because when I first tested this script I realized that by splitting on the <HR> tag, I actually ended up creating an empty element at the end of the @all_entries array (because there's an <HR> after the last guestbook entry). I need to subtract 1 from the number of elements in @all_entries so the math in my next step works the way I originally intended.

The technical term for this kind of ugly, after-the-fact retrofitting of a program is a "kludge." Remember that the next time you're playing Jeopardy and the category "Hacker Slang" comes up.

        while ($entry_count > $max_entries) {
            shift @all_entries;
            $entry_count = @all_entries - 1;
        }

Here we're using a while loop to test whether the number of guestbook entries is greater than the maximum we defined at the top of the script. If it is, we use the shift function, which removes an element from the top (or front, depending on how you visualize things) of the array, shortening the array by one element. In this case, we're throwing away the oldest guestbook entry. Then we reset the $entry_count variable for the next time through the loop.

The while loop will continue cycling through the loop until its test expression evaluates to false, which in this case will happen as soon as the number of entries is no longer greater than the maximum allowed. At that point we exit from the loop and go on with whatever comes next in the script.

        $all_entries = join('<HR>', @all_entries);

    }

In this case, what comes next is the re-assembly of the list of guestbook entries into a scalar. We do this using Perl's join function, which is the polar opposite, so to speak, of the split function: It lets you specify some text that will be used to join the elements of a list into a scalar. In this case, we join up the guestbook entries with <HR>, and we're back where we started (except we've made sure that we only have $max_entries entries).

The closing curly bracket ( } ) ends the block that was conditional on the test for whether $max_entries had been set to a non-zero value.

    # now write out to $data_file

    seek(ENTRIES, 0, 0)        or die "can't rewind $data_file: $!";
    truncate(ENTRIES, 0)       or die "can't truncate $data_file: $!";
    print ENTRIES $all_entries or die "can't print to $data_file: $!";
    close(ENTRIES)             or die "can't close $data_file: $!";

}

Now we do a little razzle dazzle with the ENTRIES filehandle: "seeking" to the beginning of the file, "truncating" the file (basically, deleting everything in it), printing the $all_entries variable to it, and closing the filehandle (which is the step at which the file actually gets written out to disk). The close statement also releases the exclusive lock we had on the file, so if other copies of the guestbook script have been waiting to get at it, now they can.

The closing curly brace ( } ) ends the block that began way back up near the top of the script, where we tested whether the "action" parameter had been set to "Add comment". That is, we're now done with the part of the script that runs only for cases in which someone invoked the script by submitting a new guestbook entry.

# display the guestbook

open (IN, "$data_file") or die "Can't open $data_file for reading: $!";
flock(IN, 1)            or die "Can't get LOCK_SH on $data_file: $!";
while (<IN>) {
    print;
}
close IN                or die "Can't close $data_file: $!";

This part opens $data_file for reading, using Perl's open function. We flock the filehandle, but this time flock it as '1', meaning we are requesting a shared, rather than an exclusive, lock. Then we just read a line at a time from the file with the while function, which you will recall sticks each line from the file into the special $_ variable. Each time through the loop we just use the print command, which by default prints out whatever is in the $_ variable. In essence, this just takes the whole contents of the file whose name is stored in $data_file and prints it out in the browser of whomever invoked our guestbook script.

Once the current contents of the guestbook have been printed out, the script prints out the form visitors can use to add their own guestbook entry:

# display the form

print <<"EndOfText";
<A NAME="form"><H2>Add a comment to the guestbook (no HTML):</H2></A>

<FORM METHOD="POST" ACTION="guestbook.cgi">
<TABLE>

<TR>
<TD ALIGN="right"><STRONG>Name:</STRONG></TD>
<TD><INPUT NAME="name" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>City:</STRONG></TD>
<TD><INPUT NAME="city" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>State:</STRONG></TD>
<TD><INPUT NAME="state" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>Country:</STRONG></TD>
<TD><INPUT NAME="country" SIZE=30></TD>
</TR>

<TR>
<TD ALIGN="right"><STRONG>Comment:</STRONG></TD>
<TD>
<TEXTAREA NAME="comment" ROWS=5 COLS=30 WRAP="virtual"></TEXTAREA>
</TD>
</TR>

<TR><TD COLSPAN=2> </TD></TR>
<TR>
<TD> </TD>
<TD><INPUT TYPE="submit" NAME="action" VALUE="Add comment"></TD>
</TR>
</TABLE>

</FORM>
</BODY>
</HTML>
EndOfText

Return to the top of the page

Guestbook file permissions

Any time you want a CGI script to write to a file on the Web server, you have some security-related issues to consider. Because the CGI script typically runs as "nobody" or "www" or some other unprivileged user, it can't normally make changes to a file owned by you. There are at least three alternatives for solving this problem:

1) Have the guestbook file be owned by the same user as the Web server process: "nobody" or "www" or whatever it is in the case of your server.

You could ask your system administrator to do this for you, or you could make the directory in which the data file is going to be created world writeable (using chmod to set the permissions to 777), then run a special CGI script to create the guestbook file. After that you should reset the directory's permissions to something more reasonable, like 755, which will still allow the guestbook script to add entries to the data file, but will prevent everyone else in the world from creating new files in the directory.

One problem with this approach is that it makes it harder for you to edit the guestbook file manually. You might want to add a reply to one of the guestbook entries, for example, or delete a specific entry. If the guestbook file is owned by the same account as the Web server, and isn't world-writeable, you won't be able to make changes to it easily.

2) Have the guestbook CGI script run in "setuid" mode.

By chmod'ing your script to permissions mode 4755, rather than just 755, you turn on setuid mode, which means the script will run as if its owner (you) were running it, even if it is started by some other user (like the Web server; i.e., user "nobody"). The benefit of this is that the script will be able to access the guestbook file even if it is owned and writeable only by you.

Here's a screen capture of what this would look like. Note the "s" in the permissions part of the ls -l listing; this is what tells you the script has been made setuid. That setuid setting probably will turn itself off automatically whenever you modify the script, by the way, which means you'll have an additional step to perform (chmodding it back to 4755) every time you make any sort of change to it.

catlow:/w1/l/lies/begperl> chmod 4755 guestbook.cgi
catlow:/w1/l/lies/begperl> ls -l guestbook.cgi
-rwsr-xr-x   1 lies     lies         4724 Oct 10 14:01 guestbook.cgi

Now you should create the directory that's going to hold the data file, chmod the directory to 755, cd to the directory, and create a blank guestbook file with the Unix "touch" command. After the guestbook file is created, you can chmod it to 644 (actually, you could chmod it to 600 if you are really paranoid), and you should be in business.

catlow:/w1/l/lies/begperl> mkdir data
catlow:/w1/l/lies/begperl> chmod 755 data
catlow:/w1/l/lies/begperl> cd data
catlow:/w1/l/lies/begperl/data> touch guestbook.txt
catlow:/w1/l/lies/begperl/data> chmod 644 guestbook.txt
catlow:/w1/l/lies/begperl/data> ls -l
total 0
-rw-r--r--   1 lies     lies            0 Oct 10 14:09 guestbook.txt

The downside to the setuid approach is that if someone succeeds in subverting your script, they will be able to do anything with it that you would have permission to do on the Web server, e.g., delete or modify all your Web pages, trash your home directory, and so on. A subverted script running as "nobody" probably would not be able to do as much damage. Another downside to the setuid approach is that your ISP may simply not allow it. On the other hand, some ISPs actually require it, using "script wrappers" like CGIWrap.

For my money, the setuid approach is the best one of the three, assuming you can use it.

3) Make the guestbook file itself world-writeable.

This is pretty much the lowest-common-denominator approach. By making the guestbook file itself world-writeable, you allow the Web server (that is to say, the guestbook CGI script) to write changes to it.

Start by creating the directory where the data file will be stored, and chmodding it to mode 755.

catlow:/w1/l/lies/begperl> mkdir data
catlow:/w1/l/lies/begperl> chmod 755 data

Then cd to the directory, and create a new, blank file to hold your guestbook entries by using the Unix "touch" command:

catlow:/w1/l/lies/begperl/data> cd data
catlow:/w1/l/lies/begperl/data> touch guestbook.txt
catlow:/w1/l/lies/begperl/data> ls -l
total 0
-rw-r--r--   1 lies     lies            0 Oct 10 14:09 guestbook.txt

Now chmod the guestbook.txt file to mode 666 (world writeable).

catlow:/w1/l/lies/begperl/data> chmod 666 guestbook.txt
catlow:/w1/l/lies/begperl/data> ls -l guestbook.txt
-rw-rw-rw-   1 lies     lies          0 Oct 10 14:10 guestbook.txt

The main benefit of this approach is that it's easy. The downside is that anyone else with access to the Web server's filesystem (that is to say, anyone else who has an account on the Web server, or who subverts your or someone else's CGI script) will be able to modify your guestbook file however they like: deleting it, adding bogus entries, sticking in arbitrary Javascript or server-side includes...

It's not a pretty picture, which is why, on balance, I prefer the setuid approach outlined above. But whichever of the three approaches you've decided on, go ahead and set everything up and enjoy your new guestbook!

Return to the top of the page

Turning the guestbook upside-down

Since I first wrote this, a number of people have asked how to turn the guestbook upside-down. That is, they want the newest entries to appear at the top, rather than the bottom. Here's an email in which I explain one way you might do that. Knock yourselves out!

Return to the top of the page

Preventing duplicate guestbook entries

And now, here's another requested guestbook enhancement (or at least, some pointers on how to implement such a thing yourself): A way to prevent duplicate entries from appearing in the guestbook. Good luck!

Return to the top of the page

Previous Section | Table of Contents | Next Section

John Callender
jbc@jbcsystems.com
http://www.lies.com/jbc/