FAQ: Frequently Asked Questions about CGI Programming

From: Nick Kew <nick.kew@pobox.com>
Date: 23 Aug 1996 00:45:55 GMT
Newsgroups: comp.infosystems.www.authoring.cgi

Frequently Asked Questions on CGI programming

Table of Contents
=================

0.   Preamble
0.1. Changes
0.2. Notice and Disclaimer
0.3. Where to get this document
0.4. Omissions
0.5. Credits

1.   Basic Questions
1.1. What is CGI?
1.2. Is it a script or a program?
1.3. When do I need to use CGI?
1.4. Should I use CGI or JAVA?
1.5. Should I use CGI or SSI?
1.6. Should I use CGI or an API?
1.7. What do I absolutely need to know?
1.8. Does CGI create new security risks?
1.9. Do I need to be on Unix?
1.10. Do I have to use Perl?
1.11. Do I have to put it in cgi-bin?
1.12. Do I have to call it *.cgi?  *.pl?

2.   HTTP Headers and NPH Scripts
2.1. What is HTTP (HyperText Transfer Protocol)?
2.2. What HTTP request headers can I use?
2.3. What HTTP response headers do I need to know about?
2.4. What is NPH?
2.5. Must/should/can I write nph scripts?
2.6. Do I have to call it nph-*

3.   Techniques: "How do I..."
3.1. Can I get information about who is visiting?
3.2. Can I get the email of visitors?
3.3. 	"But I saw some.kool.site display my email address..."
3.4. Can I get browser details and return different pages?
3.5. Can I trace where a user has come from/is going to?
3.6. Can I launch a long process and return a page before it's finished?
3.7. Can I launch a long process which the user interacts with?
3.8. Can I password-protect my pages?
3.9. Can I identify users/sessions without password protection?
3.10. Can I redirect users to another page?
3.11. Can I run a CGI script without returning a new page to the browser?
3.12. Can I write output to a different Netscape frame?
3.13. Can I write output to several frames at once?
3.14. Can I use a CGI script to generate both text and inline images?

4.   Applications: Is there an existing script to ...
4.1. Where to look for free scripts for my application?
4.2. Discussion group/bulletin board
4.3. CSCW/Groupware
4.4. Database

5.   Troubleshooting a CGI application

6.   Further Reading
6.1. Other FAQs/collections
6.2. Reference Pages

=============================================================

Section 0.   Preamble

----------------

Subject: 0.1 Changes

Last Modified: August 22nd 1996:
* General update from original draft.
* Added "production version" preamble.
* Chopped entries from Section 4 which were controversial or useless.
* Added Web Database references from a post by Matthew Healy.

----------------

Subject: 0.2 Notice and Disclaimer

Copyright 1996 Nick Kew.

You are free to copy or distribute this document in whole or in part
for any purpose and on any medium you choose, provided: 

      You DON'T do so for profit.
      You DO include this notice and disclaimer in full.

Disclaimer: This information is offered in good faith and in the hope
that it may be of use, but is not guaranteed to be correct, up to date
or suitable for any particular purpose.   The author accepts no liability
in respect of this information or its use.

----------------

Subject: 0.3 Where to get this document

This is a new document, and does not yet have a stable home.   By the time
I post, I will at least have made it available from my email autoresponder.
Send blank email to satfaq@pobox.com (autoresponder) for details.

----------------

Subject: 0.4 Omissions

Obviously I'm not trying to deal with every question: just some common ones.
Some major areas which are covered neither here nor in the other FAQs
referenced are:
   *	cgiwrap
   *	Anything about CGI programming on non-Unix platforms
I'm not going to attempt these: I have no knowledge of either of them.
An entry on cgiwrap would certainly be of value, if anyone cares to
contribute one.   The question of other platforms probably deserves
a separate FAQ in its own right.

----------------

Subject: 0.5 Credits

Thanks to Nathan Neulinger and Maurice L. Marvin for their very helpful
comments and criticisms on the original draft.

=============================================================

Section 1.   Basic Questions

----------------

Subject: 1.1 What is CGI?

[from the CGI reference http://hoohoo.ncsa.uiuc.edu/cgi/overview.html]

The Common Gateway Interface, or CGI, is a standard for external
gateway programs to interface with information servers such as HTTP servers.
A plain HTML document that the Web daemon retrieves is static,
which means it exists in a constant state: a text file that doesn't change.
A CGI program, on the other hand, is executed in real-time, so that it
can output dynamic information.

----------------

Subject: 1.2 Is it a script or a program?

The distinction is semantic.   Traditionally, compiled executables
(binaries) are called programs, and interpreted programs are usually
called scripts.   In the context of CGI, the distinction has become
even more blurred than before.   The words are often used interchangably
(including in this document).   Current usage favours the word "scripts"
for CGI programs.

----------------

Subject: 1.3 When do I need to use CGI?

There are innumerable caveats to this answer, but basically any
Webpage containing a form will require a CGI script or program
to process the form inputs.

----------------

Subject: 1.4 Should I use CGI or JAVA?

[answer to this non-question hopes to try and reduce the noise level of
the recurrent "CGI vs JAVA" threads].

CGI and JAVA are fundamentally different, and for most applications
are NOT interchangable.   Neither are the two isomorphic: you could
in principle write a CGI program in JAVA, although it is hard to
think of an instance where this would be the best choice.

CGI is a mechanism for running programs on a WWW server.
Typical applications include accessing a database, submitting
an order, or posting messages to a bulletin board.
JAVA enables programs to run on the Client machine, and is
suited to such tasks as detailed manipulation of an image.
Alternatives to JAVA may include the X windows client/server
protocol, use of browser plugins and helper applications, and
other clientside languages such as SafeTCL and perl/penguin.

In certain instances the two may be combined in a single application:
for example a JAVA applet to define a region of interest from a
geographical map, together with a CGI script to process a query
for the area defined.

----------------

Subject: 1.5 Should I use CGI or SSI?

CGI and SSI (Server-Side Includes) are often interchangable, and it may
be no more than a matter of personal preference.   Here are a few
guidelines:
  1) CGI is a common standard agreed and supported by all major HTTPDs.
     SSI is NOT a common standard, but an innovation of NCSA's HTTPD
     which has been widely adopted in later servers.   CGI has the
     greatest portability, if this is an issue.
  2) If your requirement is sufficiently simple that it can be done
     by SSI without invoking an exec, then SSI will probably be
     more efficient.   A typical application would be to include
     sitewide 'house styles', such as toolbars, netscapeised <body>
     tags or embedded CSS stylesheets.
  3) For more complex applications - like processing a form -
     where you need to exec (run) a program in any case, CGI
     is usually the best choice.

----------------

Subject: 1.6 Should I use CGI or an API?

APIs are proprietary programming interfaces supported by particular
platforms.   By using an API, you lose all portability.   If you know
your application will only ever run on one platform (OS and HTTPD),
and it has a suitable API, go ahead and use it.   Otherwise stick to CGI.

----------------

Subject: 1.7 What do I absolutely need to know?

If you're already a programmer, CGI is extremely straightforward, and just
three resources should get you up to speed in the time it takes to read them:
  1) Installation notes for your HTTPD.   Is it configured to run CGI
     scripts, and if so how does it identify that a URL should be executed?
     (Check your manuals, READMEs, ISP webpages/FAQS, and if you still can't
     find it ask your server administrator).
  2) The CGI specification at NCSA tells you all you need to know
     to get your programs running as CGI applications.
     http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
  3) WWW Security FAQ.   This is not required to 'get it working', but
     is essential reading if you want to KEEP it working!
     http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

If you're NOT already a programmer, you'll have to learn.   If you would
find it hard to write, say, a 'grep' or 'cat' utility to run from the
commandline, then you will probably have a hard time with CGI.   Make
sure your programs work from the commandline BEFORE trying them with CGI,
so that at least one possible source of errors has been dealt with.

----------------

Subject: 1.8 Does CGI create new security risks?

Yes.   Period.
There is a lot you can do to minimise these.   The most important thing
to do is read and understand Lincoln Stein's excellent WWW security
FAQ, at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html.

----------------

Subject: 1.9 Do I need to be on Unix?

No, but it helps.   The Web, along with the Internet itself, C, Perl,
and almost every other Good Thing in the last 20 years of computing,
originated in Unix.   At the time of writing, this is still the
most mature and best-supported platform for Web applications.

----------------

Subject: 1.10 Do I have to use Perl?

No - you can use any programming language you please.   Perl is simply
today's most popular choice for CGI applications.   Some other widely-
used languages are C, TCL, BASIC and - for simple tasks - even shell scripts.

Reasons for choosing Perl include its powerful text manipulation
capabilities (in particular the 'regular' expression) and the fantastic
WWW support modules available.

----------------

Subject: 1.11 Do I have to put it in cgi-bin?

----------------

Subject: 1.12 Do I have to call it *.cgi?  *.pl?

Maybe.   It depends on your server installation.

These types of filenames are commonly used conventions - no more.
It is up to the server administrator whether or not CGI scripts are
enabled, and (if so) what conventions tell the server to run or
to print them.

If you are running your own server, read the manual.
If you're on ISP or other rented webspace, check their webpages for
information or FAQs.   As a last resort, ask the server administrator.

=============================================================

Section 2.   HTTP Headers and NPH Scripts

----------------

Subject: 2.1 What is HTTP (HyperText Transfer Protocol)?

HTTP is the protocol of the Web, by which Servers and Clients (typically
browsers) communicate.  An HTTP transaction comprises a Request sent by
the Client to the Server, and a Response returned from the Server to
the Client.
Every HTTP request and response includes a message header, describing
the message.   These are processed by the HTTPD, and may often be
mostly ignored by CGI applications (but see below).
A message body may also be included:
  1) A HEAD or GET request sends only a header.   Any form data is encoded 
     in an HTTP_QUERY_STRING header field, which is available to the CGI
     program as an environment variable QUERY_STRING.
  2) A POST request sends both header and body.   The body typically
     comprises data entered by a user in a form.
  3) A HEAD request does not expect a body in the response.
  4) A GET or POST request will accept a response with or without a body,
     according to the header.   The body of a response is typically an
     HTML document.

----------------

Subject: 2.2 What HTTP request headers can I use?

Most HTTP request headers are passed to the CGI script as environment
variables.   Some are guaranteed by the CGI spec.   Others are server,
browser and/or application dependent.

To see what _your_ browser and server are telling each other, just use
a trivial little CGI script to print out the environment.   In Unix:
	#!/bin/sh
	echo "Content-type: text/plain"
	echo
	set

This enables you to see at-a-glance what useful server variables are set.
Note that dumping the environment like this within a more complex
script can be a useful debugging technique.

For details, see the CGI Environment Variables specification at
http://hoohoo.ncsa.uiuc.edu/cgi/env.html

----------------

Subject: 2.3 What HTTP response headers do I need to know about?

Unless you are using NPH, the HTTPD will insert necessary response
headers on your behalf, always provided it is configured to do so.

However, it is conventional for servers to insert the Content-type header
based on a page's filename, and for CGI scripts it will often be absent or
wrong.  Hence the usual advice is to print an explicit Content-type header.

Some other headers you may wish to use explicitly are:
Location	(to redirect the user to another URI, which may or may
		not be on your own server)
Set-cookie	(Netscape/Nonstandard) Set a cookie
Refresh		(Netscape/Nonstandard) Clientpull

You can also use general MIME headers: eg "Keywords" for the benefit of
indexers (although in this instance some major search robots have
regrettably introduced a new protocol to do the same thing).

The 'official' list of HTTP response headers is at
http://www.w3.org/pub/WWW/Protocols/HTTP/Object_Headers.html

----------------

Subject: 2.4 What is NPH?

NPH = No Parsed Headers.   The script undertakes to print the entire
HTTP response including all necessary header fields.   The HTTPD
is thereby instructed not to parse the headers (as it would normally do)
and add any which are missing.

----------------

Subject: 2.5 Must/should/can I write nph scripts?

Generally, no.   It is usually better to save yourself hassle by letting
the HTTPD produce the headers for you.

If you are going to use NPH, be sure to read and understand the HTTP spec at
http://www.w3.org/pub/WWW/Protocols/HTTP/1.1/draft-ietf-http-v11-spec-01.html

Your headers should be complete and accurate, because you're instructing
the HTTPD not to correct them or insert what's missing.

There is one common circumstance when you must use NPH.   This is when
you need to set an HTTP response that cannot be inferred from the
rest of your header (eg 204 No Change).   See:
http://www.w3.org/pub/WWW/Protocols/HTTP/HTRESP.html

----------------

Subject: 2.6 Do I have to call it nph-*

According to NCSA's reference pages, this is the standard for telling
the server that your script is NPH, so this should be a fully portable
convention.

=============================================================

Section 3.   Techniques: "How do I..."

----------------

Subject: 3.1 Can I get information about who is visiting?

You can get some limited information from the environment variables
passed to you by the browser.   Relatively few of these are guaranteed
to be available, and some may be misleading.   For particular types
of information, see below.   For full details, see NCSA's reference pages.

----------------

Subject: 3.2 Can I get the email of visitors?

Why do you want to do this?

The best information available is the REMOTE_ADDR and REMOTE_HOST,
which tell you nothing about the user.   Techniques such as "finger@"
are not reliable, are widely disliked, and generally serve only to
introduce long delays in your CGI.   Better - as well as more polite -
just to ask your users to fill in a form.

----------------

Subject: 3.3 	"But I saw some.kool.site display my email address..."

Some sites will play party tricks, which can get *some users* email
addresses.   Possible tell-tale signs of this are inordinate delays
loading a page (fingering @REMOTE_HOST - doesn't often work but
probably can't be detected from the webpage), or a submit button that
appears to do nothing at all (a mailto: link - works quite well but
trivially detectable).   As a "snoop" party trick that's fine, but
if you find someone abusing these facilities (eg they send you
junkmail), alert their service provider!

----------------

Subject: 3.4 Can I get browser details and return different pages?

Why do you want to do this?

Well-written HTML will display correctly in any browser, so the correct
answer to this question is to design a template for your output in good
HTML, and make sure your output is correct.

If you insist on a different answer, you can use the HTTP_USER_AGENT
environment variable.  This requires care, and can lead to unexpected
results.   For example, checking for "Mozilla" and serving a frameset
to it ensures that you *also* serve the frameset to early (Non-Frame)
Netscapes, me-too browsers (notably MicroSoft) and others who have
chosen to lie to you about their browser.

Note also that not every User Agent is a browser.   Your page may be
read by a user agent you've never heard of, and then displayed by
100 different browsers.   Or retrieved by different browsers from
a cache.   Another reason to write good HTML, and not try to
devise a clever or koool substitute.

----------------

Subject: 3.5 Can I trace where a user has come from/is going to?

HTTP_REFERER might or might not tell you anything.   By all means
use it to collect partial statistics if you participate in (say)
an advertising banner scheme.   But it is not always set, and may
be meaningless (eg if a user has accessed your page from a bookmark).

You cannot trace outgoing links at all.   If you really must try,
point all the external links to your HTTPD and use its redirection
facility (which gives you generally-reliable logs).   This is much
less inefficient than using a CGI script.

BTW: don't even think about asking Javascript to send you information
on some event: it's a violation of privacy which Netscape fixed as
soon as complaints about its abuse started coming in.

----------------

Subject: 3.6 Can I launch a long process and return a page before it's finished?

[UNIX]
You have to fork/spawn the long-running process.
The important thing to remember is to close all its file descriptors;
otherwise nothing will be returned to the browser until it's finished.
The standard trick to accomplish this is redirection to/from /dev/null:

        exec ("long_process < /dev/null > /dev/null 2>&1 &")
        print HTML page as usual

----------------

Subject: 3.7 Can I launch a long process which the user interacts with?

This does not fit well with the basic mechanics of the Web, in which
each transaction comprises a single request and response.
If your processing can be done on the Client machine, you can use
a clientside application; for example a Java applet.

For processing on the server, one trick that works well for Clients
running an X server (and far, far more efficient than a JAVA solution) is:
  if ( fork() ) {
    print HTML page explaining what's going on and advising about xhost
  } else {
    exec ("xterm -display THEIR_DISPLAY -title MY_APP -e MY_PROG ARGS
        < /dev/null > /dev/null 2>&1 &") ;
  }
NOTE: THEIR_DISPLAY is not necessarily the same as REMOTE_HOST or REMOTE_ADDR.
You have to ask users to supply their display (set REMOTE_HOST as default).

----------------

Subject: 3.8 Can I password-protect my pages?

Yes.   Use your HTTPD's authentication, just as you would a basic
HTML page.   Alternatively, you can DIY by printing a "401" header
if the browser hasn't sent an (acceptable) REMOTE_USER.

----------------

Subject: 3.9 Can I identify users/sessions without password protection?

The most usual (but browser-dependent) way to do this is to set a cookie.
If you do this, you are accepting that not all users will have a 'session'.

An alternative is to pass a session ID in every GET URL, and in hidden
fields of POST requests.   This can be a big overhead unless _every_ page
requires CGI in any case.

Another alternative is the Hyper-G solution of encoding a session-id in
the URLs of pages returned:
	http://hyper-g.server/session_id/real/path/to/page
This has the drawback of making the URLs very confusing, and causes any
bookmarked pages to generate old session_ids.

Note that a session ID based solely on REMOTE_HOST (or REMOTE_ADDR)
will NOT work, as multiple users may access your pages concurrently
from the same machine.

----------------

Subject: 3.10 Can I redirect users to another page?

For permanent and simple redirection, use the HTTPD configuration file:
it's much more efficient than doing it yourself.   Some servers enable
you to do this using a file in your own directory (eg Apache) whereas
others use a single configuration file (eg CERN).

For more complicated cases (eg process form inputs and conditionally
redirect the user), use the "Location:" response header.
If the redirection is itself a CGI script,  it is easy to URLencode
parameters to it in a GET request, but don't forget to escape the URL!

----------------

Subject: 3.11 Can I run a CGI script without returning a new page to the browser?

Yes, but think carefully first:  How are your readers going to know
that their "submit" has succeeded?   They may hit 'submit' many times!

You need an NPH script printing the 204 status code:

	#!/bin/sh
	# do processing (or launch it as background job)
	echo "HTTP/1.0 204 No Change"
	echo
(if it's not NPH, the HTTPD will want to insert a "200" on your behalf).

One instance where this is used is for inactive areas of an imagemap.

----------------

Subject: 3.12 Can I write output to a different Netscape frame?

Yep.   The fact you're using CGI makes no difference: use
"target=" in your links as usual.   Alternatively, the script
can print a "Window-target:" header.   Read Netscape's pages
for detail: these answer all the questions about things like
"getting rid of" or "breaking out of" frames, too.

----------------

Subject: 3.13 Can I write output to several frames at once?

A single CGI script can only ever print to one frame.

However, this limitation may be overcome by using more than one script.
The first script (the URL of the "submit" button) prints a frameset,
typically to a "_parent" or "_top" target.   The sources for one or
more of the frames thus generated may also be CGI scripts, to which
you can easily pass parameters (eg encoded in URLs with method GET).
This hack is definitely not recommended.   If you find yourself wanting
to update several frames from a single user event, it probably means
you should review the design of your application at a higher level.

Warnings:
 1. Don't forget to escape your URLs.
 2. This technique results in your server being hit by multiple 
    concurrent CGI requests.   You'll need LOTS of memory, especially
    if you use a memory-hog like Perl.   It can be a good recipe
    for bringing a server to its knees.

Javascript is often a valid alternative here, but note just how silly
it can (and often does) look in a different browser.

----------------

Subject: 3.14 Can I use a CGI script to generate both text and inline images?

Not directly.   One script generates one response to one request.

If you want to generate a dynamic page including dynamic images
(say, a report including graphs, all of which depend on user input)
then your primary script will print the usual
   <img src="[script-to-generate-image]" alt="[what you asked for]">
and, just as in the multiple frames case, you can pass data to the
image-generating program encoded in a GET URL.   Of course, the same
caveats apply: see above.

=============================================================

Section 4.   Applications: Is there an existing script to ...

[note: if you ask one of these questions on a newsgroup, you may
get a lot of people pushing their own system at you].

There are a lot of applications available.   For all the tasks
listed here, there are free systems you can download and install
yourself (at least if you're on UNIX).   Many are excellent.

Before ever *buying* software, do a Net search on what you want and
check what freeware is available.   Does the commercial system you
had in mind *really* have any advantages?   If you can't follow
the jargon they use to explain the merits of their system, insist
on some clarification (hey, that's not just for Web software :-)

Most questions under this heading are probably best answered by
reference to appropriate review sites on the Web (in many cases,
Thomas Boutell's WWW FAQ).   In cases where I know of one or more
good sites, I've referenced them.
I'll add others as and when I find them.   Or maybe not: these questions
are mostly well-covered in the other FAQs.

----------------

Subject: 4.1 Where to look for free scripts for my application?

Some popular places to look for a wide range of free CGI applications are:

Selena Sol's Public Domain CGI Scripts
http://www2.eff.org/~erict/Scripts/scripts.html

Matt Wright's Script Archive
http://www.worldwidemart.com/scripts/

Dale Bewley has a much longer list of script archives
(along with his own scripts) at
http://www.engr.iupui.edu/~dbewley/perl/

----------------

Subject: 4.2 Discussion group/bulletin board

David R Woolley maintains a list of currently around 100 systems at
http://freenet.msp.mn.us/~drwool/webconf.html

----------------

Subject: 4.3 CSCW/Groupware

There are several overview sites for this.   A few are:

The CSCW Yellow Pages, at
http://www11.informatik.tu-muenchen.de/cscw/yp/YP-index-type.html

NCSA Web Collaboration pages, at
http://union.ncsa.uiuc.edu/HyperNews/get/www/collaboration.html

----------------

Subject: 4.4 Database

This subject deserves its own FAQ.   When someone recently asked about one,
Matthew.Healy@yale.edu (Matthew D. Healy) posted this answer (slightly chopped)

> : Is there a CGI and Database FAQ available?
> : If so, could someone tell me where can I get it?
> 
> Dunno about a FAQ on that.  I can recommend a couple of published
> works, however:
> 
> 1. I wrote a chapter about CGI/Database work for the book
> {Special Edition Using CGI}.  Fulltext is online at the
> publisher's WWW site:
> 
> http://www.mcp.com/que/et/se_cgi/  The book
> http://www.mcp.com/que/et/se_cgi/Cgi13fi.htm  My chapter on WWW/DBMS
> 
> 2. Jeff Rowe wrote an excellent book, {Building Internet Database
> Servers With CGI}.  URL for more info:
> 
> http://cscsun1.larc.nasa.gov/~beowulf/db/existing_products.html
> 
> Jeff's WWW site has scads of useful information on WWW/DBMS programming,
> and pointers to lots more sites.

=============================================================

Section 5.   Troubleshooting a CGI application

Sorry, not covered here.

This is covered in several other documents:

Tom Christiansen's "Idiot's guide to solving Perl/CGI problems" is
a slightly tongue-in-cheek list of common problems, and how to track
them down.   Much of what Tom covers is not specifically Perl, but
applies equally to CGI programming in other languages.

Marc Hedlund's CGI FAQ and Thomas Boutell's WWW FAQ also deal with
this subject.

See "Further Reading" below (if you don't already know where to find
these documents).

=============================================================

Section 6.   Further Reading

----------------

Subject: 6.1 Other FAQs/collections

****	Lincoln Stein's FAQ is probably the most	****
****	important WWW document you will ever read.	****

For general WWW issues, the World Wide Web FAQ by Thomas Boutell
http://www.boutell.com/faq/

Another CGI FAQ, by Marc Hedlund
http://www.best.com/~hedlund/cgi-faq/

Perl/CGI programming FAQ, by Shishir Gundavaram and Tom Christiansen
http://www.perl.com/perl/faq/perl-cgi-faq.html

The Idiot's Guide to solving Perl/CGI problems by Tom Christiansen
http://www.perl.com/perl/faq/idiots-guide.html

The WWW Security FAQ by Lincoln Stein
http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

The WWW Virtual Library
http://WWW.Stars.com/Vlib/

----------------

Subject: 6.2 Reference Pages

The Common Gateway Interface (CGI)
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

HyperText Transfer Protocol (HTTP)
http://www.w3.org/pub/WWW/Protocols/HTTP/1.1/draft-ietf-http-v11-spec-01.html

HyperText Markup Language (HTML)
http://www.w3.org/pub/WWW/MarkUp/
From John's Useful Posting Archive (JUPA)
Maintained by John Callender
John's Home Page
Archive created with babymail