CSCI A348/548
Lecture Notes Eight

Fall 2000


Review. Definition of CGI. GET and POST. ReadParse, a feedback form, and more Perl.

I started with a review. A variable in Perl is written like this:

$x
The name is x the dollar sign indicates it's a scalar (has no dimension). A variable is just a location that is accessible by name. Not all data structures are that simple.

You can have lists, sequences of locations, indexed by their position in the sequence. If the name of the list is x I can refer to the entire list as follows:

@x
The list could be empty or could have one or more elements in it. Let's say $i is a variable that stores an integer, then
$x[$i]
means the element with index $i in the list. Remember that the first element in the list has index 0 (zero) while the last element in the list @x can be accessed as $x[$#x].

We discussed assignment statements. The symbol for assignment is

=
and it splits the assignment into two parts.

On the right hand side of the assignment we have expressions and values. On the left hand side we have locations. So in

$i = $i + 1;
the variable $i is used for its value on the right, and for its location (or address) on the right. (The result, of course, is that $i is incremented by one).

It's the same with elements of lists, since they're also variables. Their names are a bit more complex, since they are constructed from the name of the list and the index in the list and we need to use the brackets, but other than that they're names just as much as the names we're used to (the so called identifiers).

So

$x[$#x] = $x[$#x] + 1; 
will increment the value of the last element in the list @x by one.

Hash tables (or hashes, or association lists) are just like lists, but indices are not numbers, instead strings of characters are being used to index the values stored.

The indices must be unique and they are called keys.

To refer to a hash table as a whole we use

%x
and to get the individual elements we index using a $key.

If $key contains a string, and if %x is a hashtable then if there is anythign associated with the value of $key in %x it can be retrieved or indicated with

$x{$key}
while if there is no association we will either obtain an undefined value for it or obtain the ability to store one for this key, depending on where this expression appears with respect to the = (assignment operator).

Here's an example. Assume %x is empty to start with. Then

$x{"jordan"} = "bulls"; 
builds a first association.

$x{"miller"} = "pacers"; 
builds another, while
$x{"jordan"} = "hornets"; 
will change the value previously associated with jordan.

In general you can obtain the list of all keys in a hash table this way:

@theKeys = keys %x;
where %x is the hashtable.

Then you can use a foreach to go over all of them, for whatever processing purposes you may have in mind:

foreach $e (@theKeys) {
  $x{$e} = $x{e} . " (nba)"; 
} 
The code above will add (nba) to each one of the values stored in the hashtable (since the . (dot) operator is used for concatenation in Perl).

So if you print $x{"miller"} now it would read pacers (nba). That was the first part of the Perl review we needed.

We then said that we have been able to write a script (called hello) which we placed in cgi-bin and whose output was the same as when we accessed the hello.html file on the web.

hello.html was in htdocs.

hello was in your script directory.

The difference between them was that the script was entirely responsible for the output and so it had to start it with its MIME type:

"Content-type: text/html\n\n"
was the first thing that the script was supposed to write. Note the two newline characters, an empty line is required after the MIME type.

We took the script and changed the output a little, to make it display an image.

Then we thought whether we could make it display something new every time. And we introduced a bit of randomness in it, such that the output was changed from time to time. This way most of the times, most likely, the output changes.

To implement the change in output we created a list of names of images. Then every time the script is called a random number that represents an index in the list of names of images will be produced and the image with that index will appear in the output.

That's an improvement, the output is changing, but it's not that predictable. Is there any way to make the user participate, and maybe choose the output? (That in fact is one of the problems for assignment 2). Can the user then talk to the script (instead of just starting it?).

We said the answer was "yes" and to explain that we introduced a short script by the name of printenv. Each one of our servers had this script in their cgi-bin directories after installation. It looked like this:

#!/usr/bin/perl

print "Content-type: text/html\n\n<html><body><pre>"; 

foreach $elem (keys %ENV) {
  print $elem, " --> ", $ENV{$elem}, "\n"; 
} 

print "</pre></body></html>"; 
The hash %ENV is built by the system. Browser, server, host operating system contribute to it. The info is passed to the script. One of the keys in this hash table is called QUERY_STRING. If we put a ? (question mark) after the name of the program (when we invoke its URL) the string that follows, up to the first blank space, will be placed in
$ENV{"QUERY_STRING"}
We also noted that there was an entry in %ENV for REQUEST_METHOD. The value associated was GET.

Next we talked a bit about forms, and we even created a very simple one, that looked like this:

<form method="GET" action="/cgi-bin/printenv"> 
<input type=text name=fieldOne> <p> 
<input type=text name=fieldOne> <p> 
<input type=text name=fieldOne> <p> 
<input type=submit> <p> </form>
Using this form we were able to call our script, and even pass spaces to it. But we noticed a conversion process. It was happening with other characters too, such as slashes (/). So we decided to clarify what that meant.

We started the lecture (after this review) by saying that CGI is, in fact, the transfer of information from the browser, through the server, to the script. And that the transfer can be done in two ways, that are identified by the keywords GET and POST.

And that regardless of the method, the transfer always involves the encoding of special characters in a particular way. It was the purpose of this lecture to clarify the encoding scheme as well as how one can retrieve the information in the script.

The encoding involves turning special characters into hexadecimal codes. To retrieve them you need to know the encoding scheme, and to use substitutions, as described in the lecture the day before.

The scheme is that every encoded character is turned into % followed by the two hexadecimal characters that make up the ASCII code of the character.

An example: A has ASCII code 6510.

In base 16 this is: 4116.

We discussed how we compute the base 10 equivalent of a number in base 16 and that we have 16 symbols that we could use to write numbers in this base: 0-9, and a-f.

There are 256 character codes, so two hexadecimal digits would be enough to represent them all (from 0 all the way up to ff16 which is 25510). If the user has a form that specifies GET as the transmission mode, then all the data will be put together in one long string, encoded as described above, and placed such that the script will find it in $ENV{"QUERY_STRING"}.

To decode it one would do the following:

$input = $ENV{"QUERY_STRING"};
$input =~ s/%(..)/chr(hex($1))/ge; 
I think in class we used pack("c", hex($1)) but using chr should be simpler.

In both cases a character is produced out of an ASCII code.

If the method is POST then the info no longer comes through the QUERY_STRING and instead the script is receiving it through a channel that it identifies as its standard input (STDIN). So the read process will be somewhat different:

read(STDIN, $input, $ENV{"CONTENT_LENGTH"}); 
We read from the standard input, into a buffer called $input and we need to specify how many characters we want to read. Fortunately this number is available to us in the %ENV hash table, associated with the CONTENT_LENGTH key.

So now we can write a script that can read info either way it comes:

#!/usr/bin/perl

&printHeader;

if    ($ENV{REQUEST_METHOD} eq 'GET' ) { 
  print "Called with GET." ; 
} elsif ($ENV{REQUEST_METHOD} eq 'POST') { 
  print "Called with POST."; 
} else {
  print "Method not supported.\n"; 
} 

&printTrailer; 

sub printHeader { print "Content-type: text/html\n\n<html><body>"; } 

sub printTrailer { print "</body></html>"; }
Our next step was to print a form when called for the first time (with GET), and then to print all the contents of all the fields back in reply to a subsequent POST call. So we tried something like this:

#!/usr/bin/perl
       
&printHeader;
       
if    ($ENV{"REQUEST_METHOD"} eq 'GET' ) { 
  $me = $ENV{"SCRIPT_NAME"}; 
  print qq{ 
    <form method=POST action=$me> 
    Please write your thoughts below: <p> 
    <textarea name="thoughts" rows=5 cols=60></textarea> <p> 
    Also please write your e-mail address here: <input type="text" name="email"> <p>     
    <input type="submit"> 
    </form> 
  };  
} elsif ($ENV{REQUEST_METHOD} eq 'POST') { 
  print "Called with POST.";
} else {
  print "Method not supported.\n"; 
} 
       
&printTrailer; 
       
sub printHeader  { print "Content-type: text/html\n\n<html><body>"; } 

sub printTrailer { print "</body></html>"; }
The next step is a significant leap: we want to read the data and print it back.

#!/usr/bin/perl
       
&printHeader;

&readParse; 
       
if    ($ENV{"REQUEST_METHOD"} eq 'GET' ) { 
  $me = $ENV{"SCRIPT_NAME"}; 
  print qq{ 
    <form method=POST action=$me> 
    Please write your thoughts below: <p> 
    <textarea name="thoughts" rows=5 cols=60></textarea> 
    <p> Also please write your e-mail address here: 
    <input type="text" name="email"> <p>     
    <input type="submit"> 
    </form> 
  };  
} elsif ($ENV{"REQUEST_METHOD"} eq 'POST') { 
  print "Called with POST.<pre>";
  foreach $k (keys %in) {
      print $k, " --> ", $in{$k}, "<br>"; 
  } 
} else {
  print "Method not supported.\n"; 
} 
       
&printTrailer; 
       
sub printHeader  { print "Content-type: text/html\n\n<html><body>"; } 

sub printTrailer { print "</body></html>"; }

sub readParse {
    if      ($ENV{"REQUEST_METHOD"} eq 'GET' ) {
	$input = $ENV{"QUERY_STRING"}; 
    } elsif ($ENV{"REQUEST_METHOD"} eq 'POST') {
	read (STDIN, $input, $ENV{"CONTENT_LENGTH"}); 
    } else {
	print "Unsupported method."; 
	&printTrailer; 
	exit; 
    } 

    @input = split(/\&/, $input); 
    foreach $elem (@input) {
	$elem =~ s/%(..)/pack("c", hex($1))/ge;
	$elem =~ s/\+/ /g; 
	($key, $value) = split(/\=/, $elem); 
	$in{$key} = $value; 
    } 
} 
In class we really didn't try this, but we wrote it on the board.

In class we actually did the following enhancement:

#!/usr/bin/perl
       
&printHeader;

&readParse; 
       
if    ($ENV{"REQUEST_METHOD"} eq 'GET' ) { 
  $me = $ENV{"SCRIPT_NAME"}; 
  print qq{ 
    <form method=POST action=$me> 
    Please write your thoughts below: <p> 
    <textarea name="thoughts" rows=5 cols=60></textarea> 
    <p> Also please write your e-mail address here: 
    <input type="text" name="email"> <p>     
    <input type="submit"> 
    </form> 
  };  
} elsif ($ENV{"REQUEST_METHOD"} eq 'POST') { 
  print "Called with POST.<pre>";
  open (MAIL, "| mailx dgerman\@indiana.edu"); 
  foreach $k (keys %in) {
      print MAIL $k, " --> ", $in{$k}, "<br>"; 
  } 
  close(MAIL); 
} else {
  print "Method not supported.\n"; 
} 
       
&printTrailer; 
       
sub printHeader  { print "Content-type: text/html\n\n<html><body>"; } 

sub printTrailer { print "</body></html>"; }

sub readParse {
    if      ($ENV{"REQUEST_METHOD"} eq 'GET' ) {
	$input = $ENV{"QUERY_STRING"}; 
    } elsif ($ENV{"REQUEST_METHOD"} eq 'POST') {
	read (STDIN, $input, $ENV{"CONTENT_LENGTH"}); 
    } else {
	print "Unsupported method."; 
	&printTrailer; 
	exit; 
    } 

    @input = split(/\&/, $input); 
    foreach $elem (@input) {
	$elem =~ s/%(..)/pack("c", hex($1))/ge;
	$elem =~ s/\+/ /g; 
	($key, $value) = split(/\=/, $elem); 
	$in{$key} = $value; 
    } 
} 
Next time we will explore a bit more the nature of CGI scripts.


Last updated on September 24, 2000, by Adrian German for A348/A548.