Spring Semester 2002


Building a CGI Processor (Part I)

Here's how HTTP works.

tucotuco.cs.indiana.edu% pwd
/nfs/paca/home/user1/dgerman
tucotuco.cs.indiana.edu% telnet burrowww.cs.indiana.edu 36000
Trying 129.79.245.98...
Connected to burrowww.cs.indiana.edu.
Escape character is '^]'.
GET /index.html HTTP1.1

HTTP/1.1 200 OK
Date: Tue, 29 Jan 2002 01:19:20 GMT
Server: Apache/1.3.22 (Unix)
Last-Modified: Thu, 17 Jan 2002 23:46:23 GMT
ETag: "4aab1e-1ca-3c47624f"
Accept-Ranges: bytes
Content-Length: 458
Connection: close
Content-Type: text/html

<html><head><title></title></head><body bgcolor=white>

Here's what you need for homework assignment one: Include a clear 
picture of yourself on your page and write your name at the top, or 
somwhere in the page, like this. This is Adrian's server. <p> 

Here's a picture of me with my daughter Celina, in Indianapolis, last summer. 

<p> 

<img src="http://www.cs.indiana.edu/classes/a348/spr2002/labs/pava.jpg">

<p>

<img src="pic.jpg"> 

</body></html>
Connection closed by foreign host.
tucotuco.cs.indiana.edu% 
Let's make a few comments:

We'll come back to this a bit later too.

Let's now review some of the Perl needed for part two of assignment two.

How would you write this?

I'll let you think about it.

Let me build two Perl programs and you tell me which is closer to the one above.

Version One:

#!/usr/bin/perl

print "Calc> "; 

$line = <STDIN>; 
$count = 0; 

while (! ($line =~ /^bye$/i)) {
  if ($line =~ /^add/i) {
    $count += 1; 
    print "Your call has number: ", $count, "\n"; 
  }
  print "Calc> "; 
  $line = <STDIN>;    
}
Here's how it works:

burrowww.cs.indiana.edu% ./one
Calc> add
Your call has number: 1
Calc> add
Your call has number: 2
Calc> add
Your call has number: 3
Calc> add
Your call has number: 4
Calc> bye
burrowww.cs.indiana.edu% 
Version Two:
#!/usr/bin/perl

$param = $ARGV[0]; $me = $0; 

($name, $value) = split(/=/, $param); 

if ($name eq "arg") {
  $arg = $value + 1; 
  print "Your call has number: ", $arg, "\n"; 
  print "Note: please call with ($me arg=", $arg, ") next time.\n"; 
}
Here's how this runs.

burrowww.cs.indiana.edu% ./two
burrowww.cs.indiana.edu% ./two arg=0
Your call has number: 1
Note: please call with (./two arg=1) next time.
burrowww.cs.indiana.edu% ./two arg=1
Your call has number: 2
Note: please call with (./two arg=2) next time.
burrowww.cs.indiana.edu% ./two arg=2
Your call has number: 3
Note: please call with (./two arg=3) next time.
burrowww.cs.indiana.edu%
Well, which one is closer?

Actually, the second one.

The reason is: HTTP is connectionless.

That is, it has no recollection of who you are.

Here's how you might start the script discussed above.

#!/usr/bin/perl

&printTop;

$me = $ENV{SCRIPT_NAME}; 

$value = $ENV{QUERY_STRING}; 

$num = $value + 1; ; 

print qq{ 

  Your call has number: <font size=+5>$num</font> <p> 

  <a href="$me?$num">Click here</a> for more. <p> 

};

&printBottom; 

sub printTop {
  print "Content-type: text/html\n\n"; 
  print "<html><head><title>Some title</title>", 
        "</head><body bgcolor=\"white\">";
}

sub printBottom {
  print "</body></html>"; 
}

Here's my version of it.

Next we will discuss the various form elements, for example:

To display: Use: Attributes:
A form <form>
... HTML form info
</form>
method
action
enctype
Single-line text field
<input type=text>
name
value
maxlength
size
Single-line password field
<input type=password>
name
value
maxlength
size
Multiple-line text area
<textarea></textarea>
name
cols
rows
wrap
Checkbox
<input type=checkbox>
name
value
checked
Radio buttons
<input type=radio>
name
value
checked
List of choices <select>
items in list...
</select>
name
multiple
size
Items in a <select> list <option>
value
selected
Clickable image
<input type=image>
name
align
src
File upload
<input type=file>
name
accept
Hidden field
<input type=hidden>
name
value
Reset button
<input type=reset>
value
Submit button
<input type=submit>
name
value

Then we will finally look at pattern matching.

We're using the =~ operator, together with the letter s on its right hand side, followed by a slash delimited pattern (that is to be matched), and a string. When the pattern matches, the string that follows the second slash will replace it.

There are several rules and exceptions and we will summarize those that we care for here, while the examples can be found in the lecture notes of last Thursday.

A few other useful things: We now want to build a generic CGI processor.

We also need to come up with a definition of CGI.

For this purpose let's again review what we have done so far in terms of CGI.

  1. We started with a hello.html in Lab Two, placed in htdocs.

  2. We then said that we have been able to write a script (called hello) which we placed in cgi-bin and whose output was the same as when we accessed the hello.html file on the web. hello.html was in htdocs. hello was in your script (cgi-bin) directory.

  3. The difference between them was that the script was entirely responsible for the output and so it had to start it with its MIME type:
    "Content-type: text/html\n\n"
    was the first thing that the script was supposed to write. Note the two newline characters, an empty line is required after the MIME type. We took the script and changed the output a little, to make it display an image.

  4. Then we thought whether we could make it display something new every time. And we introduced a bit of randomness in it, such that the output was changed from time to time. This way most of the times, most likely, the output changes.

    To implement the change in output we created a list of names of images. Then every time the script is called a random number that represents an index in the list of names of images will be produced and the image with that index will appear in the output.

  5. That's an improvement, the output is changing, but it's not that predictable. Is there any way to make the user participate, and maybe choose the output? Can the user then talk to the script (instead of just starting it?).

    We said the answer was "yes" and to explain that we introduced a short script by the name of printenv. Each one of our servers had this script in their cgi-bin directories after installation. It looked like this:

    #!/usr/bin/perl
    
    print "Content-type: text/html\n\n<html><body><pre>"; 
    
    foreach $elem (keys %ENV) {
      print $elem, " --> ", $ENV{$elem}, "\n"; 
    } 
    
    print "</pre></body></html>"; 
  6. The hash %ENV is built by the system. Browser, server, host operating system contribute to it. The info is passed to the script. One of the keys in this hash table is called QUERY_STRING. If we put a ? (question mark) after the name of the program (when we invoke its URL) the string that follows, up to the first blank space, will be placed in
    $ENV{"QUERY_STRING"}
    We also noted that there was an entry in %ENV for REQUEST_METHOD. The value associated with $ENV{REQUEST_METHOD} was GET (please confirm that through your own experiments).

    OK, that was the review.

  7. Now we need to talk a bit about forms, and we create a very simple one, that looks like this:

    <form method="GET" action="/cgi-bin/printenv"> 
    <input type=text name=fieldOne> <p> 
    <input type=text name=fieldOne> <p> 
    <input type=text name=fieldOne> <p> 
    <input type=submit> <p> </form>

    Using this form we should be able to call our script, and even pass spaces to it.

  8. But we notice a conversion process.

  9. It is happening with other characters too, such as slashes (/).

  10. So we decide to clarify what this means.

    CGI is, in fact, the transfer of information

    And the transfer can be done in two ways, that are identified by the keywords

  11. Regardless of the method (be it GET or POST) the transfer always involves the encoding of special characters in a particular way. It is the purpose of this lecture to clarify the encoding scheme as well as how one can access that information (that is passed to the script) inside the script.

  12. The encoding involves turning special characters into hexadecimal codes. To retrieve them you need to know the encoding scheme, and to use substitutions.

  13. The scheme is that every encoded character is turned into % followed by the two hexadecimal characters that make up the ASCII code of the character.

    An example: A has ASCII code 6510.

    In base 16 this is: 4116.

  14. We discussed how we compute the base 10 equivalent of a number in base 16 and that we have 16 symbols that we could use to write numbers in this base: 0-9, and a-f.

    There are 256 character codes, so two hexadecimal digits would be enough to represent them all (from 0 all the way up to ff16 which is 25510).

  15. If the user has a form that specifies GET as the transmission mode, then all the data will be put together in one long string, encoded as described above, and placed such that the script will find it in $ENV{"QUERY_STRING"}.

  16. To decode it one would do the following:
    $input = $ENV{"QUERY_STRING"};
    $input =~ s/%(..)/chr(hex($1))/ge; 
    Now, this second line will have to be clarified, but this is not as hard as it may appear.

  17. And that's because we have already explained it (only in stages).

  18. If the method is POST then the info no longer comes through the QUERY_STRING and instead the script is receiving it through a channel that it identifies as its standard input (STDIN). So the read process will be somewhat different:
    read(STDIN, $input, $ENV{"CONTENT_LENGTH"}); 
  19. We read from the standard input, into a buffer called $input and we need to specify how many characters we want to read. Fortunately this number is available to us in the %ENV hash table, associated with the CONTENT_LENGTH key.

  20. So now we can write a script that can read info (and that regardless of how the info comes):

  21. We start from:
    #!/usr/bin/perl
    
    &printHeader;
    
    if    ($ENV{REQUEST_METHOD} eq 'GET' ) { 
      print "Called with GET." ; 
    } elsif ($ENV{REQUEST_METHOD} eq 'POST') { 
      print "Called with POST."; 
    } else {
      print "Method not supported.\n"; 
    } 
    
    &printTrailer; 
    
    sub printHeader { print "Content-type: text/html\n\n<html><body>"; } 
    
    sub printTrailer { print "</body></html>"; }
  22. Our next step is to print a form when called for the first time (with GET), and to print the contents of all the fields in reply to any subsequent POST call.

  23. So we should try something like this:

    #!/usr/bin/perl
           
    &printHeader;
           
    if    ($ENV{"REQUEST_METHOD"} eq 'GET' ) { 
      $me = $ENV{"SCRIPT_NAME"}; 
      print qq{ 
        <form method=POST action=$me> 
        Please write your thoughts below: <p> 
        <textarea name="thoughts" rows=5 cols=60></textarea> <p> 
        Also please write your e-mail address here: 
           <input type="text" name="email"> <p>     
        <input type="submit"> 
        </form> 
      };  
    } elsif ($ENV{REQUEST_METHOD} eq 'POST') { 
      print "Called with POST.";
    } else {
      print "Method not supported.\n"; 
    } 
           
    &printTrailer; 
           
    sub printHeader  { print "Content-type: text/html\n\n<html><body>"; } 
    
    sub printTrailer { print "</body></html>"; }
  24. The next step is a significant leap: we want to read the data and print it back.

    #!/usr/bin/perl
           
    &printHeader;
    
    &readParse; 
           
    if    ($ENV{"REQUEST_METHOD"} eq 'GET' ) { 
      $me = $ENV{"SCRIPT_NAME"}; 
      print qq{ 
        <form method=POST action=$me> 
        Please write your thoughts below: <p> 
        <textarea name="thoughts" rows=5 cols=60></textarea> 
        <p> Also please write your e-mail address here: 
        <input type="text" name="email"> <p>     
        <input type="submit"> 
        </form> 
      };  
    } elsif ($ENV{"REQUEST_METHOD"} eq 'POST') { 
      print "Called with POST.<pre>";
      foreach $k (keys %in) {
          print $k, " --> ", $in{$k}, "<br>"; 
      } 
    } else {
      print "Method not supported.\n"; 
    } 
           
    &printTrailer; 
           
    sub printHeader  { print "Content-type: text/html\n\n<html><body>"; } 
    
    sub printTrailer { print "</body></html>"; }
    
    sub readParse {
        if      ($ENV{"REQUEST_METHOD"} eq 'GET' ) {
            $input = $ENV{"QUERY_STRING"}; 
        } elsif ($ENV{"REQUEST_METHOD"} eq 'POST') {
            read (STDIN, $input, $ENV{"CONTENT_LENGTH"}); 
        } else {
            print "Unsupported method."; 
            &printTrailer; 
            exit; 
        } 
    
        @input = split(/\&/, $input); 
        foreach $elem (@input) {
            $elem =~ s/%(..)/chr(hex($1))/ge;
            $elem =~ s/\+/ /g; 
            ($key, $value) = split(/\=/, $elem); 
            $in{$key} = $value; 
        } 
    } 
    In class we need to explain this very thoroughly.

  25. We have in fact discussed some of it last time so it shouldn't be too hard.


Last updated: Jan 29, 2002 by Adrian German for A348/A548