|
CSCI A348/548
|
Relevant chapters:
![]()
Chapter 1: What Is a Web Server? (pp. 1-39) Chapter 6: Log Files (pp. 193-221) ![]()
Chapter 17: HTTP (pp. 379-411) Chapter 1: Introduction (pp. 1-9) Chapter 2: HTML Overview (pp. 9-16) ![]()
Chapter 1: Apache and the Internet (pp. 1-33) Chapter 2: Getting Started with Apache (p. 35-63)
These readings are not mandatory, but skimming through them is highly informative.
At the end of this week you will understand that:
Content-type: header which identifies
the Media Type of the script output. The header must be followed
by a blank line.
We will go over the Apache directory structure.
Then we will review a bit of Perl. Here are the highlights.
Use pico, emacs or another editor and create a file called one with the following contents:
This is a complete Perl program, but can you run it already?#!/usr/bin/perl print "Hello, world!";
Make the file executable and run it:
Change the program as indicated below:tucotuco.cs.indiana.edu% ./one Hello, world!tucotuco.cs.indiana.edu%
And run it again#!/usr/bin/perl print "Hello, world!\n";
and now we are ready to start our Perl tutorial.tucotuco.cs.indiana.edu% ./one Hello, world! tucotuco.cs.indiana.edu%
The first line in any Perl script must start with a hash sign
(#) followed by an exclamation sign (!)
and then by the absolute address of the perl interpreter
(which perl).
Perl is located in /usr/bin/perl on
burrow and on most other Unix systems.
We further provide an introduction to Perl through a set of examples.
The empty program is a valid program.
tucotuco.cs.indiana.edu% vi empty tucotuco.cs.indiana.edu% cat empty #!/usr/local/bin/perl tucotuco.cs.indiana.edu% chmod +x empty tucotuco.cs.indiana.edu% ./empty tucotuco.cs.indiana.edu%
Scalar variables are identified by a special prefix,
the dollar sign. If we wanted to write
a program that computes the value of
3 + 5 and stores that in
a variable with the name of x it could look like this:
Of course this program does not communicate much.tucotuco.cs.indiana.edu% vi two tucotuco.cs.indiana.edu% cat two #!/usr/local/bin/perl $x = 3 + 5; tucotuco.cs.indiana.edu% chmod +x two tucotuco.cs.indiana.edu% ./two tucotuco.cs.indiana.edu%
To print the value of x we use the print command.
tucotuco.cs.indiana.edu% vi two tucotuco.cs.indiana.edu% cat two #!/usr/local/bin/perl $x = 3 + 5; print $x; tucotuco.cs.indiana.edu% ./two 8tucotuco.cs.indiana.edu%
print takes a list of arguments, separated by commas, evaluates
them, and then prints the results to the screen in the order in which they
appear as arguments. $x evaluates to an integer, gets printed,
and then the program terminates. Control returns to the
operating system which prompts the user for more input. Other things that we could print are characters and strings. Let's look at strings first. We could start by saying that strings are sequences of characters that appear in between double quotes. Thus
is a string, and so is"perl"
or"two words"
the last one being a string of exactly 6 blanks." "
So we can change the program that computes
3 + 5 to add a blank space after the result.
This way the result (tucotuco.cs.indiana.edu% vi two tucotuco.cs.indiana.edu% cat two #!/usr/local/bin/perl $x = 3 + 5; print $x, " "; tucotuco.cs.indiana.edu% ./two 8 tucotuco.cs.indiana.edu%
8)
doesn't get as cluttered as before by the prompt.
In strings delimited by double quotes certain combinations of characters have a special,
clearly determined meaning. For example the following group of two characters:
\n
stands for a carriage return (or newline). So if we change the script that computes the results of
3 + 5 and
prints it out to print
"\n"
instead of " " after the result
the output appears on a line of its own:#!/usr/local/bin/perl $x = 3 + 5; print $x, "\n";
tucotuco.cs.indiana.edu% ./two 8 tucotuco.cs.indiana.edu%
Lists can be stored in variables that are prefixed by the symbol
@.
Here's a program that assigns a list of integers to a variable
a. Using the foreach construct it
goes over the entire list of variables and adds them up, to print the result at the end.
tucotuco.cs.indiana.edu% vi three
tucotuco.cs.indiana.edu% cat three
#!/usr/local/bin/perl
@a = (1, 2, 3, 4);
foreach $a (@a) {
$sum += $a;
}
print "The sum is: ", $sum, "\n";
tucotuco.cs.indiana.edu% chmod +x three
tucotuco.cs.indiana.edu% ./three
The sum is: 10
tucotuco.cs.indiana.edu%
As they say, by the principle of least surprise the sum
variable gets initialized to 0, and so it's 0
the first time it's used. The foreach loop uses a variable $a
that takes every value in the list @a
in turn; each such value is added to $sum, the statement
$sum += $a;
being a short form of $sum = $sum + $a;
Lists have their elements in a certain order, and indexed by their position in their list.
For example the first element of the list @a has index 0 and can
be referred to as $a[0]. Here's program three modified again to
print the value of the third element of the list, $a[2]. (Note that the first
element having index 0 the third one will have index 2).
A special (and perhaps intimidating) construction gives the index of the last element in listtucotuco.cs.indiana.edu% vi three tucotuco.cs.indiana.edu% cat three #!/usr/local/bin/perl @a = (1, 2, 3, 4); print "The third element has value: ", $a[2], "\n"; tucotuco.cs.indiana.edu% ./three The third element has value: 3 tucotuco.cs.indiana.edu%
@a: $#a
This is very useful if the list changes with time (although we won't have such examples
in this tutorial). Here's a modified version of three that also prints out the
number of elements in the list by using the index of the last element:
If a listtucotuco.cs.indiana.edu% vi three tucotuco.cs.indiana.edu% cat three #!/usr/local/bin/perl @a = (1, 2, 3, 4); print "The third element has value: ", $a[2], "\n"; print "The list has ", $#a + 1, " elements.\n"; tucotuco.cs.indiana.edu% ./three The third element has value: 3 The list has 4 elements. tucotuco.cs.indiana.edu%
@a is empty, then $#a evaluates to -1.
A special list @_ is used to hold parameters passed to
functions. Here's the program that uses a subroutine add to add
3 to 5 and then prints the result.
tucotuco.cs.indiana.edu% vi four
tucotuco.cs.indiana.edu% cat four
#!/usr/local/bin/perl
$x = &add(3, 5);
print $x, "\n";
sub add {
local ($a, $b) = @_;
return $a + $b;
}
tucotuco.cs.indiana.edu% chmod +x four
tucotuco.cs.indiana.edu% ./four
8
tucotuco.cs.indiana.edu%
The subroutine is invoked with &.
The definition of the subroutine starts with sub.
The two parameters of the functions are called $a and
$b. They are local to the add
subroutine. The subroutine simply returns the sum of its parameters.
The list of parameters passed to the function can be found in @_
which is a list (@) with a curious name: _ (underscore).
That much about passing parameters to a subroutine. Let's now talk about the way one
passes parameters to the entire program. A special list is holding those values, and its
name is @ARGV. $ARGV[0] is the first command line argument and the index of the last one
is $#ARGV, as expected. Here's a Perl program that prints back its command-line arguments:
tucotuco.cs.indiana.edu% vi six
tucotuco.cs.indiana.edu% cat six
#!/usr/local/bin/perl
foreach $a (@ARGV) {
print $a, "\n";
}
tucotuco.cs.indiana.edu% chmod +x six
tucotuco.cs.indiana.edu% ./six a b c
a
b
c
tucotuco.cs.indiana.edu% ./six
tucotuco.cs.indiana.edu% ./six 1 2 3 4 5 6
1
2
3
4
5
6
tucotuco.cs.indiana.edu%
Try ./six -d . for the fun of it.
A simple exercise would be to modify the six program
to distinguish and signal the situation when there are no command-line parameters
passed to the program at all.
tucotuco.cs.indiana.edu% vi six
tucotuco.cs.indiana.edu% cat six
#!/usr/local/bin/perl
if ($#ARGV >= 0) {
foreach $a (@ARGV) {
print $a, "\n";
}
} else {
print "No arguments.\n";
}
tucotuco.cs.indiana.edu% ./six 4 3 2
4
3
2
tucotuco.cs.indiana.edu% ./six
No arguments.
tucotuco.cs.indiana.edu%
Let's close our tour of Perl by introducing associative arrays. They are a very natural
way of associating values with a set of distinct keys. For example we have associated port numbers with
usernames. No two usernames in A348/A548 are identical. Each one identifies a unique person.
Each person has been assigned a unique port number. Here are some fictitious assignments:
Here's a program that creates an associative array indexed by usernames, and when invoked with a username on the command line returns the port assignment for the owner of that username.LBIRD 10000 MJORDAN 10001 SPIPPEN 10002 TKUKOC 10003
tucotuco.cs.indiana.edu% vi seven
tucotuco.cs.indiana.edu% cat seven
#!/usr/local/bin/perl
%portnumbers = (
LBIRD => 10000,
MJORDAN => 10001,
SPIPPEN => 10002,
TKUKOC => 10003
);
if ($#ARGV >= 0) {
print $ARGV[0],
"'s web server uses port #",
$portnumbers{$ARGV[0]}, "\n";
} else {
print "No username specified.\n";
}
tucotuco.cs.indiana.edu% chmod +x seven
tucotuco.cs.indiana.edu% ./seven LBIRD
LBIRD's web server runs on blesmol and uses port #10000
tucotuco.cs.indiana.edu%
We have reached the end of the Perl tutorial highlights.
"This is not the end, not even the beginning of the end; but
it might be the end of the beginning." -- Winston Churchill
Note: do not use vi unless you're extremely fond of it. Use emacs,
or pico instead.
The goal this week will be to write and understand CGI scripts.
To understand how scripts interact with web browsers and servers we need begin
by reviewing a simpler interaction: how static HTML files are requested by and
displayed by users. Let's say that I have the following simple, basic HTML file
in my DocumentRoot and its name is hello.html.
The path to the file is<html> <head> <title> Hello world! </title> </head> <body> <h1> Hello world! </h1> <p> How are you doing? </p> </body> </html>
and it has to be made readable by the world for it to be accesible over the web:/u/dgerman/apache/apache_1.3.14/htdocs/hello.html
Once we've created the HTML text, it may seem that the process of delivering it to a web browser should be a trivial task. But serving even a simple page like this one requires that a lot of coordination occur between the browser and the web server on which the page is stored.burrowww.cs.indiana.edu% pwd /nfs/paca/home/user1/dgerman/apache/apache_1.3.14/htdocs burrowww.cs.indiana.edu% emacs hello.html burrowww.cs.indiana.edu% cat hello.html <html> <head> <title> Hello world! </title> </head> <body> <h1> Hello world! </h1> <p> How are you doing? </p> </body> </html> burrowww.cs.indiana.edu% ls -ld hell* -rw-r--r-- 1 dgerman faculty 127 Jan 15 17:44 hello.html burrowww.cs.indiana.edu%
By web server we mean a program residing on a host machine that uses the
Hypertext Transport Protocol (HTTP) to communicate with the browser. The
program stored in my burrowww account in
/u/dgerman/apache/apache_1.3.14/bin/httpd
is such a program, and you are using similar servers. The web is
based on a client-server model. This means that there is a
server (that provides resources) and a client (which requests them). Here's what we need to keep in mind about them:
There are thousands of web servers throughout the world (wide web) but they are all acessible from any browser because they have all agreed to use a common protocol - the Hypertext Transfer Protocol (HTTP). HTTP is based on an exchange of requests and responses.
Each request can be thought of as a command, or action, which is sent by the browser to the server to be carried out. The server performs the requested service and returns its answer in the form of a response.
The components of a simple WWW interaction are the user, the client, and the server. The client acts as an intermediary between the user and the server.
Steps 1-7 detail the basic information flow in a simple HTTP transaction. Essentially the client requests a file and the server delivers it. The entire HTTP process takes place as a result of simple transactions of requests and responses.
http://burrowww.cs.indiana.edu:10006/hello.html
and clicks the hyperlink or types the URL into
the browser.
Open Page in Netscape) says that the computer
burrowww.cs.indiana.edu
needs to be contacted on port 20006 and that the
/hello.html file is needed. For this the browser
sends the HTTP GET command to the server (not shown
here - we'll look at how this works when we simulate this request
process using telnet). The path to the requested file
is relative to the server's document root).
GET request to the server,
indicating what file
it needs. This request travels over the Internet, going from
computer to computer until it reaches the web server's host:
burrowww.cs.indiana.edu
of the CSCI's burrow cluster. There's a network security aspect here that we will need to address later.
.html) to determine the type of information
in the file. The .html means that it will send back to
the browser the file but it will first say:
this file's Content-type: text/html
You do not have to write this in the file, it is inferred by the server from the file's extension. But the server does send this information to the browser as part of the header, followed by the data (the actual file) as explained below.
Content-type: text/html
The headers are then followed by (a blank line and then by)
the HTML data itself.
Content-type: part of the header tells the
browser that the data is text formatted in HTML, so the browser renders
the text appropriately, highlighting hyperlinks, etc.
When the server receives a request to access the database it passes the request to a gateway program which does whatever is necessary to get the data and return the results to the server.
The server then repackages the information from the script, and forwards the information back to the client. (In a sense the server acts as a sort of translator, taking data from either a file or script and providing it to the browsers in a consistent and uniform manner).
We make two observations now:
The CGI protocol
So the process of servicing the
http://burrowww.cs.indiana.edu:20006/cgi-bin/hello
request is different, because by the shape of the request the server
realizes that it needs to execute the script specified by that address
(or path) rather than simply retrieving the file. Upon starting the script, the server provides it with a variety of potentially useful information (such as the name of the machine from which the request originated, type of browser used, etc.) and then starts the script. What follows is of no concern to the server, other than the output of the script, which the server will send back to the requesting browser.
You take a lot of responsibility this way if you're writing the script.
While the server doesn't care how the script generates its output, it does
need to know the format of the output - the script's output is, after all,
the server's input (on the path back to the user). Recall that when
the web server delivers a static file to the browser, it uses a filename
extension to determine what to return in the Content-type
header.
This technique doesn't work for scripts, because a script's
filename is unrelated to the type of information it returns.
A script named getpic, for example, may return
an image as its data (Content-type: image/gif)
while the similarly named getinfo might return HTML
text (Content-type: text/html). It is even
possible for a single script to output different sorts of data
depending upon the context in which it is called. Therefore it
is absolutely essential that the script notify the server of the
type of data it is generating, so that the server can pass this
information on to the client.
The hello script is presented below:
#!/usr/bin/perl
print qq{Content-type: text/html\n\n<html>
<head>
<title> Hello world! </title>
</head>
<body>
<h1> Hello world! </h1>
<p> How are you doing? </p>
</body>
</html>
};
The program has one statement only, which prints a few lines.
The first line starts by specifying the media type that identifies the data
in the body. The \n\n that follow the header information are
translated by Perl into newlines. The first ends the line with the
content type, while the second inserts a (mandatory) blank line that separates
the header from the rest of the message.
The remainder of the script simply
outputs HTML text that looks suspiciously similar to the contents of the
static hello.html shown earlier, beginning with the familiar
<html> tag and ending with </html>.
All of this text output by the print statement is sent to
the server which executed the script.
The server captures the output,
constructs a set of HTTP message headers (including the
Content type: returned from the script),
and sends these headers and the rest of the script's output
to the browser. Upon receiving and interpreting the data, the
browser is left with the HTML shown in
blue in the code fragment
above.
This is rendered by the receiving browser in the exact same way as
the hello.html file that we started with. So if we were
to look only at the output we couldn't make any difference
between the two approaches.
CGI scripts are quite flexible precisely because the server itself is not really involved in the process. The server's primary responsibilities are to
To summarize, the information flow is as follows:
http://burrowww.cs.indiana.edu:20006/cgi-bin/hello
GET message requesting
/cgi-bin/hello is sent to
burrowww.cs.indiana.edu on
port 20006
/cgi-bin/) it determines it should
run the script instead of simply retriving the file.
Content-type:
header to indicate the format of the data to the server, for example:
Content-type: text/html The
headers are then followed by the HTML text generated by the script. From
here on it's the same story as before:
Content-type:) and data go
directly from program to the server.
Content-type: header from the script. Following
the headers is the actual HTML script.
Content-type header
tells the browser that the data is HTML, so the
browser formats and renders the text appropriately,
including highlighting links.
Knowing these we return to the brief Perl primer and explain the following script that comes with the server:
#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
while (($key, $val) = each %ENV) {
print "$key = $val<BR>\n";
}
Its name is printenv and is our entry point to CGI.
A348/A548