http://hostname.cs.indiana.edu:portnumber
where hostname and portnumber
were listed at
http://www.cs.indiana.edu/l/www/classes/a348/students.html
which also indicates which servers are running already. For example the server that we use for class demos can be called with:
http://tucotuco.cs.indiana.:19800
What gets retrieved in response to such a request from a web browser? The complete answer to this question is this:
Based on the shape of the URL it must be the index.html file in theWe are now ready to start our prelude to CGI.DocumentRootdirectory of the web server that runs ontucotuco, servicing port #19800.But we can get even more specific about this: since
tucotuco:19800is the demo server we know it's administered bydgerman, as the web course page says. Becausedgermanhas followed the conventions that everybody had to follow, and since we know the burrow environment like the back of our hands, the path of the file that gets retrieved is this:
/u/dgerman/httpd/htdocs/index.html
We start by describing the relationship between a web server and a web browser. We detail the interaction that goes on behind the scenes and show what happens when a browser requests a web page from a remote server. We then introduce the CGI (Common Gateway Interface) and explain how the browser-server interaction changes with the addition of CGI scripts.
To understand how scripts interact with web browsers and servers we begin
by reviewing a simpler interaction: how static HTML files are requested by
and displayed by users. Let's say you have the following simple, basic HTML
file in your DocumentRoot called hello.html:
Let's now assume that you put this file in your<html> <head> <title> Hello world! </title> </head> <body> <h1> Hello world! </h1> <p> How are you doing? </p> </body> </html>
DocumentRoot
and make it readable by the world.
Once we've created the HTML text, it may seem that the process of delivering it to a web browser should be a trivial task.tucotuco.cs.indiana.edu% pwd /nfs/paca/home/user2/dgerman/httpd/htdocs tucotuco.cs.indiana.edu% vi hello.html tucotuco.cs.indiana.edu% cat hello.html <head> <title> Hello world! </title> </head> <body> <h1> Hello world! </h1> <p> How are you doing? </p> </body> </html> tucotuco.cs.indiana.edu% ls -l hell* -rw-r--r-- 1 dgerman students 126 Sep 5 18:39 hello.html tucotuco.cs.indiana.edu%
But serving even a simple page like this one requires that a lot of coordination occur between the browser and the web server on which the page is stored.
By web server we mean a program residing on a host machine that uses the Hypertext Transport Protocol (HTTP) to communicate with the browser. Your
/u/username/httpd/httpd
is such a program. The web is based on a client-server model. This means that there is a server (that provides resources) and a client (which requests them). We need to keep this in mind about them:
There are thousands of web servers throughout the world (wide web) but they are all acessible from any browser because they have all agreed to use a common protocol - the Hypertext Transfer Protocol (HTTP). HTTP is based on an exchange of requests and responses.
Each request can be thought of as a command, or action, which is sent by the browser to the server to be carried out. The server performs the requested service and returns its answer in the form of a response.
[figure1]
The components of a simple WWW interaction are the user, the client, and the server. The client acts as an intermediary between the user and the server.
Steps 1-7 detail the basic information flow in a simple HTTP transaction. Essentially the client requests a file and the server delivers it. The entire HTTP process takes place as a result of simple transactions of requests and responses.
http://tucotuco.cs.indiana.edu:19800/hello.html
and clicks the hyperlink or types the URL into
the browser.
tucotuco.cs.indiana.edu
needs to be contacted on port 19800 and that the hello.html
file is needed. It does so by sending the HTTP GET command to the server
(which you don't see here, just yet - we'll see how this works when we
simulate this with telnet).
tucotuco. There's a security aspect
here that we will discuss later.
.html) to determine the type of information
in the file. The .html means that it will send back to
the browser the file but it will first say: the file's
Content-type: text/html. You do not have to write this
in the file, it is inferred by the server from the file's extension.
Content-type: text/html
The headers are followed by the HTML data itself.
Content-type header tells the browser that the
data is HTML, so the browser formats and renders the text appropriately,
including highlighting hyperlinks
When the server receives a request to access the database it passes the request to a gateway program which does whatever is necessary to get the data and return the results to the server.
The server then repackages the information from the script, and forwards the information back to the client. (In a sense the server acts as a sort of translator, taking data from either a file or script and providing it to the browsers in a consistent and uniform manner).
We make two observations now:
So the process of servicing the
http://tucotuco.cs.indiana.edu:19800/cgi-bin/hello
request is different for the server, because by the shape of the request it
realizes that it needs to execute the script specified by that address
(or path). Upon starting the script, the server provides it with a variety
of potentially useful information (such as the name of the machine from which
the request originated, type of browser used, etc.) Additional data may be
passed by the server to the script but we'll cover that later. What follows is of no concern to the server, other than the output of the script, which the server will send back to the requesting browser.
You take a lot of responsibility this way if you're writing the script.
While the server doesn't care how the script generates its output, it does
need to know the format of the output - the script's output is, after all,
the server's input. Recall that when the web server delivers a static file
to the browser, it uses a filename extension to determine what to return in
the Content-type header.
[comments/14 here]
The hello script:
print qq{Content-type: text/html
<html>
<head>
<title> Hello world! </title>
</head>
<body>
<h1> Hello world! </h1>
<p> How are you doing? </p>
</body>
</html>};
[figure 2 here]
The information flow is as follows:
http://tucotuco.cs.indiana.edu:19800/cgi-bin/hello
/cgi-bin/hello
is sent to
tucotuco.cs.indiana.edu
on port 19800.
/cgi-bin/) it determines it should
run the script
Content-type
header to indicate the format of the data to the server, for example:
Content-type: text/html The
headers are then followed by the HTML text generated by the script
Content-type) and data go
directly from program to the server
Content-type header from the script. Following
the headers is the actual HTML script
Content-type header
tells the browser that the data is HTML, so the
browser formats and renders the text appropriately,
including highlighting links
Content-type header which identifies
the Media Type of the script output. The header must be followed
by a blank line.
We have identified the conf directory and three
configuration files:
httpd.conf, that contains information
that the server needs at startup
srm.conf that contains information that the
server needs while it runs
access.conf that has information that describes
what users have access to what files on your site
httpd.conf
srm.conf)
access.conf
for the document root and scripts directories.
We described how the server can be started and stopped.
We mentioned that we need to be able to restart the server
automatically if we need to be on-line continuously, and we have
said we will describe ways to do that using cron.
The process id (pid) of the http daemon is located
in a file httpd.pid in the logs
directory.
The same directory contains a file where accesses are logged and a file where errors are recorded. We looked at them and described them briefly.