Project 3: Network Path Diagnostics (NPD)
Assigned: 11/05/09, Due: 11/24/09
Project Goal
The traceroute utility is commonly used for network path
diagnostics and topology inference. In this project, you will build a toolkit
that leverages traceroute to provide various pieces of information
about network paths.
Project Specification
Input
Your program, npd, must be written in C or C++.
It must run on CS Linux machines and accept the following four kinds
of inputs:
- a host name
- an IP address
- a file containing multiple host names, one on each line
- a file containing multiple IP addresses, one on each line
For each type of input, your program must confirm that the destination IP
or host name is syntactically valid (composed only of letters, numbers, the
".", and the "-" characters).
Output
If the input is a host name or an IP address, your program should
execute the traceroute command and report on the following
statistics:
- Traceroute complete: Determine if the traceroute
reached its destination by the following criteria:
- If the last entry of the traceroute has the same name
or IP address as in the request, report that the
traceroute was complete.
- Many hosts on the Internet have multiple names,
and traceroute may not report the one you expect. If you started
with a host name and the host name reached by the traceroute does
not match, you should get the IP address for the host name using
the gethostbyname function. If the IP address from this
function matches that in the last entry of the traceroute, report
that the traceroute was complete.
- If neither of the above criteria are met, report that the
traceroute was incomplete.
- Extent of reachability: If the traceroute does
not complete, infer the extent of reachability using the following criteria,
which help determine if the reachability was hurt closer to the destination or
somewhere in the core of the Internet:
- Reaches BGP prefix: The traceroute may
reach the organization but not the exact intended host. Find the IP
address of the last hop of the traceroute where one was provided (the
last non-starred hop). Modify your trie code from Project 2 to return the prefix an
address is in instead of its next hop. Run the longest prefix
match algorithm on both the destination IP address and the one
actually reached using the prefixes from Project
2's load.txt file. If the BGP Prefixes match, report that the
traceroute successfully reached the BGP prefix of the destination.
- Reaches /24: If the input IP address and the IP address of
the last non-starred hop in the traceroute output share a /24, report
that the traceroute reaches the /24.
- Reaches domain: If the input was an IP address,
find the corresponding host name using the gethostbyaddr
function. Then compare the domain name in the last
non-starred hop of the traceroute to the domain name extracted from
the input host name. If they are the same, report that the
traceroute reaches the domain. Recall that extracting the
domain name is not as simple as extracting the last two
labels from a host name. For example, for host name
www.foo.com, the domain name indeed is the last two labels,
foo.com, but for www.foo.co.uk, it is
foo.co.uk. In order to extract the correct domain name, you
first have to extract the correct effective top-level domain
(TLD). Once you have that, composing the domain name is simply a
matter of prepending the next label to the left. For example, the
effective TLD for www.foo.com is .com and that for
www.foo.co.uk is .co.uk. Use the provided file to infer effective
TLDs.
- Host, network, protocol unreachable or communication
prohibited: The traceroute may not complete if the host,
network, or the protocol are unreachable or if the communication is prohibited.
These are indicated by the occurrence of !H, !N, !P, and !X respectively.
Report on cases where these reasons prevent the traceroute from
completing.
- Hops exceeded: The traceroute may not
complete if the maximum number of hops are exceeded. When running
traceroute, make sure that you set the maximum hops to 64. Report on
cases where this number of hops is reached.
- Number of hops: Report on the number of hops traversed
irrespective of whether the traceroute was successful or not. This
statistic gives you an idea of average path length in the Internet.
- Starred hops: For both successful and unsuccessful
traceroutes, report on the percentage of hops that only returned *s.
This statistic gives you an idea of how often traceroute is blocked or
incorrectly implemented.
- Number of unique networks traversed: Using the IP
addresses and host names contained in the traceroute output, find out
the unique networks traversed in the following three ways:
- Number of unique BGP prefixes traversed: Using the IP
addresses contained in each row of the traceroute output and longest
prefix match on the BGP routing tables as before, find the lower
bound on the number of unique BGP prefixes traversed. Compare this number
to total number of hops and number of un-starred hops respectively.
- Number of unique /24s traversed: Using the IP addresses
contained in each row of the traceroute output, find the lower
bound on the number of unique /24s traversed. Compare this number to
total number of hops and number of un-starred hops respectively.
- Number of unique domains traversed: Using the host names
contained in each row of the traceroute output, find the lower
bound on the number of unique domains traversed. If the host name
corresponding to an IP address is missing, ignore that entry. Compare this
number to total number of hops and number of un-starred hops respectively.
- Time: Report on the total time taken for the
traceroute to finish, both for successful as well as unsuccessful
traceroutes. This time should be parsed from the traceroute output,
not measured by your program. Due to the possibility of *s, this time would be
a lower bound. Ignore '!' if it shows up after a valid time entry in a row of
traceroute output.
Statistics for files
If the input is a file containing host names or IP addresses, your program
should still report on the same statistics as you would report for individual
host names and IP addresses, except that you should now provide
aggregate statistics, separately for failed and successful
traceroutes. For example, instead of reporting on the number of hops
traversed for individual destinations, report on how many successful and
unsuccessful traceroutes went how many hops. Be sure to quote the
total number each of successful and unsuccessful traceroutes.
Guidelines and Resources:
- Familiarize yourself with the traceroute utility. Public
traceroute servers are available at http://www.traceroute.org/. You can
also run traceroute from the command line on most CS machines.
- Consult traceroute man pages to learn about the utility. An
example of such a man page is http://www.zytek.com/traceroute.man.html.
Feel free to avail other resources on the Web and consultations with
colleagues. However, make sure to write all code independently and cite all
resources you use, including web tutorials and discussions with individuals
outside of your group.
- Your program should simply execute the traceroute
utility on the supplied destination and display the required
statistics. You should NOT develop your own traceroute
implementation. You can use the popen function to run
traceroute and read its output into your program. You should
not display the traceroute output.
Extra Credit
For 10% extra credit, add the following statistics to your output for all
types of input.
- Hops traversed at edges: To determine how the
total hops in the traceroute output compare with those close
to source and destination, find out the number of hops at the same BGP
prefix, /24, and domain as the source and destination IP address and
domain. For unsuccessful traceroutes, the number matching the
destination may be zero.
- When computing the total time traceroute takes to complete, also
report on how it compares with to the following times:
- Time taken to traverse the edges: Use the same three
notions of edge as used in finding out the hops traversed at the
edges. Compare this to the time taken to traverse the rest of the hops.
- Time taken to traverse the slowest/congested/longest link:
Report on the IP address and host name, as given in the traceroute
output, to find the link that takes the most time to traverse, either because
it is slow, congested, or simply long. If you encounter cases where a certain
hop appears to take negative time, ignore them.
Deliverables and Grading
The grading will be based on the following three deliverables, all of which
are due by 11:55 pm on the due date:
- Project report: Prepare a 2-page report detailing your
implementation. Be sure to state any assumptions you made in your program and
all of its known limitations, including portions that are not implemented or do
not work correctly. The report should use a 10-point font, with margins on
either of the four sides not exceeding 1 inch. Submit the report in
pdf format with your code (more details below). No other format,
e.g., doc, will be accepted. Also, print your project report
double-sided and bring the hard copy to the demo.
- Code submission and demo: Submit your code,
and project report files as a single archive file
(.tar or .tar.gz file formats only) via OnCourse. The demo slots will be
posted by the AIs on the web board soon after the submission deadline.
Schedule an appointment with them to demonstrate your project. Failure
to attend your scheduled demonstration timeslot will result in a 10
point reduction in your grade. Each group is required to complete a
demonstration in order to receive credit for the project.
- Partner evaluation: Evaluate your project
partner. Send your evaluation to the
email address. Failure to submit a
partner evaluation by the project submission deadline will result in a 10 point
reduction in your grade.