Project 2: Wiretap
Assigned: 10/10/11, Due: 10/28/11
Project Goal
Programs such as tcpdump (http://www.tcpdump.org/) and
Wireshark
(http://www.wireshark.org/)
allow the interception and analysis of packets being transmitted or
received over a LAN. One prominent use of this information is in
troubleshooting network configuration and reachability. In this
project, we will provide you with data captured using these tools.
Your task is to write the analysis routines similar to those provided
by tcpdump and Wireshark. Your
program must be written in C/C++ and should work on the Burrow or
Sharkestra Linux machines.
Project Specification
Your program, wiretap, should take a file containing
tcpdump data as its input and output the statistics detailed
later in this document. Since this data is not in human-readable format, you
will have to use the Packet Capture Library, libpcap.a, and
the functions in its header file, pcap.h (found in /usr/include on the
CS Linux machines) to read the data. When compiling your program, include the
pcap library by using -lpcap as the first argument to your
GNU compiler. For example, gcc -lpcap -o wiretap wiretap.c will
compile a C program with the pcap library support. For C++, simply
change the compiler from gcc to g++. The other steps you
should follow are:
- Open an input file using function pcap_open_offline().
- Check that the data you are provided has been captured from Ethernet using
function pcap_datalink().
- Read packets from the file using function pcap_loop(). Note
that this function needs to be called only once. It takes 4 arguments. Of
these, the second and the third arguments are of most interest to you. The
second argument lets you specify how many packets to read from the file. The
third argument, pcap_handler callback, is where most of the action
happens. Here, callback is the function you write to process data
from each packet.
You can pass the callback function to the pcap_loop()
function simply by giving its name as the appropriate argument to
pcap_loop(). The callback function must be a void function
that takes three arguments, of the types u_char *, const struct
pcap_pkthdr *, const u_char *. The callback is called by
pcap_loop() once for each packet. The second argument
to the callback is the special libpcap header, which can be used to
extract the entire packet length and the packet arrival time (see the
pcap_pkthdr structure in /usr/include/pcap.h). The
third argument contains the contents of a single packet (from the
Ethernet packet header onward).
- Close the file using function pcap_close().
Packet format
Each packet in the file(s) provided to you is in
tcpdump format. It contains a tcpdump-specific header, an
Ethernet header, followed by network layer headers and their payloads. At the
network layer, the captures will have IP and other protocols. TCP and UDP will
both be present at the transport layer. We expect you to process Ethernet, IP,
TCP, and UDP headers. The packets will also contain application data, but you
do not need to process those headers. You will need to understand each of the
header formats to accomplish this task. You are encouraged to reuse the
structures from the relevant Ethernet, IP, and UDP header files in the
/usr/include/net and /usr/include/netinet
directories on the CS Linux machines. These files contain structures
used by the Linux operating system for actual packets. They can be
used to greatly simplify the process of parsing the packets. However,
you are still free to implement the structures on your own, if you
wish.
Extra Functionality Required of P538 Students: In addition
to the above, your program be tested on statistics based on processing ARP
headers at the network layer and DHCP at the application layer.
Program output
The callback function should gather statistics from each packet to
enable your program to print the following on standard out:
- Start date and time, total duration, and total number of packets in the
packet capture
- Average, minimum, and maximum packet sizes. Here, packet refers to
everything beyond the tcpdump header.
- Unique Ethernet addresses found as both sources and destinations, along
with the total number of packets containing each address. Represent Ethernet
addresses in hex-colon format.
- Unique Network layer protocols seen, and how many packets use them. Report
them by protocol number, except for IP.
- Unique source and destination IP addresses, along with the total number of
packets containing each address. Represent IPv4 addresses in the standard
a.b.c.d notation. Ignore IPv6 addresses.
- Unique Transport layer protocols seen, and how many packets use
them. Report them by protocol number, except for TCP and UDP.
- Unique source and destination TCP and UDP ports, along with the total
number of packets containing each port number.
- Determine the number of UDP packets with a correct checksum, an incorrect
checksum, and those that omit checksum calculations. You may not use a checksum
program available online. Write your own.
- For TCP, report the number of packets containing each flag.
Extra Statistics Required of P538 Students:
- Unique ARP participants, their associated MAC addresses, and IP
addresses.
- Unique DHCP clients and servers, listed separately. Indicate the
number of packets transmitted by each client or server.
- Report the number of DHCP packets that are DHCPDISCOVER,
DHCPOFFER, DHCPREQUEST, and DHCPACK.
Test files
The following are two packet captures you should use for testing purposes.
Note that your program should work on packet captures that have additional
protocols your program does not understand. Additional test files will be
used during the demo.
Miscellaneous
- If you have root access on any machine, install
Wireshark and experiment with it.
- Resource: http://www.tcpdump.org/pcap.htm
- Use the ntohs and ntohl functions as appropriate to
read values that span multiple bytes. This ensures that the bytes are in
proper byte order (Endianness) for use at this host.
Deliverables and Grading
Submit your code and project files as a single archive file (.tar or
.tar.gz file formats only) via OnCourse.
Shorly after the submission deadline, demo slots will be posted on
the Demonstration
Scheduling System (a reminder will be posted on
the Web
Board). You must schedule an appointment to demonstrate your project.
Groups that fail to demonstrate their project will not receive credit for the
project. If a group member fails to attend his or her scheduled demonstration
time slot, it will result in a 10 point reduction in his or her grade.
In addition to testing your code for various test cases, the AIs will be
explicitly evaluating the contributions of individual project partners. In
cases where they determine that partners have not contributed equally,
differential grading will be used. The instructor and the AIs reserve the
right to determine appropriate penalty in such cases.