Distributed Honeynet Mini-Howto

Camilo Viecco


This howto explains how to setup a set of 'roo' honeywalls to share data in a distributed fashion. There is no distributed management of the system.
The type of data that is able to be used in such a manner includes the hflow database (sebek data, argus data, p0f, processes) and the pcap data.
There is no sharing of the iptables system logs.

In particular, this minihowto explains how to use several scripts that allow such data distribution.

Objectives

While Dave Dittrich had some interesting ideas when initially developing the idea of distribued honeyports, we diverged  as we are initially concerned only on distributed data and not command and control.  Two objectives were in mind for this solution:

  1. Database unification: Use one single walleye interface to monitor several running honeywalls.
  2. Honeywall decomission: What to do to keep data from a available from honeywall that needs to be reinstalled.

We also wanted the following properties:

  1. Data integrity. The data must always try to follow the most authoritative version of the data.
  2. Data consistency:  Repetition of operations should always leave the data in a consistent state at least as accurate data as before.
  3. Performance: We would like economy in terms of volume moved around the network.
  4. Flexibility: The solution should provide solutions to both objectives stated above.

Model and how it is done

To achive this, two types of systems were defined: roos and lumpies. A 'roo' is a honeywall system, and it holds data for only one sensor (one honeywall) a lumpy is a system that contains data for more than one sensor.  Lumpy requests data from other roo's or lumpy systems, a roo only sends data. The data distribution is modeled on a directed acycic graph (DAG). Where each node with no outgoing links is a roo and all other nodes (nodes with  at least one outgoing link) is a lumpy. Data flows from roo's to the lumpies so that nodes closer to each roo is more authoraitative and more current in terms of data syncronization.  Each lumpy as it leaves the hierarchy contains more aggregate data, but less authoritative data.

Roo and lumpy data hierarchy

A lumpy system can also work as a data repository for decomissioned honeywalls.

The tradeoff for data transfer volume and 'currency'(with respect to time) of the data is achieved by requiring each lumpy have a copy of the database data,
and to proxy the pcap_data requests to the next (lower) level in the hierarchy. If the lumpy is the pcap_data repository, as in the case for a decomissioned honeywall, the pcap_request is served by the lumpy system.

The data integrity tradeoff is solved by having each lumpy request data from a system lower in the hierarchy. Thus this scripts work in a "pull"  fashion and this a network connectivity is required between each Lumpy and each roo it collects data from.
Data Consistency is achieved by requiring each roo in the hierarchy to have a unique sensor id. This condition is not supported by roo, but again
there are helpers to achieve this (read setting up the systems, roo below).

Requirements

 For each  'roo':

  1. A runnining honeywall setup. Each 'roo' MUST have a unique sensor id.
  2. A copy of Dbcopy3.pl placed in walleye's root directory.

For each 'Lumpy':

  1. A running MySQL server version 4.1 or higher
  2. A database with the  walleye_0_3 schema and the distributed patch(here).
  3. A working walleye data interface.
  4. A copy of dbget2.pl
  5. A distributed version of pcap_api.pl
  6. (optional) an active cron system.

Setting the systems

Common roo setup

I consider a roo an 'incarnation' of a roo system, an 'incarnation' is a roo setup where the DB system is initialized. Therefore a roo instaled on 10.0.0.1 on january 10 and another roo installed on the same system with the same information on february 13 are considered two different roo's.
This is to assure database consistency  and allow the data of the two incarnations to be uniquely identified.

The roo's setup is made of three steps:

However, step 1 is not so easy as the current roo uses its management interface ip address as its sensor_id. This information is used by hflow and sebekd to insert correct values into the database. To generate an approximation of a unique ip_address I have developed the following script: makesensor_id.pl .
This script  takes one input the current  manager ip address and generates a sensor id  (32 bit unsigned integer) with the following method (counting bits from 0 to 31 with bit 31 the most significant):
bit 31: zero(0). To allow manual sensor_id with guaranteed no duplication with the generated ones.
bits 30-16: The number of days since the epoch.
bits 15-8: A hash of the management interface ip address
bits 0-7: a random number.
A more detailed explanation of this logic in on the honeynet wiki but it has to do with assurirng uniqueness between same system incarnations and the birthday paradox.

To run this script (for a 'roo') with managent interface=10.0.0.1 do:
%>./makesensor_id.pl -i 10.0.0.1

This sensor id is stored as /hw/conf/HwSENSOR_ID

After such file exists with a unique sensor id, then replace the hflowd and sebekd startup scripts by the following versions (the only difference is the use of HwSENSOR_ID as part of the initialization parameters, replacing the old HwMANAGER_IP):
hflowd sebekd

This is best done at system setuptime, (just after installing, in the first boot of the system as a roo).

After all this is done you can copy the DBcopy3.pl script into the walleye root directory. This script is used to copy the database data (in incremental forms) to the Lumpy.
And finally add the ip address of the lumpy so it can access the data.

Common Lumpy setup

I have setup-up 'Lumpy' only in Linux systems, however, there is no instrinsic limitation on this. Here are the common setup for both setps.

  1. Install a mysql server version 4.1 or above (go to the mysql website and follow the instructions for your subsystem)
  2. Install the honeywall's hflow schema in the system.
  3. Update the schema with the distributed DB patch (here).
  4. Create the directory in /var/log/pcap. Make sure is read-writable the the appropiate apache user.
  5. Copy the walleye perl modules into your system
  6. Copy the walleye web scripts into the system
  7. Verify you can access the walleye interface.
  8. Copy dbget2.pl to a "safe" location.
  9. Copy a distribued version of pcap_api.pl into the root directory of walleye
  10. Insert the sensor id information for remote systems.
    mysql> use walleye_0_3;
    mysql> insert into sensor(sensor_id,state,name,access_via,login,passwd)
    values(REMOTE_ID,2,NAME,IPADDRESSOFSENSOR,WALLEYE_lOGIN_FOR_ROO,
    WALLEYE_PWD_FOR_ROO);

Now depending on you are ready to start importing data.

Setting a lumpy with no local pcap data

Just run dbget.pl. If you want you can add it to the cron system

Setting a lumpy with some pcap data

From the roo data repository copy all the pcap data into a directory.

  1. make a directory "/var/log/pcap/sensor_SENSOR_ID_FOR_SENSOR"
  2. copy the remote sensor data into "/var/log/pcap/sensor_SENSOR_ID_FOR_SENSOR" where the data in the remote ''/var/log/pcap/" would be located in "/var/log/pcap/sensor_SENSOR_ID_FOR_SENSOR"
  3. run dbget2.pl.

Coda

That is all.. the complete package is here and more information can be found in the readme. I forgot to mention some strange dependencies of dbget2.pl with some perl modules, please read the notes inside dbget2.pl while I update the readme.

Hope this is sufficiently clear..

Camilo Viecco



References


Home