Camilo Viecco
This howto explains how to setup a set of 'roo' honeywalls to share
data in a distributed fashion. There is no distributed management of
the system.
The type of data that is able to be used in such a manner includes the
hflow database (sebek data, argus data, p0f, processes) and the pcap
data.
There is no sharing of the iptables system logs.
In particular, this minihowto explains how to use several scripts
that allow such data distribution.
While Dave Dittrich had some interesting ideas when initially developing the idea of distribued honeyports, we diverged as we are initially concerned only on distributed data and not command and control. Two objectives were in mind for this solution:
We also wanted the following properties:
To achive this, two types of systems were defined: roos and lumpies.
A 'roo' is a honeywall system, and it holds data for only one sensor
(one
honeywall) a lumpy is a system that contains data for more than one
sensor. Lumpy requests data from other roo's or lumpy systems, a
roo only sends data. The data distribution is modeled on a directed
acycic graph (DAG). Where each node with no outgoing links is a
roo and all other nodes (nodes with at least one outgoing link)
is a lumpy. Data flows from roo's to the lumpies so that nodes closer
to each roo is more authoraitative and more current in terms of data
syncronization. Each lumpy as it leaves the hierarchy contains
more aggregate data, but less authoritative data.

A lumpy system can also work as a data repository for decomissioned
honeywalls.
The tradeoff for data transfer volume and 'currency'(with respect to
time) of the data is achieved by requiring each lumpy have a copy of
the database
data,
and to proxy the pcap_data requests to the next (lower) level in the
hierarchy.
If the lumpy is the pcap_data repository, as in the case for a
decomissioned honeywall, the pcap_request is served by
the lumpy system.
The data integrity tradeoff is solved by having each lumpy request data
from a system lower in the hierarchy. Thus this scripts work in a
"pull" fashion and this a network connectivity is required
between each Lumpy and each roo it collects data from.
Data Consistency is achieved by requiring each roo in the hierarchy to
have a unique sensor id. This condition is not supported by roo, but
again
there are helpers to achieve this (read setting up the systems, roo
below).
For each 'roo':
For each 'Lumpy':
I consider a roo an 'incarnation' of a roo system, an 'incarnation'
is a roo setup where the DB system is initialized. Therefore a roo
instaled on 10.0.0.1 on january 10 and another roo installed on the
same system with the same information on february 13 are considered two
different roo's.
This is to assure database consistency and allow the data of the
two incarnations to be uniquely identified.
The roo's setup is made of three steps:
However, step 1 is not so easy as the current roo uses its
management interface ip address as its sensor_id. This information is
used by hflow and sebekd to insert correct values into the database. To
generate an approximation of a unique ip_address I have developed the
following script: makesensor_id.pl .
This script takes one input the current manager ip address
and generates a sensor id (32 bit unsigned integer) with the
following method (counting bits from 0 to 31 with bit 31 the most
significant):
bit 31: zero(0). To allow manual sensor_id with guaranteed no
duplication with the generated ones.
bits 30-16: The number of days since the epoch.
bits 15-8: A hash of the management interface ip address
bits 0-7: a random number.
A more detailed explanation of this logic in on the honeynet wiki but
it has to do with assurirng uniqueness between same system incarnations
and the birthday paradox.
To run this script (for a 'roo') with managent interface=10.0.0.1
do:
%>./makesensor_id.pl -i 10.0.0.1
This sensor id is stored as /hw/conf/HwSENSOR_ID
After such file exists with a unique sensor id, then replace the
hflowd and sebekd startup scripts by the following versions (the only
difference is the use of HwSENSOR_ID as part of the initialization
parameters, replacing the old HwMANAGER_IP):
hflowd sebekd
This is best done at system setuptime, (just after installing, in
the first boot of the system as a roo).
After all this is done you can copy the DBcopy3.pl script into the
walleye root directory. This script is used to copy the database data
(in incremental forms) to the Lumpy.
And finally add the ip address of the lumpy so it can access the data.
I have setup-up 'Lumpy' only in Linux systems, however, there is no
instrinsic limitation on this. Here are the common setup for both setps.
Now depending on you are ready to start importing data.
Just run dbget.pl. If you want you can add it to the cron system
From the roo data repository copy all the pcap data into a
directory.
That is all.. the complete package is
here and more information can be found in the readme. I forgot to mention some strange
dependencies of dbget2.pl with some perl modules, please read the notes
inside dbget2.pl while I update the readme.
Hope this is sufficiently clear..
Camilo Viecco
Towards a Third Generation Data Capture Architecture for Honeypots
Ed Balas and Camilo Viecco ,2005 IEEE information assurance workshop, pdf