|Danko Antolovic, Development of a Real-Time Vision System for an Autonomous Model Airplane, Masters Thesis, Indiana University Computer Science Department, October 2001. [PDF] [DOC]|
The project goal is to build a vision system that will follow an object in a relatively simple (uncluttered) scene, in live video and in real time, during the flight of a model airplane. Long-term objective is to use this vision system to steer the airplane into a holding pattern above a selected (possibly moving) feature on the ground.
We have used a two-pronged approach to object tracking, taking into account the flight motion and the graphic "signatures" of the objects. A relatively simple procedure, placed at an early stage in the data flow, tracks the motion of the e ntire scene, and the camera moves to compensate for it.
Object recognition requires several stages in which the redundant and non-essent ial visual information is parsed out, until we are left with a selection of well defined objects. The stages implemented in our system are black-and-white thresholding, edge detection and segmentation into connected components. They constitute the typical "funnel" of vision processing, where the volume of data decreases but the depth of calculation increases at each stage. To identify a target object within the scene, we use the second moments about the objects principal axes of inertia as the "signature." Second moments are invariant under rotations and translations, which makes them suitable for the analysis of the ground view from a plane at steady altitude. They are also fairly easy to calculate, and work well in simple scenes. As somewhat peripheral features, the vision system also has an operator controlled digital zoom, and a process which sets the black-and-white threshold according to current conditions.
Camera: the "eye" of the system is a small digital camera, producing a grayscale NTSC video signal. The camera is mounted on a gimbal driven by two servo motors, each with a 90-degree range of motion.
Auxiliaries: the NTSC video signal contains three synchronization signals, which are decoded by means of a video sync separator chip (National Semiconductor LM1881). For the digitization of the video signal, we use a pipelined 10-bit converter (Analog Devices AD876),
ASICs: image sampling, thresholding and threshold calculation, motion detection and zoom are implemented as digital designs on a synchronous pair of FPGAs (Xilinx XC4010s), running at 33.3 MHz.
Microprocessor: The entire object recognition is implemented in software running on a 90 MHz ColdFire integrated microprocessor (Motorola MCF5307). We are using a commercial evaluation board (SBC5307) with 8 megabytes of DRAM, expansion bus and communication ports.
Memory: the ASICs and the ColdFire share image data through a 32K dual-port SRAM (by Cypress Semiconductor CY7C007AV). Data access is implemented as a hardware-controlled round robin procedure: the objective is to speed up the high-volume data transfer which occurs in the early stages of the vision process.