Documentation for Bex, Version 0.9

by Fred Cummins (revised Jan 22/99)

logo

Index

What is Bex

Bex stands for "beat extraction". We assume that it is in principle possible to locate a beat with certain auditory, and hence audio, events. As originally implemented, beats are identified in the speech signal in order to study rhythmicity in speech. The process can be applied to virtually any audio signal, however, and may well be of use in diverse musical applications. There is not, as yet, any equivalent for 2-D optical signals, though that too would be interesting....

What you have here is a set of programs which collectively allow the identification of beats. Neither the programs, nor their author, can give the slightest guarantee that these "beats" correspond to anything that humans perceive. There is, in fact, a deal of evidence which suggests that if we supply these programs with reasonable parameters (such as the default ones) the resulting beats will correspond rather well with perceived beats. Furthermore, there is now also a deal of experimental evidence which suggests that the beats extracted by this program suite can tell us some interesting things about speech. References for both of these claims are given at the end. For now I will assume that if you have this code, you know yourself why you want it.

How does it work?

The figure above illustrates the essence of the program. We start with an audio signal, which in this case is a recording of the phrase big for a duck repeated twice. First we do some jiggery-pokery (see below) to produce a smooth envelope which describes the signal energy within a restricted frequency range. This is handled by the program sndtoenv. We then look at this envelope, and identify a beat in the middle of each local rise in intensity. The program envtobeats takes care of that, and produces a list of the times (and optionally strengths) of the beats. Another utility, beatstosnd, allows you to actually listen to the beats either alone, or overlaid on the original signal. Finally, and so you don't have to remember all this, there is a single superordinate program called Bex, (note the capital letter) which provides an interface to all of these programs. In order to use the program sensibly, however, it will inevitably be necessary to get to know the component programs in at least a little detail. This is described below. First, however, we need to address the basic questions of file formats and installation

Sound File Formats

This code is being released in two versions: one for Sun Sparc workstations, and one for Silicon Graphics workstations. (An unofficial version for Linux may be available on request from me - it will inevitably be rougher.) If you have the Sun version, you will need your audio files to be of .au format, with 16-bit linear sample encoding at any sampling rate. If this is gobbledygook to you, don't feel bad, no one should ever have to know this rubbish. However, you do now, so take a look at the Audio File Format FAQ before going much further (if the link is out of date, try Yahoo or some such). If you have an SGI machine, you will use AIFF files, again, with 16 bit linear encoding. Both of these are fairly standard forms. If your data is not in the required form, you might consider using SOX (for Sun) or sfconvert to get what you need.

To reiterate:

Machine Format Encoding Sampling rate
Sun AU 16 bit, linear any
SGI AIFF 16 bit linear any

Installation

Current versions of both the SGi and Sun versions of Bex are here.

Once you have ftp'd one or the other of the tar files, unpack it with something like the following command:

tar xzvf bex0.9-sgi.tar.gz
cd bex0.9
Bex

If your machine barfs, and says it doesn't understand the -z option to tar, then do:

gunzip bex0.9.tar.gz
tar xvf bex0.9.tar
cd bex0.9
Bex

You can remake the code by typing "make" if you like. If you do, and you need to change stuff to get it to work, please email me.

The SGI code was compiled on an Indigo running IRIX 5.3, and compiled using CC and cc. The Sun code was compiled on a Sun running SunOS 5.5 (Solaris) (that's gummy, if you are from IU Computer Science Dept) using gcc for the C++ code and cc for the plain C code. The compilation was not really straight forward, as the C++ code (sndtoenv) is based on a library of dsp functions called spkit. I have pre-compiled this, and provide it as libsp.a. I do not provide the source for this, however - merely the header files required for compilation. If you need to recompile libsp.a you can get the source here . Future releases of Bex should dispense entirely with spkit, with a view to keeping it all in C and all original. Meanwhile, a big thank you to Kai for his code.

Envelope generation: sndtoenv

The audio signal is bandpass filtered, then rectified by making all values positive. It is then smoothed. Sndtoenv thus requires 3 parameters, which can be set from the envelope menu. The parameters are stored in sndtoenv.par, which looks like this (default parameters shown:

### parameters for sndtoenv
### generated automatically by Bex
1000    Center frequency
600     Bandwidth
20  Cutoff for smoothing
./soundfiles/lim.au  Input filename
./soundfiles/lim.env.au  Output envelope file

Don't edit this by hand - it is generated on the fly by Bex.

Beat extraction : envtobeats

Beats are placed half way through a local rise in the amplitude envelope computed by sndtoenv. Optionally a rough strength is associated with a beat. Strengths lie between 0 and 1, where 1 is the signal maximum. Beats which lie below some threshold, expressed as a proportion of the signal maximum, are excluded. Optionally, a .benv file is produced which contains the amplitude envelope with overlaid beats (see the logo for a cartoon example). This can be visually inspected in your favorite sound file editor to check for rationality. The parameter file looks like this:

### parameters for envtobeats
### generated automatically by Bex
au Audio file type
./soundfiles/lim  Input filename stem
0 Do not generate envelope + beat file
0.1 Threshold, as % of signal max
1 Calculate beat strengths

Audible beats : beatstosnd

As a final bonus, you can listen to the series of beats, either as a bare series of beeps, or overlaid on the original audio file. Again, you can optionally modulate the strength of the beeps to match computed beat strength.

Who is responsible for all this?

This code is produced and maintained by Fred Cummins, currently fred@idsia.ch. It is released in the hope that it will be of some use in research and play. It is not guaranteed to do anything at all, and is used entirely at your own risk. Use it freely for non-commercial purposes, and pass it on. However, if you modify it in any interesting way I would appreciate hearing about it. If you do modify it and pass it on, please rename the code, as I would like to keep control of the original code. You do not have permission to sell this code or to use it in any commercial product whatsoever. Contact me if you are interested in commercial development.

References

Two claims were made above. For the first, that bex will produce something close to perceived beats, see especially Scott (1993). For the second claim, that something interesting can be learned about speech from the study of these beats, please see my 1997 Indiana University thesis in Linguistics and Cognitive Science, entitled ``Rhythmic Coordination in English Speech: An Experimental Study'' or my Journal of Phonetics paper ``Rhythmic constraints on stress timing in English.'' Journal of Phonetics, 26(2):145-171, which is available in postscript form.