Sunday, August 14, 2005

Audio Forensics for Beginners

Posted by Alecks Pabico 
PCIJ

OUR source, an independent audio expert, has offered us a basic audio forensics lesson that we'd like to share with readers. One of the widely used methods for analyzing the authenticity of an audio clip, he says, is to break the clip down into its time-frequency domain spectrum by means of an algorithm called the Fast Fourier Transform (FFT) and look for discontinuities (changes in background noise), magnetic head start/stop/pause signatures, changes in subject's voice frequency signatures, and recording equipment noise.

The FFT algorithm breaks down the audio clip into its frequency component and is plotted on frequency-time domain image called a spectrograph. The horizontal component of the graph is represented by time (t) and the vertical component is represented by the frequency with the lowest frequency starting at the bottom of the graph. The volume (or decibel rating) is usually represented by changes in color/intensity of the plot.

fft-plate1-1.jpg

Plate 1 shows an example of a FFT Spectrograph analysis, this one done on the three-hour recordings, particularly on the " yung dagdag" portion.

The top portion of the image shows the actual waveform of the clip while the bottom part is the result of the FFT Spectrograph analysis. The forensic person will initially look for discontinuities on the clip by simply checking each frequency (from background noise to equipment noise) and see if there are changes in the plot colors (in this case, an increase in audio volume is depicted by dark red pixels, a decrease, by light red pixels).

Next step would be voice-print analysis. In this particular case, this task does not need to be performed because the person (Pres. Gloria Macapagal-Arroyo) in the clip already made a public declaration as to who the voice on the tape belongs to.

Plate 1 clearly shows no discontinuities found on the spectrograph analysis, which led our source to conclude that the clip is unaltered and pretty much authentic.

Case of the Bunye "Splicer"

fft-plate2-1.jpg

Plate 2 shows a FFT Spectrograph analysis of the Bunye "unaltered/original" version.

Here you will find strong indications of discontinuities particularly in the middle-top portion of the spectrograph. These are actually Pres. Arroyo's background noise which disappears when "Gary" (political operative Edgar "Bong" Ruado) is speaking. The forensic person will simply conclude the test because the clip already failed in the background discontinuity test.

Now here is the interesting part, the lower-mid frequencies (lower-mid portion of the spectrograph) show only subtle discontinuities between Arroyo's and Gary's. This is an indication that the "splicer" knows his assignment on "background noise" that he "induced" one across the clip. Still, the work is not perfect because the splicer "forgot" to filter the noise in the higher frequencies (Arroyo's background noise in particular), and made the analysis much easier to execute.

Case of the Chavit X-Tape "Splicer"

fft-plate3-1.jpg

Plate 3 shows an FFT Spectrograph analysis of track 6 of the "Chavit X-Tapes."

FFT Spectrograph shows very strong indication of discontinuities particularly before and after former president Joseph Estrada's waveform.

fft-plate4-1.jpg 

Plate 4 shows a zoomed version of these discontinuities.

Here we also see a "very consistent/very repetitive" clean background noise frequencies particularly on the voice of the other person. This is very typical on recordings done using a "professional" tuned-microphone/recording equipment and a quiet room with the doors/windows closed or a "recording studio." On the discontinuity test alone, a forensic expert will readily conclude that the clip is very much "spliced."

fft-plate5-1.jpg 

Plate 5 shows an FFT Spectrograph analysis of track 3 of the "Chavit X-Tapes."

No apparent discontinuities were found here but it seems the higher frequencies were cleaned out — filtered either via software or some solid-state recording equipment.

The "splicer" who worked on tracks 3 and 6 probably forgot to read books on "Audio Splicing 101 for Dummies" or the one who released this forgot to ask the experts if the "splice" will pass or not. They probably forgot that a "phone" tap should have at least some natural and continuous background noise on it. This is a case of too much exposure on expensive "noise free" audio equipment.