Monday, March 24, 2008

Sound Kills

This weekend I threw myself in the java sound world to figure out how to get the Pitch (frequency) of a wave.
I've read and read and read and after that I read even some more! There is a lot of information on the web about this. I found out that the best way of getting the frequency is by applying the Fast Fourier Transformation (FFT). This I knew for a while, but this time I got a deeper look at what FFT is and oh my god, I thought I was good at maths, but this took me soooo long to figure it out. I mean, the way some pages explained it made it sound so easy at the beginning and then started to complicate things step at the time to the point where I felt that I was reading gibberish.
One of the best sites I found that explain FFT is And it even gave me the code and a running application in Visual C++ that implements it. As I said, it takes patience, pen and paper to understand but in the end I think I did get it.
Really FFT is a faster way to compute the Discrete Fourier Transform (DFT).
Figure I shows the DFT equation

.. Figure I

To get a FFT code for java was easy, just google FFT and Java and you will get many versions, I chose one that also implements a Complex class which makes it more understandable.

Once I got this, I needed to figure out how to join the audio reading part with the FFT.
Most pages talk about sampling the sound and then feeding the values to the FFT, but none of them says how to do the sampling.
After many pages and reading almost everywhere I got to the conclusion that sampling means just the read value from the audio file. In other words, sampling is when I do:
"nBytesRead =, 0, abData.length);"
If you want to read more about how to do this, and view a sample code check this page out
I believe that is the one that helped me the most for this part and now I am getting the frequencies for each of the samples I obtain.

My current problem is that I don't know how to calculate the size of the sampling array that will be fed to the FFT method. At the moment I am using 4096.
Well, as much as I understand now, the array should be of the power of 2, that is 2, 4, 8, 16...1024, 2048, 4096, etc.
The thing is, how do I relate the size of the array to the time in milliseconds that we want (that is 10 ms) for every sampling? And does the length depend on the file format? (*.raw, *.wav, *.mp3, etc) ...the later is probably not true, but I can't prove it, nor can I confirm it.
I will try to find that out, but if you have any ideas, please tell me.

I was also looking at the audio format we should use, and I think it should be "MP3". I was hoping WAV would work, but for some reason when I record my voice using AUDACITY and export it as WAV file and then play it back with my java program I get a "mark/reset not supported".
At first I thought it was a problem with my program but seems that it is a bug in the Java sound API. The reported bug ID is: 6408764.
Don't get me wrong, I have some WAV files that do work, is just that I can not generate a WAV file with audacity that works with the Java API and is because of this that I had to rule out the WAV format. If you know of another audio recording application that we could use that can generate WAV files without generating the mark/reset error, then we can consider it again, but for now mp3 is a good option, can be generated with audacity and does work with the the Java API.

For now I think I can say I found the frequency for a sampling array of 4096. And it is not that slow!

  1. Playback changing the frequency!
  2. Making the application run under Ubuntu.
  3. Write about my findings for the thesis.


And as reminder for myself, these are questions pending for this part:
  1. The frequency values that I am getting, are they correct or coherent? I need a comparison reference to verify it.
  2. How to relate milliseconds with the amount of bytes that should be sampled read)?
  3. When reading a wav, mp3, wma, or any other audio format with the Java Audio API (JMF), will it give me the same values? The questions is brought up because for each different format, exist a different level of compression (different sizes), so, when the API reads it, does it decompress it and return a similar or almost similar value as the other formats would?

1 comment:

Unknown said...

It seems that 6408764 is not a bug in java sound. This exception is described in the javadoc getAudioInputStream():
* Obtains an audio input stream from the input stream provided. The stream
* must point to valid audio file data. In general, audio file readers may
* need to read some data from the stream before determining whether they
* support it. These parsers must be able to mark the stream, read enough
* data to determine whether they support the stream, and, if not, reset the
* stream's read pointer to its original position. If the input stream does
* not support this, this method may fail with an {@code IOException}.

To fix it in your code you need to wrap the InputStream to the stream which supports mark/reset, for example BufferedInputStream