Sound Recognition

Hej
After some "vacations" (not really, just poorly time management) I am back and doing some stuff with the thesis.

SEARCH FOR A DICTIONARY
I started to look at some dictionaries at CMU (http://www.speech.cs.cmu.edu/cgi-bin/cmudict).
I took the one called cmudict.0.1 which is the "smallest one", it only has around 100,000 words. I am cutting them down and I hope my first draft will not go above 2000 words, but I think I should cut it down to 1000 in the end.

Probably I am just going nuts and should just read up a bit more to specified a better corpus, but the reason I am doing this now is to have a big group of words we can choose from and also to have as many mixtures of sounds as possible.

Recording 2000 sounds might be a "little" bit unrealistic, but we should talk about that.
I have a dictionary of about 25 (maybe more) words which are numbers. (that one we should do!!).

Question: Will we need a corpus only on the phonemes? I guess we will. Do you have an idea of where we could find one for free?

AUDIO FORMAT
This is another matter I was suppose to check.
I was looking into RAW format and I could find no good links nor tutorials for handling this.
So what I am doing now is creating a prototype with JMF (Java Media Framework) and play with and try to get the pitch and control the volume.

I found a cool website that might help me with this. It is called jsResources.org (http://www.jsresources.org/faq_audio.html) and from there I am trying to create something we could use.
Initially I want to control the volume and finally I will do the pitch which uses seems that I need to implement "fast fourier transform".

This Blog
I started this blog for 2 reasons. One is to force myself to work, that is, I will write comments, questions and findings here. The other reason is for you to know what I am doing, where I am getting stuck and what decisions I am taking.

Sound Recognition - Thesis

Wednesday, February 27, 2008

Starting 0.1

Whisper

Blog Archive

Contributors

Key

Emil