Friday, February 29, 2008

Mumbai hotel day...

Soooo I finally managed to cram some writing out of my mind and onto... well not paper but at least it's digital representation. So far it's not very good, but hey that can wait as long as I get something down right? Basically I've started on the introduction and background things. I tend to find things I need to cite from someone so it's slow, but I'm filling up my .bib file :) I'm not 100% sure on how to write a thesis, but I hope that once we get through the background stuff we wont need to do much more citing of things since we are mainly developing something new.

On the issue of the phoneme corpus thing. As long as we have a marked up speech corpus (doesn't really matter what it contains as long as it's a good mix of sounds) we should be able to use that, right? If we have the phoneme mark up for such a corpus we can record our own, by just pronouncing those phoneme sequences. 2000 words might sound a lot but it's really just about 4 pages of text.

Anyways, now I'm going to head to the gym for a while so that when I get back to Sweden I'll be buff like a the all conquering Schwartzenegger (in his youth)!!! Muwahaha! :) Oh, I'll post up some pics for here too.

Pictures from India

A young monkey living in the Sanji-Ghandi National Park (not 100% sure on the name... of the park not the monkey... I named the monkey Ronald Regan... maybe he can grow up and one day create some kind of "Monkeygate" scandal? One can always dream! Come to think of it, it could be a girl monkey. Well to me Ronald in gender neutral so there you go :) )

<-- Picture is taken on Elephant Island. I am standing infront of Shiva, God of Destruction. Most (all?) of the images of Shiva was destroyed/mutilated by the Portuguese when they came to india. That, if anything is the definition of irony!

There is a Lion in this picture. I also like to think that the image showcases my photographic craftsmanship... hehe

Gates of India and the Taj seen from a boat. The Gates of India is apparently a symbol for the Indian independence. It was built for some English king and the last Englisd soldiers marched out of India through it!

Wednesday, February 27, 2008

Starting 0.1

After some "vacations" (not really, just poorly time management) I am back and doing some stuff with the thesis.

I started to look at some dictionaries at CMU (
I took the one called cmudict.0.1 which is the "smallest one", it only has around 100,000 words. I am cutting them down and I hope my first draft will not go above 2000 words, but I think I should cut it down to 1000 in the end.

Probably I am just going nuts and should just read up a bit more to specified a better corpus, but the reason I am doing this now is to have a big group of words we can choose from and also to have as many mixtures of sounds as possible.

Recording 2000 sounds might be a "little" bit unrealistic, but we should talk about that.
I have a dictionary of about 25 (maybe more) words which are numbers. (that one we should do!!).

Question: Will we need a corpus only on the phonemes? I guess we will. Do you have an idea of where we could find one for free?

This is another matter I was suppose to check.
I was looking into RAW format and I could find no good links nor tutorials for handling this.
So what I am doing now is creating a prototype with JMF (Java Media Framework) and play with and try to get the pitch and control the volume.

I found a cool website that might help me with this. It is called ( and from there I am trying to create something we could use.
Initially I want to control the volume and finally I will do the pitch which uses seems that I need to implement "fast fourier transform".

This Blog
I started this blog for 2 reasons. One is to force myself to work, that is, I will write comments, questions and findings here. The other reason is for you to know what I am doing, where I am getting stuck and what decisions I am taking.