Dream & Passion: April 2012

Saturday, April 28, 2012

Testing wami-recorder

Installation and configuration:

1) Download Flex SDK from http://www.adobe.com/products/flex.html (http://www.adobe.com/devnet/flex/flex-sdk-download.edu.html)

After the downloading is finished, extract it to a directory and add following two paths to either ~/.profile or ~/.bashrc file:

[Flex_sdk_path]/bin to $PATH

[Flex_sdk_path]/lib to $LD_LIBRARY_PATH

2) Check out the wami-recorder codes using: hg clone https://code.google.com/p/wami-recorder/

Then navigate to the [wami-recorder] folder, which has two subfolders: example and src. Compile the client with following command:

mxmlc -compiler.source-path=src -static-link-runtime-shared-libraries=true -output example/client/Wami.swf src/edu/mit/csail/wami/client/Wami.mxml

The command will generate a Wami.swf file under the example/client folder. Next we can start testing the wami recorder.

3) Testing

A) upload both the client and server php example to my own server, test with the basic.html both recording and playback works fine.

B) change the recording file to a file on the server instead of the default one which is on the wami group's server

C) instead of specify a absolute path for wami to save the recording to, use a php file to save the recording

D) change the hard coded file name to a variable that can be generated automatically

E) check the recording format, which is PCM, signed 16 bit integer, 22050 sample rate; only the sample rate is different from what we want, which is 16000. Currently, it can be converted using command line tool sox on the server. Have already found the interface of wami recorder to set the recording parameters, but the code does not effect currently

4) todos:

A) solve the recording parameter setup during wami recorder initialization

B) try a better UI for the recorder, which currently use the 3 basic buttons

Wednesday, April 25, 2012

GSoC 2012 Applications Accepted

When I saw the poster for Google Summer of Code in my department, it was already April 6th. Thanks to the time difference, I still had one day to apply before the deadline. Searching the list of projects with "speech recognition" as the keyword showed CMU Sphinx as the only result. It was great that there was something related to my research interests, which include acoustic modeling and speaker adaptation. While checking the CMU Sphinx project page, I was so excited to see the language learning project there. I had published a paper on that topic. That's what I will do! I contacted the mentor, James, for that project. He is really nice and gave me quite a lot of suggestions for my application. Also I have to thank Ronanki, who may not know that his well written project proposal helped me a lot with my application.

Finally, both Ronanki's Pronunciation Evaluation using CMU Sphinx3 and my Accurate and Efficient Pronunciation Evaluation using CMUSphinx for Spoken Language Learning proposals were both accepted this Monday! Thanks so much to all the mentors, reviewers and also to Google for providing us this great opportunity to work on open source projects.

Pronunciation learning is one of the most important parts of second language acquisition. The aim of this project is to utilize automatic speech recognition technology to facilitate learning spoken language and reading skills. Ronanki and I will work on the same pronunciation evaluation project with different focuses. Ronanki will focus on building the web-based pronunciation evaluation system with CMU Sphinx3. I will mainly focus on developing edit-distance based mispronunciation detection grammars, speech data collection, and maximizing the potential learner population by implementing a mobile application to work with our pronunciation evaluation system. Additionally, we also plan to design and implement an game front end to make the learning process much more fun. My project involves four specific sub-tasks: automatic edit distance scoring grammar generation, exemplar pronunciation data collection, an Android app client implementation, and development of a game-based learning system.

As a first time open source contributor, there are lots of things to learn. I believe we will have a great summer this year. Also any comments or suggestions are appreciated. Thanks again for everyone that made this happen!

All the posts for GSoC 2012 will also appear in our team blog: http://pronunciationeval.blogspot.com/.

Tuesday, April 24, 2012

GSoC 2012

Finally, my proposal for GSoC 2012 got accepted!

Accurate and Efficient Pronunciation Evaluation using CMUSphinx for Spoken Language Learning

Thanks so much to my mentor James for his great suggestions to my hurry application!

Let's start doing something great!