Sunday, May 20, 2012

Linking error while compiling CUDA SDK in Ubuntu 12.04

Following the installation guide provided by CUDA website, all the dependency libraries are installed through apt-get:

sudo apt-get install freeglut3-dev build-essential libx11-dev
libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev


When compiling the SDK, still get the error:

../../lib/librendercheckgl_x86_64.a(rendercheck_gl.cpp.o): In function `CheckBackBuffer::checkStatus(char const*, int, bool)':
rendercheck_gl.cpp:(.text+0xfbb): undefined reference to `gluErrorString'
collect2: ld returned 1 exit status


Tried several times uninstall and install again, and finally find the post: http://forums.developer.nvidia.com/devforum/discussion/3486/linkingmake-error-while-compiling-sdk-on-ubuntu-11-10/p1 solving this problem:

The problem is in the ~/NVIDIA_GPU_Computing_SDK/C/common/common.mk

Lines like:
LIB += ... ${OPENGLLIB} .... $(RENDERCHECKGLLIB) ...
should have the two in reverse.



Tuesday, May 15, 2012

cmusphinx3 phoneme alignment

Task: Generate phoneme level alignment acoustic scores.

Here is just a way to solve the problem. It requires two pass of alignments. If there are better ways to do so, let me know!

1) Do word level alignment using sphinx3_align as in the previous post.

sphinx3_align \
-logbase 1.0001 \
-feat 1s_12c_12d_3p_12dd \
-mdef model/model_architecture/wsj_all_cont_3no_8000.mdef \
-senmgau .cont. \
-hmm model/model_parameters/wsj_all_cont_3no_8000_32 \
-beam 1e-150 \
-dict ../../lib/dicts/wsj_all.dic \
-fdict ../../lib/dicts/wsj_all.filler \
-ctl ../../lib/flists/wav_fids.scp \
-cepdir ../../data/feat \
-cepext .mfc \
-insent ../../lib/wlabs/441c0200.lsn \
-outsent alignments/output.txt \
-wdsegdir wdseg,CTL \
-phlabdir phnlab,CTL \
-agc none \
-cmn current 

2) convert the phoneme labels to a phoneme transcriptions file

3) Do phoneme alignment with sphinx3_align

sphinx3_align \
-logbase 1.0001 \
-feat 1s_12c_12d_3p_12dd \
-mdef model/model_architecture/wsj_all_cont_3no_8000.mdef \
-senmgau .cont. \
-hmm model/model_parameters/wsj_all_cont_3no_8000_32 \
-beam 1e-150 \
-insert_sil 0 \
-dict ../../lib/dicts/phone.dic \
-ctl ../../lib/flists/wav_fids.scp \
-cepdir ../../data/feat \
-cepext .mfc \
-insent trans_phn.txt \
-outsent alignments/output_phn.txt \
-wdsegdir phnseg,CTL \
-agc none \
-cmn current 

testing cmusphinx3 alignment

Task: do alignment with the cmusphinx3 tools.

1) Data preparation:

Using one of the WSJ si_et_05 testing speaker' prompts, 42 wav files are recorded as the testing data. The wav is saved in the standard PCM encoding, i.e. the basic MS wav format. And put only the file names of those (without path and suffix) to a list file "wav_fids.scp", which is called control file in cmusphinx community. 

Convert the prompts to the format of:
FIRST COMMODITY APPEALED THE EXPULSION AND FINE TO THE C. F. T. C. (441C0201)
The last item in the "(" and ")" is the file name of the corresponding recording. This will serve as the transcription file to be used for alignment.

To extract the cepstral features with sphinx_fe command ( which is located in sphinxbase/src/sphinx_fe ):
sphinx_fe -verbose yes -c wav_fids.scp -mswav yes -di "../wav" -ei "wav" -do "../feat" -eo "mfc" 
With this command, most of the feature extraction parameters are using the default values. According to the specific requirements, adjust the parameter values. After this command, there will be a ".mfc" file under the folder "../feat" corresponding to each ".wav" file in the folder ../wav.

To view the content of the ".mfc" feature file, use the command sphinx_cepview (which is also located in sphinxbase/src):
sphinx_cepview -header 1 -describe 1 -d 13 -f ../feat/441c0216.mfc 

2) Prepare the dictionary

As most of the example scripts come with cmusphinx are using cmudict.0.6d, here we will also use this version instead of the newest cmudict.0.7a. 

First, download the dictionary from https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/  and remove the comments at the beginning and the stress symbols in the dictionary to give the dictionary "wsj_all.dic" for alignment.

Meanwhile, generate the phone list file "wsj_all.phones" which contains the total 39 phones from the dictionary file and an extra "SIL".

Also create the filler dictionary "wsj_all.filler" with the contents of:
<s>   SIL
</s>  SIL
<sil> SIL

3) Prepare the model

In this experiment, we use the existing WSJ acoustic model trained by Keith Vertanen (http://www.keithv.com/software/sphinx/us/sphinx_wsj_all_cont_3no_8000_32.zip). Simply download and extract the folders. 

For alignment, no Language model is required. 

4) Do the alignment

The alignment is done with the sphinx3_align ( from sphinx3/src/programs) with following configurations:
sphinx3_align \
-logbase 1.0001 \
-feat 1s_12c_12d_3p_12dd \
-mdef model/model_architecture/wsj_all_cont_3no_8000.mdef \
-senmgau .cont. \
-mean model/model_parameters/wsj_all_cont_3no_8000_32/means \
-var model/model_parameters/wsj_all_cont_3no_8000_32/variances \
-mixw model/model_parameters/wsj_all_cont_3no_8000_32/mixture_weights \
-tmat model/model_parameters/wsj_all_cont_3no_8000_32/transition_matrices \
-beam 1e-80 \
-dict wsj_all.dic \
-fdict wsj_all.filler \
-ctl wav_fids.scp \
-cepdir ../feat \
-cepext .mfc \
-insent transcription.txt \
-outsent alignments/output.txt \
-wdsegdir segmentations,CTL \
-agc none \
-cmn current 

Make sure there are "alignments" and "segmentations" two folders under the current path. 


5) The results

In the "output.txt" file under the "alignments" folder, each line is of the form of:
<s> <sil> FIRST <sil> COMMODITY APPEALED <sil> THE(2) <sil> EXPULSION AND <sil> FINE TO(2) THE C. F. T. C. </s>  (441c0201)
 representing the alignment and the chosen pronunciation of each word in the dictionary. 

Under the "segmentations" folder, there are ".wdseg" files for each ".mfc" file. For example, the content of "441c0201.wdseg" is:
SFrm  EFrm    SegAScr Word
   0    16   -1547495 <s>
  17    50    -187467 <sil>
  51    81   -1144739 FIRST
  82    96    -589279 <sil>
  97   161   -3079573 COMMODITY
 162   217   -2497321 APPEALED
 218   231    -878705 <sil>
 232   254    -993106 THE(2)
 255   257    -268085 <sil>
 258   314   -6117691 EXPULSION
 315   371   -2952645 AND
 372   376    -269014 <sil>
 377   409   -1439215 FINE
 410   424    -618393 TO(2)
 425   449   -2016365 THE
 450   481   -1288960 C.
 482   504   -1931603 F.
 505   532   -1083254 T.
 533   576   -1412650 C.
 577   676    -488706 </s>
 Total score:   -30804266



Useful References:

Friday, May 11, 2012

setup rtmplite demo I

1) Make sure the server has at least Python 2.6, otherwise install the newer version;

2) Download the rtmplite package from http://code.google.com/p/rtmplite/; extract and navigate to the folder;

3) Start the server with default options and debug trace by issuing the command: python rtmp.py -d

4) Open the link http://li-bo-z.comp.nus.edu.sg/rtmplite/testClient/bin-debug/testClient.html in browser, and connect to rtmp://li-bo-z.comp.nus.edu.sg/myapp ; for the netstream, just use the default "user1", then click "Publish" to send the data to server. If the "Enable recording" is selected, there will be a flv file created to record the content.

5) Open the same link as 4) in another browser window and also connect to the same application domain rtmp://li-bo-z.comp.nus.edu.sg/myapp . Use the same stream and click "Play", we can then see the same recording of the previous window. 


Saturday, April 28, 2012

Testing wami-recorder

Installation and configuration:


After the downloading is finished, extract it to a directory and add following two paths to either ~/.profile or ~/.bashrc file:
[Flex_sdk_path]/bin to $PATH
[Flex_sdk_path]/lib to $LD_LIBRARY_PATH

2) Check out the wami-recorder codes using: hg clone https://code.google.com/p/wami-recorder/

Then navigate to the [wami-recorder] folder, which has two subfolders: example and src. Compile the client with following command:

mxmlc -compiler.source-path=src -static-link-runtime-shared-libraries=true -output example/client/Wami.swf src/edu/mit/csail/wami/client/Wami.mxml

The command will generate a Wami.swf file under the example/client folder. Next we can start testing the wami recorder.

3) Testing

A) upload both the client and server php example to my own server, test with the basic.html both recording and playback works fine.

B) change the recording file to a file on the server instead of the default one which is on the wami group's server

C) instead of specify a absolute path for wami to save the recording to, use a php file to save the recording

D) change the hard coded file name to a variable that can be generated automatically

E) check the recording format, which is PCM, signed 16 bit integer, 22050 sample rate; only the sample rate is different from what we want, which is 16000. Currently, it can be converted using command line tool sox on the server. Have already found the interface of wami recorder to set the recording parameters, but the code does not effect currently

4) todos:

A) solve the recording parameter setup during wami recorder initialization 

B) try a better UI for the recorder, which currently use the 3 basic buttons

Wednesday, April 25, 2012

GSoC 2012 Applications Accepted


When I saw the poster for Google Summer of Code in my department, it was already April 6th. Thanks to the time difference, I still had one day to apply before the deadline. Searching the list of projects with "speech recognition" as the keyword showed CMU Sphinx as the only result. It was great that there was something related to my research interests, which include acoustic modeling and speaker adaptation. While checking the CMU Sphinx project page, I was so excited to see the language learning project there. I had published a paper on that topic. That's what I will do! I contacted the mentor, James, for that project. He is really nice and gave me quite a lot of suggestions for my application. Also I have to thank Ronanki, who may not know that his well written project proposal helped me a lot with my application.

Finally, both Ronanki's Pronunciation Evaluation using CMU Sphinx3 and my Accurate and Efficient Pronunciation Evaluation using CMUSphinx for Spoken Language Learning proposals were both accepted this Monday! Thanks so much to all the mentors, reviewers and also to Google for providing us this great opportunity to work on open source projects.

Pronunciation learning is one of the most important parts of second language acquisition. The aim of this project is to utilize automatic speech recognition technology to facilitate learning spoken language and reading skills. Ronanki and I will work on the same pronunciation evaluation project with different focuses. Ronanki will focus on building the web-based pronunciation evaluation system with CMU Sphinx3. I will mainly focus on developing edit-distance based mispronunciation detection grammars, speech data collection, and maximizing the potential learner population by implementing a mobile application to work with our pronunciation evaluation system. Additionally, we also plan to design and implement an game front end to make the learning process much more fun. My project involves four specific sub-tasks: automatic edit distance scoring grammar generation, exemplar pronunciation data collection, an Android app client implementation, and development of a game-based learning system.

As a first time open source contributor, there are lots of things to learn. I believe we will have a great summer this year. Also any comments or suggestions are appreciated. Thanks again for everyone that made this happen!

All the posts for GSoC 2012 will also appear in our team blog: http://pronunciationeval.blogspot.com/

Tuesday, April 24, 2012

GSoC 2012

Finally, my proposal for GSoC 2012 got accepted!

Accurate and Efficient Pronunciation Evaluation using CMUSphinx for Spoken Language Learning

Thanks so much to my mentor James for his great suggestions to my hurry application!

Let's start doing something great!
Google+