Dream & Passion

Wednesday, May 30, 2012

First use of Bazaar and LaunchPad.net

1. Check the launchpad brunch:

bzr branch lp:~troy-lee2008/pronunciationeval/branch-troy

2. Update to launchpad:

bzr push lp:~troy-lee2008/pronunciationeval/branch-troy

This should be executed everytime you want your local changes be reflected in the server.

3. Useful operations

bzr merge

bzr add [file/directory]

bzr mkdir [directory]

bzr mv [file/directory...]

bzr rm [file/directory]

bzr commit -m "message"

Most of the time, the operations go in following steps:

a) bzr merge: to sync to the most recent revision with the server

b) bzr add, mv, rm etc. to make necessary changes

c) bzr commit to commit changes to your local repository

d) bzr push to submit your changes to the server repository

The initial bunch creation, I mainly reference from http://doc.bazaar.canonical.com/latest/en/mini-tutorial/index.html and https://help.launchpad.net/Code/UploadingABranch

Some other helps:

http://doc.bazaar.canonical.com/latest/en/tutorials/using_bazaar_with_launchpad.html

Monday, May 28, 2012

[GSoC 2012: Pronunciation Evaluation #Troy] Week 1

The first week of GSoC 2012 has already been a busy summer. Following are the stuff I managed to accomplish:

The server side rtmplite is now configured to save the recordings into the folder [path_to_webroot]/data on the server. And for the audioRecorder app, all the data will be under the [path_to_webroot]/data/audioRecorder folder and for each user there will be a separate folder (e.g. [path_to_webroot]/data/audioRecorder/user1). For each recording utterance, the file name is now in the format of [sentence name]_[quality level].flv
Till now the conversion from FLV to WAV is done purely on the server side inside rtmplite with Python's subprocess.Popen() function calling FFMPEG. After the rtmplite close the FLV file, the conversion will be carried out immediately and the converted WAV file has exactly the same path and name except the suffix, which is WAV instead of FLV. Thanks very much for Guillem to help me test "sox" to do the conversion. However, I failed to use sox directly to convert FLV to WAV in the terminal, which said "unknown format flv". If needed I will try to figure out whether it is the problem of the sox I have. James then pointed out we can do it inside rtmplite. This really helps. Yes, why must send an extra HTTP request to invoke a PHP process for the conversion?
To verify the recording parameter, i.e. the quality for speex encoding, I tried to record the same utterance ("NO ONE AT THE STATE DEPARTMENT WANTS TO LET SPIES IN") with the quality varying from 0 to 10. Apparently, the higher the quality, the larger the FLV file will be. From my own listening, the better the quality. However, it is hard to notice the differences above level 7. I also tried to generate alignment scores to see whether the quality affects the alignment. However, from the results shown in the following graph, the acoustic scores seems incomparable among different recordings. However, we will for now set the recording quality to 8.
For the audioRecorder, only when the NetConnection event and NetStream open and close events successfully finished will the UI and other events carry on. Also 0.5s delay is inserted into the starting and ending of the recording button click event to avoid clipping.

For the 2nd week,

Solve the problem encountered in converting FLV to WAV using FFMEPG with Python's Popen(). If the main Python script (call it test.py for now) is run in the terminal with "python test.py", then no problems, everything works great. However, if I want to put it in background and log off the server by doing "python test.py &", everytime when Popen() is invoked, the whole process hangs there with a "Stopped + test.py &" error message. I have tried different approaches found from web without success for two days. I will try to figure out a way to work around it. Otherwise, maybe turn to Guillem's suggestion and figure out whether sox works.
Finish the upload interface. There will be two kinds of interfaces: one for students and one for exemplar pronunciations. For the students': we want to display one to five phrases below space for a graphic or animation, assuming the smallest screen possible but with HTML which also looks good in a large window. For the exemplar, we just need to display one phrase but we should also have per-upload form fields (name, age, sex, native English speaker?, where lived ages 6-8 (determines accent), self-reported accent, etc.) which should persist across multiple uploads by the same user (with cookies?)
Testing the rtmplite for multiple users and the same user's multiple recording sessions.

Saturday, May 26, 2012

First use of SVN

Due to the requirement of GSoC 2012, we need to check in our code in the cmusphinx subversion. Here is what I tried.

Step 1: Creating a new brunch:

svn mkdir https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/speecheval

Then a log message will be opened with vi, just add some comments to it, save and quit. Next press 'c' and Enter, user authentication will be prompted, if the username is not your SVN username, just press Enter and then it will ask for both username and password.

After that, seeing the message "Committed revision 11369" would probably indicating a successful operation.

However, I did receive an email saying that the operation "Is being held until the list moderator can review it for approval.". Browsing the online SVN branches, the folder is already there.

Step 2: Check out the brunch:

svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/speecheval speecheval

Some other commands useful:

svn add [filename/folder]

svn delete [filename/folder]

svn commit -m "message"

Thursday, May 24, 2012

[GSoC 2012] Before Week 1

GSoC 2012 has already officially started since this Monday (21 May). Although the expected weekly report should be starting next Monday, it would be better to have a brief overview of the preparations we have done during the bonding period.

The projects starts with the group chat with our mentor James and the other student Ronanki. From the chat together with the following email communications, the project is becoming more and more clear for me. For my project, the major focuses would be:

1) a web portal for automatic pronunciation evaluation audio collection;

2) Android based mobile automatic pronunciation evaluation app.

The core of these two applications is the edit distance grammar based automatic pronunciation evaluation using cmusphinx3, which would server as both the foundation and prior.

Following are the preparations I have done during the bonding period:

Trying out the basic wami-recorder demo on my school's server;
Change to the rtmplite for audio recording. rtmplite is a Python implementation of the Flash RTMP server with minimum support needed for real-time streaming and recording using AMF0. On the server side, the daemon RTMP server process will by default listen on the TCP 1935 port for connection and streaming. On the client side, the user needs to use NetConnection to setup a session with the server and using NetStream for audio and video streaming and also recording. The demo application has been set up at: http://talknicer.net/~li-bo/testClient/bin-debug/testClient.html
Based on the understanding of the demo application, which does the real time streaming and recording of both audio and video, I started to write my own audio recorder which is the key component for both the web-based audio data collection and the evaluation app. The basic version of the recorder was hosted at: http://talknicer.net/~li-bo/audioRecorder/audioRecorder.html . The current implementation includes:

Distinguish recordings from different users with user id;
Pre-defined text sentences loading for recording, which may be useful for the data collection;
Real-time audio recording;
Playback the recordings from the server.
Basic event control logic, such as prevent users from recording and playing in the same time etc.

Besides, I have also learnt from http://cmusphinx.sourceforge.net/wiki/sphinx4:sphinxthreealigner on how to do alignment using cmusphinx3. To generate the phoneme alignment scores, two steps of alignments are needed. Regarding the details of how to carry out the alignment can be found on my more tech-oriented posts (http://troylee2008.blogspot.com/2012/05/testing-cmusphinx3-alignment.html and http://troylee2008.blogspot.com/2012/05/cmusphinx3-phoneme-alignment.html) on my personal blog.

Currently, there are followings thing ongoing:

Setup the server side process to well manage the user recordings, i.e. distinguishing between users and different utterances.
Figuring out how to automatically convert the recorded server side FLV files to WAV files after the user stop the recording.
Verify the recording parameters against the recording quality and also taking the network bandwidth into consideration.
Incorporating delays between network events in the recorder. The current version does not wait for the network events (such as connection set up, data package transmission etc) to successfully finish to process next user event, which usually cause the recordings to be clipped out.

Sunday, May 20, 2012

Configure Ubuntu12.04 to boot without a monitor

Normally, the Ubuntu 12.04 desktop system is for PC which is assumed to be connected with a monitor, a keyboard and a mouse. As I just want to remotely login to the computer, I just removed those devices and leave the machine with only power and network. Everything works fine before I need to reboot the machine remotely. Then I can never connected it again before I reconnect it with a monitor and reboot it.

To solve the problem, I finally find this post: http://ubuntuforums.org/showthread.php?t=1452600&page=3 . The solution is:

Step 1. Back up the original xorg.conf to xorg.conf.bk just in case. Create a new xorg.conf in /etc/X11 with the following.

Section "Device"
Identifier "VNC Device"
Driver "vesa"
EndSection

Section "Screen"
Identifier "VNC Screen"
Device "VNC Device"
Monitor "VNC Monitor"
SubSection "Display"
Modes "1024x768"
EndSubSection
EndSection

Section "Monitor"
Identifier "VNC Monitor"
HorizSync 30-70
VertRefresh 50-75
EndSection

Step 2. Disable KMS for your video card

The list is to know which video card manufacturer you have and use the command line entry below it to create the appropriate kms.conf file with the line "options...modeset=0" line in it. If you have access to the GUI you could just are easily create/modify the file and put the "options...modeset=0" in as appropriate.

The following are input into the terminal windows as a line command.

# ATI Radeon:
echo options radeon modeset=0 > /etc/modprobe.d/radeon-kms.conf

# Intel:
echo options i915 modeset=0 > /etc/modprobe.d/i915-kms.conf

# Nvidia (this should revert you to using -nv or -vesa):
echo options nouveau modeset=0 > /etc/modprobe.d/nouveau-kms.conf

As for my case, mine is Intel, I add "option i915 modeset=0 " to /etc/modprobe.d/dkms.conf .

Step 3. Reboot

cudaGetDeviceCount returned 38

The machine was initially configured to use the NVIDIA card to provide video output and the IGD (integrated graphic device) comes with the board is disabled. To make full use of the GPU to provide computation power, I have to re-configure the BIOS to use the IGD for video output and leave NVIDIA card for computation.

However, after installing the NVIDIA driver, CUDA 4.2 and SDK on Ubuntu 12.04, the test program deviceQuery cannot find the CUDA device:

$. / DeviceQuery
[DeviceQuery] starting ...
. / DeviceQuery Starting ...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> No CUDA-capable device is detected
[DeviceQuery] test results ...
FAILED

Press ENTER to exit ...

Checking the device:

$ lspci | grep -i NVIDIA
01:00.0 VGA compatible controller: NVIDIA Corporation Tesla C2075 (rev a1)

With the help of the post: http://d.hatena.ne.jp/flick-flick/20110818 and Google Translate, finally realized the problem is actually stated in the installation doc:

4. If you do not use a GUI environment, ensure that the device files /dev/nvidia*
exist and have the correct file permissions. (This would be done automatically when
initializing a GUI environment.) This can be done creating a startup script like the
following to load the driver kernel module and create the entries as a superuser at
boot time:
#!/bin/bash

/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
    # Count the number of NVIDIA controllers found.
    NVDEVS=`lspci | grep -i NVIDIA`
    N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
    NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
    N=`expr $N3D + $NVGA - 1`
    for i in `seq 0 $N`; do
        mknod -m 666 /dev/nvidia$i c 195 $i
    done
    mknod -m 666 /dev/nvidiactl c 195 255
else
    exit 1
fi

Thus to solve the problem: add the above scripts to /etc/rc.local as a startup script.

Linking error while compiling CUDA SDK in Ubuntu 12.04

Following the installation guide provided by CUDA website, all the dependency libraries are installed through apt-get:

sudo apt-get install freeglut3-dev build-essential libx11-dev
libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

When compiling the SDK, still get the error:

../../lib/librendercheckgl_x86_64.a(rendercheck_gl.cpp.o): In function `CheckBackBuffer::checkStatus(char const*, int, bool)':
rendercheck_gl.cpp:(.text+0xfbb): undefined reference to `gluErrorString'
collect2: ld returned 1 exit status

Tried several times uninstall and install again, and finally find the post: http://forums.developer.nvidia.com/devforum/discussion/3486/linkingmake-error-while-compiling-sdk-on-ubuntu-11-10/p1 solving this problem:

The problem is in the ~/NVIDIA_GPU_Computing_SDK/C/common/common.mk

Lines like:
LIB += ... ${OPENGLLIB} .... $(RENDERCHECKGLLIB) ...
should have the two in reverse.