Dream & Passion

Tuesday, July 17, 2012

GSoC 2012: Pronunciation Evaluation Week 8 Status

In last week, thanks for Ronanki's help, I finally managed to integrate his exemplar outlier detection and pronunciation scoring scripts into our website, which makes it a more complete system. Following are the details of the two main parts:

1) Exemplar outlier detection

In the exemplar data collection part of the website, after the user recorded each phrase, the system will automatically decide whether the current recording is an exemplar or not. This process involves following steps:

a) Using ffmpeg to convert the FLV recording generated by the rtmplite to WAV file;

b) MFCC feature extraction using sphinx_fe;

c) Do alignment to get the phonetic transcription of the current phrase using sphinx_align;

d) Decode the current recording with edit distance grammar using sphinx_decode;

e) Compute phone error rate for the recognition results compared to the alignment;

f) Based on the phone error rate, the system will decide whether the current recording is an exemplar or not. Currently we require the phone error rate to be less than 50% for an exemplar, which could be further tuned for different cases.

g) The system then either navigate to next phrase for recording or ask the user to re-try the recording for the current phrase according to the above decision.

In the other part of the website, i.e. the student section, two new functionalities are added.

2) Exemplar recordings for student to study

When the student select a phrase for practice, maximum 5 exemplar recordings for this phrase from the database are selected and listed for the student to mimic and learn. Currently, the selection is randomly done. After we collected more exemplar recordings, it could be improved by selecting the top scored exemplars.

3) Student pronunciation scoring

Another important feature for student page is to score the recording he/she recorded. This is done with following steps:

a) Convert the FLV files from rtmplite to WAV using ffmpeg;

b) MFCC feature extraction using sphinx_fe;

c) Generate the phonetic alignment using sphnix_align;

d) Comparing the aligned phone dependent statistics with the average statistics from exemplars to evaluate the student's performance.

e) Present both the acoustic and duration scores and navigate to next phrase.

Tuesday, July 10, 2012

GSoC 2012: Pronunciation Evaluation Week 7 Status

Hi All,

Last week, I was still working on the data collection website.

Thank Robert (butler1970@gmail.com) so much for trying out the website and listed the issues he encountered on this page: https://www.evernote.com/pub/butler1970/cmusphinx#b=11634bf8-7be9-479f-a20e-6fa1e54b322b&n=398dc728-b3f0-4ceb-8ccf-89295b98a6d7

Issue #1: The under construction of Student Page

The first stage of the website to collect exemplar recordings, thus the student page is not implemented at that time.

Issue #2: The inconvenient birthdate control

The birthdate control is now replaced with the standard HTML5 <input type="datetime"> control. Due to the datetime input control is a new element in HTML5, currently only Chrome, Safari and Opera support the popup date selection. On other browsers, which have on support yet, the control will simply be displayed as an input box. The user can just type in the date and the background script will check whether the format is correct or not.

Issue #3: The incorrect error message "Invalid date format" on the additional information update page

After digging into the source code to find the problem for several hours, the bug lies in the order of invoking mysql related functions. The processing steps in the additional information update page is as follows:

a) client side post the user input information to the server;

b) server side first using mysql_escape_string function to preprocess the user information to ensure the security of later mysql queries;

c) check the format of each field including the date time format, whether the user inputs a valid date;

d) update the mysql database with the new information.

As only in step d) the mysql sever action is needed, I thus put the database connection code behind step c), without knowing the mysql_escape_string function also requires mysql database connection. In the previous implementation, the mysql_escape_string returns empty string thus leads to invalid date format.

Secondly, the exemplar recording page is update with following features:

1) Automatically move to the next utterance after the user record and playback the current recording;

2) Adding extra navigation control for recording phrase selection;

3) When the user opens the exemplar recording page, the first un-recorded utterance will be set to the first one shown the user.

4) Connection the enable and disable of recording and playback buttons of the player with the database information, i.e. if the user has recorded the phrase before, both the recording and playback buttons are enabled, otherwise only recording is allowed.

The third major part done in last week is the student page which is previously left empty.

For the student page, users now can also practice their pronunciation by recording the phrases in the database and also listening to the exemplar recordings in the system. The features are:

1) Full recording and playback functionalities as exemplar recording;

2) When navigating to each phrase, randomly maximum 5 exemplar recordings from the system are retrieved from the database and listed on the page to help the students.

3) Additionally, to put some exemplar recordings in the system, I have to manually transcribe several sentences and put the recordings into the system for use. After there are many people contributing to the exemplar recordings, I don't need to do manually transcription any more.

For this week, two major tasks to be done: integration with Ronanki's evaluation scripts and mid-term report.

Regards,

Troy

Tuesday, June 26, 2012

Troy: GSoC 2012 Pronunciation Evaluation Week 5

Sorry for the late update. Following are the stuff done in week 5, mainly problem solving.

1) Solving the Flash based recorder doesn't allow users to enable their microphone access problem.

At the very beginning (before the Flash player 11.2 and 11.3 update), the audio recorder I created using Flex works fine. Users can simply right click the recorder and select the "Settings" to allow microphone access. However, with the new updates, that option is disabled without giving any information.

To solve it, many people suggest adding the websites into the online global privacy list. However, after trying many times still not working for my case.

Furthermore, checking the http://englishcentral.com/ which also is Flash based recording suddenly gives me a clue. In their website, after clicking recording button (which is a microphone image), a popup window shows up with the Flash Microphone privacy setting dialogue. Yes! Instead of finding the problem disabled the "Setting" option for Flash object, why not checking the accessibility of microphone in code and prompt the setting dialogue when necessary. Here comes the solution:

First, checking whether the microphone is available, if not show the microphone list dialogue of Flash object ask the user to plugin a microphone:

var mic:Microphone = Microphone.getMicrophone();

if(!mic) {

Alert.show("No microphone available");

debug("No microphone available");

Security.showSettings("microphone");

}

Otherwise, check whether the microphone is accessible or not, if it is muted, prompt the privacy dialogue to ask user to allow the microphone access:

if(mic.muted) {

debug("Microphone muted!");

Security.showSettings("privacy");

}

With these testing during the initialization stage of the Flash recorder, it can allow users to enable the microphone access at the early beginning. One interesting thing is that after doing this, the "Setting" option of the Flash object now is clickable.

Now, looking back to the code solving the problem, which is so apparent, however, before you know the answer, it is really hard to predict.

2) Cross-browser Flash recorder compatibility

As the Flash recorder problem was finally solved using the above method, I was so happy to update the trunk and our server and hoped to see the site working nicely. But the browser shows that the Flash recorder cannot load, the only information I got is "Error 2046". ....

Although rather depressed, I still have to solve the problem. Googled a bunch of pages and tried several suggestions, the one that first clear the browser cache and then set the Flash player not to save local cache and then re-enable its local cache (some kind of clear Flash player local cache), gives some progress by changing the "Error 2046" to "Error 2032".

For "Error 2032", there are mainly two groups of explanations, one said there are something wrong with the URLs in your Actionscript's HTTPRequests, which is not my problem as those URLs are definitely correct and are under the same folder as the player. The other is the RSL problem of Flash compiler. To solve the RSL linkage problem, go to the "Flex Build Path" properties page, "Library path" tab and change the framework linkage to "merged into code".

3) Adding password change page

4) Refine the user extra information update page to reflect the existing user information if available, instead of always showing the default values.

Till now, the website for exemplary recordings are finally come to a usable stage.

In this week, I will try to accomplish followings:

1) Prompts adding page for administrators;

2) Design recording prompts to start our exemplary recording data collection;

3) Bug fixing and system testing;

4) Study the Amazon Mechanical Turk and start thinking how to incorporate our recording onto that platform.

Monday, June 18, 2012

Troy: GSoC 2012 Pronunciation Evaluation Week 4

Finally, the data collection website now can provide the basic capabilities! Anyone who are interested, check out our website at http://talknicer.net/~li-bo/datacollection/login.php and have a try. If you encountered any problems, do let us know. Following are the stuff I have done during the last week:

1) Discussed with my mentor James to finalize the schema design and created the whole database with MySQL. The whole database design could be found http://talknicer.net/w/Database_schema . During the development of the website, slightly modifications were carried out to refine the database design. Such as the age field for the users table, only when I try to insert user values into the table did I realize the age value is dependent on the registration date which may not be a good idea to store age. Storing birthdate would be much better. Similar changes like that are updated. What I learnt from these is that a good design comes from practice not purely imagination.

2) Implement the two types of user registration page: one for student and one for exemplary. As we don't want to constrain the two types of users to be exclusive and to avoid redundant work, the registration involves two steps: one basic registration and one extra information update. For student, only the basic one is compulsory, but for the exemplary they have to finish both the two forms.

3) Adding extra supporting functionalities for user management. These including: password reset, mode selection for users with multiple types.

4) Incorporating the audio recorder with the website for recording and uploading to servers.

Things to do this week:

1) Prompts adding page;

2) Testing the system;

3) Design the pronunciation learning game for student users.

Tuesday, June 12, 2012

Troy: GSoC 2012 Pronunciation Evaluation Week 3

Due to some personal stuff, I haven't made much progress during the 3rd week of GSoC. I will try my best to catch up in this week.

Things done:

1. Tailor the previous audio recorder to provide only the recording and playback functionality and leave interfaces for Javascript to communicate with the web site pages.

2. Discuss with my mentor regarding the database design.

Things to do this week:

1. Fix schema for prompts to handle word lists with pronunciation and parts of speech (along with a separate text string for display which might not have as clear word boundaries because of punctuation--such as this--etc.)

2. Get separate registration interface for exemplar uploaders

3. Get an interface to add prompts

4. Do the interface to upload recordings for prompts

5. Think about the game play and do its schema once the basic features are decided

Monday, June 4, 2012

Troy: GSoC 2012 Pronunciation Evaluation Week 2

Following are things done in the second week of GSoC 2012:

1. Setup the server rtmplite to automatically check whether the process is still running or not. If it is stopped, restart it.

To accomplish this, first of all, create a .process file under my home folder and put the current rtmplite process id as the only content of this .process file. You can use 'top' or 'ps' to find out the current process id of your application.

Then I created following script file to do the status checking:

if [ -e "$pidfile" ]  then	  	# check whether the process is running  	rtmppid=`/usr/bin/head -n 1 ${pidfile} | /usr/bin/awk '{print $1}'`;  	  	# restart the process if not running  	if [ ! -d /proc/${rtmppid} ]  	then  		/usr/bin/python ${exefile} -r ${dataroot} &  		rtmppid=$!  		echo "${rtmppid}" > ${pidfile}  		echo `/bin/date` "### rtmp process restarted with pid: ${rtmppid}"  	fi  fi

In this script, first we will check whether the .process files ( i.e. the $pidfile) exists or not. If we don't want the server to check for this process for now (maybe when we apply patches to the program), we could simply delete this file and it won't check the process again. And after the maintenance, recreate the file with the new process id. The checking will automatically going on.

The checking itself is quite simple: getting the process id from the file and see whether the process exists by looking into the /proc system folder where each running process will have a folder. Goolge the '/proc linux' you will get more information about this mystery folder which contains quite a lot information about your system.

2. Implement the login and registration pages using HTML5.

First for user information storage, we use MySQL database, thus a user table is designed and created in the server's mysql database:


      Field Type       Comments
      userid       INTEGER Compulsory, automatically increased, primary key       
email       VARCHAR(200)       Compulsory, users are identified by emails
      password VARCHAR(50)       Compulsory, encrypted using SHA1, at least 8 alphanumeric characters       
name       VARCHAR(100)       Not compulsory, default 'NULL'
      age INTEGER       Not compulsory, default 'NULL', accepted values [0,150]       
sex       CHAR(1)       Not compulsory, default 'NULL', accepted values {'M', 'F'}
      native CHAR(1)       Not compulsory, default 'NULL', accepted values {'Y', 'N'}. Indicating the user is a native English speaker or not.       
place       VARCHAR(1000)       Not compulsory, default 'NULL'. Indicating the place when the user lived at the age between 6 and 8.
      accent       CHAR(1) Not compulsory, default 'NULL', accepted values {'Y', 'N'}. Indicating the user has a self-reported accent or not.

The creation of the database:

CREATE TABLE users ( userid INTEGER NOT NULL AUTO_INCREMENT, email VARCHAR(200) NOT NULL, password VARCHAR(50) NOT NULL, name VARCHAR(100), age INTEGER, sex SET('M', 'F'), native SET('Y', 'N') DEFAULT 'N', place VARCHAR(1000), accent SET('Y', 'N'), CONSTRAINT PRIMARY KEY (userid), CONSTRAINT chk_age CHECK (age>=0 AND age<=150) );
Secondly, the login and simple registration pages are implemented in HTML5 which I have to learn in practice. Follows are the screenshots of the pages:

Also if you are interested, you can go to this page to help us test the system: http://talknicer.net/~li-bo/datacollection/login.php . On the server side we use PHP to retrive information from the page and do the query in mysql database and finally sending the data back to the page.
The recording interface, has also been ported to use HTML instead of pure Flex as I did before. The page current shows up OK but no event interaction between HTML and Flash yet.

3. Database design for the whole project.
A bunch of tables are designed to store various information for this project. Detailed table information could be found on our wiki page: http://talknicer.net/w/Database_schema . Here I will give a brief discussion. First the user table created in the previous step will be augmented to keep two kind of user information: one for normal student user and one for exemplar recordings. The reason to put them into one table instead of two is that student users, when they can do an excellent job in pronunciation, should also be allowed to contribute to the exemplar recording. Also for exemplar recorders, if they register through the website, they have to show they are proficient enough to contribute a qualified exemplar recording.

Beside the user table, there are several other tables to for necessary information such as languages for list of languages defined by ISO in case we may extend our project to other languages; region table to have an idea of the user's accent; prompts table for the list of text resources will be used for pronunciation evaluation.

Then are the tables to log the recordings the users do and tables for set of tests designed in the system.

Things to do in the coming week:

1. Discuss more regarding the game part to finish the last part of schema design.

2. Figure out how to integrated the Flash audio recorder nicely with the HTML5 interface.

3. Implement the student recording interface.

4. Further tasks could be found on this list: http://talknicer.net/w/To_do_list

Wednesday, May 30, 2012

First use of Bazaar and LaunchPad.net

1. Check the launchpad brunch:

bzr branch lp:~troy-lee2008/pronunciationeval/branch-troy

2. Update to launchpad:

bzr push lp:~troy-lee2008/pronunciationeval/branch-troy

This should be executed everytime you want your local changes be reflected in the server.

3. Useful operations

bzr merge

bzr add [file/directory]

bzr mkdir [directory]

bzr mv [file/directory...]

bzr rm [file/directory]

bzr commit -m "message"

Most of the time, the operations go in following steps:

a) bzr merge: to sync to the most recent revision with the server

b) bzr add, mv, rm etc. to make necessary changes

c) bzr commit to commit changes to your local repository

d) bzr push to submit your changes to the server repository

The initial bunch creation, I mainly reference from http://doc.bazaar.canonical.com/latest/en/mini-tutorial/index.html and https://help.launchpad.net/Code/UploadingABranch

Some other helps:

http://doc.bazaar.canonical.com/latest/en/tutorials/using_bazaar_with_launchpad.html

Field	Type	Comments
userid	INTEGER	Compulsory, automatically increased, primary key
email	VARCHAR(200)	Compulsory, users are identified by emails
password	VARCHAR(50)	Compulsory, encrypted using SHA1, at least 8 alphanumeric characters
name	VARCHAR(100)	Not compulsory, default 'NULL'
age	INTEGER	Not compulsory, default 'NULL', accepted values [0,150]
sex	CHAR(1)	Not compulsory, default 'NULL', accepted values {'M', 'F'}
native	CHAR(1)	Not compulsory, default 'NULL', accepted values {'Y', 'N'}. Indicating the user is a native English speaker or not.
place	VARCHAR(1000)	Not compulsory, default 'NULL'. Indicating the place when the user lived at the age between 6 and 8.
accent	CHAR(1)	Not compulsory, default 'NULL', accepted values {'Y', 'N'}. Indicating the user has a self-reported accent or not.