Dream & Passion: July 2012

Tuesday, July 17, 2012

GSoC 2012: Pronunciation Evaluation Week 8 Status

In last week, thanks for Ronanki's help, I finally managed to integrate his exemplar outlier detection and pronunciation scoring scripts into our website, which makes it a more complete system. Following are the details of the two main parts:

1) Exemplar outlier detection

In the exemplar data collection part of the website, after the user recorded each phrase, the system will automatically decide whether the current recording is an exemplar or not. This process involves following steps:

a) Using ffmpeg to convert the FLV recording generated by the rtmplite to WAV file;

b) MFCC feature extraction using sphinx_fe;

c) Do alignment to get the phonetic transcription of the current phrase using sphinx_align;

d) Decode the current recording with edit distance grammar using sphinx_decode;

e) Compute phone error rate for the recognition results compared to the alignment;

f) Based on the phone error rate, the system will decide whether the current recording is an exemplar or not. Currently we require the phone error rate to be less than 50% for an exemplar, which could be further tuned for different cases.

g) The system then either navigate to next phrase for recording or ask the user to re-try the recording for the current phrase according to the above decision.

In the other part of the website, i.e. the student section, two new functionalities are added.

2) Exemplar recordings for student to study

When the student select a phrase for practice, maximum 5 exemplar recordings for this phrase from the database are selected and listed for the student to mimic and learn. Currently, the selection is randomly done. After we collected more exemplar recordings, it could be improved by selecting the top scored exemplars.

3) Student pronunciation scoring

Another important feature for student page is to score the recording he/she recorded. This is done with following steps:

a) Convert the FLV files from rtmplite to WAV using ffmpeg;

b) MFCC feature extraction using sphinx_fe;

c) Generate the phonetic alignment using sphnix_align;

d) Comparing the aligned phone dependent statistics with the average statistics from exemplars to evaluate the student's performance.

e) Present both the acoustic and duration scores and navigate to next phrase.

Tuesday, July 10, 2012

GSoC 2012: Pronunciation Evaluation Week 7 Status

Hi All,

Last week, I was still working on the data collection website.

Thank Robert (butler1970@gmail.com) so much for trying out the website and listed the issues he encountered on this page: https://www.evernote.com/pub/butler1970/cmusphinx#b=11634bf8-7be9-479f-a20e-6fa1e54b322b&n=398dc728-b3f0-4ceb-8ccf-89295b98a6d7

Issue #1: The under construction of Student Page

The first stage of the website to collect exemplar recordings, thus the student page is not implemented at that time.

Issue #2: The inconvenient birthdate control

The birthdate control is now replaced with the standard HTML5 <input type="datetime"> control. Due to the datetime input control is a new element in HTML5, currently only Chrome, Safari and Opera support the popup date selection. On other browsers, which have on support yet, the control will simply be displayed as an input box. The user can just type in the date and the background script will check whether the format is correct or not.

Issue #3: The incorrect error message "Invalid date format" on the additional information update page

After digging into the source code to find the problem for several hours, the bug lies in the order of invoking mysql related functions. The processing steps in the additional information update page is as follows:

a) client side post the user input information to the server;

b) server side first using mysql_escape_string function to preprocess the user information to ensure the security of later mysql queries;

c) check the format of each field including the date time format, whether the user inputs a valid date;

d) update the mysql database with the new information.

As only in step d) the mysql sever action is needed, I thus put the database connection code behind step c), without knowing the mysql_escape_string function also requires mysql database connection. In the previous implementation, the mysql_escape_string returns empty string thus leads to invalid date format.

Secondly, the exemplar recording page is update with following features:

1) Automatically move to the next utterance after the user record and playback the current recording;

2) Adding extra navigation control for recording phrase selection;

3) When the user opens the exemplar recording page, the first un-recorded utterance will be set to the first one shown the user.

4) Connection the enable and disable of recording and playback buttons of the player with the database information, i.e. if the user has recorded the phrase before, both the recording and playback buttons are enabled, otherwise only recording is allowed.

The third major part done in last week is the student page which is previously left empty.

For the student page, users now can also practice their pronunciation by recording the phrases in the database and also listening to the exemplar recordings in the system. The features are:

1) Full recording and playback functionalities as exemplar recording;

2) When navigating to each phrase, randomly maximum 5 exemplar recordings from the system are retrieved from the database and listed on the page to help the students.

3) Additionally, to put some exemplar recordings in the system, I have to manually transcribe several sentences and put the recordings into the system for use. After there are many people contributing to the exemplar recordings, I don't need to do manually transcription any more.

For this week, two major tasks to be done: integration with Ronanki's evaluation scripts and mid-term report.

Regards,

Troy