Tuesday, July 17, 2012

GSoC 2012: Pronunciation Evaluation Week 8 Status

In last week, thanks for Ronanki's help, I finally managed to integrate his exemplar outlier detection and pronunciation scoring scripts into our website, which makes it a more complete system. Following are the details of the two main parts:

1) Exemplar outlier detection

In the exemplar data collection part of the website, after the user recorded each phrase, the system will automatically decide whether the current recording is an exemplar or not. This process involves following steps:
a) Using ffmpeg to convert the FLV recording generated by the rtmplite to WAV file;
b) MFCC feature extraction using sphinx_fe;
c) Do alignment to get the phonetic transcription of the current phrase using sphinx_align;
d) Decode the current recording with edit distance grammar using sphinx_decode;
e) Compute phone error rate for the recognition results compared to the alignment;
f) Based on the phone error rate, the system will decide whether the current recording is an exemplar or not. Currently we require the phone error rate to be less than 50% for an exemplar, which could be further tuned for different cases.
g) The system then either navigate to next phrase for recording or ask the user to re-try the recording for the current phrase according to the above decision.


In the other part of the website, i.e. the student section, two new functionalities are added.

2) Exemplar recordings for student to study

When the student select a phrase for practice, maximum 5 exemplar recordings for this phrase from the database are selected and listed for the student to mimic and learn. Currently, the selection is randomly done. After we collected more exemplar recordings, it could be improved by selecting the top scored exemplars.

3) Student pronunciation scoring

Another important feature for student page is to score the recording he/she recorded. This is done with following steps:
a) Convert the FLV files from rtmplite to WAV using ffmpeg;
b) MFCC feature extraction using sphinx_fe;
c) Generate the phonetic alignment using sphnix_align;
d) Comparing the aligned phone dependent statistics with the average statistics from exemplars to evaluate the student's performance. 
e) Present both the acoustic and duration scores and navigate to next phrase.



No comments:

Post a Comment

Google+