Saturday, April 28, 2012
Testing wami-recorder
Wednesday, April 25, 2012
GSoC 2012 Applications Accepted
Finally, both Ronanki's Pronunciation Evaluation using CMU Sphinx3 and my Accurate and Efficient Pronunciation Evaluation using CMUSphinx for Spoken Language Learning proposals were both accepted this Monday! Thanks so much to all the mentors, reviewers and also to Google for providing us this great opportunity to work on open source projects.
Pronunciation learning is one of the most important parts of second language acquisition. The aim of this project is to utilize automatic speech recognition technology to facilitate learning spoken language and reading skills. Ronanki and I will work on the same pronunciation evaluation project with different focuses. Ronanki will focus on building the web-based pronunciation evaluation system with CMU Sphinx3. I will mainly focus on developing edit-distance based mispronunciation detection grammars, speech data collection, and maximizing the potential learner population by implementing a mobile application to work with our pronunciation evaluation system. Additionally, we also plan to design and implement an game front end to make the learning process much more fun. My project involves four specific sub-tasks: automatic edit distance scoring grammar generation, exemplar pronunciation data collection, an Android app client implementation, and development of a game-based learning system.
As a first time open source contributor, there are lots of things to learn. I believe we will have a great summer this year. Also any comments or suggestions are appreciated. Thanks again for everyone that made this happen!
Tuesday, April 24, 2012
GSoC 2012
Accurate and Efficient Pronunciation Evaluation using CMUSphinx for Spoken Language Learning
Thanks so much to my mentor James for his great suggestions to my hurry application!
Let's start doing something great!
Wednesday, February 1, 2012
WSJ Setup
Wednesday, November 30, 2011
[HTK] Chinese Encoding
| 利用HTK工具包进行语音识别建模时,遇到任务语法中存在中文时候,无法生成对应的底层网络,这样就需要对HTK源码的部分内容进行修改,以下是我对HTK源码HParse及HVite部分内容改动记录,希望对有需要的人有帮助!自己也做个备份! 添加下面函数 static int IsSpace(char c) { if ((c == 0x09) ||( c == 0x0D) || (c == ' ' )) return 1; return 0; } 修改下面的函数 static void PGetSym(void) { ....///////////// +++while ( !IsSpace(ch) || (ch=='/' && inlyne[curpos]=='*') ) //isspace((int) ch) { +++ if (!IsSpace(ch) || isspace((int) ch)) /* skip space */ PGetCh(); else { /* skip comment */ PGetCh(); PGetCh(); while (!(ch=='*' && inlyne[curpos]=='/')) PGetCh(); PGetCh(); PGetCh(); } } ..../////////////以下部分代码为做修改 }static void PGetIdent(void) { int i=0; Ident id;do { if (ch==ESCAPE) PGetCh(); if (i<MAXIDENT) id[i++]=ch; PGetCh(); +++ } while (!IsSpace(ch)&& ch!='{' && ch!='}' && ch!='[' && ch!=']' &&//!isspace( (int)ch) ch!='<' && ch!='>' && ch!='(' && ch!=')' && ch!='=' && ch!=';' && ch!='|' && ch!='/' && ch!='%'); id[i]='\0'; ident = GetLabId(id,TRUE); }ReturnStatus WriteOneLattice(Lattice *lat,FILE *file,LatFormat format) { .../////////////////////////////// else if (ln->word!=NULL) { fprintf(file,"W=%-19s ",ln->word->wordName->name);// // ReWriteString(ln->word->wordName->name,注释掉 // NULL,ESCAPE_CHAR)); ...//////////////////////////////// } 这样在生产的底层网络中就可以看到汉字,而不是汉字编码了。下面是我测试的一个简单例子: 这是taskgram中的内容 $word = 好 | 浩 | 尼 | 你; ( START_SIL ([sil] )(<$word>)( [sil]) END_SIL ) 没有修改HParse生产的网络 VERSION=1.0 N=11 L=22 I=0 W=END_SIL I=1 W=sil I=2 W=\304\343 I=3 W=!NULL I=4 W=\304\341 I=5 W=\272\306 I=6 W=\272\303 I=7 W=sil I=8 W=START_SIL I=9 W=!NULL I=10 W=!NULL J=0 S=1 E=0 J=1 S=3 E=0 J=2 S=3 E=1 J=3 S=3 E=2 J=4 S=7 E=2 J=5 S=8 E=2 J=6 S=2 E=3 J=7 S=4 E=3 J=8 S=5 E=3 J=9 S=6 E=3 J=10 S=3 E=4 J=11 S=7 E=4 J=12 S=8 E=4 J=13 S=3 E=5 J=14 S=7 E=5 J=15 S=8 E=5 J=16 S=3 E=6 J=17 S=7 E=6 J=18 S=8 E=6 J=19 S=8 E=7 J=20 S=10 E=8 J=21 S=0 E=9 修改后的网络 VERSION=1.0 N=11 L=22 I=0 W=END_SIL I=1 W=sil I=2 W=你 I=3 W=!NULL I=4 W=尼 I=5 W=浩 I=6 W=好 I=7 W=sil I=8 W=START_SIL I=9 W=!NULL I=10 W=!NULL J=0 S=1 E=0 J=1 S=3 E=0 J=2 S=3 E=1 J=3 S=3 E=2 J=4 S=7 E=2 J=5 S=8 E=2 J=6 S=2 E=3 J=7 S=4 E=3 J=8 S=5 E=3 J=9 S=6 E=3 J=10 S=3 E=4 J=11 S=7 E=4 J=12 S=8 E=4 J=13 S=3 E=5 J=14 S=7 E=5 J=15 S=8 E=5 J=16 S=3 E=6 J=17 S=7 E=6 J=18 S=8 E=6 J=19 S=8 E=7 J=20 S=10 E=8 J=21 S=0 E=9 至于HVite部分,我找了近一下午,总算找到改的地方了,修改HSheel.c 中WriteString函数 n=*p; fputc(n,f); // fputc(ESCAPE_CHAR,f); // fputc(((n/64)%8)+'0',f);fputc(((n/8)%8)+'0',f);fputc((n%8)+'0',f); 我将相应的位置给注释上了,并将字符之间输出到文件中,这样在结果文件中就可以看到中文了~~ |
[HTK] Chinese encoding
Thursday, November 24, 2011
[HTK] Increase HTK feature dimension limit
HTK format files consist of a contiguous sequence of samples preceded by a header. Each sample is a vector of either 2-byte integers or 4-byte floats. 2-byte integers are used for compressed forms as described below and for vector quantised data as described later in section 5.11. HTK format data files can also be used to store speech waveforms as described in section 5.8.
The HTK file format header is 12 bytes long and contains the following data
nSamples - number of samples in file (4-byte integer)
sampPeriod - sample period in 100ns units (4-byte integer)
sampSize - number of bytes per sample (2-byte integer)
parmKind - a code indicating the sample kind (2-byte integer)
if (hdr.sampSize <= 0 || hdr.sampSize > 5000 || hdr.nSamples <= 0 ||
hdr.sampPeriod <= 0 || hdr.sampPeriod > 1000000)
return FALSE;