Wednesday, November 30, 2011

[HTK] Chinese Encoding

修改HTK源码 HParse,HVite部分,使其支持中文
2010-03-24 12:05

利用HTK工具包进行语音识别建模时,遇到任务语法中存在中文时候,无法生成对应的底层网络,这样就需要对HTK源码的部分内容进行修改,以下是我对HTK源码HParse及HVite部分内容改动记录,希望对有需要的人有帮助!自己也做个备份!
添加下面函数
static int IsSpace(char c)
{
if ((c == 0x09) ||( c == 0x0D) || (c == ' ' ))
return 1;
return 0;
}
修改下面的函数
static void PGetSym(void)
{
..../////////////
+++while ( !IsSpace(ch) || (ch=='/' && inlyne[curpos]=='*') ) //isspace((int) ch)
{
+++     if (!IsSpace(ch) || isspace((int) ch))  /* skip space */
PGetCh();
else {            /* skip comment */
PGetCh(); PGetCh();
while (!(ch=='*' && inlyne[curpos]=='/')) PGetCh();
PGetCh(); PGetCh();        
}
}
..../////////////以下部分代码为做修改
}

static void PGetIdent(void)
{
int i=0;
Ident id;

do {
if (ch==ESCAPE) PGetCh();
if (i<MAXIDENT) id[i++]=ch;
PGetCh();
+++   } while (!IsSpace(ch)&& ch!='{' && ch!='}' && ch!='[' && ch!=']' &&//!isspace( (int)ch) 
ch!='<' && ch!='>' && ch!='(' && ch!=')' && ch!='=' && 
ch!=';' && ch!='|' && ch!='/' && ch!='%');
id[i]='\0';
ident = GetLabId(id,TRUE);
}

ReturnStatus WriteOneLattice(Lattice *lat,FILE *file,LatFormat format)
{
...///////////////////////////////
else if (ln->word!=NULL) {
fprintf(file,"W=%-19s ",ln->word->wordName->name);//
//   ReWriteString(ln->word->wordName->name,注释掉
//                NULL,ESCAPE_CHAR));
...////////////////////////////////
}
这样在生产的底层网络中就可以看到汉字,而不是汉字编码了。下面是我测试的一个简单例子:
这是taskgram中的内容
$word = 好
| 浩
| 尼
| 你;
(  START_SIL ([sil] )(<$word>)( [sil]) END_SIL )
没有修改HParse生产的网络
VERSION=1.0
N=11   L=22   
I=0    W=END_SIL             
I=1    W=sil                 
I=2    W=\304\343            
I=3    W=!NULL               
I=4    W=\304\341            
I=5    W=\272\306            
I=6    W=\272\303            
I=7    W=sil                 
I=8    W=START_SIL           
I=9    W=!NULL               
I=10   W=!NULL               
J=0     S=1    E=0    
J=1     S=3    E=0    
J=2     S=3    E=1    
J=3     S=3    E=2    
J=4     S=7    E=2    
J=5     S=8    E=2    
J=6     S=2    E=3    
J=7     S=4    E=3    
J=8     S=5    E=3    
J=9     S=6    E=3    
J=10    S=3    E=4    
J=11    S=7    E=4    
J=12    S=8    E=4    
J=13    S=3    E=5    
J=14    S=7    E=5    
J=15    S=8    E=5    
J=16    S=3    E=6    
J=17    S=7    E=6    
J=18    S=8    E=6    
J=19    S=8    E=7    
J=20    S=10   E=8    
J=21    S=0    E=9    
修改后的网络
VERSION=1.0
N=11   L=22   
I=0    W=END_SIL             
I=1    W=sil                 
I=2    W=你                  
I=3    W=!NULL               
I=4    W=尼                  
I=5    W=浩                  
I=6    W=好                  
I=7    W=sil                 
I=8    W=START_SIL           
I=9    W=!NULL               
I=10   W=!NULL               
J=0     S=1    E=0    
J=1     S=3    E=0    
J=2     S=3    E=1    
J=3     S=3    E=2    
J=4     S=7    E=2    
J=5     S=8    E=2    
J=6     S=2    E=3    
J=7     S=4    E=3    
J=8     S=5    E=3    
J=9     S=6    E=3    
J=10    S=3    E=4    
J=11    S=7    E=4    
J=12    S=8    E=4    
J=13    S=3    E=5    
J=14    S=7    E=5    
J=15    S=8    E=5    
J=16    S=3    E=6    
J=17    S=7    E=6    
J=18    S=8    E=6    
J=19    S=8    E=7    
J=20    S=10   E=8    
J=21    S=0    E=9   
至于HVite部分,我找了近一下午,总算找到改的地方了,修改HSheel.c 中WriteString函数
n=*p;
fputc(n,f);
//   fputc(ESCAPE_CHAR,f);
//  fputc(((n/64)%8)+'0',f);fputc(((n/8)%8)+'0',f);fputc((n%8)+'0',f);
我将相应的位置给注释上了,并将字符之间输出到文件中,这样在结果文件中就可以看到中文了~~

Posted via email from Troy's posterous

[HTK] Chinese encoding

HTK could directly read in the "gbk" encoded MLF or dictionary etc. files. Actually, it could read any kine of encoded file. In HTK, what it does is to read in every byte (char type) and when print them out, each byte is write out in the form of "\abc", which abc is the octal representation of the byte number(=a*64+b*8+c). 

Thus to convert the HTK generated files back to the readable characters, we need following steps:
1) convert the HTK octal representation of byte values to byte array
2) decode the byte array with corresponding encoding, (e.g. for Chinese, we could use "gbk")

Following is the code I used to convert the HTK generated MLF to readable "gbk" encoded MLF file:

import string, codecs

fin=open('vom_utt_wlab.mlf')
fout=codecs.open('vom_utt_wlab.gbk.mlf', encoding='gbk', mode='w')
while True:
    sr=fin.readline()
    if sr=='':break
    sr=sr.strip()
    if sr.endswith('.lab"'):
        print >>fout, sr
        while True:
            sr=(fin.readline()).strip()
            if sr=='.':break
            if sr.startswith('\\'):
                lst=(sr.strip('\\')).split('\\') # get the list of octal representation of each byte
                bins=bytearray()
                for itm in lst:
                    val=0
                    for ii in range(3): # each octal number will have exactly 3 numbers, i.e. of the form \nnn
                        val=val*8
                        val=val+int(itm[ii])
                    bins.append(val)
                print >>fout, bins.decode('gbk')
            else:
                print >>fout, sr
        print >>fout, '.'
    else:
        print >>fout, sr
fin.close()
fout.close()

Posted via email from Troy's posterous

Thursday, November 24, 2011

[HTK] Increase HTK feature dimension limit

In the HTK feature file, there is a header file specify the basic information of the parameters. 

HTK format files consist of a contiguous sequence of samples preceded by a header. Each sample is a vector of either 2-byte integers or 4-byte floats. 2-byte integers are used for compressed forms as described below and for vector quantised data as described later in section 5.11. HTK format data files can also be used to store speech waveforms as described in section 5.8 

The HTK file format header is 12 bytes long and contains the following data

nSamples                - number of samples in file (4-byte integer)

sampPeriod - sample period in 100ns units (4-byte integer)

sampSize - number of bytes per sample (2-byte integer)

parmKind - a code indicating the sample kind (2-byte integer)

From the above specification, the sampSize is short integer, thus the maximum value for sampSize is 32768. For uncompressed data, the maximum dimension for each sample is thus 32768/4=8192. However, usually even just 1000+ D feature will cause the HTK tools to generate following errors:

OpenParmChannel: cannot read HTK Header in File 

The reason is that in the function ReadHTKHeader of the file HWave.c, there is check for the sampSize value:

if (hdr.sampSize <= 0 || hdr.sampSize > 5000 || hdr.nSamples <= 0 ||

       hdr.sampPeriod <= 0 || hdr.sampPeriod > 1000000)

      return FALSE;


That's to say, in HTK the dimension of the feature vector is limited by this check instead of data type specified in the header format. In the standard version of HTK, at most 1250D feature could be used. To increase the limit, what we need to do is to change the number 5000, but do remember sampSize is short integer, changing to any value larger than 32768 would be useless.

The code at about line 1427 of the file HTKLib/HWave.c.

Posted via email from Troy's posterous

Tuesday, November 8, 2011

Solving iMessage or FaceTime waiting for activation problem

My problem is even further than simply "waiting for activation". After change the SIM card of my iphone, the iMessages sent out are all under the old number ....


However, the number under the iMessage setting is unchangeable. To change it, you need to turn off both the iMessage and FaceTime and then reactivate them. The problem comes when reactivating them, the "waiting for activation" messages last for hours without giving any hints what are going on.


Here comes the "waiting" problem as many people encountered. To solve it, Simply speaking, you need to save your own contact information in the Contacts and in the general settings set my info to the correct contact.


From: https://discussions.apple.com/thread/3390466?start=0&tstart=0


Hi all....I have an iPhone4 with IOS5 and was getting the waiting for activation message for imessages. I was also finding that people I was iMessaging were getting messages from my email address (aka apple ID) rather than my mobile number which was confusing them.  I found following adnanfarooqui's entry above and the iphoneism site very useful and it fixed my iPhone even though it was more directed at Facetime. 

All I did was simply go into my own details in CONTACTS and ensure my own mobile # was set up as 'iphone' in contacts.  Then I went into SETTINGS>MAIL,CONTACTS,CALENDAR>CONTACTS>MY INFO and selected myself.  What this does, I suspect, is tells the iphone who I am as the owner along with the mobile #. Then I went back into SETTINGS>iMESSAGES and found that the 'waiting for activation' message had changed to the normal message with the 'learn more' link included. Then I went down to RECEIVE AT, got into my Apple ID and signed out of that (which I suspect was what my imessages recipients were seeing), which then allowed me to select my own mobile # as the receive at number.  This therefore also fixed the issues I was having with recipients of my imessages getting my email address as the 'from' details as opposed to my mobile # and not matching their own CONTACTS.  Hope that was helpful......

Posted via email from Troy's posterous

Developing a Static Library and Incorporating It in Your Application in Xcode4

Developing a Static Library and Incorporating It in Your Application

From: http://developer.apple.com/library/ios/#documentation/Xcode/Conceptual/ios_development_workflow/910-A-Developing_a_Static_Library_and_Incorporating_It_in_Your_Application/archiving_an_application_that_uses_a_static_library.html

When you need to develop a static library to use in an application and you have to have separate projects for each product, you can use a workspace to contain both the static library project and the application project. If you do, ensure that you configure the projects in the workspace as described here:

  1. In the target that builds the static library, ensure that:

    • The exported headers are in the Project group in the Copy Headers build phase.

    • The Skip Install build setting is set to Yes.

  2. In the target that builds the application, ensure that:

    • The User Header Search Paths build setting is set to the recursive absolute path of a directory under which the static library’s header files are stored.

      Important: If you move your static library project directory to a different location in your file system, you must update the value of the User Header Search Paths build setting to reflect the new location of the static library’s header files.

    • The Always Search User Paths build setting is set to Yes.

    • The Skip Install build setting is set to No.

  3. In the scheme that builds the application, ensure the scheme also builds the static library for archiving.

    image: ../art/scheme_editor-archiving_a_static_library.jpg

Content specifications: This content is written for Xcode 4.0.2 and iOS SDK 4.3.

Posted via email from Troy's posterous

Monday, November 7, 2011

Tutorial: Code Sharing Via Static Libraries And Cross-Project References

  on  in 

Guest author Clint Harris (Profile) is an independent software consultant with experience ranging from enterprise web app work to custom iPhone app development. He currently lives in Brooklyn, New York.

Finding an elegant way to reuse and share code (i.e., libraries) across separate iPhone applications can be a bit tricky, especially considering Apple’s restrictions on dynamic library linking and customFrameworks.

Most people agree that the best way to re-use code is to use static libraries. This tutorial builds on that solution, showing how your Xcode project can reference a second Xcode project — one which is used to build a static library.

This approach allows you to automatically build that static library with the rest of your app, using your current build configuration (e.g., debug, release, etc.) and avoid pre-building several versions of the library separately (where each version was built for a specific environment/configuration).

Wanted: An Elegant Way To Share Code Across Projects

If you want to reuse/share code across different iPhone applications, you only have two options that I’m aware of:

  1. Copy all of the source code from the “shared” library into your own project
  2. Keep the shared library code in a separate Xcode project and use it to build static libraries (e.g., libSomeLibrary.a, also referred to as “archive files”) that can be referenced by your project and used via static linking.

The first option, copying the files, should be avoided when possible since it’s inherently redundant and contrary to the goal of keeping “common code” modular and atomic.

It’s a much better idea to put the code in a static library (see since, as mentioned in the introduction, dynamic linking to custom libraries/frameworks isn’t allowed by Apple when it comes to iPhone apps.

For instructions on creating a static library from your code see this tutorial on the Stormy Productions blog.

We’ve established that the second option is preferable, but there’s a catch: you’ll need to build and distribute multiple versions of the static library–one for each runtime environment and build configuration. For example, you would need to build both “release” and “debug” versions of the library for the Simulator, as well as other pairs for the iPhone or iPod device itself.

So, how can we avoid manually pre-building and managing separate .a files?

Solution: Static Libraries Built On-Demand Via Xcode Cross-Project References

The trick to avoid pre-building static libraries for each environment is to use an Xcode “cross-project reference” so that those libraries are built dynamically (i.e., when you build your own app) using your app’s current build configuration.

This approach allows you to both reuse shared source code and avoid the headache of managing multiple versions of the library. Here’s how it works at a high level:

  1. The shared code lives in its own Xcode project that, when built, results in one or more static libraries.
  2. You create an Xcode environment variable with a path to the directory that contains the static library’s *.xcodeproj file.
  3. All iPhone apps that need the static library will use the aforementioned environment variable toreference the library’s Xcode project, including any static library in that project and the related header files.
  4. Each time you build your project for a specific configuration/runtime environment, the shared project library will also be built for that config/environment–if it hasn’t already–and linked with your executable.

In addition to solving the main problem (reusing code and avoiding management of multiple library versions), there are a couple of nice benefits to this strategy:

First, if you make changes to the shared library source code, those changes will immediately be included the next time you build your own project (via the cross-project reference).

Second, you can modify the Xcode environment variable to point to different versions of any project. For example, you might have separate directories for “somelibrary-1.0″ and “somelibrary-2.0″; as you’ll see in the detailed solution instructions, it’s easy to modify the environment variable and switch your project to a different version of “somelibrary.”

Implementing Cross-Project References

The instructions for setting up cross-project references to shared static libraries can be split into two parts:

  • Part 1: Global Xcode Settings
  • Part 2: Project-Specific Settings

I’ll be using an example in the instructions to help illustrate things. A suitable example would be an application that needs to use a shared static library from a separate project. In this case, I’ll use a sample iPhone app called “

Posted via email from Troy's posterous

Friday, November 4, 2011

HDecode in lattice rescoring mode

When training NN, the training process is usually controlled by the frame accuracy (or frame error rate). However, it is not directly related to the speech recognition performance, i.e. PER or WER. 

One way is to do decoding after each time the network weights are updated. For phoneme recognition, it is fine, as the decoding doesn't take too much time. When coming to word recognition, the decoding is quite time consuming. To speed up, one possible way is to using lattices instead a full decoding. 

Invoke HDecode with "-w" without language model, it will run in lattice rescoring mode (of course, you need set the input lattices parameters). But where to get the lattices? My setup is as follows:

1) Using HDecode and HMM system to generate lattices using bigram LM ( can also use higher order LM);
2) Using HLRescore to prune the lattices to word networks ( with -m f/b, and save the new lattices with -w);
3) Using HDecode to rescore the lattices with new acoustic model or new posterior features ( in NN case).

Posted via email from Troy's posterous

Google+