Using HVite for both recognition and alignment, we could set the trace level to be 1, then we could get information of following format in the terminal:
File: /home/li-bo/research/databases/WSJ/mfcc_0_d_a_z/c2l/c2la0102.mfcc
c2l == [482 frames] -72.9054 [Ac=-35140.4 LM=0.0] (Act=72.4)
The first line show which file is processed.
The second line, firstly outputs the recognized result sequence end with "=="; after that, the important model likelihood information are given:
[482 frames] : total number of frames in the utterance;
-72.9054 : overall average log likelihood per frame for the sentence, which equals to ( acoustic log likelihood + language model log likelilhood ) / totoal frames;
[Ac=-35140.4 : the total acoustic log likelihood for the whole utterance;
LM=0.0] : the total language model log likelihood for the whole utterance;
(Act=72.4) : the average number of active models.
Similarly, in the recognized MLF file or aligned MLF file, there are also scores:
"/home/li-bo/research/databases/WSJ/mfcc_0_d_a_z/c02/c02a0102.rec"
0 33900000 c02 -79.211006
.
Between the filename and the ".", the recognized or aligned results are written line by line.
Each line in the example has 4 fields, the number of fields varies due to the setting of HVite, e.g with "-f" we could keep track of the model's state information.
In this example,
the first field is the start time of that segment;
the second field is the ending time of that segment;
the third field is the recognized symbol;
the last field is the total log likelihood of that segment.
In the attached file, it also gives another example.
Thanks for post. It was very useful
ReplyDelete