Dream & Passion

Tuesday, August 9, 2011

HTK lattice format

The lattices generated by HVite have the following general form

VERSION=1.0

UTTERANCE=testf1.mfc
lmname=wdnet
lmscale=20.00 wdpenalty=-30.00

vocab=dict
N=31
L=56

I=0
t=0.00
I=1

t=0.36
I=2
t=0.75

I=3
t=0.81
... etc

I=30
t=2.48
J=0

S=0
E=1
W=SILENCE

J=1
S=1
E=2

W=FOUR
... etc
J=55

S=29
E=30
W=SILENCE

v=0 a=-3239.01 l=0.00
v=0 a=-3820.77 l=0.00

v=0 a=-246.99 l=-1.20

The first 5 lines comprise a header which records names of the files used to generated the lattice along with the settings of the language model scale and penalty factors.

Each node in the lattice represents a point in time measured in seconds and each arc represents a word spanning the segment of the input starting at the time of its start node and ending at the time of its end node. For each such span, v gives the number of pronunciation used, a gives the acoustic score and l gives the language model score.

The language model scores in output lattices do not include the scale factors and penalties. There are removed so that the lattice can be used as a constrained network for subsequent recognizer testing.

When using HVite normally, the word level network file is specified using the -w option. When the -w option is included but no file name is included, HVite constructs the name of a lattice file from the name of the test file and inputs that. Hence, a new recognition network is created for each input file and recognition is very fast. This is an efficient way, for example, of experimentally determining optimum values for the language model scale and penalty factors.

Posted via email from Troy's posterous

Thursday, August 4, 2011

A hierarchical context dependent neural network architecture for improved phone recognition

a hierarchical context dependent neural network architecture for improved phone recognition.pdf Download this file

This paper explored the CD phone recognition on TIMIT using hybrid NN/HMM system.

1) Using two nets in tandem, one for CI posteriors and the other modeling the contexts from the CI posteriors;

2) Directly train a NN for CD state posteriors, too many outputs and not robust;

3) The first net is to give bottleneck features and then use another net on top of it.

The best results on TIMIT is 21.24% on core test set, there is less than 1% difference from the DBN based monophone recognition, thus rendering the context gain not significant. And also using DBN, the current best results is around 19%.

Posted via email from Troy's posterous

Dirichlet mixture models of neural net posteriors for hmm based speech recognition

dirichlet mixture models of neural net posteriors for hmm based speech recognition.pdf Download this file

In this paper, the authors propose to using Dirichlet Mixture models instead of Gaussian Mixture models for the hybrid NN/HMM system. In the conventional NN/HMM system, the NN's posteriors are Gaussianized to feed into the HMM framework. However, as the posterior probabilities are lying on probability simplex and their distribution could be modeled by Dirichlet distributions. Thus Dirichlet Mixture models would be more preferable to Gaussian Mixture model.

However, the final system performance, although better than the GMM based system, are still far for the state-of-the-art performance.

Posted via email from Troy's posterous

Automatic speech recognition using hidden conditional neural fields

automatic speech recongition using hidden conditional neural fields.pdf Download this file

The concept is not that new, is similar to Conditional Neural Network or Neural Conditional Random Fields. And the results on TIMIT is far from DBN based systems.

Posted via email from Troy's posterous

Wednesday, August 3, 2011

Slides for Introduction to Restricted Boltzmann Machine

Frean1.pdf Download this file

Why RBMs could be stacked and learning them in a layer-wise greedy fashion?

Page 50-57

Posted via email from Troy's posterous

Wednesday, July 13, 2011

[HTK] Compile HTK on 64bit machine

When compile HTK on 64bit machines, by default it will still try to compile the tools to 32bit. Then there will be an error after issue the command "make all":

/usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory

To solve this problem, what we need is to install the multilib package for the gcc and g++. On Ubuntu, install following packages:

g++-multilib

gcc-multilib

and redo "make all".

Posted via email from Troy's posterous

Ubuntu 9.04 nVidia Driver Screen Resolution Problem

Many people running Ubuntu 9.04 are having trouble with the proprietary nVidia driver (nvidia-graphics-driver-180 in my case) including getting it to go to high resolutions that fit the native resolution of widescreen monitors. I had the same problem with an nVidia GeForce 6150 LE and Dell UltraSharp 2407WFPHC monitor.

I was able to get all the resolutions, including 1920×1200, for my monitor as well as have the nVidia driver recognize the monitor as a 2407WFPHC, by doing the following:

(If you can’t see your screen at all after enabling the nVidia driver, first read the companion post, Ubuntu 9.04 Screen Resolution/Monitor Out of Range (nVidia Driver 180).)

Open a terminal window
Go to the X11 directory (cd /etc/X11)
Make a backup of the current xorg.conf (e.g., sudo cp xorg.conf xorg.conf.backup)
Run nvidia-xconfig with root permission (sudo nvidia-xconfig). If you get a parsing error, delete xorg.conf so nvidia-xconfig can create a fresh one.
Open xorg.conf with your favorite editor (e.g. sudo vim xorg.conf)
You’ll see a lot of extra settings now
Look for Section “Monitor”. Mine defaulted to the following settings:
    Identifier “Monitor0″
    VendorName “Unknown”
    ModelName “Unknown”
    HorizSync 28.0 – 33.0
    VertRefresh 43.0 – 72.0
    Option “DPMS”
Change the HorizSync and VertRefresh values to the correct ones for your particular monitor. For my 2407WFPHC, I put the following:
HorizSync 30.0 – 83.0
VertRefresh 56.0 – 76.0
Save the xorg.conf file
Log out and restart the X server (at the login screen, select Menu, then Restart X server)
Log in and run the NVIDIA X Server Settings tool. You should now have a whole bunch of resolutions from which to choose. I selected 1920×1200.

The reason that this works is that the nVidia driver needs to know the frequency ranges for your monitor in order to know what resolutions are safe to use. Setting the HorizSync and VertRefresh in xorg.conf provides this necessary information.

From:

http://turbulentsky.com/ubuntu-904-nvidia-driver-screen-resolution-problem.html

Posted via email from Troy's posterous