Thursday, January 27, 2011

[Conference] PASCAL CHiME

PASCAL CHiME Speech Separation and Recognition Challenge

Deadline: April 14, 2011
Workshop: September 1, 2011, Florence, Italy

http://www.dcs.shef.ac.uk/spandh/chime/challenge.html

Wednesday, January 26, 2011

The FFT algorithm implemented in the HTK is the Decimation-In-Time FFT algorithm as detailed explained in the attached file. The computation could be reflected in the following flow graph ( 8-poing DFT ):

A minor difference is that in the HTK implementation, the Wn=exp(j * 2 * pi / n), while in common the Wn adopted is Wn=exp(-j * 2 * pi).

The above FFT algorithm only deals with complex numbers. To do DFT on the real valued signal sequence we need a real version of the FFT, for which we could utilize the complex FFT. In the second document from Texas Instrument, they give a fast implementation of the FFT for real valued sequences ( on page 12-19). The algorithm is illustrated briefly in following figure:

Download now or preview on posterous

Decimation in time FFT algorithm.pdf (735 KB)

Download now or preview on posterous

DSP_FOR_FFT_COMPUTATION.PDF (389 KB)

Posted via email from Troy's posterous

Friday, January 21, 2011

[Learning] Unsupervised Learning

Download now or preview on posterous

ul.pdf (362 KB)

Posted via email from Troy's posterous

Thursday, January 20, 2011

[Speech] Front end analysis of speech recognition: A review

Download now or preview on posterous

front end analysis of speech recognition_a review.pdf (2930 KB)

Posted via email from Troy's posterous

[Machine Learning] GTM: The Generative Topographic Mapping

Download now or preview on posterous

GTM_The generative topographic mapping.pdf (463 KB)

Posted via email from Troy's posterous

[Misc] Command Line Keyboard Shortcuts for Mac OS X

From: http://osxdaily.com/2006/12/19/command-line-keyboard-shortcuts-for-mac-os-x/

The command line in Mac OS X can be a very powerful and fun tool, so it’s good to know how to maneuver around if you find yourself in it. By default, the Mac OS X Terminal uses the Bash shell, which is what these keyboard shortcuts are intended for. So if you’re ready to get your feet wet, open up the Terminal and try these shortcuts out, they’re sure to make your command line life easier. The list isn’t too crazy so you should be able to try all these out within a minute or two, have fun:

Ctrl + A	Go to the beginning of the line you are currently typing on
Ctrl + E	Go to the end of the line you are currently typing on
Ctrl + L	Clears the Screen, similar to the clear command
Ctrl + U	Clears the line before the cursor position. If you are at the end of the line, clears the entire line.
Ctrl + H	Same as backspace
Ctrl + R	Let’s you search through previously used commands
Ctrl + C	Kill whatever you are running
Ctrl + D	Exit the current shell
Ctrl + Z	Puts whatever you are running into a suspended background process. fg restores it.
Ctrl + W	Delete the word before the cursor
Ctrl + K	Clear the line after the cursor
Ctrl + T	Swap the last two characters before the cursor
Esc + T	Swap the last two words before the cursor

Posted via email from Troy's posterous

Wednesday, January 19, 2011

[Tool] HMM Toolkit STK from Speech@FIT

http://www.fit.vutbr.cz/research/groups/speech/sw/stk.html

Simple example scripts for MMI, MPE training.

Posted via email from Troy's posterous

[Speech] Hierarchical structures of neural networks for phoneme recognition

Four Neural Network based phoneme recognition systems are investigated:

a) the TRAPs system (Fig. 1a) - separate networks for processing of speech in frequency bands;

b) the split temporal context (STC) system (Fig. 1b) - separate networks for processing of blocks of spectral vectors;

c) combination of both (Fig. 1c) - split in both frequency and time.

d) Tandem of two networks, the frond-end network is trained in classical ways and the back-end is trained on the combination of the front-end's posteriors and original features.

The assumptions for those systems are:

a) Independent processing of speech in critical bands;

b) Independent processing of parts of phonemes;

c) both a) and b).

Phoneme strings are basic representation for automatic language recognition and it is proved that language recognition results are highly correlated with phoneme recognition results. Phoneme posteriors are useful representation for acoustic keyword search, they contain enough information to distinguish among all words and they are small enough to store compared for example to the size of posteriors from context dependent Gaussian Mixture Models.

Two ways to provide additional information for NN training:

i) windowing, multiple frames context window, hamming window to emphasis the central frame;

ii) output representation: some improvements have been observed when a net was trained for multiple tasks in the same time.

A special Phoneme set mapping adopted in this paper is they merged closures with burst instead of with silence (bcl b -> b not bcl b -> pau b). It is believed that this mapping is more appropriate for features which use a longer temporal context.

The number of neurons in hidden layer of neural networks was increased until the saturation of phoneme error rate (PER) was observed. The obtained number of hidden layer neurons was approximately 500.

Table 1 shows the superiority of long Mel-bank energies but also great improvement coming from three state model. ( Block of 31 vectors of mel-bank energies (MBE) = 310 ms, Temporal trajectories in bands were weighted by Hamming window and down-sampled by DCT to 11 coefficients. )

The final best PER reported in this paper is using the 5-block STC system with bigram LM as shown in following table:

Download now or preview on posterous

ICASSP2006_Schwarz_PhnRec.pdf (75 KB)

Posted via email from Troy's posterous

Monday, January 17, 2011

[Machine Learning] Machine Learning Summer School 2008 - Kioloa

http://videolectures.net/mlss08au_kioloa/

Posted via email from Troy's posterous

Monte Carlo Simulation for Statistical Inference, Model Selection and Decision Making

See Also:

Launch in a standalone WM Player
Switch to Windows Media Player

Download slides: mlss08au_freitas_asm.pdf (14.4 MB)

Streaming Video Help

Windows Media Player Firefox Plugin - Download

via videolectures.net

Posted via email from Troy's posterous

[Tool] TexPoint - A latex plugin for Microsoft Office

http://texpoint.necula.org/index.html

Posted via email from Troy's posterous

[Speech] Mixture Density Network (Technical Report)

Download now or preview on posterous

Mixture Density Network.pdf (428 KB)

Posted via email from Troy's posterous

[Speech] A trajectory density mixture network for acoustic articulatory inversion mapping

Download now or preview on posterous

A trajectory mixture density network for acoustic articulatory inversion mapping.pdf (113 KB)

Mixture Density Network

PRML Chapter 5.

http://www.cedar.buffalo.edu/~srihari/CSE574/Chap5/Chap5.6-MixDensityNetworks.pdf

Posted via email from Troy's posterous

Thursday, January 13, 2011

Extracting and Composing Robust Features with Denoising Autoencoders

Friday, January 7, 2011

[DBN] Learning multiple layers of Features from tiny images

Download now or preview on posterous

learning-features-2009-TR.pdf (4107 KB)

In this paper, there are more detailed equation derivations for RBM, especially the Gaussian-Bernoulli RBM.

Posted via email from Troy's posterous

Wednesday, January 5, 2011

Matlab v7.3 mat file and python

From: http://mloss.org/community/blog/2009/nov/19/matlabtm-73-file-format-is-actually-hdf5-and-can-b/

It looks like that matlab version 7.3 and later are capable of writing out objects in the so called matlab 7.3 file format. While at first glance it looks like another proprietary format - it seems to be in fact the Hierarchical Data Format version 5 or in short hdf5.

So you can do all sorts of neat things:

Lets create some matrix in matlab first and save it:

>> x=[[1,2,3];[4,5,6];[7,8,9]] x = 1 2 3 4 5 6 7 8 9 >> save -v7.3 x.mat x

Lets investigate that file from the shell:

$ h5ls x.mat x Dataset {3, 3} $ h5dump x.mat HDF5 "x.mat" { GROUP "/" { DATASET "x" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 3, 3 ) / ( 3, 3 ) } DATA { (0,0): 1, 4, 7, (1,0): 2, 5, 8, (2,0): 3, 6, 9 } ATTRIBUTE "MATLAB_class" { DATATYPE H5T_STRING { STRSIZE 6; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "double" } } } } }

And load it from python:

>>> import h5py >>> import numpy >>> f = h5py.File('x.mat') >>> x=f["x"] >>> x <HDF5 dataset "x": shape (3, 3), type "<f8"> >>> numpy.array(x) array([[ 1., 4., 7.], [ 2., 5., 8.], [ 3., 6., 9.]])

So it seems actually to be a good idea to use matlab's 7.3 format for interoperability.

Posted via email from Troy's posterous