Wednesday, August 24, 2011

Papers on learning sparse Gaussian Precision Matrix

The basic idea of learning a sparse full covariance for the Gaussian using Graphical models was proposed in the paper "Covaraince Selection".

In the paper "Sparse Gaussian Graphical Models for Speech Recognition", the authors adopted the sparse full covariance learning to speech recognition.

In the third paper, "Projected Subgradient Methods for Learning Sparse Gaussians", a new approach for estimating the covaraince is proposed.

covariance selection.pdf Download this file

Sparse Gaussian Graphical Models for speech recognition_is2007.pdf Download this file

projected subgradient methods for learning sparse gaussians.pdf Download this file

Posted via email from Troy's posterous

LangBrain Website

"If the human mind were simple enough for us to understand,
we would be too simple-minded to understand it."


The ability of humans to speak and to understand speech requires an enormous amount of brain resources. These resources have to manage information about many thousands of words and many syntactic constructions and their interconnections, not just to one another but to meanings and to the structures that allow us to recognize the sounds of speech and to move the muscles of our mouths to produce speech. This complex combination of brain structures can be called the brain's linguistic system. It allows a person not only to talk and to understand speech but also to read and write. It also gives us the power to think as well as the power to acquire new knowledge and abilities and to learn how to speak in the first place. The Langbrain website is about this system.

Posted via email from Troy's posterous

Tuesday, August 23, 2011

Graphical Gaussian Models for Genome Data



For software to efficiently identify GGM networks from data visit theGeneNet page.

A simple method for inferring the network of (linear) dependencies among a set of variables is to compute all pairwise correlations and subsequently to draw the corresponding graph (for some specified threshold). While popular and often used on many types of genomic data (e.g. gene expression, metabolite concentrations etc.)the naive correlation approach does not allow to infer the dependency network. Instead, graphical Gaussians models (GGMs) should be used. These allow to correctly identify direct influences, have close connections with causal graphical models, are straightforward to interpret, and yet are essentially as easy to compute as naive correlation models. This page lists pointers to learning GGMs from data, including procedures suitable for "small n, large p" data sets (category iii).



Graphical Gaussian Models (GGMs), also known as "covariance selection" or " concentration graph" models, have recently become a popular tool to study gene association networks. The key idea behind GGMs is to use partial correlations as a measure of independence of any two genes. This makes it straightforward to distinguish direct from indirect interactions. Note that partial correlations are related to the inverse of the correlation matrix. Also note that in GGMs missing edges indicate conditional independence.

A related but completely different concept are the so-called gene relevance networks which are based on the "covariance graph" model. In the latter interactions are defined through standard correlation coefficients so that missing edges denote marginal independence only.

There is a simple reason why GGMs should be preferred over relevance networks for identification of gene networks: the correlation coefficient is weak criterion for measuring dependence, as marginally, i.e. directly and indirectly, more or less all genes will be correlated. This implies that zero correlation is in fact a strong indicator for independence, i.e. the case of no edge in a network - but this is of course not what one usually wants to find out by building a relevance network... On the other hand, partial correlation coefficients do provide a strong measure of dependence and, correspondingly, offer only a weak criterion of independence (as most partial correlations coefficients usually vanish).

The best starting place to learn about GGMs is the classic paper that introduced this concept in the early 1970s. (A.P. Dempster. 1972. Covariance Selection. Biometrics 28:157-175). Further details can be found in the GGM books by J. Whittaker (1990) and by D. Edwards (1995).


Application of GGMs to Genomic Data:

Application of GGMs to genomic data is quite challenging, as the number of genes (p) is usually much larger than the number of available samples (n), and classical GGM theory is not valid in a small sample setting. With this page I'd like to provide a commented list of some recent work dealing with GGM gene expression analysis (there are only very few so far). In my understanding, all of these paper fit in one of three categories:

  1. analysis with classic GGM theory,
  2. using limited order partial correlations, and
  3. application of regularized GGMs.

For small n, large p data it seems that methods from section iii. are most suited (see below for references and software).


I. Classic GGM Analysis:

The following papers simply apply classical GGM theory (i.e. with not further modification) to analyze gene expression data. It turns out that such an analysis is necessarily restricted to very small numbers of genes or gene clusters as to satisfy n > p.

  1. P. J. Waddell and H. Kishino. 2000. Correspondence analysis of genes and tissue types and finding genetics links from microarray data. Genome Informatics 11:83-95
  2. P. J. Waddell and H. Kishino. 2000. Cluster inferences methods and graphical models evaluated on NCI60 microarray gene expression data. Genome Informatics 11:129--140
  3. H. Toh and K. Horimoto. 2002. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 18:287--297
  4. H. Toh and K. Horimoto. 2002. System for automatically inferring a genetic network from expression profiles. J. Biol. Physics 28:449--464
  5. X. Wu, Y. Ye and K. R. Subramanian. 2003. Interactive analysis of gene interactions using graphical Gaussian model. ACM SIGKDD Workshop on Data Mining in Bioinformatics 3:63-69


II. Limited Order Partial Correlations:

One way to circumvent the problem of computing full partial correlation coefficients when the sample size is small compared to the number of genes is to use partial correlation coefficients of limited order. This results in something inbetween a full GGM model (with correlation conditioned on all p-2 remaining genes) and a relevance network model (with unconditioned correlation). This is the strategy employed in the following papers:

  1. A. de la Fuente, N. Bing, I. Hoeschele, and P. Mendes. 2004. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20:3565-3574.
  2. A. Wille, P. Zimmermann et al. 2004. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology 5:R92
  3. P. M. Magwene and J. Kim. 2004. Estimating genomic coexpression networks using first-order conditional independence. Genome Biology 5:R100
  4. A. Wille and P. Bühlmann. 2006. Low-order conditional independence graphs for inferring genetic networks. Statist. Appl. Genet. Mol. Biol. 4: 32.
  5. R. Castelo and A. Roverato. A robust procedure for Gaussian graphical model search from microarray data with p larger than n. Preprint.


III. Regularized GGMs:

Another possibility (and in my opinion the statistically most sound way) to marry GGMs with small sample modeling is to introduce regularization and moderation. This essentially boils down to finding suitable estimates for the covariance matrix and its inverse when n < p. This can either be done in a full Bayesian manner, or in an empirical Bayes way via variance reduction, shrinkage estimates etc. Once regularized estimates of partial correlation are available then heuristic searches can subsequently to be employed to find an optimal graphical model (or set of models).

Outside a genomic context using regularized GGMs was first proposed by F. Wong, C.K. Carter, and R. Kohn. (2003. Efficient estimation of covariance selection models. Biometrika 90:809-830). For gene expression data this strategy is pursued in the following papers:

  1. A. Dobra, C. Hans, B. Jones, J.R. Nevins, and M. West. 2004. Sparse graphical models for exploring gene expression data. J. Multiv. Analysis 90:196-212.
    See the web page of M. West for various other related articles.
  2. In these papers a regularized estimate of the correlation matrix is obtained, either by Stein-type shrinkage (3) or by bootstrap variance reduction (2). This estimate is subsequently employed for computing partial correlation. Network selection is based on false discovery rate multiple testing. This method is implemented in GeneNet.
  3. J. Schäfer and K. Strimmer. 2005. Learning large-scale graphical Gaussian models from genomic data. In: J. F. Mendes. (Ed.). Proceedings of "Science of Complex Networks: from Biology to the Internet and WWW" (CNET 2004), Aveiro, PT, August 2004. (Publisher: The American Institute of Physics).
  4. N. Mainshausen and P. Bühlmann 2006. High-dimensional graphs and variable selection with the lasso. Annals of Statistics 34 (3)
    This approach uses lasso regression to induce sparsity on a node level among the partial correlations.
  5. These authors regularize the concentration matrix rather than the covariance matrix.

Posted via email from Troy's posterous

Tuesday, August 9, 2011

HTK lattice format

The lattices generated by HVite have the following general form
lmscale=20.00 wdpenalty=-30.00
... etc
... etc
v=0 a=-3239.01 l=0.00
v=0 a=-3820.77 l=0.00
v=0 a=-246.99 l=-1.20

The first 5 lines comprise a header which records names of the files used to generated the lattice along with the settings of the language model scale and penalty factors. 

Each node in the lattice represents a point in time measured in seconds and each arc represents a word spanning the segment of the input starting at the time of  its start node and ending at the time of its end node. For each such span, v gives the number of pronunciation used, a gives the acoustic score and l gives the language model score.

The language model scores in output lattices do not include the scale factors and penalties. There are removed so that the lattice can be used as a constrained network for subsequent recognizer testing. 

When using HVite normally, the word level network file is specified using the -w option. When the -w option is included but no file name is included, HVite constructs the name of a lattice file from the name of the test file and inputs that. Hence, a new recognition network is created for each input file and recognition is very fast. This is an efficient way, for example, of experimentally determining optimum values for the language model scale and penalty factors.


Posted via email from Troy's posterous

Thursday, August 4, 2011

A hierarchical context dependent neural network architecture for improved phone recognition

a hierarchical context dependent neural network architecture for improved phone recognition.pdf Download this file

This paper explored the CD phone recognition on TIMIT using hybrid NN/HMM system. 

1) Using two nets in tandem, one for CI posteriors and the other modeling the contexts from the CI posteriors;

2) Directly train a NN for CD state posteriors, too many outputs and not robust;

3) The first net is to give bottleneck features and then use another net on top of it.

The best results on TIMIT is 21.24% on core test set, there is less than 1% difference from the DBN based monophone recognition, thus rendering the context gain not significant. And also using DBN, the current best results is around 19%. 

Posted via email from Troy's posterous

Dirichlet mixture models of neural net posteriors for hmm based speech recognition

dirichlet mixture models of neural net posteriors for hmm based speech recognition.pdf Download this file

In this paper, the authors propose to using Dirichlet Mixture models instead of Gaussian Mixture models for the hybrid NN/HMM system. In the conventional NN/HMM system, the NN's posteriors are Gaussianized to feed into the HMM framework. However, as the posterior probabilities are lying on probability simplex and their distribution could be modeled by Dirichlet distributions. Thus Dirichlet Mixture models would be more preferable to Gaussian Mixture model. 

However, the final system performance, although better than the GMM based system, are still far for the state-of-the-art performance. 

Posted via email from Troy's posterous

Automatic speech recognition using hidden conditional neural fields

automatic speech recongition using hidden conditional neural fields.pdf Download this file

The concept is not that new, is similar to Conditional Neural Network or Neural Conditional Random Fields. And the results on TIMIT is far from DBN based systems.

Posted via email from Troy's posterous

Wednesday, August 3, 2011