Showing posts with label Neural Network. Show all posts
Showing posts with label Neural Network. Show all posts

Wednesday, October 28, 2009

Flood - An open source Neural Networks C++ Library

Today, I found another Open source Neural Network library: Flood.

http://www.cimne.com/flood/default.asp

Testing ...
Maybe it would be helpful.

To install it on Linux, add one more code to the file: Flood2/Flood/Utilities/Vector.h
include <stdlib.h>

otherwise when compiling, it will give errors that function "exit" not defined.


Tuesday, October 27, 2009

Quicknet - Bug for Learning Partial Weights

In the previous post, I have do some modification to enable the Quicknet for learning arbitrary weight sections.

However, there is still a bug which I just found it.

In the original Quicknet design, the partial weights that can be learned is only the last part. By last part, suppose for a 4 layer MLP, the parameter "mlp_lrmultiplier" is set to be "x,y,z", x control the weight between layer 1 and layer 2, y controls the weight between layer 2 and layer 3, and z controls the weight between layer 3 and layer 4.

If we set y to be 0, then no matter what value z is set, the actual z used in Quicknet is 1. That means after the first un-updated weight section, the following weights are all un-updated.

Also in the original design, if the weight is no updated, the error will not be back propagated through that layer.

All in all, in the last modification we should not remove the "break" sentences.

The new version can be found here.

Friday, August 7, 2009

Quicknet AF - II

From the files:
QN_fltvec.h - qn_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_f_f(...)

we can see that the sigmoid function used in the hidden layer is different from the sigmoid function  used in the output layer.
In the hidden layer, the sigmoid function is:
f(x)=1/(1+exp(-x))

In the output layer the sigmoid function is:
f(x)=tanh(s.a*b+c)

Tuesday, July 28, 2009

Deep belief networks

From: http://www.scholarpedia.org/article/Deep_belief_networks

Deep belief networks

From Scholarpedia

Geoffrey E. Hinton (2009), Scholarpedia, 4(5):5947. revision #63393 [link to/cite this article]

Curator: Dr. Geoffrey E. Hinton, University of Toronto, CANADA

Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.

The two most significant properties of deep belief nets are:

  • There is an efficient, layer-by-layer procedure for learning the top-down, generative weights that determine how the variables in one layer depend on the variables in the layer above.
  • After learning, the values of the latent variables in every layer can be inferred by a single, bottom-up pass that starts with an observed data vector in the bottom layer and uses the generative weights in the reverse direction.

Deep belief nets are learned one layer at a time by treating the values of the latent variables in one layer, when they are being inferred from data, as the data for training the next layer. This efficient, greedy learning can be followed by, or combined with, other learning procedures that fine-tune all of the weights to improve the generative or discriminative performance of the whole network.

Discriminative fine-tuning can be performed by adding a final layer of variables that represent the desired outputs and backpropagating error derivatives. When networks with many hidden layers are applied to highly-structured input data, such as images, backpropagation works much better if the feature detectors in the hidden layers are initialized by learning a deep belief net that models the structure in the input data (Hinton & Salakhutdinov, 2006).

Contents

[hide]

Deep Belief Nets as Compositions of Simple Learning Modules

A deep belief net can be viewed as a composition of simple learning modules each of which is a restricted type of Boltzmann machine that contains a layer of visible units that represent the data and a layer of hidden units that learn to represent features that capture higher-order correlations in the data. The two layers are connected by a matrix of symmetrically weighted connections, W, and there are no connections within a layer. Given a vector of activities v for the visible units, the hidden units are all conditionally independent so it is easy to sample a vector, h, from the factorial posterior distribution over hidden vectors, p(h|v,W). It is also easy to sample from p(v|h,W). By starting with an observed data vector on the visible units and alternating several times between sampling from p(h|v,W) and p(v| h,W), it is easy to get a learning signal. This signal is simply the difference between the pairwise correlations of the visible and hidden units at the beginning and end of the sampling (see Boltzmann machine for details).

The Theoretical Justification of the Learning Procedure

The key idea behind deep belief nets is that the weights, W, learned by a restricted Boltzmann machine define both p(v|h,W) and the prior distribution over hidden vectors, p(h|W), so the probability of generating a visible vector, v, can be written as:

p(v) = \sum_h p(h|W)p(v|h,W)

After learning W, we keep p(v|h,W) but we replace p(h|W) by a better model of the aggregated posterior distribution over hidden vectors – i.e. the non-factorial distribution produced by averaging the factorial posterior distributions produced by the individual data vectors. The better model is learned by treating the hidden activity vectors produced from the training data as the training data for the next learning module. Hinton, Osindero and Teh (2006) show that this replacement, if performed in the right way, improves a variational lower bound on the probability of the training data under the composite model.

Deep Belief Nets with Other Types of Variable

Deep belief nets typically use a logistic function of the weighted input received from above or below to determine the probability that a binary latent variable has a value of 1 during top-down generation or bottom-up inference, but other types of variable can be used (Welling et. al. 2005) and the variational bound still applies, provided the variables are all in the exponential family (i.e. the log probability is linear in the parameters).

Using Autoencoders as the Learning Module

A closely related approach, that is also called a deep belief net,uses the same type of greedy, layer-by-layer learning with a different kind of learning module -- an autoencoder that simply tries to reproduce each data vector from the feature activations that it causes (Bengio et.al., 2007; LeCun et. al. 2007). However, the variational bound no longer applies and an autoencoder module is less good at ignoring random noise in its training data (Larochelle et.al., 2007).

Applications of Deep Belief Nets

Deep belief nets have been used for generating and recognizing images (Hinton, Osindero & Teh 2006, Ranzato et. al. 2007, Bengio et.al., 2007), video sequences (Sutskever and Hinton, 2007), and motion-capture data (Taylor et. al. 2007). If the number of units in the highest layer is small, deep belief nets perform non-linear dimensionality reduction and they can learn short binary codes that allow very fast retrieval of documents or images (Hinton & Salakhutdinov,2006; Salakhutdinov and Hinton,2007).

References

  • Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007) Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA.
  • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
  • Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313:504-507.
  • Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y. (2007) An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. International Conference on Machine Learning.
  • LeCun, Y. and Bengio, Y. (2007) Scaling Learning Algorithms Towards AI. In Bottou et al. (Eds.) Large-Scale Kernel Machines, MIT Press.
  • M. Ranzato, F.J. Huang, Y. Boureau, Y. LeCun (2007) Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. Proc. of Computer Vision and Pattern Recognition Conference (CVPR 2007), Minneapolis, Minnesota, 2007
  • Salakhutdinov, R. R. and Hinton,G. E. (2007) Semantic Hashing. In Proceedings of the SIGIR Workshop on Information Retrieval and Applications of Graphical Models, Amsterdam.
  • Sutskever, I. and Hinton, G. E. (2007) Learning multilevel distributed representations for high-dimensional sequences. AI and Statistics, 2007, Puerto Rico.
  • Taylor, G. W., Hinton, G. E. and Roweis, S. (2007) Modeling human motion using binary latent variables. Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA
  • Welling, M., Rosen-Zvi, M., and Hinton, G. E. (2005). Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems 17, pages 1481-1488. MIT Press, Cambridge, MA.

Internal references


See also


Geoffrey E. Hinton (2009) Deep belief networks. Scholarpedia, 4(5):5947, (go to the first approved version)
Created: 18 December 2007, reviewed: 5 May 2009, accepted: 31 May 2009
Invited by: Dr. Ke CHEN, School of Computer Science, The University of Manchester, U.K.
Action editor: Dr. Ke CHEN, School of Computer Science, The University of Manchester, U.K.
Reviewer A: Dr. Yoshua Bengio, Professor, department of computer science and operations research, Université de Montréal, Canada
Reviewer B: Dr. Max Welling, School of Information and Computer Science, University of California, Irvine, CA


Thursday, July 16, 2009

What feature normalization should be used?

From: http://tech.groups.yahoo.com/group/icsi-speech-tools/message/144

It's generally recommended to always use some kind of feature mean and
variance normalization with Quicknet.

The minimum amount of normalization is normalization using a single
set of mean and variance estimates calculated over the entire training
set
. In this case, normalization simply amounts to translating the
origin of the feature space (the mean normalization) and re-scaling
the axes of the feature space (the variance normalization), since all
data is normalized using the same mean and variance estimates. This is
recommended as a minimum amount of normalization since very big or
very small numbers can cause sigmoids to saturate, and since the
optimal learning rate during MLP training may depend on the scale of
the data (and thus normalization makes it less likely you'll need to
re-tune the learning rate).

It's very common to go further and do normalization at the utterance
or speaker level. This can be useful for reducing the amount of
variability in the features due to speaker differences and channel
differences.

Google+