## Wednesday, July 29, 2009

### Shell Script Parameters

from:http://www.injunea.demon.co.uk/pages/page204.htm

### Parameters:

A parameter is a name, a digit, or any of the characters *, @, #, ?, -, $, and !\^. So, what's the difference between a name and a parameter exactly? Not much actually, it's all in the usage. If a word follows a command, as in: ls -l word , then word is one of the parameters (or arguments) passed to the ls command. But if the ls command was inside a sh script, then in all likelihood the word would also be a variable name. So a parameter can be a name when passing information into some other command or script. Viewed from inside a script however, the command line arguments appear as a line of positional parameters named by digits in the ordered sequence of arrival (See - Script Example_1.1 ). So a parameter can also be a digit. The other characters listed are special characters which are assigned values at script start up and may be used if required from within a script. Well after reading through the above, I am still not sure if this is any clearer. Lets see if an example can help to clarify things a little. #### Script example_1.1 - The shell default parameters <b>#!/bin/sh -vx######################################################## example_1.1 (c) R.H.Reepe 1996 March 28 Version 1.0 ########################################################echo "Script name is [$0]"echo "First Parameter is		[$1]"echo "Second Parameter is [$2]"echo "This Process ID is		[$]"echo "This Parameter Count is [$#]"echo "All Parameters		[$@]"echo "The FLAGS are [$-]"</b>

If you execute the script shown above with some arguments as shown below, you will get the output on your screen that follows.

## Tuesday, July 28, 2009

### Logical Markov Models

Resources:

http://www.cis.hut.fi/praiko/papers/step02/node1.html

### Deep belief networks

From: http://www.scholarpedia.org/article/Deep_belief_networks

# Deep belief networks

 Geoffrey E. Hinton (2009), Scholarpedia, 4(5):5947. revision #63393 [link to/cite this article]

Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.

The two most significant properties of deep belief nets are:

• There is an efficient, layer-by-layer procedure for learning the top-down, generative weights that determine how the variables in one layer depend on the variables in the layer above.
• After learning, the values of the latent variables in every layer can be inferred by a single, bottom-up pass that starts with an observed data vector in the bottom layer and uses the generative weights in the reverse direction.

Deep belief nets are learned one layer at a time by treating the values of the latent variables in one layer, when they are being inferred from data, as the data for training the next layer. This efficient, greedy learning can be followed by, or combined with, other learning procedures that fine-tune all of the weights to improve the generative or discriminative performance of the whole network.

Discriminative fine-tuning can be performed by adding a final layer of variables that represent the desired outputs and backpropagating error derivatives. When networks with many hidden layers are applied to highly-structured input data, such as images, backpropagation works much better if the feature detectors in the hidden layers are initialized by learning a deep belief net that models the structure in the input data (Hinton & Salakhutdinov, 2006).

[hide]

## Deep Belief Nets as Compositions of Simple Learning Modules

A deep belief net can be viewed as a composition of simple learning modules each of which is a restricted type of Boltzmann machine that contains a layer of visible units that represent the data and a layer of hidden units that learn to represent features that capture higher-order correlations in the data. The two layers are connected by a matrix of symmetrically weighted connections, $W$, and there are no connections within a layer. Given a vector of activities $v$ for the visible units, the hidden units are all conditionally independent so it is easy to sample a vector, $h$, from the factorial posterior distribution over hidden vectors, $p(h|v,W)$. It is also easy to sample from $p(v|h,W)$. By starting with an observed data vector on the visible units and alternating several times between sampling from $p(h|v,W)$ and $p(v| h,W)$, it is easy to get a learning signal. This signal is simply the difference between the pairwise correlations of the visible and hidden units at the beginning and end of the sampling (see Boltzmann machine for details).

## The Theoretical Justification of the Learning Procedure

The key idea behind deep belief nets is that the weights, $W$, learned by a restricted Boltzmann machine define both $p(v|h,W)$ and the prior distribution over hidden vectors, $p(h|W)$, so the probability of generating a visible vector, $v$, can be written as:

$p(v) = \sum_h p(h|W)p(v|h,W)$

After learning $W$, we keep $p(v|h,W)$ but we replace $p(h|W)$ by a better model of the aggregated posterior distribution over hidden vectors – i.e. the non-factorial distribution produced by averaging the factorial posterior distributions produced by the individual data vectors. The better model is learned by treating the hidden activity vectors produced from the training data as the training data for the next learning module. Hinton, Osindero and Teh (2006) show that this replacement, if performed in the right way, improves a variational lower bound on the probability of the training data under the composite model.

## Deep Belief Nets with Other Types of Variable

Deep belief nets typically use a logistic function of the weighted input received from above or below to determine the probability that a binary latent variable has a value of 1 during top-down generation or bottom-up inference, but other types of variable can be used (Welling et. al. 2005) and the variational bound still applies, provided the variables are all in the exponential family (i.e. the log probability is linear in the parameters).

## Using Autoencoders as the Learning Module

A closely related approach, that is also called a deep belief net,uses the same type of greedy, layer-by-layer learning with a different kind of learning module -- an autoencoder that simply tries to reproduce each data vector from the feature activations that it causes (Bengio et.al., 2007; LeCun et. al. 2007). However, the variational bound no longer applies and an autoencoder module is less good at ignoring random noise in its training data (Larochelle et.al., 2007).

## Applications of Deep Belief Nets

Deep belief nets have been used for generating and recognizing images (Hinton, Osindero & Teh 2006, Ranzato et. al. 2007, Bengio et.al., 2007), video sequences (Sutskever and Hinton, 2007), and motion-capture data (Taylor et. al. 2007). If the number of units in the highest layer is small, deep belief nets perform non-linear dimensionality reduction and they can learn short binary codes that allow very fast retrieval of documents or images (Hinton & Salakhutdinov,2006; Salakhutdinov and Hinton,2007).

## References

• Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007) Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA.
• Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
• Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313:504-507.
• Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y. (2007) An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. International Conference on Machine Learning.
• LeCun, Y. and Bengio, Y. (2007) Scaling Learning Algorithms Towards AI. In Bottou et al. (Eds.) Large-Scale Kernel Machines, MIT Press.
• M. Ranzato, F.J. Huang, Y. Boureau, Y. LeCun (2007) Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. Proc. of Computer Vision and Pattern Recognition Conference (CVPR 2007), Minneapolis, Minnesota, 2007
• Salakhutdinov, R. R. and Hinton,G. E. (2007) Semantic Hashing. In Proceedings of the SIGIR Workshop on Information Retrieval and Applications of Graphical Models, Amsterdam.
• Sutskever, I. and Hinton, G. E. (2007) Learning multilevel distributed representations for high-dimensional sequences. AI and Statistics, 2007, Puerto Rico.
• Taylor, G. W., Hinton, G. E. and Roweis, S. (2007) Modeling human motion using binary latent variables. Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA
• Welling, M., Rosen-Zvi, M., and Hinton, G. E. (2005). Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems 17, pages 1481-1488. MIT Press, Cambridge, MA.

Internal references

## See also

 Geoffrey E. Hinton (2009) Deep belief networks. Scholarpedia, 4(5):5947, (go to the first approved version)Created: 18 December 2007, reviewed: 5 May 2009, accepted: 31 May 2009
 Invited by: Dr. Ke CHEN, School of Computer Science, The University of Manchester, U.K. Action editor: Dr. Ke CHEN, School of Computer Science, The University of Manchester, U.K. Reviewer A: Dr. Yoshua Bengio, Professor, department of computer science and operations research, Université de Montréal, Canada Reviewer B: Dr. Max Welling, School of Information and Computer Science, University of California, Irvine, CA

## Tuesday, July 21, 2009

### Make an ISO Image

Make an ISO Image :: Scott Granneman
Make an ISO Image

To make an ISO from your CD/DVD, place the media in your drive but do not mount it. If it automounts, unmount it.

dd if=/dev/dvd of=dvd.iso # for dvd
dd if=/dev/cdrom of=cd.iso # for cdrom
dd if=/dev/scd0 of=cd.iso # if cdrom is scsi

To make an ISO from files on your hard drive, create a directory which holds the files you want. Then use the mkisofs command.

mkisofs -o /tmp/cd.iso /tmp/directory/

This results in a file called cd.iso in folder /tmp which contains all the files and directories in /tmp/directory/.

For more info, see the man pages for mkisofs, losetup, and dd, or see the CD-Writing-HOWTO at http://www.tldp.org.

If you want to create ISO images from a CD and you're using Windows, Cygwin has a dd command that will work. Since dd is not specific to CDs, it will also create disk images of floppies, hard drives, zip drives, etc.

For the Windows users, here are some other suggestions:

WinISO ~ http://www.winiso.com

VaporCD ~ http://vaporcd.sourceforge.net ~ "You can create ISOs from CD and mount them as 'virtual' CD drives. Works flawlessly with games and other CD based software. Unfortunately, it appears to be unmaintained now. Good thing it works so well." (P.B., 13 February 2002)

## Thursday, July 16, 2009

### What feature normalization should be used?

From: http://tech.groups.yahoo.com/group/icsi-speech-tools/message/144

It's generally recommended to always use some kind of feature mean and
variance normalization with Quicknet.

The minimum amount of normalization is normalization using a single
set of mean and variance estimates calculated over the entire training
set
. In this case, normalization simply amounts to translating the
origin of the feature space (the mean normalization) and re-scaling
the axes of the feature space (the variance normalization), since all
data is normalized using the same mean and variance estimates. This is
recommended as a minimum amount of normalization since very big or
very small numbers can cause sigmoids to saturate, and since the
optimal learning rate during MLP training may depend on the scale of
the data (and thus normalization makes it less likely you'll need to
re-tune the learning rate).

It's very common to go further and do normalization at the utterance
or speaker level. This can be useful for reducing the amount of
variability in the features due to speaker differences and channel
differences.

<
Google+