Thursday, December 17, 2009

bad data or over pruning



HERest -C src/ConfigHVite -I lists/train.phonemlf -t 250.0 150.0 1000.0 -S train.mfcc.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
ERROR [-7324] StepBack: File ... bad data or over pruning
Possible problems include corrupt mfcc, non-matching or non-existent labels. In this case, I had to re-calculate the mean & variance for the prototype hmm using only 1/2 the data, and the problem went away. If every file is considered bad data, you may have derived the features wrong. Go back to HCopy and check the parameters (config file).

From: http://www.ling.ohio-state.edu/~bromberg/htk_problems.html

Tuesday, December 8, 2009

WSJCAMP0

Some information about the WSJCAMP0 corpus:
1. Totally 140 speakers and 110 utterances per speaker;

2. 92 training speakers, 20 development test speaker and two sets of 14 evaluation test speakers. Each speaker provides approximately 90 utterances and an additional 18 adaptation utterances.

3. The same set of 18 adaptation sentences was recorded by each speaker, consisting of one recording of background noise, 2 phonetically balanced sentences and the first 15 adaptation sentences from the initial WSJ experiment.

4. Each training speaker read out some 90 training sentences, selected randomly in paragraph units. This is the empirically determined maximum number of sentences that could be squeezed into one hour of speaker time.

5. Each of 48 test speakers read 80 sentences. The final development test group consists of 20 speakers.

The CD-ROM publication consists of six discs, with contents organized as follows:

  • discs 1 and 2 - training data from head-mounted microphone
  • disc 3 - development test data from head-mounted microphone, plus first set of evaluation test data
  • discs 4 and 5 - training data from desk-mounted microphone
  • disc 6 - development test data from desk-mounted microphone, plus second set of evaluation test data
There are 90 utterances from each of 92 speakers that are designated as training material for speech recognition algorithms. An additional 48 speakers each read 40 sentences containing only words from a fixed 5,000 word vocabulary, and another 40 sentences using a 64,000 word vocabulary, to be used as testing material. Each of the total of 140 speakers also recorded a common set of 18 adaptation sentences. Recordings were made from two microphones: a far-field desk microphone and a head-mounted close-talking microphone.

http://ccl.pku.edu.cn/doubtfire/CorpusLinguistics/LDC_Corpus/available_corpus_from_ldc.html#wsjcam0




Monday, December 7, 2009

Install CDT for Eclipse on Ubuntu

No eclipse-cdt package is found in the package list. We can install it through Eclipse.

1. open a terminal and start eclipse using root user: sudo eclipse
2. Form Help -> Install New Software... -> Add
3. Add:
Name: galileo
Url: http://download.eclipse.org/tools/cdt/releases/galileo
and return.
4. type filter text as "c" or "cdt", then in the listed package select CDT Main Feature package
5. Click to install.



Tuesday, December 1, 2009

wget: Download entire websites easy

wget is a nice tool for downloading resources from the internet. The basic usage is wget url:

wget http://linuxreviews.org/

Therefore, wget (manual page) + less (manual page) is all you need to surf the internet. The power of wget is that you may download sites recursive, meaning you also get all pages (and images and other data) linked on the front page:

wget -r http://linuxreviews.org/

But many sites do not want you to download their entire site. To prevent this, they check how browsers identify. Many sites refuses you to connect or sends a blank page if they detect you are not using a web-browser. You might get a message like:

Sorry, but the download manager you are using to view this site is not supported. We do not support use of such download managers as flashget, go!zilla, or getright

Wget has a very handy -U option for sites like this. Use -U My-browser to tell the site you are using some commonly accepted browser:

  wget  -r -p -U Mozilla http://www.stupidsite.com/restricedplace.html

The most important command line options are --limit-rate= and --wait=. You should add --wait=20 to pause 20 seconds between retrievals, this makes sure you are not manually added to a blacklist. --limit-rate defaults to bytes, add K to set KB/s. Example:

wget --wait=20 --limit-rate=20K -r -p -U Mozilla http://www.stupidsite.com/restricedplace.html

A web-site owner will probably get upset if you attempt to download his entire site using a simple wget http://foo.bar command. However, the web-site owner will not even notice you if you limit the download transfer rate and pause between fetching files.

Use --no-parent

--no-parent is a very handy option that guarantees wget will not download anything from the folders beneath the folder you want to acquire. Use this to make sure wget does not fetch more than it needs to if just just want to download the files in a folder.


From: http://linuxreviews.org/quicktips/wget/



Friday, November 27, 2009

Stanford Lectures

Machine Learning:
http://see.stanford.edu/see/lecturelist.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1

Natural Language Processing:
http://see.stanford.edu/see/courseinfo.aspx?coll=63480b48-8819-4efd-8412-263f1a472f5a


Others:
http://see.stanford.edu/see/courses.aspx


Thursday, November 26, 2009

Deep Belief Networks

A introduction YouTube video: http://www.youtube.com/watch?v=AyzOUbkUf3M

On TWiki:
http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepBeliefNetworks

An implementation:
PLearn:
http://plearn.berlios.de/


A tutorial on using PLearn for DBN:
http://plearn.berlios.de/users_guide/node3.html#SECTION00340000000000000000

A 3-hour video lecture:
http://videolectures.net/jul09_hinton_deeplearn/

Wednesday, November 11, 2009

HTK Chinese character Encoding

When using HTK for building a Mandarin ASR, the Chinese words are represented in the MLF files are encoded in Octal numeral system.



Wednesday, October 28, 2009

Flood - An open source Neural Networks C++ Library

Today, I found another Open source Neural Network library: Flood.

http://www.cimne.com/flood/default.asp

Testing ...
Maybe it would be helpful.

To install it on Linux, add one more code to the file: Flood2/Flood/Utilities/Vector.h
include <stdlib.h>

otherwise when compiling, it will give errors that function "exit" not defined.


Tuesday, October 27, 2009

Quicknet - Bug for Learning Partial Weights

In the previous post, I have do some modification to enable the Quicknet for learning arbitrary weight sections.

However, there is still a bug which I just found it.

In the original Quicknet design, the partial weights that can be learned is only the last part. By last part, suppose for a 4 layer MLP, the parameter "mlp_lrmultiplier" is set to be "x,y,z", x control the weight between layer 1 and layer 2, y controls the weight between layer 2 and layer 3, and z controls the weight between layer 3 and layer 4.

If we set y to be 0, then no matter what value z is set, the actual z used in Quicknet is 1. That means after the first un-updated weight section, the following weights are all un-updated.

Also in the original design, if the weight is no updated, the error will not be back propagated through that layer.

All in all, in the last modification we should not remove the "break" sentences.

The new version can be found here.

Monday, October 12, 2009

Quicknet - Enable Learning Partial Weights

As in the original Quicknet_v3_20, the parameter mlp_lrmultiplier can scale the learning rate per-section. If the value is set to 0.0, then that section weight will not be updated. However, only it does not provide full freedom to set any section to be kept during training. As the values after the first non-zero value will be non zeros.

To remove this restrict, following steps need to be done:
In QN_MLP_BunchFlVar.cc:
QN_MLP_BunchFlVar::train_bunch() function:
about line 427:
if (cur_layer!=nlayers-1 && backprop_weights[cur_weinum+1])

ToDo: remove "&& backprop_weights[cur_weinum+1]"

In QN_MLP_BaseFl.cc
QN_MLP_BaseFl::set_learnrate() function:
about line 334:
else
     break;

ToDo: remove these two sentences.

The workable sourcecode, at least works for my purpose, can be found here: https://www.comp.nus.edu.sg/~li-bo/src/quicknet_linAF.tar

Friday, October 9, 2009

Quicknet_v3_20 Bug-I

There is a small bug in the softtarget trainer.

After the pre-run cross validation, the qnmultitrn stopped and gave out the error "segmentation fault".

This bug is simply because that in the file "QN_trn.cc", in the function(begins at line 598):

////////////////////////////////////////////////////////////////
// "Soft training" object - trains using continuous targets.

QN_SoftSentTrainer::QN_SoftSentTrainer(int a_debug, const char* a_dbgname,
                       int a_verbose,
                       QN_MLP* a_mlp,
                       QN_InFtrStream* a_train_ftr_str,
                       QN_InFtrStream* a_train_targ_str,
                       QN_InFtrStream* a_cv_ftr_str,
                       QN_InFtrStream* a_cv_targ_str,
                       QN_RateSchedule* a_lr_sched,
                       float a_targ_low, float a_targ_high,
                       const char* a_wlog_template,
                       QN_WeightFileType a_wfile_format,
                       const char* a_ckpt_template,
                       QN_WeightFileType a_ckpt_format,
                       unsigned long a_ckpt_secs,
                       size_t a_bunch_size,
                       float* a_lrscale)
    : clog(a_debug, "QN_SoftSentTrainer", a_dbgname),
      ...

Two variable debug and dbgname is not initialized. As debug is an integer, the uninitialized variable did not cause big problem. However, dbgname is a pointer which casued the "Segmentation fault".

To fix this bug only add the following two blue lines in the code.
 
    : debug(a_debug),
      dbgname(a_dbgname),

      clog(a_debug, "QN_SoftSentTrainer", a_dbgname),
      ...


Tuesday, August 18, 2009

Doing what the brain does - how computers learn to listen

Max Planck scientists develop model to improve computer language recognition


 


We see, hear and feel, and make sense of countless diverse, quickly changing stimuli in our environment seemingly without effort. However, doing what our brains do with ease is often an impossible task for computers. Researchers at the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences and the Wellcome Trust Centre for Neuroimaging in London have now developed a mathematical model which could significantly improve the automatic recognition and processing of spoken language. In the future, this kind of algorithms which imitate brain mechanisms could help machines to perceive the world around them. (PLoS Computational Biology, August 12th, 2009)

Many people will have personal experience of how difficult it is for computers to deal with spoken language. For example, people who 'communicate' with automated telephone systems now commonly used by many organisations need a great deal of patience. If you speak just a little too quickly or slowly, if your pronunciation isn’t clear, or if there is background noise, the system often fails to work properly. The reason for this is that until now the computer programs that have been used rely on processes that are particularly sensitive to perturbations. When computers process language, they primarily attempt to recognise characteristic features in the frequencies of the voice in order to recognise words.

'It is likely that the brain uses a different process', says Stefan Kiebel from the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences. The researcher presumes that the analysis of temporal sequences plays an important role in this. 'Many perceptual stimuli in our environment could be described as temporal sequences.' Music and spoken language, for example, are comprised of sequences of different length which are hierarchically ordered. According to the scientist’s hypothesis, the brain classifies the various signals from the smallest, fast-changing components (e.g., single sound units like 'e' or 'u') up to big, slow-changing elements (e.g., the topic). The significance of the information at various temporal levels is probably much greater than previously thought for the processing of perceptual stimuli. 'The brain permanently searches for temporal structure in the environment in order to deduce what will happen next', the scientist explains. In this way, the brain can, for example, often predict the next sound units based on the slow-changing information. Thus, if the topic of conversation is the hot summer, 'su…' will more likely be the beginning of the word 'sun' than the word 'supper'.

To test this hypothesis, the researchers constructed a mathematical model which was designed to imitate, in a highly simplified manner, the neuronal processes which occur during the comprehension of speech. Neuronal processes were described by algorithms which processed speech at several temporal levels. The model succeeded in processing speech; it recognised individual speech sounds and syllables. In contrast to other artificial speech recognition devices, it was able to process sped-up speech sequences. Furthermore it had the brain’s ability to 'predict' the next speech sound. If a prediction turned out to be wrong because the researchers made an unfamiliar syllable out of the familiar sounds, the model was able to detect the error.

The 'language' with which the model was tested was simplified - it consisted of the four vowels a, e, i and o, which were combined to make 'syllables' consisting of four sounds. 'In the first instance we wanted to check whether our general assumption was right', Kiebel explains. With more time and effort, consonants, which are more difficult to differentiate from each other, could be included, and further hierarchical levels for words and sentences could be incorporated alongside individual sounds and syllables. Thus, the model could, in principle, be applied to natural language.

'The crucial point, from a neuroscientific perspective, is that the reactions of the model were similar to what would be observed in the human brain', Stefan Kiebel says. This indicates that the researchers’ model could represent the processes in the brain. At the same time, the model provides new approaches for practical applications in the field of artificial speech recognition.

Original work:

Stefan J. Kiebel, Katharina von Kriegstein, Jean Daunizeau, Karl J. Friston
Recognizing sequences of sequences
PLoS Computational Biology, August 12th, 2009.




Max Planck Society
for the Advancement of Science
Press and Public Relations Department

Hofgartenstrasse 8
D-80539 Munich
Germany

PO Box 10 10 62
D-80084 Munich

Phone: +49-89-2108-1276
Fax: +49-89-2108-1207

E-mail: presse@gv.mpg.de
Internet: www.mpg.de/english/

Head of scientific communications:
Dr. Christina Beck (-1275)

Press Officer / Head of corporate communications:
Dr. Felicitas von Aretin (-1227)

Executive Editor:
Barbara Abrell (-1416)


ISSN 0170-4656

 

PDF (121 KB)


Contact:

Dr Christina Schröder
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
Tel.: +49 (0)341 9940-132
E-mail: cschroeder@cbs.mpg.de


Dr Stefan Kiebel
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
Tel.: +49 (0)341 9940-2435
E-mail: kiebel@cbs.mpg.de


Wednesday, August 12, 2009

Convert CHM files to HTML/PDF

From:http://linuxondesktop.blogspot.com/2008/07/convert-chm-files-to-htmlpdf.html

A few years back reading book involved going to the neighborhood book shop,purchasing the book and then finding cozy place to sit and read the book . However with the advent of Internet and computing with laptops getting smaller ,less bulkier and cooler and the easy availability of Ebooks on the internet scenario has changed.These days you could go to online book shop and purchase Ebook (any time of day ) and start reading them immediately , all this sitting in your Bed.However most of these Ebooks are in CHM format (Microsoft Compiled HTML Help) ,which is a native documentation format of Windows operating system. CHM basically combines HTML and it's associated images together into a single (.chm) file.

Now by default Ubuntu and many other Linux distributions do not include support for opening (.chm) files out of box owing to CHM file being proprietary file format of Windows operating system. There are viewers available on Linux which allows you to open these files , as i had highlighted in my previous article (Read Here ). Still if you want to convert (.chm) files to (.html) or (.pdf) , maybe for sending them to your friend who does not have this chm viewer installed you can do so easily .

First open Terminal from (Applications -> Accessories -> Terminal ) and issue the following command to install chmlib
sudo apt-get install libchm-bin
chmlib allows extracting HTML files and images from (.chm) files. Now if you want to convert extracted HTML files into PDF, PS etc , you would need to install htmldoc which you could install easily by issuing the following command in the terminal window :
sudo apt-get install htmldoc

Converting CHM files to HTML and eventually PDF


Now suppose you have a file named "Primer.chm" from which you want to extract HTML files and images into "Primer" directory , you could do so easily by issuing the following command in the terminal window :
extract_chmLib Primer.chm Primer
This should quickly extract all the HTML files and associated images from the chm file and put it into Primer directory.

Now once you have extracted the HTML , you are ready to convert them and combine them into a single (.pdf) file . Open the Terminal Window (Applications -> Accessories -> Terminal ) and issue the following command in the terminal window to launch "htmldoc"
htmldoc
Once htmldoc finishes loading its interface ,click on Continuous radio button and press "Add Files..." and add all the files you would like to combine into single PDF document, as shown in image below :

After choosing all the HTML files you would like to combine , click on the "output" tab and chose output file type to be PDF and the name and location of the finally generated PDF file.If you want you could change compression level , whether you want output to be in Grayscale etc .
Finally press the "Generate" button to actually start the process of combining (.html ) files with their images into single (.pdf) file .
Files being combined into single (.pdf) file
The entire process of combining (.html) files into (.pdf) files should not take more than few minutes , infact on my Core 2 Duo based laptop entire process of combining about 1000 page long book in HTML format to PDF format took 4 minutes.

Python Module

User defined module in python can be put in the same folder as the main script file. However, on our server when submitting jobs, absolute paths are needed, otherwise file won't be found. As the jobs on the server will be distributed to other machines for execution. In this case, where to put the used defined module?

One solution is creating a Shell script and call the main script using the absolute path, which is also the module's path.

Another solution is copy the user defined module into the python's installation folder, which is under the system lib folder, that's to say put the user defined module to the same place as the system modules.

Saturday, August 8, 2009

Statistics

For Today's Graduate, Just One Word: Statistics
New York Times (08/06/09) Lohr, Steve; Fuller, Andrea

The statistics field's popularity is growing among graduates as they realize that it involves more than number crunching and deals with pressing real-world challenges, and Google chief economist Hal Varian predicts that "the sexy job in the next 10 years will be statisticians." The explosion of digital data has played a key role in the elevation of statisticians' stature, as computing and the Web are creating new data domains to investigate in myriad disciplines. Traditionally, social sciences tracked people's behavior by interviewing or surveying them. “But the Web provides this amazing resource for observing how millions of people interact,” says Jon Kleinberg, a computer scientist and social networking researcher at Cornell, who won the 2008 ACM-Infosys Foundation award. In research just published, Kleinberg and two colleagues tracked 1.6 million news sites and blogs during the 2008 presidential campaign, using algorithms that scanned for phrases associated with news topics like “lipstick on a pig.” The Cornell researchers found that, generally, the traditional media leads and the blogs follow, typically by 2.5 hours, though a handful of blogs were quickest to mention quotes that later gained wide attention. IDC forecasts that the digital data surge will increase by a factor of five by 2012. Meeting this challenge is the job of the newest iteration of statisticians, who use powerful computers and complex mathematical models to mine meaningful patterns and insights out of massive data sets. "The key is to let computers do what they are good at, which is trawling these massive data sets for something that is mathematically odd," says IBM researcher Daniel Gruhl. "And that makes it easier for humans to do what they are good at--explain those anomalies." The American Statistical Association estimates that the number of people attending the statistics profession's annual conference has risen from about 5,400 in recent years to some 6,400 this week.
View Full Article - May Require Free Registration | Return to Headlines


Friday, August 7, 2009

Quicknet AF - II

From the files:
QN_fltvec.h - qn_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_f_f(...)

we can see that the sigmoid function used in the hidden layer is different from the sigmoid function  used in the output layer.
In the hidden layer, the sigmoid function is:
f(x)=1/(1+exp(-x))

In the output layer the sigmoid function is:
f(x)=tanh(s.a*b+c)

Sunday, August 2, 2009

Quicknet Activation Function in Hidden Layer

In the file, “QN_MLP_BunchFlVar.cc”, the function QN_MLP_BunchFlVar::forward_bunch(size_t n_frames, const float* in, float* out)  has the following part of codes:

// Check if we are doing things differently for the final layer.
if (cur_layer!=n_layers - 1)
{
    // This is the intermediate layer non-linearity.
   qn_sigmoid_vf_vf(cur_layer_size, cur_layer_x,
             cur_layer_y);
}
else
{
    // This is the output layer non-linearity.
    switch(out_layer_type)
    {
    case QN_OUTPUT_SIGMOID:
    case QN_OUTPUT_SIGMOID_XENTROPY:
    qn_sigmoid_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    case QN_OUTPUT_SOFTMAX:
    {
    size_t i;
    float* layer_x_p = cur_layer_x;
    float* layer_y_p = out;

    for (i=0; i<n_frames; i++)
    {
        qn_softmax_vf_vf(cur_layer_units, layer_x_p, layer_y_p);
        layer_x_p += cur_layer_units;
        layer_y_p += cur_layer_units;
    }
    break;
    }
    case QN_OUTPUT_LINEAR:
    qn_copy_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    case QN_OUTPUT_TANH:
    qn_tanh_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    default:
    assert(0);
    }
}

The activation function of MLP in quicknet tools, the activation function of hidden layers are all set to sigmoid by default.

Only the activation function can be set by users.

Wednesday, July 29, 2009

Shell Script Parameters


from:http://www.injunea.demon.co.uk/pages/page204.htm

Parameters:

A parameter is a name, a digit, or any of the characters *, @, #, ?, -, $, and !\^. So, what's the difference between a name and a parameter exactly? Not much actually, it's all in the usage. If a word follows a command, as in: ls -l word , then word is one of the parameters (or arguments) passed to the ls command. But if the ls command was inside a sh script, then in all likelihood the word would also be a variable name. So a parameter can be a name when passing information into some other command or script. Viewed from inside a script however, the command line arguments appear as a line of positional parameters named by digits in the ordered sequence of arrival (See - Script Example_1.1 ). So a parameter can also be a digit. The other characters listed are special characters which are assigned values at script start up and may be used if required from within a script.

Well after reading through the above, I am still not sure if this is any clearer. Lets see if an example can help to clarify things a little.

Script example_1.1 - The shell default parameters

<b>#!/bin/sh -vx
#######################################################
# example_1.1 (c) R.H.Reepe 1996 March 28 Version 1.0 #
#######################################################
echo "Script name is [$0]"
echo "First Parameter is [$1]"
echo "Second Parameter is [$2]"
echo "This Process ID is [$]"
echo "This Parameter Count is [$#]"
echo "All Parameters [$@]"
echo "The FLAGS are [$-]"</b>

If you execute the script shown above with some arguments as shown below, you will get the output on your screen that follows.



Tuesday, July 28, 2009

EER-ROC-DET




Logical Markov Models

Resources:

http://www.cis.hut.fi/praiko/papers/step02/node1.html




Deep belief networks

From: http://www.scholarpedia.org/article/Deep_belief_networks

Deep belief networks

From Scholarpedia

Geoffrey E. Hinton (2009), Scholarpedia, 4(5):5947. revision #63393 [link to/cite this article]

Curator: Dr. Geoffrey E. Hinton, University of Toronto, CANADA

Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.

The two most significant properties of deep belief nets are:

  • There is an efficient, layer-by-layer procedure for learning the top-down, generative weights that determine how the variables in one layer depend on the variables in the layer above.
  • After learning, the values of the latent variables in every layer can be inferred by a single, bottom-up pass that starts with an observed data vector in the bottom layer and uses the generative weights in the reverse direction.

Deep belief nets are learned one layer at a time by treating the values of the latent variables in one layer, when they are being inferred from data, as the data for training the next layer. This efficient, greedy learning can be followed by, or combined with, other learning procedures that fine-tune all of the weights to improve the generative or discriminative performance of the whole network.

Discriminative fine-tuning can be performed by adding a final layer of variables that represent the desired outputs and backpropagating error derivatives. When networks with many hidden layers are applied to highly-structured input data, such as images, backpropagation works much better if the feature detectors in the hidden layers are initialized by learning a deep belief net that models the structure in the input data (Hinton & Salakhutdinov, 2006).

Contents

[hide]

Deep Belief Nets as Compositions of Simple Learning Modules

A deep belief net can be viewed as a composition of simple learning modules each of which is a restricted type of Boltzmann machine that contains a layer of visible units that represent the data and a layer of hidden units that learn to represent features that capture higher-order correlations in the data. The two layers are connected by a matrix of symmetrically weighted connections, W, and there are no connections within a layer. Given a vector of activities v for the visible units, the hidden units are all conditionally independent so it is easy to sample a vector, h, from the factorial posterior distribution over hidden vectors, p(h|v,W). It is also easy to sample from p(v|h,W). By starting with an observed data vector on the visible units and alternating several times between sampling from p(h|v,W) and p(v| h,W), it is easy to get a learning signal. This signal is simply the difference between the pairwise correlations of the visible and hidden units at the beginning and end of the sampling (see Boltzmann machine for details).

The Theoretical Justification of the Learning Procedure

The key idea behind deep belief nets is that the weights, W, learned by a restricted Boltzmann machine define both p(v|h,W) and the prior distribution over hidden vectors, p(h|W), so the probability of generating a visible vector, v, can be written as:

p(v) = \sum_h p(h|W)p(v|h,W)

After learning W, we keep p(v|h,W) but we replace p(h|W) by a better model of the aggregated posterior distribution over hidden vectors – i.e. the non-factorial distribution produced by averaging the factorial posterior distributions produced by the individual data vectors. The better model is learned by treating the hidden activity vectors produced from the training data as the training data for the next learning module. Hinton, Osindero and Teh (2006) show that this replacement, if performed in the right way, improves a variational lower bound on the probability of the training data under the composite model.

Deep Belief Nets with Other Types of Variable

Deep belief nets typically use a logistic function of the weighted input received from above or below to determine the probability that a binary latent variable has a value of 1 during top-down generation or bottom-up inference, but other types of variable can be used (Welling et. al. 2005) and the variational bound still applies, provided the variables are all in the exponential family (i.e. the log probability is linear in the parameters).

Using Autoencoders as the Learning Module

A closely related approach, that is also called a deep belief net,uses the same type of greedy, layer-by-layer learning with a different kind of learning module -- an autoencoder that simply tries to reproduce each data vector from the feature activations that it causes (Bengio et.al., 2007; LeCun et. al. 2007). However, the variational bound no longer applies and an autoencoder module is less good at ignoring random noise in its training data (Larochelle et.al., 2007).

Applications of Deep Belief Nets

Deep belief nets have been used for generating and recognizing images (Hinton, Osindero & Teh 2006, Ranzato et. al. 2007, Bengio et.al., 2007), video sequences (Sutskever and Hinton, 2007), and motion-capture data (Taylor et. al. 2007). If the number of units in the highest layer is small, deep belief nets perform non-linear dimensionality reduction and they can learn short binary codes that allow very fast retrieval of documents or images (Hinton & Salakhutdinov,2006; Salakhutdinov and Hinton,2007).

References

  • Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007) Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA.
  • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
  • Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313:504-507.
  • Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y. (2007) An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. International Conference on Machine Learning.
  • LeCun, Y. and Bengio, Y. (2007) Scaling Learning Algorithms Towards AI. In Bottou et al. (Eds.) Large-Scale Kernel Machines, MIT Press.
  • M. Ranzato, F.J. Huang, Y. Boureau, Y. LeCun (2007) Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. Proc. of Computer Vision and Pattern Recognition Conference (CVPR 2007), Minneapolis, Minnesota, 2007
  • Salakhutdinov, R. R. and Hinton,G. E. (2007) Semantic Hashing. In Proceedings of the SIGIR Workshop on Information Retrieval and Applications of Graphical Models, Amsterdam.
  • Sutskever, I. and Hinton, G. E. (2007) Learning multilevel distributed representations for high-dimensional sequences. AI and Statistics, 2007, Puerto Rico.
  • Taylor, G. W., Hinton, G. E. and Roweis, S. (2007) Modeling human motion using binary latent variables. Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA
  • Welling, M., Rosen-Zvi, M., and Hinton, G. E. (2005). Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems 17, pages 1481-1488. MIT Press, Cambridge, MA.

Internal references


See also


Geoffrey E. Hinton (2009) Deep belief networks. Scholarpedia, 4(5):5947, (go to the first approved version)
Created: 18 December 2007, reviewed: 5 May 2009, accepted: 31 May 2009
Invited by: Dr. Ke CHEN, School of Computer Science, The University of Manchester, U.K.
Action editor: Dr. Ke CHEN, School of Computer Science, The University of Manchester, U.K.
Reviewer A: Dr. Yoshua Bengio, Professor, department of computer science and operations research, Université de Montréal, Canada
Reviewer B: Dr. Max Welling, School of Information and Computer Science, University of California, Irvine, CA


Tuesday, July 21, 2009

Make an ISO Image

Make an ISO Image :: Scott Granneman
Make an ISO Image

To make an ISO from your CD/DVD, place the media in your drive but do not mount it. If it automounts, unmount it.

dd if=/dev/dvd of=dvd.iso # for dvd
dd if=/dev/cdrom of=cd.iso # for cdrom
dd if=/dev/scd0 of=cd.iso # if cdrom is scsi

To make an ISO from files on your hard drive, create a directory which holds the files you want. Then use the mkisofs command.

mkisofs -o /tmp/cd.iso /tmp/directory/

This results in a file called cd.iso in folder /tmp which contains all the files and directories in /tmp/directory/.

For more info, see the man pages for mkisofs, losetup, and dd, or see the CD-Writing-HOWTO at http://www.tldp.org.

If you want to create ISO images from a CD and you're using Windows, Cygwin has a dd command that will work. Since dd is not specific to CDs, it will also create disk images of floppies, hard drives, zip drives, etc.

For the Windows users, here are some other suggestions:

WinISO ~ http://www.winiso.com

VaporCD ~ http://vaporcd.sourceforge.net ~ "You can create ISOs from CD and mount them as 'virtual' CD drives. Works flawlessly with games and other CD based software. Unfortunately, it appears to be unmaintained now. Good thing it works so well." (P.B., 13 February 2002)


Thursday, July 16, 2009

What feature normalization should be used?

From: http://tech.groups.yahoo.com/group/icsi-speech-tools/message/144

It's generally recommended to always use some kind of feature mean and
variance normalization with Quicknet.

The minimum amount of normalization is normalization using a single
set of mean and variance estimates calculated over the entire training
set
. In this case, normalization simply amounts to translating the
origin of the feature space (the mean normalization) and re-scaling
the axes of the feature space (the variance normalization), since all
data is normalized using the same mean and variance estimates. This is
recommended as a minimum amount of normalization since very big or
very small numbers can cause sigmoids to saturate, and since the
optimal learning rate during MLP training may depend on the scale of
the data (and thus normalization makes it less likely you'll need to
re-tune the learning rate).

It's very common to go further and do normalization at the utterance
or speaker level. This can be useful for reducing the amount of
variability in the features due to speaker differences and channel
differences.

I2R to NUS Central Library

<