Tuesday, August 18, 2009

Doing what the brain does - how computers learn to listen

Max Planck scientists develop model to improve computer language recognition


 


We see, hear and feel, and make sense of countless diverse, quickly changing stimuli in our environment seemingly without effort. However, doing what our brains do with ease is often an impossible task for computers. Researchers at the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences and the Wellcome Trust Centre for Neuroimaging in London have now developed a mathematical model which could significantly improve the automatic recognition and processing of spoken language. In the future, this kind of algorithms which imitate brain mechanisms could help machines to perceive the world around them. (PLoS Computational Biology, August 12th, 2009)

Many people will have personal experience of how difficult it is for computers to deal with spoken language. For example, people who 'communicate' with automated telephone systems now commonly used by many organisations need a great deal of patience. If you speak just a little too quickly or slowly, if your pronunciation isn’t clear, or if there is background noise, the system often fails to work properly. The reason for this is that until now the computer programs that have been used rely on processes that are particularly sensitive to perturbations. When computers process language, they primarily attempt to recognise characteristic features in the frequencies of the voice in order to recognise words.

'It is likely that the brain uses a different process', says Stefan Kiebel from the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences. The researcher presumes that the analysis of temporal sequences plays an important role in this. 'Many perceptual stimuli in our environment could be described as temporal sequences.' Music and spoken language, for example, are comprised of sequences of different length which are hierarchically ordered. According to the scientist’s hypothesis, the brain classifies the various signals from the smallest, fast-changing components (e.g., single sound units like 'e' or 'u') up to big, slow-changing elements (e.g., the topic). The significance of the information at various temporal levels is probably much greater than previously thought for the processing of perceptual stimuli. 'The brain permanently searches for temporal structure in the environment in order to deduce what will happen next', the scientist explains. In this way, the brain can, for example, often predict the next sound units based on the slow-changing information. Thus, if the topic of conversation is the hot summer, 'su…' will more likely be the beginning of the word 'sun' than the word 'supper'.

To test this hypothesis, the researchers constructed a mathematical model which was designed to imitate, in a highly simplified manner, the neuronal processes which occur during the comprehension of speech. Neuronal processes were described by algorithms which processed speech at several temporal levels. The model succeeded in processing speech; it recognised individual speech sounds and syllables. In contrast to other artificial speech recognition devices, it was able to process sped-up speech sequences. Furthermore it had the brain’s ability to 'predict' the next speech sound. If a prediction turned out to be wrong because the researchers made an unfamiliar syllable out of the familiar sounds, the model was able to detect the error.

The 'language' with which the model was tested was simplified - it consisted of the four vowels a, e, i and o, which were combined to make 'syllables' consisting of four sounds. 'In the first instance we wanted to check whether our general assumption was right', Kiebel explains. With more time and effort, consonants, which are more difficult to differentiate from each other, could be included, and further hierarchical levels for words and sentences could be incorporated alongside individual sounds and syllables. Thus, the model could, in principle, be applied to natural language.

'The crucial point, from a neuroscientific perspective, is that the reactions of the model were similar to what would be observed in the human brain', Stefan Kiebel says. This indicates that the researchers’ model could represent the processes in the brain. At the same time, the model provides new approaches for practical applications in the field of artificial speech recognition.

Original work:

Stefan J. Kiebel, Katharina von Kriegstein, Jean Daunizeau, Karl J. Friston
Recognizing sequences of sequences
PLoS Computational Biology, August 12th, 2009.




Max Planck Society
for the Advancement of Science
Press and Public Relations Department

Hofgartenstrasse 8
D-80539 Munich
Germany

PO Box 10 10 62
D-80084 Munich

Phone: +49-89-2108-1276
Fax: +49-89-2108-1207

E-mail: presse@gv.mpg.de
Internet: www.mpg.de/english/

Head of scientific communications:
Dr. Christina Beck (-1275)

Press Officer / Head of corporate communications:
Dr. Felicitas von Aretin (-1227)

Executive Editor:
Barbara Abrell (-1416)


ISSN 0170-4656

 

PDF (121 KB)


Contact:

Dr Christina Schröder
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
Tel.: +49 (0)341 9940-132
E-mail: cschroeder@cbs.mpg.de


Dr Stefan Kiebel
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
Tel.: +49 (0)341 9940-2435
E-mail: kiebel@cbs.mpg.de


Wednesday, August 12, 2009

Convert CHM files to HTML/PDF

From:http://linuxondesktop.blogspot.com/2008/07/convert-chm-files-to-htmlpdf.html

A few years back reading book involved going to the neighborhood book shop,purchasing the book and then finding cozy place to sit and read the book . However with the advent of Internet and computing with laptops getting smaller ,less bulkier and cooler and the easy availability of Ebooks on the internet scenario has changed.These days you could go to online book shop and purchase Ebook (any time of day ) and start reading them immediately , all this sitting in your Bed.However most of these Ebooks are in CHM format (Microsoft Compiled HTML Help) ,which is a native documentation format of Windows operating system. CHM basically combines HTML and it's associated images together into a single (.chm) file.

Now by default Ubuntu and many other Linux distributions do not include support for opening (.chm) files out of box owing to CHM file being proprietary file format of Windows operating system. There are viewers available on Linux which allows you to open these files , as i had highlighted in my previous article (Read Here ). Still if you want to convert (.chm) files to (.html) or (.pdf) , maybe for sending them to your friend who does not have this chm viewer installed you can do so easily .

First open Terminal from (Applications -> Accessories -> Terminal ) and issue the following command to install chmlib
sudo apt-get install libchm-bin
chmlib allows extracting HTML files and images from (.chm) files. Now if you want to convert extracted HTML files into PDF, PS etc , you would need to install htmldoc which you could install easily by issuing the following command in the terminal window :
sudo apt-get install htmldoc

Converting CHM files to HTML and eventually PDF


Now suppose you have a file named "Primer.chm" from which you want to extract HTML files and images into "Primer" directory , you could do so easily by issuing the following command in the terminal window :
extract_chmLib Primer.chm Primer
This should quickly extract all the HTML files and associated images from the chm file and put it into Primer directory.

Now once you have extracted the HTML , you are ready to convert them and combine them into a single (.pdf) file . Open the Terminal Window (Applications -> Accessories -> Terminal ) and issue the following command in the terminal window to launch "htmldoc"
htmldoc
Once htmldoc finishes loading its interface ,click on Continuous radio button and press "Add Files..." and add all the files you would like to combine into single PDF document, as shown in image below :

After choosing all the HTML files you would like to combine , click on the "output" tab and chose output file type to be PDF and the name and location of the finally generated PDF file.If you want you could change compression level , whether you want output to be in Grayscale etc .
Finally press the "Generate" button to actually start the process of combining (.html ) files with their images into single (.pdf) file .
Files being combined into single (.pdf) file
The entire process of combining (.html) files into (.pdf) files should not take more than few minutes , infact on my Core 2 Duo based laptop entire process of combining about 1000 page long book in HTML format to PDF format took 4 minutes.

Python Module

User defined module in python can be put in the same folder as the main script file. However, on our server when submitting jobs, absolute paths are needed, otherwise file won't be found. As the jobs on the server will be distributed to other machines for execution. In this case, where to put the used defined module?

One solution is creating a Shell script and call the main script using the absolute path, which is also the module's path.

Another solution is copy the user defined module into the python's installation folder, which is under the system lib folder, that's to say put the user defined module to the same place as the system modules.

Saturday, August 8, 2009

Statistics

For Today's Graduate, Just One Word: Statistics
New York Times (08/06/09) Lohr, Steve; Fuller, Andrea

The statistics field's popularity is growing among graduates as they realize that it involves more than number crunching and deals with pressing real-world challenges, and Google chief economist Hal Varian predicts that "the sexy job in the next 10 years will be statisticians." The explosion of digital data has played a key role in the elevation of statisticians' stature, as computing and the Web are creating new data domains to investigate in myriad disciplines. Traditionally, social sciences tracked people's behavior by interviewing or surveying them. “But the Web provides this amazing resource for observing how millions of people interact,” says Jon Kleinberg, a computer scientist and social networking researcher at Cornell, who won the 2008 ACM-Infosys Foundation award. In research just published, Kleinberg and two colleagues tracked 1.6 million news sites and blogs during the 2008 presidential campaign, using algorithms that scanned for phrases associated with news topics like “lipstick on a pig.” The Cornell researchers found that, generally, the traditional media leads and the blogs follow, typically by 2.5 hours, though a handful of blogs were quickest to mention quotes that later gained wide attention. IDC forecasts that the digital data surge will increase by a factor of five by 2012. Meeting this challenge is the job of the newest iteration of statisticians, who use powerful computers and complex mathematical models to mine meaningful patterns and insights out of massive data sets. "The key is to let computers do what they are good at, which is trawling these massive data sets for something that is mathematically odd," says IBM researcher Daniel Gruhl. "And that makes it easier for humans to do what they are good at--explain those anomalies." The American Statistical Association estimates that the number of people attending the statistics profession's annual conference has risen from about 5,400 in recent years to some 6,400 this week.
View Full Article - May Require Free Registration | Return to Headlines


Friday, August 7, 2009

Quicknet AF - II

From the files:
QN_fltvec.h - qn_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_f_f(...)

we can see that the sigmoid function used in the hidden layer is different from the sigmoid function  used in the output layer.
In the hidden layer, the sigmoid function is:
f(x)=1/(1+exp(-x))

In the output layer the sigmoid function is:
f(x)=tanh(s.a*b+c)

Sunday, August 2, 2009

Quicknet Activation Function in Hidden Layer

In the file, “QN_MLP_BunchFlVar.cc”, the function QN_MLP_BunchFlVar::forward_bunch(size_t n_frames, const float* in, float* out)  has the following part of codes:

// Check if we are doing things differently for the final layer.
if (cur_layer!=n_layers - 1)
{
    // This is the intermediate layer non-linearity.
   qn_sigmoid_vf_vf(cur_layer_size, cur_layer_x,
             cur_layer_y);
}
else
{
    // This is the output layer non-linearity.
    switch(out_layer_type)
    {
    case QN_OUTPUT_SIGMOID:
    case QN_OUTPUT_SIGMOID_XENTROPY:
    qn_sigmoid_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    case QN_OUTPUT_SOFTMAX:
    {
    size_t i;
    float* layer_x_p = cur_layer_x;
    float* layer_y_p = out;

    for (i=0; i<n_frames; i++)
    {
        qn_softmax_vf_vf(cur_layer_units, layer_x_p, layer_y_p);
        layer_x_p += cur_layer_units;
        layer_y_p += cur_layer_units;
    }
    break;
    }
    case QN_OUTPUT_LINEAR:
    qn_copy_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    case QN_OUTPUT_TANH:
    qn_tanh_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    default:
    assert(0);
    }
}

The activation function of MLP in quicknet tools, the activation function of hidden layers are all set to sigmoid by default.

Only the activation function can be set by users.

Google+