Showing posts with label Web. Show all posts
Showing posts with label Web. Show all posts

Tuesday, August 18, 2009

Doing what the brain does - how computers learn to listen

Max Planck scientists develop model to improve computer language recognition


 


We see, hear and feel, and make sense of countless diverse, quickly changing stimuli in our environment seemingly without effort. However, doing what our brains do with ease is often an impossible task for computers. Researchers at the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences and the Wellcome Trust Centre for Neuroimaging in London have now developed a mathematical model which could significantly improve the automatic recognition and processing of spoken language. In the future, this kind of algorithms which imitate brain mechanisms could help machines to perceive the world around them. (PLoS Computational Biology, August 12th, 2009)

Many people will have personal experience of how difficult it is for computers to deal with spoken language. For example, people who 'communicate' with automated telephone systems now commonly used by many organisations need a great deal of patience. If you speak just a little too quickly or slowly, if your pronunciation isn’t clear, or if there is background noise, the system often fails to work properly. The reason for this is that until now the computer programs that have been used rely on processes that are particularly sensitive to perturbations. When computers process language, they primarily attempt to recognise characteristic features in the frequencies of the voice in order to recognise words.

'It is likely that the brain uses a different process', says Stefan Kiebel from the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences. The researcher presumes that the analysis of temporal sequences plays an important role in this. 'Many perceptual stimuli in our environment could be described as temporal sequences.' Music and spoken language, for example, are comprised of sequences of different length which are hierarchically ordered. According to the scientist’s hypothesis, the brain classifies the various signals from the smallest, fast-changing components (e.g., single sound units like 'e' or 'u') up to big, slow-changing elements (e.g., the topic). The significance of the information at various temporal levels is probably much greater than previously thought for the processing of perceptual stimuli. 'The brain permanently searches for temporal structure in the environment in order to deduce what will happen next', the scientist explains. In this way, the brain can, for example, often predict the next sound units based on the slow-changing information. Thus, if the topic of conversation is the hot summer, 'su…' will more likely be the beginning of the word 'sun' than the word 'supper'.

To test this hypothesis, the researchers constructed a mathematical model which was designed to imitate, in a highly simplified manner, the neuronal processes which occur during the comprehension of speech. Neuronal processes were described by algorithms which processed speech at several temporal levels. The model succeeded in processing speech; it recognised individual speech sounds and syllables. In contrast to other artificial speech recognition devices, it was able to process sped-up speech sequences. Furthermore it had the brain’s ability to 'predict' the next speech sound. If a prediction turned out to be wrong because the researchers made an unfamiliar syllable out of the familiar sounds, the model was able to detect the error.

The 'language' with which the model was tested was simplified - it consisted of the four vowels a, e, i and o, which were combined to make 'syllables' consisting of four sounds. 'In the first instance we wanted to check whether our general assumption was right', Kiebel explains. With more time and effort, consonants, which are more difficult to differentiate from each other, could be included, and further hierarchical levels for words and sentences could be incorporated alongside individual sounds and syllables. Thus, the model could, in principle, be applied to natural language.

'The crucial point, from a neuroscientific perspective, is that the reactions of the model were similar to what would be observed in the human brain', Stefan Kiebel says. This indicates that the researchers’ model could represent the processes in the brain. At the same time, the model provides new approaches for practical applications in the field of artificial speech recognition.

Original work:

Stefan J. Kiebel, Katharina von Kriegstein, Jean Daunizeau, Karl J. Friston
Recognizing sequences of sequences
PLoS Computational Biology, August 12th, 2009.




Max Planck Society
for the Advancement of Science
Press and Public Relations Department

Hofgartenstrasse 8
D-80539 Munich
Germany

PO Box 10 10 62
D-80084 Munich

Phone: +49-89-2108-1276
Fax: +49-89-2108-1207

E-mail: presse@gv.mpg.de
Internet: www.mpg.de/english/

Head of scientific communications:
Dr. Christina Beck (-1275)

Press Officer / Head of corporate communications:
Dr. Felicitas von Aretin (-1227)

Executive Editor:
Barbara Abrell (-1416)


ISSN 0170-4656

 

PDF (121 KB)


Contact:

Dr Christina Schröder
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
Tel.: +49 (0)341 9940-132
E-mail: cschroeder@cbs.mpg.de


Dr Stefan Kiebel
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
Tel.: +49 (0)341 9940-2435
E-mail: kiebel@cbs.mpg.de


Saturday, August 8, 2009

Statistics

For Today's Graduate, Just One Word: Statistics
New York Times (08/06/09) Lohr, Steve; Fuller, Andrea

The statistics field's popularity is growing among graduates as they realize that it involves more than number crunching and deals with pressing real-world challenges, and Google chief economist Hal Varian predicts that "the sexy job in the next 10 years will be statisticians." The explosion of digital data has played a key role in the elevation of statisticians' stature, as computing and the Web are creating new data domains to investigate in myriad disciplines. Traditionally, social sciences tracked people's behavior by interviewing or surveying them. “But the Web provides this amazing resource for observing how millions of people interact,” says Jon Kleinberg, a computer scientist and social networking researcher at Cornell, who won the 2008 ACM-Infosys Foundation award. In research just published, Kleinberg and two colleagues tracked 1.6 million news sites and blogs during the 2008 presidential campaign, using algorithms that scanned for phrases associated with news topics like “lipstick on a pig.” The Cornell researchers found that, generally, the traditional media leads and the blogs follow, typically by 2.5 hours, though a handful of blogs were quickest to mention quotes that later gained wide attention. IDC forecasts that the digital data surge will increase by a factor of five by 2012. Meeting this challenge is the job of the newest iteration of statisticians, who use powerful computers and complex mathematical models to mine meaningful patterns and insights out of massive data sets. "The key is to let computers do what they are good at, which is trawling these massive data sets for something that is mathematically odd," says IBM researcher Daniel Gruhl. "And that makes it easier for humans to do what they are good at--explain those anomalies." The American Statistical Association estimates that the number of people attending the statistics profession's annual conference has risen from about 5,400 in recent years to some 6,400 this week.
View Full Article - May Require Free Registration | Return to Headlines


Tuesday, June 2, 2009

The Next Frontier: Decoding the Internet's Raw Data

The Next Frontier: Decoding the Internet's Raw Data



Washington Post (06/01/09) P. A10; Hart, Kim




The massive amounts of data available on the Internet
potentially have infinite uses. For example, advertisers want to mine
photos and status updates on social networks to better sell products,
while scientists are tracking weather patterns using decades of climate
records. Now, U.S. White House officials want to make government data
available to the public so citizens can monitor government actions. The
problem is determining how to organize and display such a massive
amount of data without having to sift through volumes of spreadsheets.
Participants at the recent symposium at the University of Maryland's
Human-Computer Interaction Lab focused on solving this problem. "We're
trying to understand data and make sense of it visually, but there's no
way of evaluating how effective these visuals really are for people,"
says PricewaterhouseCoopers research manager Mave Houston. Analysts
from the U.S. Department of Defense, SAIC, and Lockheed Martin
expressed their frustrations with available information visualization
tools, which are too complex for novice users, frequently do not work
well with user-generated content, and have difficulty handling large
amounts of data. The Human-Computer Interaction Lab is working on ways
of linking information, creating user-friendly technology devices, and
improving how people interact with the Web. "Our belief is that
technology is not just useful as toys or for business," says lab
founder Ben Shneiderman. "We're talking about using these technologies
for national priorities."


Friday, May 22, 2009

How to automatically redirect a browser to another web page from one of your own

This is the preferred method of redirecting to other web pages, and additional information can be found at http://www.w3.org/QA/Tips/reback.

As the P-A Department's main web server uses the Apache HTTP server program, here is how to do it on that system (for other systems' servers, see the references in the www.w3.org web page noted above).

Create a file in the directory in question called ".htaccess" and put into it the line

Redirect /path-of-file-to-be-redirected URL-of-page-to-go-to

For example, if you are a professor teaching the (fictitious - for the sake of the example only)PHY386 course during Spring Semester 2007, but you want to keep your web pages in a subdirectory of your own user area instead of in the courses area provided, you can go to the appropriate courses area on the server, /web/documents/courses/2007spring/PHY386 and put

Redirect /courses/2007spring/PHY386/index.html http://www.pa.msu.edu/people/username/subdir/index.htm

(all on one line, in case the above example is wrapped by your browser) into a file called .htaccesswhich has world-read permissions (that's the default).

The "path" argument is relative to the "web root", so in the above example, "/web/documents" is left off. The "page to go to" URL is a full URL, even if the web page is on the same server. More than oneRedirect command can be put into the .htaccess file, and you can redirect all files in a directory to their equivalents in a "to go to" directory by leaving the filenames off.

A case where more than one Redirect command may be necessary is when a web page may be accessed via more than one URL. In the above "PHY 386" example, in fact, the instructor will have to add a second line, the same as the first, except for lower-case "phy386" instead of "PHY386" in the "path" argument, because the web page may be accessed with the "phy386" URL, too. During Spring Semester 2007, the page could also be accessed with URLs with "current" in place of "2007spring" and with "2007spring" left out entirely, bringing the number of Redirect commands up to six for that one page. Fortunately, a URL which leaves off the "index.html" filename defaults to assuming it, or else three more Redirect commands would be needed to handle those cases. (The folks at w3.org still consider this as preferable to a single "refresh" meta command in the file itself, which would be read and acted upon regardless of how the file was accessed, as described below.)

If there is already a .htaccess file in the subdirectory in question, see the Apache HTTP server documentation to see where in it the Redirect command should be placed. If you are the person running the Apache web server program on a system, you can also put instances of the Redirectcommand into the server configuration file instead of, or in addition to, .htaccess files in specific subdirectories (again, see the Apache HTTP server documentation for the details).


"refresh" meta command

Note that this method is deprecated by the official HTML standards organization in favor of the server-based redirect method described above.

You can set up a web page to inform any browser which happens to load it that there is another web page it should go to instead, after an optional delay.

This is accomplished using a "refresh" meta command in the header section

     <head>
.
.
</head>


of your HTML file, along with the title and any "keywords" or other meta commands.

Syntax

The syntax for the "refresh" meta command is

<meta http-equiv="refresh" content="N; URL=other-web-address">



where N is the approximate number of seconds that you want the current web page to be displayed before the browser automatically goes to the other web address. If N = 0, then the browser should go immediately to the other web address.



Netiquette tip

In case someone's browser doesn't handle these automatic redirects (most browsers do handle them, but some allow them to be turned off, as a way of discouraging "web spam", which often uses this type of "refresh" redirect), you may want to provide a second route to the intended destination by way of a standard link (see the example, below).

Example


<html>
<head>
<title>A web page that points a browser to a different page after 2 seconds</title>
<meta http-equiv="refresh" content="2; URL=http://www.pa.msu.edu/services/computing/">
<meta name="keywords" content="automatic redirection">
</head>
<body>
If your browser doesn't automatically go there within a few seconds,
you may want to go to
<a href="http://www.pa.msu.edu/services/computing/">the destination</a>
manually.
</body>
</html>


Select Example above or here to see how the example works in practice.






Notes on scripting languages

There are also ways of doing this with JavaScript, VBscript, and other internal web page scripting languages, but explaining them in detail is beyond the scope of this web page. A few examples may illustrate the method, however, and encourage users to obtain actual JavaScript documentation (a book, or online) to guide them in developing their own variants suited to their own needs.

The following JavaScript example, which would go ahead of the first <html> flag on the web page, orbetween the <HEAD> and </HEAD> tags, opens the new site in the same browser window (effectivelyinstead of the rest of the contents of the page that the script is in):



 <script language="javascript" type="text/javascript">
<!--
window.location="http://www.pa.msu.edu/services/";
// -->
</script>


This JavaScript example opens the new site in the same browser window after displaying the current page in the window for 2 seconds (2000 ms):



 <script language="javascript" type="text/javascript">
<!--
window.setTimeout('window.location="http://www.pa.msu.edu/services/"; ',2000);
// -->
</script>

(Note that this does exactly what the HTML META tag above does, but as the META tag method does not depend on the browser's having JavaScript available and active, in most cases the META tag method would be preferable).

The next JavaScript example opens the new site in a new* browser window:



 <script language="javascript" type="text/javascript">
<!--
Newsite= window.open("http://www.pa.msu.edu/services/","newsite");
// -->
</script>

* sometimes, the "new" window is one of those already opened in the session; this seems to be somewhat random, and I don't know if it's a browser bug or a "JavaScript thing" with the window.open command. Just note that browser behavior may not always be consistent if you use this script (or the next one, which also useswindow.open). -- GJP.

This JavaScript example opens the new site in a new browser window after a 4.5 second (4500 ms) delay:



 <script language="javascript" type="text/javascript">
<!--
window.setTimeout('window.open("http://www.pa.msu.edu/services/","newsite")',4500);
// -->
</script>



WARNING: With these capabilities for automatic redirection to other web pages, it is possible to set up a redirection loop -- try to avoid making it a no-wait-time infinite loop! (An infinite loop with a reasonable delay, on the other hand, might have its uses as a sort of slide show, among other possibilities).

Thursday, March 5, 2009

Conversion between Utf-8 and GB2312

Some codes about conversion between utf-8 and gb2312 when handling different text format.

// ChineseCodeLib.h: interface for the CChineseCodeLib class.
//
//////////////////////////////////////////////////////////////////////
#include<string>
using namespace std;

/*
功?能?:?汉?字?GB2312与?UTF-8编?码?互?转?
作?者?:?litz
Email:mycro@163.com
参?考?:?吴?康?彬?先?生?的?文?章?《?UTF-8与?GB2312之?间?的?互?换?》?
http://www.vckbase.com/document/viewdoc/?id=1397
*/


#if !defined(__CCHINESECODELIB_H_)
#define __CCHINESECODELIB_H_


class CChineseCodeLib
{
public:
static void UTF_8ToGB2312(string& pOut,char *pText, int pLen);
static void GB2312ToUTF_8(string& pOut,char *pText, int pLen);
// Unicode 转?换?成?UTF-8
static void UnicodeToUTF_8(char* pOut,wchar_t* pText);
// GB2312 转?换?成? ?Unicode
static void Gb2312ToUnicode(wchar_t* pOut,char *gbBuffer);
// 把?Unicode 转?换?成?GB2312
static void UnicodeToGB2312(char* pOut,unsigned short uData);
// 把?UTF-8转?换?成?Unicode
static void UTF_8ToUnicode(wchar_t* pOut,char* pText);

CChineseCodeLib();
virtual ~CChineseCodeLib();
};

#endif // !defined(__CCHINESECODELIB_H_)




///////////////////////////////////////////////////



// ChineseCodeLib.cpp: implementation of the CChineseCodeLib class.
//
//////////////////////////////////////////////////////////////////////

#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include "ChineseCodeLib.h"

//////////////////////////////////////////////////////////////////////
// Construction/Destruction
//////////////////////////////////////////////////////////////////////

CChineseCodeLib::CChineseCodeLib()
{

}

CChineseCodeLib::~CChineseCodeLib()
{

}


void CChineseCodeLib::UTF_8ToUnicode(wchar_t* pOut,char *pText)
{
char* uchar = (char *)pOut;

uchar[1] = ((pText[0] & 0x0F) << 4) + ((pText[1] >> 2) & 0x0F);
uchar[0] = ((pText[1] & 0x03) << 6) + (pText[2] & 0x3F);

return;
}

void CChineseCodeLib::UnicodeToGB2312(char* pOut,unsigned short uData)
{
WideCharToMultiByte(CP_ACP,NULL,(LPCWSTR)&uData,1,pOut,sizeof(wchar_t),NULL,NULL);
return;
}

void CChineseCodeLib::Gb2312ToUnicode(wchar_t* pOut,char *gbBuffer)
{
::MultiByteToWideChar(CP_ACP,MB_PRECOMPOSED,gbBuffer,2,pOut,1);
return;
}

void CChineseCodeLib::UnicodeToUTF_8(char* pOut,wchar_t* pText)
{
// 注?意?wchar_t高?低?字?的?顺?序?,低?字?节?在?前?,?高?字?节?在?后?
char* pchar = (char *)pText;

pOut[0] = (0xE0 | ((pchar[1] & 0xF0) >> 4));
pOut[1] = (0x80 | ((pchar[1] & 0x0F) << 2)) + ((pchar[1] & 0xC0) >> 6);
pOut[2] = (0x80 | (pchar[0] & 0x3F));

return;
}

void CChineseCodeLib::GB2312ToUTF_8(string& pOut,char *pText, int pLen)
{
char buf[4];
char* rst = new char[pLen + (pLen >> 2) + 2];

memset(buf,0,4);
memset(rst,0,pLen + (pLen >> 2) + 2);

int i = 0;
int j = 0;
while(i < pLen)
{
//如?果?是?英?文?直?接?复?制?就?可?以?
if( *(pText + i) >= 0)
{
rst[j++] = pText[i++];
}
else
{
wchar_t pbuffer;
Gb2312ToUnicode(&pbuffer,pText+i);

UnicodeToUTF_8(buf,&pbuffer);

unsigned short int tmp = 0;
tmp = rst[j] = buf[0];
tmp = rst[j+1] = buf[1];
tmp = rst[j+2] = buf[2];


j += 3;
i += 2;
}
}
rst[j] = '\0';

//返?回?结?果?
pOut = rst;
delete []rst;

return;
}

void CChineseCodeLib::UTF_8ToGB2312(string &pOut, char *pText, int pLen)
{
char * newBuf = new char[pLen];
char Ctemp[4];
memset(Ctemp,0,4);

int i =0;
int j = 0;

while(i < pLen)
{
if(pText[i] > 0)
{
newBuf[j++] = pText[i++];
}
else
{
wchar_t Wtemp;
UTF_8ToUnicode(&Wtemp,pText + i);

UnicodeToGB2312(Ctemp,Wtemp);

newBuf[j] = Ctemp[0];
newBuf[j + 1] = Ctemp[1];

i += 3;
j += 2;
}
}
newBuf[j] = '\0';

pOut = newBuf;
delete []newBuf;

return;
}


///////////////////////////////////////////////////



//A test main program:



//Input utf.txt in the same folder with the source files.



//Output resutl.txt



// Decode1.cpp : Defines the entry point for the console application.


#include "ChineseCodeLib.h"

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

#define LEN 100000

char str[LEN];
string res;


int main(int argc, char* argv[])
{
ifstream fin("utf.txt");
ofstream fout("result.txt");

while(!fin.eof())
{
fin.getline(str,LEN);
if(strlen(str)<10)
continue;
CChineseCodeLib::UTF_8ToGB2312(res,str,strlen(str));
//cout<<res<<endl;
fout<<res;
}

fin.close();
fout.close();

return 0;
}




From: http://blog.csdn.net/Mycro/archive/2005/12/06/544637.aspx

Saturday, February 7, 2009

Useful websites

For blogging, finally I turn back to Blogger. Simple and easy.

www.blogger.com

 

To share pictures, Flickr is the one I like.

http://www.flickr.com/

 

Sharing documents, Scribd is the one I found recently.

http://www.scribd.com/

 

CG News, I prefer to CGW.

http://www.cgw.com/ME2/Default.asp

 

Music, I like this site - HIMUZIK.

http://himuzik.net/blog/

Sunday, February 1, 2009

Using Webzip to retrieve web pages

Environment settings for retrieving the web pages:

1. Copy all the page list to the circled area:

clip_image002

2. Only select the htm file type:

clip_image004

3. For the followed links, set as the following figure shows and keep others default:

clip_image006

4. Add filters to the project:

clip_image008

In the red circle 1, type in the filter string http://paper.people.com.cn/rmrb/html/ and click the “Add” button. The added filter will be listed in the red circle 2.

5. Keep all the other options default. Next, click the button “Run Now!” and wait for your pages.

Google+