Wednesday, November 30, 2011

[HTK] Chinese Encoding

修改HTK源码 HParse,HVite部分,使其支持中文
2010-03-24 12:05

static int IsSpace(char c)
if ((c == 0x09) ||( c == 0x0D) || (c == ' ' ))
return 1;
return 0;
static void PGetSym(void)
+++while ( !IsSpace(ch) || (ch=='/' && inlyne[curpos]=='*') ) //isspace((int) ch)
+++     if (!IsSpace(ch) || isspace((int) ch))  /* skip space */
else {            /* skip comment */
PGetCh(); PGetCh();
while (!(ch=='*' && inlyne[curpos]=='/')) PGetCh();
PGetCh(); PGetCh();        

static void PGetIdent(void)
int i=0;
Ident id;

do {
if (ch==ESCAPE) PGetCh();
if (i<MAXIDENT) id[i++]=ch;
+++   } while (!IsSpace(ch)&& ch!='{' && ch!='}' && ch!='[' && ch!=']' &&//!isspace( (int)ch) 
ch!='<' && ch!='>' && ch!='(' && ch!=')' && ch!='=' && 
ch!=';' && ch!='|' && ch!='/' && ch!='%');
ident = GetLabId(id,TRUE);

ReturnStatus WriteOneLattice(Lattice *lat,FILE *file,LatFormat format)
else if (ln->word!=NULL) {
fprintf(file,"W=%-19s ",ln->word->wordName->name);//
//   ReWriteString(ln->word->wordName->name,注释掉
//                NULL,ESCAPE_CHAR));
$word = 好
| 浩
| 尼
| 你;
(  START_SIL ([sil] )(<$word>)( [sil]) END_SIL )
N=11   L=22   
I=0    W=END_SIL             
I=1    W=sil                 
I=2    W=\304\343            
I=3    W=!NULL               
I=4    W=\304\341            
I=5    W=\272\306            
I=6    W=\272\303            
I=7    W=sil                 
I=8    W=START_SIL           
I=9    W=!NULL               
I=10   W=!NULL               
J=0     S=1    E=0    
J=1     S=3    E=0    
J=2     S=3    E=1    
J=3     S=3    E=2    
J=4     S=7    E=2    
J=5     S=8    E=2    
J=6     S=2    E=3    
J=7     S=4    E=3    
J=8     S=5    E=3    
J=9     S=6    E=3    
J=10    S=3    E=4    
J=11    S=7    E=4    
J=12    S=8    E=4    
J=13    S=3    E=5    
J=14    S=7    E=5    
J=15    S=8    E=5    
J=16    S=3    E=6    
J=17    S=7    E=6    
J=18    S=8    E=6    
J=19    S=8    E=7    
J=20    S=10   E=8    
J=21    S=0    E=9    
N=11   L=22   
I=0    W=END_SIL             
I=1    W=sil                 
I=2    W=你                  
I=3    W=!NULL               
I=4    W=尼                  
I=5    W=浩                  
I=6    W=好                  
I=7    W=sil                 
I=8    W=START_SIL           
I=9    W=!NULL               
I=10   W=!NULL               
J=0     S=1    E=0    
J=1     S=3    E=0    
J=2     S=3    E=1    
J=3     S=3    E=2    
J=4     S=7    E=2    
J=5     S=8    E=2    
J=6     S=2    E=3    
J=7     S=4    E=3    
J=8     S=5    E=3    
J=9     S=6    E=3    
J=10    S=3    E=4    
J=11    S=7    E=4    
J=12    S=8    E=4    
J=13    S=3    E=5    
J=14    S=7    E=5    
J=15    S=8    E=5    
J=16    S=3    E=6    
J=17    S=7    E=6    
J=18    S=8    E=6    
J=19    S=8    E=7    
J=20    S=10   E=8    
J=21    S=0    E=9   
至于HVite部分,我找了近一下午,总算找到改的地方了,修改HSheel.c 中WriteString函数
//   fputc(ESCAPE_CHAR,f);
//  fputc(((n/64)%8)+'0',f);fputc(((n/8)%8)+'0',f);fputc((n%8)+'0',f);

Posted via email from Troy's posterous

[HTK] Chinese encoding

HTK could directly read in the "gbk" encoded MLF or dictionary etc. files. Actually, it could read any kine of encoded file. In HTK, what it does is to read in every byte (char type) and when print them out, each byte is write out in the form of "\abc", which abc is the octal representation of the byte number(=a*64+b*8+c). 

Thus to convert the HTK generated files back to the readable characters, we need following steps:
1) convert the HTK octal representation of byte values to byte array
2) decode the byte array with corresponding encoding, (e.g. for Chinese, we could use "gbk")

Following is the code I used to convert the HTK generated MLF to readable "gbk" encoded MLF file:

import string, codecs

fin=open('vom_utt_wlab.mlf')'vom_utt_wlab.gbk.mlf', encoding='gbk', mode='w')
while True:
    if sr=='':break
    if sr.endswith('.lab"'):
        print >>fout, sr
        while True:
            if sr=='.':break
            if sr.startswith('\\'):
                lst=(sr.strip('\\')).split('\\') # get the list of octal representation of each byte
                for itm in lst:
                    for ii in range(3): # each octal number will have exactly 3 numbers, i.e. of the form \nnn
                print >>fout, bins.decode('gbk')
                print >>fout, sr
        print >>fout, '.'
        print >>fout, sr

Posted via email from Troy's posterous

Thursday, November 24, 2011

[HTK] Increase HTK feature dimension limit

In the HTK feature file, there is a header file specify the basic information of the parameters. 

HTK format files consist of a contiguous sequence of samples preceded by a header. Each sample is a vector of either 2-byte integers or 4-byte floats. 2-byte integers are used for compressed forms as described below and for vector quantised data as described later in section 5.11. HTK format data files can also be used to store speech waveforms as described in section 5.8 

The HTK file format header is 12 bytes long and contains the following data

nSamples                - number of samples in file (4-byte integer)

sampPeriod - sample period in 100ns units (4-byte integer)

sampSize - number of bytes per sample (2-byte integer)

parmKind - a code indicating the sample kind (2-byte integer)

From the above specification, the sampSize is short integer, thus the maximum value for sampSize is 32768. For uncompressed data, the maximum dimension for each sample is thus 32768/4=8192. However, usually even just 1000+ D feature will cause the HTK tools to generate following errors:

OpenParmChannel: cannot read HTK Header in File 

The reason is that in the function ReadHTKHeader of the file HWave.c, there is check for the sampSize value:

if (hdr.sampSize <= 0 || hdr.sampSize > 5000 || hdr.nSamples <= 0 ||

       hdr.sampPeriod <= 0 || hdr.sampPeriod > 1000000)

      return FALSE;

That's to say, in HTK the dimension of the feature vector is limited by this check instead of data type specified in the header format. In the standard version of HTK, at most 1250D feature could be used. To increase the limit, what we need to do is to change the number 5000, but do remember sampSize is short integer, changing to any value larger than 32768 would be useless.

The code at about line 1427 of the file HTKLib/HWave.c.

Posted via email from Troy's posterous

Tuesday, November 8, 2011

Solving iMessage or FaceTime waiting for activation problem

My problem is even further than simply "waiting for activation". After change the SIM card of my iphone, the iMessages sent out are all under the old number ....

However, the number under the iMessage setting is unchangeable. To change it, you need to turn off both the iMessage and FaceTime and then reactivate them. The problem comes when reactivating them, the "waiting for activation" messages last for hours without giving any hints what are going on.

Here comes the "waiting" problem as many people encountered. To solve it, Simply speaking, you need to save your own contact information in the Contacts and in the general settings set my info to the correct contact.


Hi all....I have an iPhone4 with IOS5 and was getting the waiting for activation message for imessages. I was also finding that people I was iMessaging were getting messages from my email address (aka apple ID) rather than my mobile number which was confusing them.  I found following adnanfarooqui's entry above and the iphoneism site very useful and it fixed my iPhone even though it was more directed at Facetime. 

All I did was simply go into my own details in CONTACTS and ensure my own mobile # was set up as 'iphone' in contacts.  Then I went into SETTINGS>MAIL,CONTACTS,CALENDAR>CONTACTS>MY INFO and selected myself.  What this does, I suspect, is tells the iphone who I am as the owner along with the mobile #. Then I went back into SETTINGS>iMESSAGES and found that the 'waiting for activation' message had changed to the normal message with the 'learn more' link included. Then I went down to RECEIVE AT, got into my Apple ID and signed out of that (which I suspect was what my imessages recipients were seeing), which then allowed me to select my own mobile # as the receive at number.  This therefore also fixed the issues I was having with recipients of my imessages getting my email address as the 'from' details as opposed to my mobile # and not matching their own CONTACTS.  Hope that was helpful......

Posted via email from Troy's posterous

Developing a Static Library and Incorporating It in Your Application in Xcode4

Developing a Static Library and Incorporating It in Your Application


When you need to develop a static library to use in an application and you have to have separate projects for each product, you can use a workspace to contain both the static library project and the application project. If you do, ensure that you configure the projects in the workspace as described here:

  1. In the target that builds the static library, ensure that:

    • The exported headers are in the Project group in the Copy Headers build phase.

    • The Skip Install build setting is set to Yes.

  2. In the target that builds the application, ensure that:

    • The User Header Search Paths build setting is set to the recursive absolute path of a directory under which the static library’s header files are stored.

      Important: If you move your static library project directory to a different location in your file system, you must update the value of the User Header Search Paths build setting to reflect the new location of the static library’s header files.

    • The Always Search User Paths build setting is set to Yes.

    • The Skip Install build setting is set to No.

  3. In the scheme that builds the application, ensure the scheme also builds the static library for archiving.

    image: ../art/scheme_editor-archiving_a_static_library.jpg

Content specifications: This content is written for Xcode 4.0.2 and iOS SDK 4.3.

Posted via email from Troy's posterous

Monday, November 7, 2011

Tutorial: Code Sharing Via Static Libraries And Cross-Project References

  on  in 

Guest author Clint Harris (Profile) is an independent software consultant with experience ranging from enterprise web app work to custom iPhone app development. He currently lives in Brooklyn, New York.

Finding an elegant way to reuse and share code (i.e., libraries) across separate iPhone applications can be a bit tricky, especially considering Apple’s restrictions on dynamic library linking and customFrameworks.

Most people agree that the best way to re-use code is to use static libraries. This tutorial builds on that solution, showing how your Xcode project can reference a second Xcode project — one which is used to build a static library.

This approach allows you to automatically build that static library with the rest of your app, using your current build configuration (e.g., debug, release, etc.) and avoid pre-building several versions of the library separately (where each version was built for a specific environment/configuration).

Wanted: An Elegant Way To Share Code Across Projects

If you want to reuse/share code across different iPhone applications, you only have two options that I’m aware of:

  1. Copy all of the source code from the “shared” library into your own project
  2. Keep the shared library code in a separate Xcode project and use it to build static libraries (e.g., libSomeLibrary.a, also referred to as “archive files”) that can be referenced by your project and used via static linking.

The first option, copying the files, should be avoided when possible since it’s inherently redundant and contrary to the goal of keeping “common code” modular and atomic.

It’s a much better idea to put the code in a static library (see since, as mentioned in the introduction, dynamic linking to custom libraries/frameworks isn’t allowed by Apple when it comes to iPhone apps.

For instructions on creating a static library from your code see this tutorial on the Stormy Productions blog.

We’ve established that the second option is preferable, but there’s a catch: you’ll need to build and distribute multiple versions of the static library–one for each runtime environment and build configuration. For example, you would need to build both “release” and “debug” versions of the library for the Simulator, as well as other pairs for the iPhone or iPod device itself.

So, how can we avoid manually pre-building and managing separate .a files?

Solution: Static Libraries Built On-Demand Via Xcode Cross-Project References

The trick to avoid pre-building static libraries for each environment is to use an Xcode “cross-project reference” so that those libraries are built dynamically (i.e., when you build your own app) using your app’s current build configuration.

This approach allows you to both reuse shared source code and avoid the headache of managing multiple versions of the library. Here’s how it works at a high level:

  1. The shared code lives in its own Xcode project that, when built, results in one or more static libraries.
  2. You create an Xcode environment variable with a path to the directory that contains the static library’s *.xcodeproj file.
  3. All iPhone apps that need the static library will use the aforementioned environment variable toreference the library’s Xcode project, including any static library in that project and the related header files.
  4. Each time you build your project for a specific configuration/runtime environment, the shared project library will also be built for that config/environment–if it hasn’t already–and linked with your executable.

In addition to solving the main problem (reusing code and avoiding management of multiple library versions), there are a couple of nice benefits to this strategy:

First, if you make changes to the shared library source code, those changes will immediately be included the next time you build your own project (via the cross-project reference).

Second, you can modify the Xcode environment variable to point to different versions of any project. For example, you might have separate directories for “somelibrary-1.0″ and “somelibrary-2.0″; as you’ll see in the detailed solution instructions, it’s easy to modify the environment variable and switch your project to a different version of “somelibrary.”

Implementing Cross-Project References

The instructions for setting up cross-project references to shared static libraries can be split into two parts:

  • Part 1: Global Xcode Settings
  • Part 2: Project-Specific Settings

I’ll be using an example in the instructions to help illustrate things. A suitable example would be an application that needs to use a shared static library from a separate project. In this case, I’ll use a sample iPhone app called “

Posted via email from Troy's posterous

Friday, November 4, 2011

HDecode in lattice rescoring mode

When training NN, the training process is usually controlled by the frame accuracy (or frame error rate). However, it is not directly related to the speech recognition performance, i.e. PER or WER. 

One way is to do decoding after each time the network weights are updated. For phoneme recognition, it is fine, as the decoding doesn't take too much time. When coming to word recognition, the decoding is quite time consuming. To speed up, one possible way is to using lattices instead a full decoding. 

Invoke HDecode with "-w" without language model, it will run in lattice rescoring mode (of course, you need set the input lattices parameters). But where to get the lattices? My setup is as follows:

1) Using HDecode and HMM system to generate lattices using bigram LM ( can also use higher order LM);
2) Using HLRescore to prune the lattices to word networks ( with -m f/b, and save the new lattices with -w);
3) Using HDecode to rescore the lattices with new acoustic model or new posterior features ( in NN case).

Posted via email from Troy's posterous

Wednesday, October 12, 2011

[HTK] CreateHMM: multiple use of logical HMM name sp

When expanding the monphone system to triphone system, I encountered this problem. 

From the page :, it suggests to check the mktri.hed file, where I found the expanding of sp as triphones. 

Thus the problem is in the process of generating the mktri.hed file, which is done using:

perl maketrihed [mono_list] triphn_seen.lst

The correct way of doing so is:

perl maketrihed [mono_no_sp_no_sil_list] triphn_seen.lst

Posted via email from Troy's posterous

Thursday, September 29, 2011

PCA whitening

I have encountered this phrase "PCA whitening" many times, all I thought about it was just a simple PCA. However, it is not!

The Whitening is actually PCA + scaling!!!


Also there is a good implementation explanation here :

Implementing PCA_Whitening - Ufldl.pdf Download this file

Posted via email from Troy's posterous

PCA again

PCA is a simple but rather popular techniques for feature processing and is used widely as some initial processing steps involved in many applications. However, every time when I need to do PCA, I have to find some references :<. 

Thus again, to put some information for my own reference. The figure is from the attached slides I encountered today, quite clear.


3 - Feature extraction.pdf Download this file

Posted via email from Troy's posterous

Saturday, September 3, 2011

Unity3D Resources

The original post is from:
This is just for my own reference!

As promised, this blog entry is going to give an overview of existing Unity resources. I have tried to cover a wide variety of video and text-based tutorials, extensions for Unity and links to most example projects and documentation on the web. I have tried to categorize the resources and also present them in an order that should get you started in no time. Furthermore, I have spent a substantial amount of hours either absorbing these tutorials/docs completely or at least scanning through them (I haven't bought any paid tutorials except Will Goldstone's book). Thus, I believe all of these links should lead you to some form of valuable information about Unity. Thanks to all the authors for doing such an incredible job. Without further ado, my take on summarizing resources for Unity.

Unity website:

Which license do you need?

Download the free version of Unity to get started:

If you have no previous experience with Unity, start with these six video tutorials which give a quick overview of the Unity interface and some important features 

Continue with a more in-depth text-based walk through of very basic Unity functionality and work flow

To get you started with scripting, have a look at the following PDF document. It was written for an older version of Unity, but still covers relevant aspects of scripting with JavaScript. (2 hours to complete, no previous JavaScript knowledge required) 

Unity features three scripting languages; JavaScript, C#, and Boo. Depending on your scripting language of choice, you might want to choose different tutorials to get started with Unity. Each tutorial link will also mention which scripting language is used. Most Unity tutorials available on the web are using JavaScript. Notice that JavaScript is commonly used for web applications. Unforunately, books which cover JavaScript are normally of little use for Unity's implementation of JavaScript. But worry not, a vast amount of tutorials will follow...

If you want to familiarize yourself with Unity's functionality more, browse through Unity's manual. You can skip the “Basics” section as we already went through this (see above). 

For a printable version of the 600+ pages manual, see

If you rather prefer to jump into the action, skip the manual altogether and have a look at the three books which have been published on Game Development with Unity (one recently added on 06/10/11). 
1) Author Will Goldstone guides you through a complete project and introduces most of Unity's functionality (using JavaScript). The book is available as print and digital version and all needed assets and resources can be downloaded with the book. It's been a helpful investment from my point of view. 
This forum thread discusses the book and its content.

2) Ryan Henson Creighton more recently published "Unity 3D Game Development by Example Beginner's Guide". Find out more about the book here:
and the forum thread -

3) Craig Stevenson and Simon Quiq (publisher Deep Pixel) released "Unity 3 Blueprints: A practical guide to Indie games development". Their website provides all the art assests and code to create four classic games from scratch (Match the Pairs, Top-Down Shooter, Tower Defense, Marble Madness).
Amazon link

To get up-to-date on Unity 3 functionality, check out the following official documentation:

Unity 3 - What's new?

- Beast lightmapping Basics -
- Beast Lightmapping In-Depth -
- Tree Creator -
- Umbra Occlusion Culling -

You are now faced with the choice of tackling example projects provided by Unity Technologies or jump straight into user-generated tutorials. As the example projects are quite complex, I would suggest working through some video tutorials first. Nonetheless, here are the links for Unity's official example projects. They can be taken completely apart, reverse-engineered and reused for own projects.

3D Platform Game
2D Platform Game
Iphone tutorials and more example projects

More Unity Example projects 

Brand New Car Tutorial by Unity Technologies

Now, let's head on to TUTORIALS. This list starts with mostly basic and general tutorials at the top and lists more specific tutorials at the end.

Will Goldstone , author of the Unity Game Development book, worked on a series of helpful video tutorials here (using JavaScript): (also
More recently by Will Goldstone:

TornadoTwins Video Tutorials 
They show step by step how to create a simple game using Javascript. 

Walker Boys Studio - Unity Tutorials from the Guildhall at SMU (an extensive list of tutorials with more on the way)

CannedMushroom Video Tutorials (Unity and other software)
This is a series of projects intended for 2-hour self-instructed lessons using JavaScript 

Unity Jumpstart 
Proof of Concept Game to learn creating your own game from the ground up (JavaScript) 
Series of Unity tutorials in C# (among many other tutorials)
registration required (free)
website -
forum thread -!

BurgZergArcade - Unity Tutorials
Hack&Slash Tutorial using C# and plenty of other information and useful tutorials
website -

UnityScript Basics (Scripting Basics for Noobs)
If you're new to scripting, read up on this good introduction to scripting in Unity. Lots of analogies are provided which makes it really easy to understand. The details are explained for C#, but the introduction is great for any scripting language.
website -

Text-based tutorials with screenshots covering a wide range of topics incl. scripting (using JavaScript), basic introduction to unity, character controllers, and user interfaces 

InfiniteAmmo Tutorial 
General Introduction to Unity (3 parts so far)
Some scripting covered using Javascript (e.g. movement controls) 

Introduction to Game Development with Unity including Workflow, Scripting, GUI, Version Control, etc. 

Advanced Media Lab at North Carolina State University 

GearTech Games 
4 Videos on how to work through a project (and more videos)

workflow between Modo and Unity

Making Sense of Unity 
These video tutorials provide more indepth coverage of unity scripting using Javascript (not meant to be introductory material)

lecture style explanations of concepts
more entertaining and not like most other screen-capture tutorials :)

IO Development Diary
This series of video tutorials (some paid, some free) follows the development of a Space Shooter and covers a Modo-Unity workflow and C# scripting.
forum thread -
website -

Virtual Autonomy 
Guide for working with Google SketchUp and Unity
(also shaders and multi-user environments)
text-based with screenshots

Robotduck - Blog 
The blog of this Unity user provides tips on Unity scripting and functionality and showcases some of his projects 

Ethical Games 
Unity Tutorials for Flash Developers 

Car Tutorial 
Physics Setup for a car, AI for driving around racetrack 

Terrain Tutorial 
seven videos on how to create terrains in Unity from heightmaps to finish 

In-Depth Terrain Tutorial 
text-based with screenshots 
forum thread 

Paul Bourke -Unity for stereoscopic display
text-based with screenshots 

Paul Bourke – Unity and Idome
text-based with screenshots 

RENCI – Unity for Dome projections 

Official Unity Tech. Tutorial for the Animation View (NEW) 3-part-series of video tutorials


VTC - Unity tutorial using JavaScript (subscription based content on
Unity-Tutorials (mostly paid and some free tutorials) (subscription based tutorials, Unity-related content among other software like Maya, 3DS Max, Photoshop)  


Once you went through some or all of these tutorials, you should be creating your own content in no time. Next, I'll provide an overview of general resources for Unity. Whenever you need to find some information about Unity, scripting, projects, collaborations or anything Unity-related, check these links out:

Searching for resources - The All-In-One Unity Reference Search 
credit goes to Robotduck for providing this link to the public; tremendous time saver (see

Unity Scripting Reference 

Unity Component Reference 
each available Component described in detail 

Unity Wiki (UnifyCommunity) 

Unity Answers 
Invaluable when you have specific questions about Unity or Scripting 
This link should also get you started on learning Unity 

Unity Forum 

Unity Feedback 
feature requests go here 

Unity IRC 
Point your favorite IRC client to and join #unity3d to chat in real time. 

Overview of Unity Resources 

Overview of Unity blogs 


Lastly, I want to list extensions and tools which can make your life as a Unity developer easier.

Unity Extensions

Terrain Toolkit
External Lightmapping Tool
Locomotion System
Explosion Framework
Head Look Controller
overview of some extensions – Unity youtube and vimeo channels 

Visual C# Express-  free IDE for your C# development

3DAttack - Tools and Home of Unity Creative Magazine
First Person Shooter Developer Kit
forum thread -
website -

One of Unity's developers provides projects to extend Unity's functionality (e.g. pathfinding and AI) 
forum thread Path -!

sturestone's A* Pathfinding (currently version 2.9)
forum thread -*-Pathfinding-2.9-Is-Released-(Unity-3-Compatible)

Augmented Reality / Webcam Input 
forum threads: (Webcam Toolkit) (ARToolkit Extension) (UnityAR)

SeeingMachines FaceAPI / VisionBlaster – Head Tracking in Unity (purchase required) 

Mostly Tigerproof – Using Google Analytics and Unity to track game stats
This is a blog entry about Google Analytics and Unity 

Antares Project - Extensive Set of Tools to extend the Unity Editor - Open Source
Also available: Antares.dll (free for non-commercial work)
forum thread -

Antares Deformator - Deform your meshes (Beta Version)
forum thread -

UniWii – WiiMote implementation 
forum thread 

Unity Terrain Tools - EasyRoads (purchase required)
forum thread

Six Times Nothing - Road/Path Tool and River Tool

Dastardly Banana - FPS Weapon Tool, Radar example

Starscene Software - Tools, Games and Utilities for Unity (purchase required for utilities)
e.g. Vectrosity - Line Drawing Tool
Fractscape - Terrain Tool
Stitchscape - Stitch multiple terrains together

GUIX - visual Menu/GUI builder (purchase required)

EZ Game Saver - saving tool (purchase required)
Note that I will cover saving to text file in a later blog

Fire Tool - Realistically spreading fire (on hold for the moment)

Decal Framework - Easily place decals in your scene
forum thread -

Visual Logic Editor by NeoDrop (Antares VIZIO, Work in Progress)
forum thread -

Nimbus Volumetric Clouds
forum thread -

RapidUnity Vehicle Editor Resource Pack
forum thread -

Ocean 3D - Ocean Simulation

ShaderFusion - Node-Based Shader Editor (Requires Unity3)
forum thread -

Strumpy Shader Editor - Node-Based Shader Editor
forum thread -

Overview of extensions on UnifyWiki 

LightUp (purchase required)
Extension of Google Sketchup (Lighting Solution) which works nicely for exporting lightmaps to Unity 

Community Project - GTAIV Vehicle Replica (Pledge of > USD 50 required)
forum thread -

Stereoscopic Solutions
3D Anaglyph System (purchase required)

Plugin for Kinect's Primesense Camera
forum thread -

Unity Web Suite - tutorials and examples in C# to create online content
forum thread -
website -

Tools for Visual Programming:
Antares Universe - Vizio (forum thread)
cost (as of 05/22/11) Euro 142.50 
Visual programming tool similar to the approaches of Quest3D and Virtools.

Playmaker by Hutong Games
cost (as of 05/22/11) Euro 95.00
Visual State Machine Editor (website and forum thread)

uScript by Detox Studios (Beta Version)
website and forum thread
Visual Scripting Tool based on UDK's Kismet

Posted via email from Troy's posterous

Thursday, September 1, 2011

Resolving nvcc & gcc conflict ion for Theano

There is a compatibility issue affecting some Ubuntu 9.10 users, and probably anyone using CUDA 2.3 with gcc-4.4. Symptom: errors about “__sync_fetch_and_add” being undefined. Solution 1: make gcc-4.3 the default gcc ( 2: make another gcc (e.g. gcc-4.3) the default just for nvcc. Do this by making a directory (e.g. $HOME/.theano/nvcc-bindir) and installing two symlinks in it: one called gcc pointing to gcc-4.3 (or lower) and one called g++ pointing to g++-4.3 (or lower). Then add compiler_bindir = /path/to/nvcc-bindir to the [nvcc] section of your .theanorc (libdoc_config).

.theanorc file should be under the folder $HOME. If there is no such file, then create a new one.

Posted via email from Troy's posterous

Wednesday, August 24, 2011

Papers on learning sparse Gaussian Precision Matrix

The basic idea of learning a sparse full covariance for the Gaussian using Graphical models was proposed in the paper "Covaraince Selection".

In the paper "Sparse Gaussian Graphical Models for Speech Recognition", the authors adopted the sparse full covariance learning to speech recognition.

In the third paper, "Projected Subgradient Methods for Learning Sparse Gaussians", a new approach for estimating the covaraince is proposed.

covariance selection.pdf Download this file

Sparse Gaussian Graphical Models for speech recognition_is2007.pdf Download this file

projected subgradient methods for learning sparse gaussians.pdf Download this file

Posted via email from Troy's posterous

LangBrain Website

"If the human mind were simple enough for us to understand,
we would be too simple-minded to understand it."


The ability of humans to speak and to understand speech requires an enormous amount of brain resources. These resources have to manage information about many thousands of words and many syntactic constructions and their interconnections, not just to one another but to meanings and to the structures that allow us to recognize the sounds of speech and to move the muscles of our mouths to produce speech. This complex combination of brain structures can be called the brain's linguistic system. It allows a person not only to talk and to understand speech but also to read and write. It also gives us the power to think as well as the power to acquire new knowledge and abilities and to learn how to speak in the first place. The Langbrain website is about this system.

Posted via email from Troy's posterous

Tuesday, August 23, 2011

Graphical Gaussian Models for Genome Data



For software to efficiently identify GGM networks from data visit theGeneNet page.

A simple method for inferring the network of (linear) dependencies among a set of variables is to compute all pairwise correlations and subsequently to draw the corresponding graph (for some specified threshold). While popular and often used on many types of genomic data (e.g. gene expression, metabolite concentrations etc.)the naive correlation approach does not allow to infer the dependency network. Instead, graphical Gaussians models (GGMs) should be used. These allow to correctly identify direct influences, have close connections with causal graphical models, are straightforward to interpret, and yet are essentially as easy to compute as naive correlation models. This page lists pointers to learning GGMs from data, including procedures suitable for "small n, large p" data sets (category iii).



Graphical Gaussian Models (GGMs), also known as "covariance selection" or " concentration graph" models, have recently become a popular tool to study gene association networks. The key idea behind GGMs is to use partial correlations as a measure of independence of any two genes. This makes it straightforward to distinguish direct from indirect interactions. Note that partial correlations are related to the inverse of the correlation matrix. Also note that in GGMs missing edges indicate conditional independence.

A related but completely different concept are the so-called gene relevance networks which are based on the "covariance graph" model. In the latter interactions are defined through standard correlation coefficients so that missing edges denote marginal independence only.

There is a simple reason why GGMs should be preferred over relevance networks for identification of gene networks: the correlation coefficient is weak criterion for measuring dependence, as marginally, i.e. directly and indirectly, more or less all genes will be correlated. This implies that zero correlation is in fact a strong indicator for independence, i.e. the case of no edge in a network - but this is of course not what one usually wants to find out by building a relevance network... On the other hand, partial correlation coefficients do provide a strong measure of dependence and, correspondingly, offer only a weak criterion of independence (as most partial correlations coefficients usually vanish).

The best starting place to learn about GGMs is the classic paper that introduced this concept in the early 1970s. (A.P. Dempster. 1972. Covariance Selection. Biometrics 28:157-175). Further details can be found in the GGM books by J. Whittaker (1990) and by D. Edwards (1995).


Application of GGMs to Genomic Data:

Application of GGMs to genomic data is quite challenging, as the number of genes (p) is usually much larger than the number of available samples (n), and classical GGM theory is not valid in a small sample setting. With this page I'd like to provide a commented list of some recent work dealing with GGM gene expression analysis (there are only very few so far). In my understanding, all of these paper fit in one of three categories:

  1. analysis with classic GGM theory,
  2. using limited order partial correlations, and
  3. application of regularized GGMs.

For small n, large p data it seems that methods from section iii. are most suited (see below for references and software).


I. Classic GGM Analysis:

The following papers simply apply classical GGM theory (i.e. with not further modification) to analyze gene expression data. It turns out that such an analysis is necessarily restricted to very small numbers of genes or gene clusters as to satisfy n > p.

  1. P. J. Waddell and H. Kishino. 2000. Correspondence analysis of genes and tissue types and finding genetics links from microarray data. Genome Informatics 11:83-95
  2. P. J. Waddell and H. Kishino. 2000. Cluster inferences methods and graphical models evaluated on NCI60 microarray gene expression data. Genome Informatics 11:129--140
  3. H. Toh and K. Horimoto. 2002. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 18:287--297
  4. H. Toh and K. Horimoto. 2002. System for automatically inferring a genetic network from expression profiles. J. Biol. Physics 28:449--464
  5. X. Wu, Y. Ye and K. R. Subramanian. 2003. Interactive analysis of gene interactions using graphical Gaussian model. ACM SIGKDD Workshop on Data Mining in Bioinformatics 3:63-69


II. Limited Order Partial Correlations:

One way to circumvent the problem of computing full partial correlation coefficients when the sample size is small compared to the number of genes is to use partial correlation coefficients of limited order. This results in something inbetween a full GGM model (with correlation conditioned on all p-2 remaining genes) and a relevance network model (with unconditioned correlation). This is the strategy employed in the following papers:

  1. A. de la Fuente, N. Bing, I. Hoeschele, and P. Mendes. 2004. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20:3565-3574.
  2. A. Wille, P. Zimmermann et al. 2004. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology 5:R92
  3. P. M. Magwene and J. Kim. 2004. Estimating genomic coexpression networks using first-order conditional independence. Genome Biology 5:R100
  4. A. Wille and P. Bühlmann. 2006. Low-order conditional independence graphs for inferring genetic networks. Statist. Appl. Genet. Mol. Biol. 4: 32.
  5. R. Castelo and A. Roverato. A robust procedure for Gaussian graphical model search from microarray data with p larger than n. Preprint.


III. Regularized GGMs:

Another possibility (and in my opinion the statistically most sound way) to marry GGMs with small sample modeling is to introduce regularization and moderation. This essentially boils down to finding suitable estimates for the covariance matrix and its inverse when n < p. This can either be done in a full Bayesian manner, or in an empirical Bayes way via variance reduction, shrinkage estimates etc. Once regularized estimates of partial correlation are available then heuristic searches can subsequently to be employed to find an optimal graphical model (or set of models).

Outside a genomic context using regularized GGMs was first proposed by F. Wong, C.K. Carter, and R. Kohn. (2003. Efficient estimation of covariance selection models. Biometrika 90:809-830). For gene expression data this strategy is pursued in the following papers:

  1. A. Dobra, C. Hans, B. Jones, J.R. Nevins, and M. West. 2004. Sparse graphical models for exploring gene expression data. J. Multiv. Analysis 90:196-212.
    See the web page of M. West for various other related articles.
  2. In these papers a regularized estimate of the correlation matrix is obtained, either by Stein-type shrinkage (3) or by bootstrap variance reduction (2). This estimate is subsequently employed for computing partial correlation. Network selection is based on false discovery rate multiple testing. This method is implemented in GeneNet.
  3. J. Schäfer and K. Strimmer. 2005. Learning large-scale graphical Gaussian models from genomic data. In: J. F. Mendes. (Ed.). Proceedings of "Science of Complex Networks: from Biology to the Internet and WWW" (CNET 2004), Aveiro, PT, August 2004. (Publisher: The American Institute of Physics).
  4. N. Mainshausen and P. Bühlmann 2006. High-dimensional graphs and variable selection with the lasso. Annals of Statistics 34 (3)
    This approach uses lasso regression to induce sparsity on a node level among the partial correlations.
  5. These authors regularize the concentration matrix rather than the covariance matrix.

Posted via email from Troy's posterous