Dream & Passion: QuickNet

Showing posts with label QuickNet. Show all posts

Tuesday, October 27, 2009

Quicknet - Bug for Learning Partial Weights

In the previous post, I have do some modification to enable the Quicknet for learning arbitrary weight sections.

However, there is still a bug which I just found it.

In the original Quicknet design, the partial weights that can be learned is only the last part. By last part, suppose for a 4 layer MLP, the parameter "mlp_lrmultiplier" is set to be "x,y,z", x control the weight between layer 1 and layer 2, y controls the weight between layer 2 and layer 3, and z controls the weight between layer 3 and layer 4.

If we set y to be 0, then no matter what value z is set, the actual z used in Quicknet is 1. That means after the first un-updated weight section, the following weights are all un-updated.

Also in the original design, if the weight is no updated, the error will not be back propagated through that layer.

All in all, in the last modification we should not remove the "break" sentences.

The new version can be found here.

Monday, October 12, 2009

Quicknet - Enable Learning Partial Weights

As in the original Quicknet_v3_20, the parameter mlp_lrmultiplier can scale the learning rate per-section. If the value is set to 0.0, then that section weight will not be updated. However, only it does not provide full freedom to set any section to be kept during training. As the values after the first non-zero value will be non zeros.

To remove this restrict, following steps need to be done:
In QN_MLP_BunchFlVar.cc:
QN_MLP_BunchFlVar::train_bunch() function:
about line 427:
if (cur_layer!=nlayers-1 && backprop_weights[cur_weinum+1])

ToDo: remove "&& backprop_weights[cur_weinum+1]"

In QN_MLP_BaseFl.cc
QN_MLP_BaseFl::set_learnrate() function:
about line 334:
else
break;

ToDo: remove these two sentences.

The workable sourcecode, at least works for my purpose, can be found here: https://www.comp.nus.edu.sg/~li-bo/src/quicknet_linAF.tar

Friday, October 9, 2009

Quicknet_v3_20 Bug-I

There is a small bug in the softtarget trainer.

After the pre-run cross validation, the qnmultitrn stopped and gave out the error "segmentation fault".

This bug is simply because that in the file "QN_trn.cc", in the function(begins at line 598):

////////////////////////////////////////////////////////////////
// "Soft training" object - trains using continuous targets.

QN_SoftSentTrainer::QN_SoftSentTrainer(int a_debug, const char* a_dbgname,
                       int a_verbose,
                       QN_MLP* a_mlp,
                       QN_InFtrStream* a_train_ftr_str,
                       QN_InFtrStream* a_train_targ_str,
                       QN_InFtrStream* a_cv_ftr_str,
                       QN_InFtrStream* a_cv_targ_str,
                       QN_RateSchedule* a_lr_sched,
                       float a_targ_low, float a_targ_high,
                       const char* a_wlog_template,
                       QN_WeightFileType a_wfile_format,
                       const char* a_ckpt_template,
                       QN_WeightFileType a_ckpt_format,
                       unsigned long a_ckpt_secs,
                       size_t a_bunch_size,
                       float* a_lrscale)
    : clog(a_debug, "QN_SoftSentTrainer", a_dbgname),
      ...

Two variable debug and dbgname is not initialized. As debug is an integer, the uninitialized variable did not cause big problem. However, dbgname is a pointer which casued the "Segmentation fault".

To fix this bug only add the following two blue lines in the code.

    : debug(a_debug),
      dbgname(a_dbgname),
      clog(a_debug, "QN_SoftSentTrainer", a_dbgname),
      ...

Friday, August 7, 2009

Quicknet AF - II

From the files:
QN_fltvec.h - qn_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_vf_vf(...)
QN_fltvec.cc - qn_fe_sigmoid_f_f(...)

we can see that the sigmoid function used in the hidden layer is different from the sigmoid function used in the output layer.
In the hidden layer, the sigmoid function is:
f(x)=1/(1+exp(-x))

In the output layer the sigmoid function is:
f(x)=tanh(s.a*b+c)

Sunday, August 2, 2009

Quicknet Activation Function in Hidden Layer

In the file, “QN_MLP_BunchFlVar.cc”, the function QN_MLP_BunchFlVar::forward_bunch(size_t n_frames, const float* in, float* out) has the following part of codes:

// Check if we are doing things differently for the final layer.
if (cur_layer!=n_layers - 1)
{
    // This is the intermediate layer non-linearity.
   qn_sigmoid_vf_vf(cur_layer_size, cur_layer_x,
             cur_layer_y);
}
else
{
    // This is the output layer non-linearity.
    switch(out_layer_type)
    {
    case QN_OUTPUT_SIGMOID:
    case QN_OUTPUT_SIGMOID_XENTROPY:
    qn_sigmoid_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    case QN_OUTPUT_SOFTMAX:
    {
    size_t i;
    float* layer_x_p = cur_layer_x;
    float* layer_y_p = out;

    for (i=0; i<n_frames; i++)
    {
        qn_softmax_vf_vf(cur_layer_units, layer_x_p, layer_y_p);
        layer_x_p += cur_layer_units;
        layer_y_p += cur_layer_units;
    }
    break;
    }
    case QN_OUTPUT_LINEAR:
    qn_copy_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    case QN_OUTPUT_TANH:
    qn_tanh_vf_vf(cur_layer_size, cur_layer_x, out);
    break;
    default:
    assert(0);
    }
}

The activation function of MLP in quicknet tools, the activation function of hidden layers are all set to sigmoid by default.

Only the activation function can be set by users.

Thursday, July 16, 2009

What feature normalization should be used?

From: http://tech.groups.yahoo.com/group/icsi-speech-tools/message/144

It's generally recommended to always use some kind of feature mean and
variance normalization with Quicknet.

The minimum amount of normalization is normalization using a single
set of mean and variance estimates calculated over the entire training
set. In this case, normalization simply amounts to translating the
origin of the feature space (the mean normalization) and re-scaling
the axes of the feature space (the variance normalization), since all
data is normalized using the same mean and variance estimates. This is
recommended as a minimum amount of normalization since very big or
very small numbers can cause sigmoids to saturate, and since the
optimal learning rate during MLP training may depend on the scale of
the data (and thus normalization makes it less likely you'll need to
re-tune the learning rate).

It's very common to go further and do normalization at the utterance
or speaker level. This can be useful for reducing the amount of
variability in the features due to speaker differences and channel
differences.