Saturday, May 11, 2013
Moodle - Simple workflow
Wednesday, May 8, 2013
Moodle - Initial try for bug fixing of MDL-36020
Tuesday, May 7, 2013
Moodle - First attempt
First install tasksel...
$ sudo apt-get install tasksel
... and then the LAMP stack:
$ sudo tasksel install lamp-server
Create a mysql database
$ mysql> CREATE DATABASE moodel;
Create a mysql user
For creating a new user with all privileges on the moodle database at mysql prompt type:
$ mysql> GRANT ALL PRIVILEGES ON moodel.* TO 'yourusername'@'localhost' IDENTIFIED BY 'yourpassword' WITH GRANT OPTION;
Next fill out the corresponding information to Moodle.
After updating the admin account information for Moodle site, the final admin workspace will be presented to you.
4. Taking Tim's suggestions on http://dev.moodle.org/mod/forum/discuss.php?d=1803 . Following configuration is done at the admin workspace:
After installing a copy of Moodle for development, the first thing you should do is:
- Go to Site administration -> Development -> Debugging
- Set Debug messages to DEVELOPER, and Turn on Display debug messages. (Consider turning on some of the other options too.)
- In the administration block, search for "Cache" then
- Turn off Cache all language strings.
- Set Text cache lifetime to No
- Turn on Theme designer mode
- Turn off Cache Javascript
Immediately after the installation, set your name and contact e-mail. The name and e-mail will become part of your commits and they can't be changed later once your commits are accepted into the Moodle code. Therefore we ask contributors to use their real names written in capital letters, eg "John Smith" and not "john smith" or even "john5677".
git config --global user.name "Your Name" git config --global user.email yourmail@domain.tld
Unless you are the repository maintainer, it is wise to set your Git to not push changes in file permissions:
git config --global core.filemode false
Then register the upstream remote:
cd moodle git remote add upstream git://git.moodle.org/moodle.git
Then use following commands to keep the standard Moodle branches at your Github repository synced with the upstream repository. You may wish to store them in a script so that you can run it every week after the upstream repository is updated.
#!/bin/sh git fetch upstream for BRANCH in MOODLE_19_STABLE MOODLE_20_STABLE MOODLE_21_STABLE MOODLE_22_STABLE MOODLE_23_STABLE MOODLE_24_STABLE master; do git push origin refs/remotes/upstream/$BRANCH:$BRANCH done
Wednesday, April 10, 2013
[Paper] Increased Robustness of Noisy Speech Features Using Neural Networks
The paper could be found at
Increased Robustness of Noisy Speech Features Using Neural Networks
Thursday, January 24, 2013
Noisy speech
For research purpose, we usually collect clean data and pure noise data and then generate noisy speech by combining them. Sometimes, we use some filters to create the channel distortion effects. Speech at different SNRs are created and used for evaluation. Like in the Aurora 2 dataset, 6 SNRs from 20dB to -5dB are used for evaluation.
To understand the difficulty in recognizing the noisy speech, the spectrograms of them at different SNRs are plotted. From these relatively high resolution spectrograms, at SNR0 the patterns are already quite confusing. At SNR-5 it is hard to extract speech patterns from the noise.
While in speech recognition, the FBank features used are rather low resolution to the spectrograms. Due to the value ranges, the patterns are relatively hard. That's also why usually we use CMVN to preprocess the features before sending to NNs.
The CMVN normalized FBank features are shown as follows. The dynamic parameters actually helps a lot to locate the patterns.
Although from my experience, the dynamic coefficients are really helpful. I always have the question of whether is that because the high dimension. If we use higher dimensional static features, we will have much more detailed information, will that outperform the dynamic features? However, when I try to extract the same number of FBanks, there are several dimensions always giving 0 values. This may be saying the current feature extraction methods are in some aspects limited.
First the per utterance normalized static parts (40D) of the above features are displayed below:
Following are illustrations of 80 FBanks of static features.
Although more FBanks are visually more favorable (at least to me), it is hard to say how the ASR can benefit form them. Maybe some automatically learnt features directly from the waveform signals would be helpful.
Tuesday, August 21, 2012
GSoC 2012: Pronunciation Evaluation #Troy – Project Conclusions
This article briefly summarized the Pronunciation Evaluation Web Portal Design and Implementation for the GSoC 2012 Pronunciation Evaluation Project.
The pronunciation evaluation system mainly consists following components:
1) Database management module: Store, retrieve and update all the necessary information including both user information and various data information such as phrases, words, correct pronunciations, assessment scores and etc.
2) User management module: New user registration, information update, change/reset password and so on.
3) Audio recording and playback module: Recording the user's pronunciation for further processing.
4) Exemplar verification module: Justify whether a given recording is an exemplar or not.
5) Pronunciation assessment module: Provide numerical evaluation at the phoneme level (which could be aggregated to form higher level evaluation scores) in both acoustic and duration aspects.
6) Phrase library module: Allow users to create new phrases into the database for evaluation.
7) Human evaluation module: Support human experts to evaluate the users' pronunciations which could be compared with the automatically generated evaluations.
The website could be tested at http://talknicer.net/~li-bo/datacollection/login.php. Do let me know (troy.lee2008@gmail.com) once you encounter any problem as the site needs quite a lot testing before it works robustly. The complete setup of the website could be found athttp://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/branches/speecheval/troy/. More detailed functionality and implementations could be found in a more manual like report:
Although it is the end of this GSoC, it is just the start of our project that leveraging on open source tools to improve people's lives around the world using speech technologies. We are currently preparing using Amazon Mechanical Turk to collect more exemplar data through our web portal to build a rich database for improved pronunciation evaluation performance and further making the learning much more fun through gamification.
Tuesday, July 17, 2012
GSoC 2012: Pronunciation Evaluation Week 8 Status
Tuesday, July 10, 2012
GSoC 2012: Pronunciation Evaluation Week 7 Status
Tuesday, June 26, 2012
Troy: GSoC 2012 Pronunciation Evaluation Week 5
Monday, June 18, 2012
Troy: GSoC 2012 Pronunciation Evaluation Week 4
Tuesday, June 12, 2012
Troy: GSoC 2012 Pronunciation Evaluation Week 3
Monday, June 4, 2012
Troy: GSoC 2012 Pronunciation Evaluation Week 2
if [ -e "$pidfile" ] then # check whether the process is running rtmppid=`/usr/bin/head -n 1 ${pidfile} | /usr/bin/awk '{print $1}'`; # restart the process if not running if [ ! -d /proc/${rtmppid} ] then /usr/bin/python ${exefile} -r ${dataroot} & rtmppid=$! echo "${rtmppid}" > ${pidfile} echo `/bin/date` "### rtmp process restarted with pid: ${rtmppid}" fi fi
In this script, first we will check whether the .process files ( i.e. the $pidfile) exists or not. If we don't want the server to check for this process for now (maybe when we apply patches to the program), we could simply delete this file and it won't check the process again. And after the maintenance, recreate the file with the new process id. The checking will automatically going on.
The checking itself is quite simple: getting the process id from the file and see whether the process exists by looking into the /proc system folder where each running process will have a folder. Goolge the '/proc linux' you will get more information about this mystery folder which contains quite a lot information about your system.
2. Implement the login and registration pages using HTML5.
First for user information storage, we use MySQL database, thus a user table is designed and created in the server's mysql database:
| Field | Type | Comments |
| userid | INTEGER | Compulsory, automatically increased, primary key |
| VARCHAR(200) | Compulsory, users are identified by emails | |
| password | VARCHAR(50) | Compulsory, encrypted using SHA1, at least 8 alphanumeric characters |
| name | VARCHAR(100) | Not compulsory, default 'NULL' |
| age | INTEGER | Not compulsory, default 'NULL', accepted values [0,150] |
| sex | CHAR(1) | Not compulsory, default 'NULL', accepted values {'M', 'F'} |
| native | CHAR(1) | Not compulsory, default 'NULL', accepted values {'Y', 'N'}. Indicating the user is a native English speaker or not. |
| place | VARCHAR(1000) | Not compulsory, default 'NULL'. Indicating the place when the user lived at the age between 6 and 8. |
| accent | CHAR(1) | Not compulsory, default 'NULL', accepted values {'Y', 'N'}. Indicating the user has a self-reported accent or not. |
The creation of the database:
CREATE TABLE users ( userid INTEGER NOT NULL AUTO_INCREMENT, email VARCHAR(200) NOT NULL, password VARCHAR(50) NOT NULL, name VARCHAR(100), age INTEGER, sex SET('M', 'F'), native SET('Y', 'N') DEFAULT 'N', place VARCHAR(1000), accent SET('Y', 'N'), CONSTRAINT PRIMARY KEY (userid), CONSTRAINT chk_age CHECK (age>=0 AND age<=150) );Secondly, the login and simple registration pages are implemented in HTML5 which I have to learn in practice. Follows are the screenshots of the pages:
Also if you are interested, you can go to this page to help us test the system: http://talknicer.net/~li-bo/datacollection/login.php . On the server side we use PHP to retrive information from the page and do the query in mysql database and finally sending the data back to the page.The recording interface, has also been ported to use HTML instead of pure Flex as I did before. The page current shows up OK but no event interaction between HTML and Flash yet.3. Database design for the whole project.A bunch of tables are designed to store various information for this project. Detailed table information could be found on our wiki page: http://talknicer.net/w/Database_schema . Here I will give a brief discussion. First the user table created in the previous step will be augmented to keep two kind of user information: one for normal student user and one for exemplar recordings. The reason to put them into one table instead of two is that student users, when they can do an excellent job in pronunciation, should also be allowed to contribute to the exemplar recording. Also for exemplar recorders, if they register through the website, they have to show they are proficient enough to contribute a qualified exemplar recording.
Beside the user table, there are several other tables to for necessary information such as languages for list of languages defined by ISO in case we may extend our project to other languages; region table to have an idea of the user's accent; prompts table for the list of text resources will be used for pronunciation evaluation.
Then are the tables to log the recordings the users do and tables for set of tests designed in the system.
Things to do in the coming week:
1. Discuss more regarding the game part to finish the last part of schema design.
2. Figure out how to integrated the Flash audio recorder nicely with the HTML5 interface.
3. Implement the student recording interface.
4. Further tasks could be found on this list: http://talknicer.net/w/To_do_list
Wednesday, May 30, 2012
First use of Bazaar and LaunchPad.net
Monday, May 28, 2012
[GSoC 2012: Pronunciation Evaluation #Troy] Week 1
- The server side rtmplite is now configured to save the recordings into the folder [path_to_webroot]/data on the server. And for the audioRecorder app, all the data will be under the [path_to_webroot]/data/audioRecorder folder and for each user there will be a separate folder (e.g. [path_to_webroot]/data/audioRecorder/user1). For each recording utterance, the file name is now in the format of [sentence name]_[quality level].flv
- Till now the conversion from FLV to WAV is done purely on the server side inside rtmplite with Python's subprocess.Popen() function calling FFMPEG. After the rtmplite close the FLV file, the conversion will be carried out immediately and the converted WAV file has exactly the same path and name except the suffix, which is WAV instead of FLV. Thanks very much for Guillem to help me test "sox" to do the conversion. However, I failed to use sox directly to convert FLV to WAV in the terminal, which said "unknown format flv". If needed I will try to figure out whether it is the problem of the sox I have. James then pointed out we can do it inside rtmplite. This really helps. Yes, why must send an extra HTTP request to invoke a PHP process for the conversion?
- To verify the recording parameter, i.e. the quality for speex encoding, I tried to record the same utterance ("NO ONE AT THE STATE DEPARTMENT WANTS TO LET SPIES IN") with the quality varying from 0 to 10. Apparently, the higher the quality, the larger the FLV file will be. From my own listening, the better the quality. However, it is hard to notice the differences above level 7. I also tried to generate alignment scores to see whether the quality affects the alignment. However, from the results shown in the following graph, the acoustic scores seems incomparable among different recordings. However, we will for now set the recording quality to 8.
- For the audioRecorder, only when the NetConnection event and NetStream open and close events successfully finished will the UI and other events carry on. Also 0.5s delay is inserted into the starting and ending of the recording button click event to avoid clipping.
- Solve the problem encountered in converting FLV to WAV using FFMEPG with Python's Popen(). If the main Python script (call it test.py for now) is run in the terminal with "python test.py", then no problems, everything works great. However, if I want to put it in background and log off the server by doing "python test.py &", everytime when Popen() is invoked, the whole process hangs there with a "Stopped + test.py &" error message. I have tried different approaches found from web without success for two days. I will try to figure out a way to work around it. Otherwise, maybe turn to Guillem's suggestion and figure out whether sox works.
- Finish the upload interface. There will be two kinds of interfaces: one for students and one for exemplar pronunciations. For the students': we want to display one to five phrases below space for a graphic or animation, assuming the smallest screen possible but with HTML which also looks good in a large window. For the exemplar, we just need to display one phrase but we should also have per-upload form fields (name, age, sex, native English speaker?, where lived ages 6-8 (determines accent), self-reported accent, etc.) which should persist across multiple uploads by the same user (with cookies?)
- Testing the rtmplite for multiple users and the same user's multiple recording sessions.
Saturday, May 26, 2012
First use of SVN
Thursday, May 24, 2012
[GSoC 2012] Before Week 1
- Trying out the basic wami-recorder demo on my school's server;
- Change to the rtmplite for audio recording. rtmplite is a Python implementation of the Flash RTMP server with minimum support needed for real-time streaming and recording using AMF0. On the server side, the daemon RTMP server process will by default listen on the TCP 1935 port for connection and streaming. On the client side, the user needs to use NetConnection to setup a session with the server and using NetStream for audio and video streaming and also recording. The demo application has been set up at: http://talknicer.net/~li-bo/testClient/bin-debug/testClient.html
- Based on the understanding of the demo application, which does the real time streaming and recording of both audio and video, I started to write my own audio recorder which is the key component for both the web-based audio data collection and the evaluation app. The basic version of the recorder was hosted at: http://talknicer.net/~li-bo/audioRecorder/audioRecorder.html . The current implementation includes:
- Distinguish recordings from different users with user id;
- Pre-defined text sentences loading for recording, which may be useful for the data collection;
- Real-time audio recording;
- Playback the recordings from the server.
- Basic event control logic, such as prevent users from recording and playing in the same time etc.
- Besides, I have also learnt from http://cmusphinx.sourceforge.net/wiki/sphinx4:sphinxthreealigner on how to do alignment using cmusphinx3. To generate the phoneme alignment scores, two steps of alignments are needed. Regarding the details of how to carry out the alignment can be found on my more tech-oriented posts (http://troylee2008.blogspot.com/2012/05/testing-cmusphinx3-alignment.html and http://troylee2008.blogspot.com/2012/05/cmusphinx3-phoneme-alignment.html) on my personal blog.
- Setup the server side process to well manage the user recordings, i.e. distinguishing between users and different utterances.
- Figuring out how to automatically convert the recorded server side FLV files to WAV files after the user stop the recording.
- Verify the recording parameters against the recording quality and also taking the network bandwidth into consideration.
- Incorporating delays between network events in the recorder. The current version does not wait for the network events (such as connection set up, data package transmission etc) to successfully finish to process next user event, which usually cause the recordings to be clipped out.
Sunday, May 20, 2012
Configure Ubuntu12.04 to boot without a monitor
To solve the problem, I finally find this post: http://ubuntuforums.org/showthread.php?t=1452600&page=3 . The solution is:
Step 1. Back up the original xorg.conf to xorg.conf.bk just in case. Create a new xorg.conf in /etc/X11 with the following.
Section "Device"
Identifier "VNC Device"
Driver "vesa"
EndSection
Section "Screen"
Identifier "VNC Screen"
Device "VNC Device"
Monitor "VNC Monitor"
SubSection "Display"
Modes "1024x768"
EndSubSection
EndSection
Section "Monitor"
Identifier "VNC Monitor"
HorizSync 30-70
VertRefresh 50-75
EndSection
Step 2. Disable KMS for your video card
The list is to know which video card manufacturer you have and use the command line entry below it to create the appropriate kms.conf file with the line "options...modeset=0" line in it. If you have access to the GUI you could just are easily create/modify the file and put the "options...modeset=0" in as appropriate.
The following are input into the terminal windows as a line command.
# ATI Radeon:
echo options radeon modeset=0 > /etc/modprobe.d/radeon-kms.conf
# Intel:
echo options i915 modeset=0 > /etc/modprobe.d/i915-kms.conf
# Nvidia (this should revert you to using -nv or -vesa):
echo options nouveau modeset=0 > /etc/modprobe.d/nouveau-kms.conf
As for my case, mine is Intel, I add "option i915 modeset=0 " to /etc/modprobe.d/dkms.conf .
Step 3. Reboot




































