Principle Component Analysis (PCA) is a easy and useful technique to identify patterns in high dimensional data.
Matlab has a function princomp doing the PCA analysis.
[coeff, score, latent]=princomp(X)
X is the data arranged by row, i.e. each row is an observation or an instance, each column corresponds to a random variable, suppose the dimension of X is [n,p], that is there are totally n observations and each is of p dimension;
coeff: is the returned eigenvector matrix, each column is an eigenvector, they are ordered according the corresponding eigenvalues from large to small. coeff is of the dimension of [p,p];
score: is the reconstructed version of X using the eigenvectors. If all the eigenvectors are used, it should be the same as X; while if small eigenvectors are ignored, there will be small difference;
latent: are the eigenvalues corresponding to the eigenvectors in coeff.
With principle components (i.e. eigenvectors) the reconstructed data are got by:
score = ( X - mean(X) ) * coeff
The eigenvectors and eigenvalues are actually the eigenvectors and eigenvalues of the covariance matrix of the data.
Thus to compute them,
1) first subtract the mean of the data;
2) compute the covariance matrix of the data;
3) compute the eigenvectors and eigenvalues of the covariance matrix;
then it's done.
No comments:
Post a Comment