المساعد الشخصي الرقمي

مشاهدة النسخة كاملة : Principal components analysis, PCA



A7med Baraka
03-20-2009, 02:04 AM
In theory PCA is a tranfromation from one p-dimensional coordinatesystem to another p-dimensional one. The thing is that the transformation is performed in such a way that a truncation of an inputvector in the new coordinatesystem only causes a minimal square error i.e. a minimal loss of information.

To achive this the unit vectors in the new coordinatesystem (i.e. the principal components) must be choosen to maximize or minimize the variance of the inputvectors. See fig 1.


http://www.dtek.chalmers.se/%7Ed95danb/ocr/pcaex.gif

Fig 1. Given a set of p-dimensional inputvectors (here shown as two-dimensional "clouds of points") in a arbitray coordinatesystem the principal components (marked as 1 and 2 in the figure) is choosen to maximize or minimize the variance of the inputvectors.



It turns out that, if E[x]=0* where x is the inputvectors, the unitvectors in the new coordinatesystem are the eigenvectors to the covariancematrix, R, defined as E[xxT]. R can be approximated by


http://www.dtek.chalmers.se/%7Ed95danb/ocr/pca_sum2.gif
E[x]=0 can always be approximateliy achived by centering the points around the meanvalue.

Problem with PCA


PCA is often good to use but it works bad if there are too many isotropically distributed clusters or if there are meaningless variables (outliners) with a high noise level.
MATLAB and PCA



In Matlab PCA can be done with the command 'princomp'.

Ex: (Assume that X is a matrix with the inputvectors as rows)
% Center examples to make E[x]=0
X = X - ones(p,1)*mean(X,1);

% PCA
% PC = the principalcomponents (as columns)
% SCORE = the transformed matrix X.
% LATENT = the eigenvalues
[PC, SCORE, LATENT]=princomp(X);


Ref: [3], [4]