Multivariate Analysis
This booklet tells you how to use the R statistical software to carry out some simple multivariate analyses, with a focus on principal components analysis (PCA) and linear discriminant analysis (LDA).
PCA How To
http://psych.colorado.edu/wiki/lib/exe/fetch.php?media=labs:learnr:emily_-_principal_components_analysis_in_r:pca_how_to.pdf
So that’s it. To do PCA, all you have to do is follow these steps:
1. Get X in the proper form. This will probably mean subtracting off the means of
each row. If the variances are significantly different in your data, you may also
wish to scale each row by dividing by its standard deviation to give the rows a
uniform variance of 1 (the subtleties of how this affects your analysis are beyond
the scope of this paper, but in general, if you have significantly different scales in
your data it’s probably a good idea).
2. Calculate A=XXT
3. Find the eigenvectors of A and stack them to make the matrix P.
4. Your new data is PX, the new variables (a.k.a. principal components) are the rows
of P.
5. The variance for each principal component can be read off the diagonal of the
covariance matrix.
# Obtain data in a matrix
Xoriginal=t(as.matrix(recorded.data))
# Center the data so that the mean of each row is 0
rm=rowMeans(Xoriginal)
X=Xoriginal-matrix(rep(rm, dim(X)[2]), nrow=dim(X)[1])
# Calculate P
A=X %*% t(X)
E=eigen(A,TRUE)
P=t(E$vectors)
# Find the new data and standard deviations of the principal components
newdata = P %*% X
sdev = sqrt(diag((1/(dim(X)[2]-1)* P %*% A %*% t(P))))
Going back to the derivation of PCA, we have N=PX, where N is our new data. Since we
know that P-1=PT , it is easy to see that X=PT N. Thus, if we know P and N, we can easily
recover X. This is useful, because if we choose to throw away some of the smaller
components—which hopefully are just noise anyway—N is a smaller dataset than X. But
we can still reconstruct data that is almost the same as X.
Run prcomp() again, but this time include the option tol=0.1. What is returned will be
any principal components whose standard deviation is greater than 10% of the standard
deviation of the first principal component. In this case, the first two components are
returned.
pr=prcomp(recorded.data)
pr
plot(pr)
barplot(pr$sdev/pr$sdev[1])
pr2=prcomp(recorded.data, tol=.1)
plot.ts(pr2$x)
quartz(); plot.ts(intensities)
quartz(); plot.ts(recorded.data)
quartz(); plot.ts(cbind(-1*pr2$x[,1],pr2$x[,2]))
You can see that od, which is the reconstruction of X from when no principal components were discarded, is identical to the recorded.data. od2 is the reconstruction of X from only two principal components.
Because prcomp() works with variables in columns instead of rows as in the derivation
above, the required transformation is X=NPT
, or in R syntax, X=pr$x %*% t(pr$rotation).
Run the code at the top of the page
od=pr$x %*% t(pr$rotation)
od2=pr2$x %*% t(pr2$rotation)
quartz(); plot.ts(recorded.data)
quartz(); plot.ts(od)
quartz(); plot.ts(od2)
PCA How To
http://psych.colorado.edu/wiki/lib/exe/fetch.php?media=labs:learnr:emily_-_principal_components_analysis_in_r:pca_how_to.pdf
So that’s it. To do PCA, all you have to do is follow these steps:
1. Get X in the proper form. This will probably mean subtracting off the means of
each row. If the variances are significantly different in your data, you may also
wish to scale each row by dividing by its standard deviation to give the rows a
uniform variance of 1 (the subtleties of how this affects your analysis are beyond
the scope of this paper, but in general, if you have significantly different scales in
your data it’s probably a good idea).
2. Calculate A=XXT
3. Find the eigenvectors of A and stack them to make the matrix P.
4. Your new data is PX, the new variables (a.k.a. principal components) are the rows
of P.
5. The variance for each principal component can be read off the diagonal of the
covariance matrix.
# Obtain data in a matrix
Xoriginal=t(as.matrix(recorded.data))
# Center the data so that the mean of each row is 0
rm=rowMeans(Xoriginal)
X=Xoriginal-matrix(rep(rm, dim(X)[2]), nrow=dim(X)[1])
# Calculate P
A=X %*% t(X)
E=eigen(A,TRUE)
P=t(E$vectors)
# Find the new data and standard deviations of the principal components
newdata = P %*% X
sdev = sqrt(diag((1/(dim(X)[2]-1)* P %*% A %*% t(P))))
Going back to the derivation of PCA, we have N=PX, where N is our new data. Since we
know that P-1=PT , it is easy to see that X=PT N. Thus, if we know P and N, we can easily
recover X. This is useful, because if we choose to throw away some of the smaller
components—which hopefully are just noise anyway—N is a smaller dataset than X. But
we can still reconstruct data that is almost the same as X.
Run prcomp() again, but this time include the option tol=0.1. What is returned will be
any principal components whose standard deviation is greater than 10% of the standard
deviation of the first principal component. In this case, the first two components are
returned.
pr=prcomp(recorded.data)
pr
plot(pr)
barplot(pr$sdev/pr$sdev[1])
pr2=prcomp(recorded.data, tol=.1)
plot.ts(pr2$x)
quartz(); plot.ts(intensities)
quartz(); plot.ts(recorded.data)
quartz(); plot.ts(cbind(-1*pr2$x[,1],pr2$x[,2]))
You can see that od, which is the reconstruction of X from when no principal components were discarded, is identical to the recorded.data. od2 is the reconstruction of X from only two principal components.
Because prcomp() works with variables in columns instead of rows as in the derivation
above, the required transformation is X=NPT
, or in R syntax, X=pr$x %*% t(pr$rotation).
Run the code at the top of the page
od=pr$x %*% t(pr$rotation)
od2=pr2$x %*% t(pr2$rotation)
quartz(); plot.ts(recorded.data)
quartz(); plot.ts(od)
quartz(); plot.ts(od2)
No comments:
Post a Comment