and comparison with classical neuroanatomy"
http://grass.osgeo.org/wiki/Principal_Components_Analysis
The SVD is a decomposition of any p x q matrix M into a product M = USVt where U and V are unitary matrices (UUt = VVt = I), and S is a diagonal matrix with real entries. Here, U is a p x q matrix, and S and V are q x q matrices. The columns of U and V are known as the left and right singular vectors, respectively, and entries along the diagonal of S are known as singular values. Note that when M is centered (row and column means are zero), the left singular vectors are eigenvectors of the covariance matrix MtM, the right singular vectors are eigenvectors of the covariance matrix MMt , and the square of a singular value is the variance of the corre- sponding eigenvector. Therefore, a projection of the data matrix M to a d-dimensional subspace with the largest variance may be obtained by using MV = US, retaining only the d largest singular values and corresponding singular vectors.
http://public.lanl.gov/mewall/kluwer2002.html
http://genome-www.stanford.edu/SVD/
pca.narod.ru/pcaclustclass.pdf
General about principal components
– linear combinations of the original variables
– uncorrelated with each other
Summary
• Dimension reduction important to visualize data
– Principal Component Analysis
– Clustering
• Hierarchical
• Partitioning (K-means)
(distance measure important)
• Classification
– Reduction of dimension often nessesary (t-test, PCA)
– Several classification methods avaliable
– Validation
Linear Algebra
http://pillowlab.cps.utexas.edu/teaching/CompNeuro10/schedule.html
Data matrix A, rows=data points, columns = variables (attributes,
parameters).
1. Center the data by subtracting the mean of each column.
2. Compute the SVD of the centered matrix ˆA (or the k first singular
values and vectors):
ˆA = U S(V)T .
3. The principal components are the columns of V, the coordinates of the
data in the basis defined by the principal components are U S.
%Data matrix A, columns:variables, rows: data points
%matlab function for computing the first k principal components of A.
function [pc,score]=pca(A,k);
[rows,cols]=size(A);
Ameans=repmat(mean(A,1),rows,1); %matrix, rows=means of columns
A=A-Ameans; %centering data
[U,S,V]=svds(A,k); %k is the number of pc:s desired
pc=V;
score=U*S; %now A=scores*pcs’+Ameans;
The variance in the direction of the kth principal component is given
by the corresponding singular value: 2
k.
Singular values can be used to estimate how many principal components
to keep.
Rule of thumb: keep enough to explain 85% of the variation:
http://www.uta.edu/faculty/rcli/Teaching/math5392/NotesByHyvonen/lecture5.pdf
http://www.ncbi.nlm.nih.gov.proxy.lib.sfu.ca/pubmed/10963673
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8768BD97E00D5306E8437C70EB103959?doi=10.1.1.115.3503&rep=rep1&type=pdf
by J Shlens - Cited by 234 - Related articles
A Tutorial on Principal Component Analysis. Jonathon Shlens∗. Systems Neurobiology Laboratory, Salk Insitute for Biological Studies.
PCA = only works on square matrices
SVD = more generalized PCA
PCA can fail if the data is very “non-Gaussian” – It assumes that the interesting directions are along lines, and are orthogonal.
PCA is non-parametric, the most important ones are the ones with the largest variance
prcomp(dat) – Calls svd(dat); Gives you stdev (square roots of eigenvalues) and rotation (columns are the eigenvectors) a.k.a. loadings
http://genetics.agrsci.dk/statistics/courses/Rcourse-DJF2006/day3/PCA-computing.pdf
biplot(prcomp(USArrests, scale = TRUE))
library(limma)
mm <- model.matrix(~PC1, pData(esetr))
fit <- lmFit(esetr, mm) #Fit linear model for each gene given a series of arrays
fit <- eBayes(fit) #Given a series of related parameter estimates and standard errors, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes shrinkage of the standard errors towards a common value.
topTable(fit) #
Extract a table of the top-ranked genes from a linear model fit.
PCA for correcting batch effects
In ideal circumstances, with very consistent data, we expect all data
points to form a single, cohesive grouping in this type of plot. We also
expect that any observed clustering will not be related to the primary
phenotype. If there is any clustering of cases and controls, this is
usually indicative of batch effects or other systematic differences in
the generation of the data, and it may cause problems in association
testing.
http://chemtree.com/SNP_Variation/tutorials/cnv-quality-control/pca.html
http://www.puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html?start=2 http://www.miislita.com/information-retrieval-tutorial/reduced-svd.gif
http://spinner.cofc.edu/~langvillea/DISSECTION-LAB/Emmie%27sLSI-SVDModule/p4module.html
http://www.cbs.dtu.dk/chipcourse/Exercises/Ex_Stat/NormStatEx.html
T(V) = V transpose
X = U S T(V) = s1 u1 v1 + s2 u2 v2 + ∙ ∙ ∙ + sr ur vr ,
where U = (u1 , u2 , . . . , ur ), V = (v1 , v2 , . . . , vr ), and S = diag{s1 , s2 , . . . , sr } with
s1 ≥ s2 ≥ ∙ ∙ ∙ ≥ sr > 0. The singular columns {ui } form an orthonormal basis for the
column space spanned by {c j }, and the singular rows {v j } form an orthonormal basis for
the row space spanned by {ri }. The vectors {ui } and {vi } are called singular columns and
singular rows, respectively (Gabriel and Odoroff 1984); the scalars {si } are called singular values; and the matrices {si ui viT }(i = 1, . . . , r ) are referred to as SVD components.
Image Compression
http://www.johnmyleswhite.com/notebook/2009/12/17/image-compression-with-the-svd-in-r/
http://n0b3l1a.blogspot.ca/2010/09/pca-principal-component-analysis.html
No comments:
Post a Comment