Monday, January 24, 2011

Linear models: lm(y ~ x)

t.test(a, b) - test if two categorical variables are related

t.test(data ~ sex)

or

t.test(data[sex == 'male'], data[sex == 'female'])

If p-value is high and the confidence interval has zero then data is not related to male and female.


Use cor.test(age, sex) to get a p-value and confidence interval, can also specify 'rank-based statistic'

Even better is to use the 'lm(y ~ x)' function

o <- lm(data ~ sex)
summary(o)

This gives the same p-value!

Plus, you can plot this linear object 'o' which gives you a best-fit line

plot(age, data, xlab='Age')
abline(o)


Use tryCatch to handle possible errors

lmFun<-function(x) {
tryCatch(summary(lm(data ~ sex)), error=function(e) return(NA))
}

lms<-apply(all.data, 1, lmFun)

like class(data) only with more info

attributes(data)

No comments: