Friday, August 26, 2011

Emulate Galaxy Join operate on interval with R's merge()

http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right

> df1 <- data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3)))
> df2 <- data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1)))
> df3 <- data.frame(CustomerId=c(3,4,1),Food=c(rep("Pancake",2),rep("Cereal",1)))

> merge(merge(df1, df2, all.x=T), df3, all.x=T)
  CustomerId Product   State    Food
1          1 Toaster      Cereal
2          2 Toaster Alabama    
3          3 Toaster     Pancake
4          4   Radio Alabama Pancake
5          5   Radio        
6          6   Radio    Ohio    

hmm... problem is that the intervals are not exactly the same (like in intersectBed)
but wait! you CAN use intersectBed and read each Bed files into R!!!!


You can also aggregate using the aggregate command


> df1
  CustomerId Product
1          1 Toaster
2          2 Toaster
3          3 Toaster
4          4   Radio
5          5   Radio
6          6   Radio
> aggregate(df1$CustomerId, by=list(df1$Product), FUN=mean)
  Group.1 x
1   Radio 5
2 Toaster 2


http://www.statmethods.net/management/aggregate.html

No comments: