Sunday, October 31, 2010

Alignment

*Free shift (or semi-global) alignments will ignore gaps at the beginning and end of the sequence, while Global alignments try to consider all positions.
Use Free shift alignments when some of the sequences are terminally truncated. Local alignments (such as BLAST) are useful for
finding short stretches of homology and are useful for finding sequence overlap or detecting a short internal sequence stretches that
are shared. 


http://www.uoguelph.ca/plant/depttools/dnaanalysis.htm
www.cs.ecu.edu/hochberg/spring2006/LocalAlign.pdf
birg.cs.wright.edu/text/Ch2.ppt 

Thursday, October 28, 2010

ncRNA non-coding RNA review papers

1: Galasso M, Elena Sana M, Volinia S. Non-coding RNAs: a key to future
personalized molecular therapy? Genome Med. 2010 Feb 18;2(2):12. PubMed PMID: 20236487; PubMed Central PMCID: PMC2847703.
http://www.ncbi.nlm.nih.gov/pubmed/20236487

1: Harrison BR, Yazgan O, Krebs JE. Life without RNAi: noncoding RNAs and their functions in Saccharomyces cerevisiae. Biochem Cell Biol. 2009 Oct;87(5):767-79. Review. PubMed PMID: 19898526.
http://www.ncbi.nlm.nih.gov/pubmed/19898526

1: Fabbri M, Calin GA. Beyond genomics: interpreting the 93% of the human genome that does not encode proteins. Curr Opin Drug Discov Devel. 2010 May;13(3):350-8. Review. PubMed PMID: 20443168.
http://www.ncbi.nlm.nih.gov/pubmed/20443168

1: Majer A, Booth SA. Computational methodologies for studying non-coding RNAs relevant to central nervous system function and dysfunction. Brain Res. 2010 Jun 18;1338:131-45. Epub 2010 Apr 8. Review. PubMed PMID: 20381467.
http://www.ncbi.nlm.nih.gov/pubmed/20381467

1: Zheng L, Qu L. Computational RNomics: structure identification and functional
prediction of non-coding RNAs in silico. Sci China Life Sci. 2010
May;53(5):548-62. Epub 2010 May 23. PubMed PMID: 20596938.
http://www.ncbi.nlm.nih.gov/pubmed/20596938

ncRNA, probe, GWAS

A method for automatically extracting infectious disease-related primers and probes from the literature
http://www.biomedcentral.com/1471-2105/11/410

Classification of ncRNAs using position and size information in deep
sequencing data
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935403/?tool=pubmed

Forward-time simulation of realistic samples for genome-wide association studies
http://www.biomedcentral.com/1471-2105/11/442
Genetic drift or allelic drift is the change in the frequency of a gene variant (allele) in a population due to random sampling.  The alleles in the offspring are a sample of those in the parents, and chance has a role in determining whether a given individual survives and reproduces. vs (natural selection)

In population genetics, linkage disequilibrium is the non-random association of alleles at two or more loci, not necessarily on the same chromosome. It is not the same as linkage, which describes the association of two or more loci on a chromosome with limited recombination between them. Linkage disequilibrium describes a situation in which some combinations of alleles or genetic markers occur more or less frequently in a population than would be expected from a random formation of haplotypes from alleles based on their frequencies. Non-random associations between polymorphisms at different loci are measured by the degree of linkage disequilibrium (LD). Numerically, it is the difference between observed and expected (assuming random distributions) allelic frequencies.

Detection and characterization of novel sequence insertions using paired-end next-generation sequencing.
http://bioinformatics.oxfordjournals.org/content/26/10/1277.full

Wednesday, October 27, 2010

imagemagick convert - split single multi-pdf to many pdfs

http://ardvaark.net/useful-pdf-imagemagick-recipes

Split single multi-pdf to many pdfs

$ convert -quality 100 -density 300x300 in.pdf multi%d.pdf

# combine, may increase in size by a lot
$ convert -density 150 pdf1.pdf pdf2.pdf out.pdf

or better
$ pdftk pdf1.pdf pdf2.pdf cat output temp.pdf

Tuesday, October 26, 2010

R aggregation

> x
  j word journ
1 1    p     b
2 2    g     b
3 3    p     d
4 4    p     b
5 5    p     d
> with(x, tapply(word, journ, length))
b d
3 2

Monday, October 25, 2010

Perl monks for your PERL programming needs

http://www.perlmonks.org

Critical thinking

http://en.wikipedia.org/wiki/Critical_thinking

Critical thinking clarifies goals, examines assumptions, discerns hidden values, evaluates evidence, accomplishes actions, and assesses conclusions.
"Critical" as used in the expression "critical thinking" connotes the importance or centrality of the thinking to an issue, question or problem of concern. "Critical" in this context does not mean "disapproval" or "negative." There are many positive and useful uses of critical thinking, for example formulating a workable solution to a complex personal problem, deliberating as a group about what course of action to take, or analyzing the assumptions and the quality of the methods used in scientifically arriving at a reasonable level of confidence about a given hypothesis. Using strong critical thinking we might evaluate an argument, for example, as worthy of acceptance because it is valid and based on true premises. Upon reflection, a speaker may be evaluated as a credible source of knowledge on a given topic.
Critical thinking can occur whenever one judges, decides, or solves a problem; in general, whenever one must figure out what to believe or what to do, and do so in a reasonable and reflective way. Reading, writing, speaking, and listening can all be done critically or uncritically. Critical thinking is crucial to becoming a close reader and a substantive writer. Expressed most generally, critical thinking is "a way of taking up the problems of life."[2]

Friday, October 22, 2010

Transcription factor, PROFESS, SPA

High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions:http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000916 
 
Semi-supervised recursively partitioned mixture models for identifying cancer subtypes
http://bioinformatics.oxfordjournals.org/content/early/2010/08/15/bioinformatics.btq470.full.pdf+html

PROFESS: a PROtein Function, Evolution, Structure and Sequence database: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2911846/
The advantage of the ‘answering queries using views’ approach to the database integration problem is that it reduces the integration problem to two steps: (i) building wrappers of the source databases, thereby providing simple ‘views’, and (ii) applying standard database queries on the views. Thus, implementing wrappers enables a robust query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. This will allow the user to move beyond simple text-based queries. Therefore, the PROFESS (PROtein Function, Evolution, Structure and Sequence) database uses wrappers to assist in the structural, functional and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing.
eggNOG http://eggnog.embl.de/
evolutionary genealogy of genes: Non-supervised Orthologous Groups

multiple structural alignment program, MAMMOTH-mult
http://ub.cbm.uam.es/mammoth/mult/

PROFESS
http://bionmr-c1.unl.edu/

Edit distance eg. kitten -> sitting has 3 character changes needed, useful for autocomplete
http://en.wikipedia.org/wiki/Levenshtein_distance


SPA: Short peptide analyzer of intrinsic disorder status of short peptides
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2900848/?tool=pubmed

Biological Insights of Transcription Factor through Analyzing ChIP-Seq Data
Kaida Ning, 2009

LaTeX, Sweave, R

\usepackage{hyperref}
\usepackage{tabularx}
\usepackage{listings}
\usepackage{graphicx}
\usepackage{url}
\usepackage{cite}


$ R CMD Sweave foo.Rnw ; texi2pdf foo.tex

http://www.stat.umn.edu/~charlie/Sweave/
http://www.stat.umn.edu/~charlie/Sweave/foo.pdf
http://www.stat.umn.edu/~charlie/Sweave/foo.Rnw

\pagebreak[3]
\verb@Sweave@

\begin{verbatim}
latex foo
\end{verbatim}


Figure~\ref{fig:one} (p.~\pageref{fig:one})


<>=
plot(x, y)
abline(out1)
@


\begin{figure}
\begin{center}
<>=
<>
@
\end{center}
\caption{Scatter Plot with Regression Line}
\label{fig:one}
\end{figure}



<>=
n <- 50
x <- seq(1, n)
a.true <- 3
b.true <- 1.5
y.true <- a.true + b.true * x
s.true <- 17.3
y <- y.true + s.true * rnorm(n)
out1 <- lm(y ~ x)
summary(out1)
@





The commands in package lattice have different behavior than the standard plot commands in
the base package: lattice commands return an object of class "trellis", the actual plotting is
performed by the print method for the class. Encapsulating calls to lattice functions in print()
statements should do the trick, e.g.:
<>=
library(lattice)
print(bwplot(1:10))
@



BibTeX and bibliography styles

http://amath.colorado.edu/documentation/LaTeX/reference/faq/bibstyles.html

The two Latex editors that I found most useful are:
 1. Texmaker - lightweight, spell-check, have to press F1 and F3 to generate a PDF, need to click on the log window a lot
 2. Kile - nicer, but I couldn't get syntax coloring to work for Sweave, spell-check, one-click gets you a PDF


Trick
- if you rename your '.Rnw' to '.tex' as a work around, it plays nicely with the editors
- then once all formatting is done, copy '.tex' to '.Rnw' and run the command
- or just create a tex symlink to Rnw!   ln -s mydoc.Rnw mydoc.tex

R CMD Sweave mydoc.Rnw && texi2pdf mydoc.tex && evince mydoc.pdf

In R, call
> Stangle(file='foo.Rnw')
to extract the R code, WARNING: this will overwrite 'foo.R'!!!!!


quantitative trait loci (QTL)

A common approach to understanding the genetic basis of complex traits is through identification of associated quantitative trait loci (QTL). Fine mapping QTLs requires several generations of backcrosses and analysis of large populations, which is time-consuming and costly effort.

http://www.biomedcentral.com/1471-2105/11/525/abstract

http://www.biomedcentral.com/1471-2105/11/526/abstract

Wednesday, October 20, 2010

Splicing, chemical compound classification, structural variations

Deciphering the Splicing Code
Barash et al. Nature 465, 53-59 (6 May 2010)
http://www.nature.com/nature/journal/v465/n7294/full/nature09000.html
- A unique aspect of our approach is that it searches for a regulatory
code that maximizes a quantifiable measure of code quality, so as to
jointly account for many features and produce a predictive splicing
code.
- To achieve this, we introduce an information
theoretic measure of ‘code quality’
- Our method seeks a code that is able to predict the splicing patterns of
all exons as accurately as possible, based solely on the tissue type and
proximal RNA features.
- We use a measure of ‘code quality’ that is based on information
theory31 (see Methods). It can be viewed as the amount of informa-
tion about genome-wide tissue-dependent splicing accounted for by
the code. A code quality of zero indicates that the predictions are no
better than guessing, whereas a higher code quality indicates
improved prediction capability.

SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data.
Zeitouni B, Boeva V, Janoueix-Lerosey I, Loeillet S, Legoix-né P, Nicolas A, Delattre O, Barillot E.
http://bioinformatics.oxfordjournals.org/content/26/15/1895.long

Semantic Similarity for Automatic Classification of Chemical Compounds
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000937
- Their semantic
similarity, as measured with a simGIC method in the whole
ontology, is 0.324, and their structural similarity, as measured with
the FP3 format, is 0.667.
- Their semantic
similarity, as measured with a simGIC method in the whole
ontology, is 0.324, and their structural similarity, as measured with
the FP3 format, is 0.667.
- The best approaches existing today are based on the structure-
activity relationship premise (SAR), which states that biological
activity of a molecule is strongly related to its structural or
physicochemical properties.
- We dubbed the novel
approach Chym, for Chemical Hybrid Metric. We extract
semantic information from ChEBI, the Chemical Entities


Detection of splice junctions from paired-end RNA-seq data by SpliceMap
http://nar.oxfordjournals.org/content/38/14/4570.abstract
TopHat
http://tophat.cbcb.umd.edu/
~ 151 317 exon junctions, including 23 020 novel junctions, which were not reported in RefSeq (19), Ensembl (20) and KnownGene (21). in the human brain tissue
- expression level (in RKPM)
- Novel junction discovery is the major function of SpliceMap, which therefore cannot be replace-able by annotation-independent ERANGE
- SpliceMap gener-ates the seeding by using short-read alignment tools such
as ELAND and SeqMap, while BLAT makes use of a hash table.
- For the Illumina protocol used in to produce our data, the distance between two
paired-end reads is about 200 nt in the mRNA

Sensitivity (identify true +) and Specificity (identify true -)

http://en.wikipedia.org/wiki/Sensitivity_and_specificity

Sensitivity and specificity are statistical measures of the performance of a binary classification test. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition). Specificity measures the proportion of negatives which are correctly identified (e.g. the percentage of healthy people who are correctly identified as not having the condition). These two measures are closely related to the concepts of type I and type II errors. A theoretical, optimal prediction can achieve 100% sensitivity (i.e. predict all people from the sick group as sick) and 100% specificity (i.e. not predict anyone from the healthy group as sick).

Paired-end (PE, 500bp) vs. Mate Pairs (longer PE, for structural variations, 2-10kbp)

http://seqanswers.com/forums/showthread.php?t=503&page=2
http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-newsArticle_print&ID=1248574&highlight=
Illumina refers to "paired end" as the original library preparation method they use, where you sequence each end of the same molecule. Because of the way the cluster generation technology works, it is limited to an inter-pair distance of ~300bp ( 200-600bp).

Illumina refers to "mate pairs" as sequences derived from their newer library prep method which is designed to provide paired sequences separated by a greater distance (between about 2 and 10kb). This method still actually only sequences the ends of ~400bp molecules, but this template is derived from both ends of a 2-10kb fragment that has had the middle section cut out and the 'internal' ends ligated in the middle. Basically, you take your 2-10kb random fragments, biotinylate the end, circularise them, shear the circles to ~400bp, capture biotinylated molecules, and then sequence those (they go into what is essentially a standard 'paired end' sample prep procedure).

http://www.nature.com/ng/journal/v37/n7/full/ng1562.html

http://www.nature.com/nature/journal/v431/n7011/full/nature03001.html

Used for studying structural variations

Tuesday, October 19, 2010

Python NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

http://networkx.lanl.gov/

Data modelling in Rails

http://www.mongodb.org/display/DOCS/MongoDB+Data+Modeling+and+Rails

To find all the stories voted on by a given user:
Story.all(:conditions => {:voters => @user.id}) 
 
http://biodegradablegeek.com/2008/07/how-to-use-fixtures-to-populate-your-database-in-rails/ 
$ rake db:fixtures:load

jruby -S gem list

Guide to Rails command line
http://guides.rubyonrails.org/command_line.html

$ RAILS_ENV=development jruby -S gem list

*** LOCAL GEMS ***

actionmailer (2.3.5)
actionpack (2.3.5)
activerecord (2.3.5)
activerecord-jdbc-adapter (0.9.3)
activerecord-jdbcmysql-adapter (0.9.3)
activeresource (2.3.5)
activesupport (2.3.5)
block_helpers (0.3.2)
builder (2.1.2)
columnize (0.3.1)
cucumber (0.9.2)
cucumber-rails (0.3.2)
diff-lcs (1.1.2)
factory_girl (1.2.4)
gem_plugin (0.2.3)
gherkin (2.2.9)
haml (2.2.22)
jdbc-mysql (5.0.4)
jruby-jars (1.5.3)
jruby-openssl (0.6)
jruby-rack (1.0.3)
json (1.4.6)
rack (1.0.1)
rails (2.3.5)
rake (0.8.7)
rdoc (2.5.11)
rdoc-data (2.5.3)
rspec (1.3.0)
rspec-core (2.0.1)
rspec-expectations (2.0.1)
rspec-mocks (2.0.1)
rspec-rails (1.3.2)
ruby-debug (0.10.3)
ruby-debug-base (0.10.3.2)
ruby-openid (2.1.7)
rubyzip (0.9.4)
sources (0.0.1)
term-ansicolor (1.0.5)
warbler (1.2.1, 0.9.13)

$ RAILS_ENV=development jruby -S ./script/generate rspec_model users

http://lukeredpath.co.uk/blog/developing-a-rails-model-using-bdd-and-rspec-part-1.html

http://rspec.info/
http://rspec.info/documentation/

$ jruby -S spec/models/users_spec.rb
spec/models/users_spec.rb:1:in `require': no such file to load -- spec_helper (LoadError)
    from spec/models/users_spec.rb:1
$ cd spec/
$ jruby -S models/users_spec.rb

http://github.com/rspec/rspec-dev

Sunday, October 17, 2010

ssh ssh-keygen proxy tunnel

Setup is first we connect to host1 then a tunnel from host1 to host2

$ cat ~/.ssh/config
Host host2 i host2
ProxyCommand ssh -q mylogin@host1 nc host2 28 User mylogin

Connect by:
$ ssh i -l mylogin

Copy by:
$ scp myfile.txt mylogin@host2:outfile.txt

Setup RSA keys by
$ ssh-keygen -t rsa

copy ~/.ssh/id_rsa.pub to host2's ~/.ssh/authorized_keys


Using pipes and tar
tar c somefiles*.txt | ssh user1@host1 tar xvp
tar c somefiles*.txt | ssh user1@host1 ssh user2@host2 tar xvp

'pv' command - progress bar view
tar c somefiles*.* | pv -s 75m | ssh user1@host1 tar xp

Thursday, October 14, 2010

PAM vs BLOSUM

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Scoring2.html

The relationship between BLOSUM and PAM substitution matrices. BLOSUM matrices with higher numbers and PAM matrices with low numbers are both designed for comparisons of closely related sequences. BLOSUM matrices with low numbers and PAM matrices with high numbers are designed for comparisons of distantly related proteins. If distant relatives of the query sequence are specifically being sought, the matrix can be tailored to that type of search.

The PAM family


PAM matrices are based on global alignments of closely related proteins.

The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence.

BLOSUM matrices are based on local alignments.


BLOSUM 62 is a matrix calculated from comparisons of sequences with no less than 62% divergence.

All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins.

BLOSUM 62 is the default matrix in BLAST 2.0.

PeakPicker

http://genome.cshlp.org/content/15/11/1584.full

PeakPicker is developed for quantitative allele ratio analysis and can be used to determine differential allelic expression in cells heterozygous for a marker SNP expressed in mRNA by measuring and calculating the peak height ratios of the marker SNP.

WebLogo

http://weblogo.berkeley.edu/examples.html
WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible. Click here to create your own sequence logos.
Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment developed by Tom Schneider and Mike Stephens. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.

Tuesday, October 12, 2010

JRuby, Ant, Tomcat, Rails

http://gregmoreno.ca/deploy-a-rails-3-sqlite3-application-in-tomcat-using-jruby/

http://blog.emptyway.com/2008/04/08/120-seconds-guide-to-jruby-on-rails/
 
(jruby 1.5.3, Rails version 2.3.5)
jruby -S rails myapp -d mysql
cd myapp 
jruby -S rake db:create:all
jruby script/generate scaffold post title:string body:text published:boolean
jruby script/generate model keywords keyword:string source:string
jruby script/generate migrate add_word_to_keywords 
jruby -S rake db:migrate
jruby script/server 
 
http://wiki.rubyonrails.org/rails/pages/availablegenerators 
http://www.tutorialspoint.com/ruby-on-rails/rails-and-rake.htm 

Reset MySQL password

$ sudo service mysql stop
$ sudo mysqld_safe --skip-grant-tables
$ mysql -u root
mysql> update mysql.user set password=password('newpassword') where user='root';
mysql> flush privileges;

http://www.tech-faq.com/how-do-i-reset-a-mysql-password.html

To assign passwords to the root accounts using mysqladmin, execute the following commands:
shell> mysqladmin -u root password "newpwd"
shell> mysqladmin -u root -h host_name password "newpwd"

Monday, October 11, 2010

Word Clouds in R

http://www.r-bloggers.com/abstract-word-clouds-using-r/

http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetch_help.html

http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/esearch_help.html

http://math.illinoisstate.edu/dhkim/rstuff/rtutor.html

> library(lattice)
> x
  a b
4 d 4
3 c 3
2 b 2
1 a 1
> x[order(x$b,decreasing=TRUE),]
> xyplot(b ~ a, data = x, groups=a, ylab='', xlab='', scales=list(x=list(tck=0, at=0),y=list(tck=0, at=0)), panel = function(x,y,subscripts,groups) ltext(x = c(mean(y),sample(1:max(y-1))), y = c(mean(y),sample(1:max(y-1))), label=groups[subscripts], cex=1*y^1.5, fontfamily = c("AvantGarde", "Bookman", "Courier", "Helvetica", "Helvetica-Narrow", "NewCenturySchoolbook", "Palatino", "Times"), col=c('red','blue')))

Loss of heterozygosity (LOH)

Loss of heterozygosity (LOH) in a cell represents the loss of normal function of one allele of a gene in which the other allele was already inactivated. This term is mostly used in the context of oncogenesis; after an inactivating mutation in one allele of a tumor suppressor gene occurs in the parent's germline cell, it is passed on to the zygote resulting in an offspring that is heterozygous for that allele. In oncology, loss of heterozygosity occurs when the remaining functional allele in a somatic cell of the offspring becomes inactivated by mutation. This could cause a normal tumor suppressor to no longer be produced which could result in tumorigenesis.

Canon MP560 Scanner in Ubuntu

http://ubuntuforums.org/showthread.php?t=1264928&page=3

Install Canon MP560 ScanGear
http://software.canon-europe.com/products/0010756.asp
$ cd scangearmp-mp560series-1.40-1-i386-deb
$ vi ./install.sh   # add sudo dpkg --force-architecture -i
$ sudo ./install.sh
$ scangearmp
scangearmp: error while loading shared libraries: libgimp-2.0.so.0: cannot open shared object file: No such file or directory

Install Gimp lib32 libraries
$ wget http://mirrors.kernel.org/ubuntu/pool/main/g/gimp/libgimp2.0_2.6.7-1ubuntu1_i386.deb
$ mkdir libgimp2.0_2.6.7-1ubuntu1_i386
$ dpkg -x libgimp2.0_2.6.7-1ubuntu1_i386.deb libgimp2.0_2.6.7-1ubuntu1_i386
$ cd libgimp2.0_2.6.7-1ubuntu1_i386/
$ sudo cp usr/lib/libgimp* /usr/lib32/

And if all else fails, you can still scan images to a USB stick!

Friday, October 8, 2010

Howl's Moving Castle

Nice movie!

Director:

Hayao Miyazaki

When an unconfident young woman is cursed with an old body by a spiteful witch, her only chance of breaking the spell lies with a self-indulgent yet insecure young wizard and his companions in his legged, walking home. 

Materialized views

http://www.dba-oracle.com/art_9i_mv.htm

Michael Sjoerdsma sfu technical writing

http://www.sfu.ca/immr/pmp/people.htm

Michael is currently a faculty member in the School of Engineering Science
at Simon Fraser University teaching courses related to technical writing,
group dynamics, graphical communication, and ethics and law.nique Process Feaures

Tuesday, October 5, 2010

Ruby create a class object with name determined at runtime, calling dynamic methods

http://ruby-doc.org/docs/ProgrammingRuby/html/ospace.html

Calling a method unknown during compile time.

So we want to call a method but we only get the method's name at runtime, use the 'send' method
ruby-1.8.7-p302 > 'John Coltran'.send('length')
 => 12


or using 'method' to call it later

or 'eval'

r = eval "'John Coltran'.length"
 => 12

-------------------------------------

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/190828

$ irb
ruby-1.8.7-p302 > b=Object::const_get('String').new()
 => ""
ruby-1.8.7-p302 > b
 => ""
ruby-1.8.7-p302 > b.class
 => String

# Print all User tuples
>  b = Object::const_get('User').new()
>  puts b.class
 => String
>  puts b.class.all
#


$ rails console < my_script_helper.rb


--------class----------------
# The Greeter class
class Greeter
  def initialize(name)
    @name = name.capitalize
  end
  def salute
    puts "Hello #{@name}!"
  end
end
# Create a new object
g = Greeter.new("world")
# Output "Hello World!"
g.salute
vs
--------module-------------
http://ruby-doc.org/core/classes/Module.html
 A Module is a collection of methods and constants.

module Mod
     alias_method :orig_exit, :exit
     def exit(code=0)
       puts "Exiting with code #{code}"
       orig_exit(code)
     end
   end
   include Mod
   exit(99)

Monday, October 4, 2010

R blogs and gallery

http://addictedtor.free.fr/graphiques/

http://www.statmethods.net/input/missingdata.html
# create new dataset without missing data
newdata <- na.omit(mydata)  

 

Reshaping data

# example of melt function
library(reshape)
mdata <- melt(mydata, id=c("id","time")) 
# Creating a Graph with a linear model regression line
attach(mtcars)
plot(wt, mpg)
abline(lm(mpg~wt))
title("Regression of MPG on Weight")

# Filled Density Plot
d <- density(mtcars$mpg)
plot(d, main="Kernel Density of Miles Per Gallon")
polygon(d, col="red", border="blue") 

Comparing Groups VIA Kernal Density

The sm.density.compare( ) function in the sm package allows you to superimpose the kernal density plots of two or more groups. The format is sm.density.compare(x, factor) where x is a numeric vector and factor is the grouping variable.
# Compare MPG distributions for cars with
# 4,6, or 8 cylinders
library(sm)
attach(mtcars)

# create value labels
cyl.f <- factor(cyl, levels= c(4,6,8),
  labels = c("4 cylinder", "6 cylinder", "8 cylinder"))

# plot densities
sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")
title(main="MPG Distribution by Car Cylinders")

# add legend via mouse click
colfill<-c(2:(2+length(levels(cyl.f))))
legend(locator(1), levels(cyl.f), fill=colfill)



http://yihui.name/en/page/2/
http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/#more-224 - Tag Cloud

Side by side plot

# png(width = 500, height = 300)
x = rep(0, 1000)
par(mfrow = c(1, 2), mar = c(4, 4, 0.1, 0.1))
plot(density(x), main = "")
plot(density(x), main = "")
rug(jitter(x))
# dev.off()

# transparent colors (alpha = 0.1)
plot(x, col = rgb(0, 0, 0, 0.1))


http://processtrends.com/RClimate.htm 
## Use regexp to replace all the occurences of **** with NA
lines2 <- gsub("\\*{3,5}", " NA", lines, perl=TRUE)
## Select monthly data in first 13 columns
df <- df[,1:13]
## Remove rows where Year=NA from the dataframe
df <- df [!is.na(df$Year),] 

Sunday, October 3, 2010

Using lattice in R

http://www.his.sunderland.ac.uk/~cs0her/Statistics/UsingLatticeGraphicsInR.htm


## Multiple variables in formula for grouped displays

xyplot(Sepal.Length + Sepal.Width ~ Petal.Length + Petal.Width | Species, 
       data = iris, scales = "free", layout = c(2, 2),
       auto.key = list(x = .6, y = .7, corner = c(0, 0))) 
 
 
## user defined panel functions

states <- data.frame(state.x77,
                     state.name = dimnames(state.x77)[[1]], 
                     state.region = state.region) 
xyplot(Murder ~ Population | state.region, data = states, 
       groups = state.name, 
       panel = function(x, y, subscripts, groups)  
       ltext(x = x, y = y, label = groups[subscripts], cex=1,
             fontfamily = "HersheySans"))  
 
http://learnr.wordpress.com/2009/08/18/ggplot2-version-of-figures-in-lattice-multivariate-data-visualization-with-r-part-13/
http://data.princeton.edu/R/gettingStarted.html
 
http://www.r-bloggers.com/5-minute-analysis-in-r-case-shiller-indices/ 
 
http://lmdvr.r-forge.r-project.org/figures/figures.html 

Lucid update Ubuntu splash in grub

http://anonir.wordpress.com/2010/08/08/ubuntu-lucid-disable-boot-splash/

Open /etc/default/grub for editing and remove “quiet splash” options from the GRUB_CMDLINE_LINUX_DEFAULT property.
For example, if your grub has this line:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
Change it to this:
GRUB_CMDLINE_LINUX_DEFAULT=""
Then run this command to update grub2:
sudo update-grub

R removing NA

http://www.opensubscriber.com/message/r-help@stat.math.ethz.ch/7268077.html

> df<-data.frame(name=c('a','b'), age=c('1','2'))
> if (any(apply(df,1,function(x) any(is.na(x)))) == TRUE)  { r <- df[-which(apply(df,1,function(x) any(is.na(x)))),] }  else { r <- df }
> r
  name age
1    a   1
2    b   2

> df<-data.frame(name=c('a','b'), age=c('1',NA))
> df
  name  age
1    a    1
2    b
> if (any(apply(df,1,function(x) any(is.na(x)))) == TRUE)  { r <- df[-which(apply(df,1,function(x) any(is.na(x)))),] }  else { r <- df }
> r
  name age
1    a   1


## Remove rows where Year=NA from the dataframe
df <- df [!is.na(df$Year),] 

Saturday, October 2, 2010

tm package on R

R 2.11 is needed by tm (tm_0.5-4.1.tar.gz) but only R 2.10 is in the lucid ubuntu repo

so

http://ubuntuforums.org/showthread.php?t=639710

deb http://cran.r-project.org/bin/linux/ubuntu lucid/

gpg --keyserver subkeys.pgp.net --recv-key E2A11821 
   gpg -a --export E2A11821 | sudo apt-key add - 
 
 
My R libraries installed to
$HOME/R/x86_64-pc-linux-gnu-library/2.11
So you need to add this path in the 'R console' Run configuration
 
then I got a 'checking for xml2-config... no' error when doing
   install.packages('XML')
 
so do 
 
$ sudo apt-get install libxml2-dev

Ubuntu 10.04.1 LTS Lucid Lynx

#enable canonical archive in /etc/apt/sources.list
sudo apt-get install sun-java6-jdk

sudo apt-get install r-base

sudo apt-get update && sudo apt-get install cairo-dock cairo-dock-plug-ins

Ubuntu Intrepid AMD64

https://launchpad.net/ubuntu/intrepid/amd64

/var/lib/dpkg$ sudo cp status-old2 status

Friday, October 1, 2010

R inspection commands

class()
inspect()
summary()
head()

/usr/bin/ld: cannot find -lgfortran

$ ld -lgfortran
ld: cannot find -lgfortran

$ sudo ln -s /usr/lib/libgfortran.so.3.0.0 /usr/lib/libgfortran.so

$ ld -lgfortran
ld: warning: cannot find entry symbol _start; not setting start address

Idempotence

idempotent - they can be applied multiple times without changing the result.

School, Life, Lessons

" The difference between school and life? In school, you're taught a lesson and the given a test. In life, you're given a test that teaches you a lesson" - Tom Bodett