Just a collection of some random cool stuff. PS. Almost 99% of the contents here are not mine and I don't take credit for them, I reference and copy part of the interesting sections.
Monday, January 31, 2011
SAM - Sequence Alignment/Map - generic alignment format
SUMMARY: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released.
http://www.ncbi.nlm.nih.gov/pubmed/19505943
http://www.ncbi.nlm.nih.gov/pubmed/19505943
Sunday, January 30, 2011
Clique - sets of elements where each pair of elements is connected.
In computer science, the clique problem refers to any of the problems related to finding particular complete subgraphs ("cliques") in a graph, i.e., sets of elements where each pair of elements is connected.
For example, the maximum clique problem arises in the following real-world setting. Consider a social network, where the graph’s vertices represent people, and the graph’s edges represent mutual acquaintance. To find a largest subset of people who all know each other, one can systematically inspect all subsets, a process that is too time-consuming to be practical for social networks comprising more than a few dozen people.
en.wikipedia.org/wiki/Clique_(graph_theory)
For example, the maximum clique problem arises in the following real-world setting. Consider a social network, where the graph’s vertices represent people, and the graph’s edges represent mutual acquaintance. To find a largest subset of people who all know each other, one can systematically inspect all subsets, a process that is too time-consuming to be practical for social networks comprising more than a few dozen people.
en.wikipedia.org/wiki/Clique_(graph_theory)
Carsonella - Smalles bacterial genome (160k bp)
Candidatus Carsonella ruddii PV, complete genome
http://www.ncbi.nlm.nih.gov/nuccore/NC_008512.1
159,662 bp circular DNA
NC_008512.1 GI:116334902
http://www.npr.org/templates/story/story.php?storyId=6256036
http://www.ncbi.nlm.nih.gov/nuccore/NC_008512.1
159,662 bp circular DNA
NC_008512.1 GI:116334902
http://www.npr.org/templates/story/story.php?storyId=6256036
Thursday, January 27, 2011
PSSM - position specific scoring matrix
works by sliding window, L is the number of columns in the matrix (how longs is the sequence) and you slide it along the sequence to get the read out for the full sequence
rows are A,C,G,T and elements are log-likelihood ratios log(emission/random model)
rows are A,C,G,T and elements are log-likelihood ratios log(emission/random model)
Wednesday, January 26, 2011
Monday, January 24, 2011
GRIN1, a gene whose alternative splicing is well understood
GRIN1, a gene whose alternative splicing is well understood
Llansola M, Sanchez-Perez A, Cauli O, Felipo V (2005) Modulation of NMDA
receptors in the cerebellum. 1. Properties of the NMDA receptor that modulate
its function. Cerebellum 4: 154–161.
Llansola M, Sanchez-Perez A, Cauli O, Felipo V (2005) Modulation of NMDA
receptors in the cerebellum. 1. Properties of the NMDA receptor that modulate
its function. Cerebellum 4: 154–161.
Linear models: lm(y ~ x)
t.test(a, b) - test if two categorical variables are related
t.test(data ~ sex)
or
t.test(data[sex == 'male'], data[sex == 'female'])
If p-value is high and the confidence interval has zero then data is not related to male and female.
Use cor.test(age, sex) to get a p-value and confidence interval, can also specify 'rank-based statistic'
Even better is to use the 'lm(y ~ x)' function
o <- lm(data ~ sex)
summary(o)
This gives the same p-value!
Plus, you can plot this linear object 'o' which gives you a best-fit line
plot(age, data, xlab='Age')
abline(o)
Use tryCatch to handle possible errors
lmFun<-function(x) {
tryCatch(summary(lm(data ~ sex)), error=function(e) return(NA))
}
lms<-apply(all.data, 1, lmFun)
like class(data) only with more info
attributes(data)
t.test(data ~ sex)
or
t.test(data[sex == 'male'], data[sex == 'female'])
If p-value is high and the confidence interval has zero then data is not related to male and female.
Use cor.test(age, sex) to get a p-value and confidence interval, can also specify 'rank-based statistic'
Even better is to use the 'lm(y ~ x)' function
o <- lm(data ~ sex)
summary(o)
This gives the same p-value!
Plus, you can plot this linear object 'o' which gives you a best-fit line
plot(age, data, xlab='Age')
abline(o)
Use tryCatch to handle possible errors
lmFun<-function(x) {
tryCatch(summary(lm(data ~ sex)), error=function(e) return(NA))
}
lms<-apply(all.data, 1, lmFun)
like class(data) only with more info
attributes(data)
Dirichlet Mixture for estimating expected amino acid probability at each position
http://compbio.soe.ucsc.edu/dirichlets/dirichlet-papers.html
Sjolander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S., and Haussler, D. Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology. CABIOS, 12(4): 327-345, Aug 1996.
We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model.
1.3 What is a Dirichlet density?
A Dirichlet density Berger, 1985; Santner and Du y, 1989 is a probability density over the set of all probability vectors ~ i.e., pi 0 and i pi = 1 . Proteins have a 20-letter alphabet, with pi = Prob amino acid i .
Each vector ~ represents a possible probability distribution over the 20 amino acids.
1.4 What is a Dirichlet Mixture?
A mixture of Dirichlet densities is a collection of individual Dirichlet densities that function jointly to
assign probabilities to distributions. For any distribution of amino acids, the mixture as a whole assigns a
probability to the distribution by using a weighted combination of the probabilities given the distribution
by each of the components in the mixture. These weights are called mixture coefficients.
Sjolander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S., and Haussler, D. Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology. CABIOS, 12(4): 327-345, Aug 1996.
We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model.
1.3 What is a Dirichlet density?
A Dirichlet density Berger, 1985; Santner and Du y, 1989 is a probability density over the set of all probability vectors ~ i.e., pi 0 and i pi = 1 . Proteins have a 20-letter alphabet, with pi = Prob amino acid i .
Each vector ~ represents a possible probability distribution over the 20 amino acids.
1.4 What is a Dirichlet Mixture?
A mixture of Dirichlet densities is a collection of individual Dirichlet densities that function jointly to
assign probabilities to distributions. For any distribution of amino acids, the mixture as a whole assigns a
probability to the distribution by using a weighted combination of the probabilities given the distribution
by each of the components in the mixture. These weights are called mixture coefficients.
Saturday, January 22, 2011
eeepc wifi turning off by itself
/etc/init.d/eeepc-restore
I disabled this line
$EEEPC_PATH/eeepc-wifi-toggle.sh poweroff || true
http://forum.eeeuser.com/viewtopic.php?id=74280
I disabled this line
$EEEPC_PATH/eeepc-wifi-toggle.sh poweroff || true
http://forum.eeeuser.com/viewtopic.php?id=74280
Fonts for linux, Verdana, Arial, Times New Roman, Myriad Pro
$ apt-get install msttcorefonts
Myriad Pro ('Apple font')
Install Adobe Acrobat Reader
http://get.adobe.com/uk/reader/completion/?installer=Reader_9.4_English_UK_for_Linux_(.bin)
mkdir ~/.fonts
cp /opt/Adobe/Reader9/Resource/Font/* ~/.fonts
sudo fc-cache -f
http://ubuntuforums.org/showthread.php?t=645363
Myriad Pro ('Apple font')
Install Adobe Acrobat Reader
http://get.adobe.com/uk/reader/completion/?installer=Reader_9.4_English_UK_for_Linux_(.bin)
mkdir ~/.fonts
cp /opt/Adobe/Reader9/Resource/Font/* ~/.fonts
sudo fc-cache -f
http://ubuntuforums.org/showthread.php?t=645363
Friday, January 21, 2011
R Plot with Text labels
> h<-data.frame(sample(5),runif(5))
> plot(h, type='n')
> text(h[,1], h[,2])
Plot xticks
> plot(h, xaxt='n')
> axis(1, at=c(1:5), label=letters[1:5])
> text(h[,1], h[,2])
Plot xticks
> plot(h, xaxt='n')
> axis(1, at=c(1:5), label=letters[1:5])
Rank instead of variance
Use rank instead of variance (because variance is dependent on the sample size)?
http://en.wikipedia.org/wiki/Ranking
http://en.wikipedia.org/wiki/Ranking
R plot, many plots on one window, or many windows
To plot on the same window frame
part(new=T)
To open a new window and keep the current one
dev.new()
part(new=T)
To open a new window and keep the current one
dev.new()
Thursday, January 20, 2011
R group by
# use aggregate(x, by, FUN, ...) to calculate the mean density grouped by concentration
res<-aggregate(DNase$density, list(DNase$conc) , mean)
names(res)<-c("Conc", "MeanDensity")
plot(res$Conc, res$MeanDensity, pch=20, xlab="Concentration", ylab="Mean density")
res<-aggregate(DNase$density, list(DNase$conc) , mean)
names(res)<-c("Conc", "MeanDensity")
plot(res$Conc, res$MeanDensity, pch=20, xlab="Concentration", ylab="Mean density")
Tuesday, January 18, 2011
R Convert Correlation Matrix to Gene Pairs
http://tolstoy.newcastle.edu.au/R/help/04/10/5127.html
r <- cor(x) y <- which(lower.tri(r), TRUE) # returns indices of lower triangle of matrix z <- data.frame(row = rownames(r)[y[, 1]], col = colnames(r)[y[, 2]], cor = r[y]) subset(z, abs(cor) > 0.5)
r <- cor(x) y <- which(lower.tri(r), TRUE) # returns indices of lower triangle of matrix z <- data.frame(row = rownames(r)[y[, 1]], col = colnames(r)[y[, 2]], cor = r[y]) subset(z, abs(cor) > 0.5)
R plotting tricks
# error bars, $Run is the group
library(sciplot)
bargraph.CI(DNase.625$Run, DNase.625$density, err.width=0.05)
# density plots
plot(density(DNase.625$density))
# multiple plots
par(mfrow=c(2,1))
bargraph.CI(DNase.625$Run, DNase.625$density, err.width=0.05)
hist(DNase.625$density)
# empty plot, n = no points
empty set of axes (use "type='n'"),
# saving images to file use png() or pdf()
png('avg_density.png', width=800, height=800)
plot(dens.avg.frame, type='b', lty=3)
dev.off()
library(sciplot)
bargraph.CI(DNase.625$Run, DNase.625$density, err.width=0.05)
# density plots
plot(density(DNase.625$density))
# multiple plots
par(mfrow=c(2,1))
bargraph.CI(DNase.625$Run, DNase.625$density, err.width=0.05)
hist(DNase.625$density)
# empty plot, n = no points
empty set of axes (use "type='n'"),
# saving images to file use png() or pdf()
png('avg_density.png', width=800, height=800)
plot(dens.avg.frame, type='b', lty=3)
dev.off()
Monday, January 17, 2011
Grad school advice
http://www.cs.unc.edu/~azuma/hitch4.html
But without an underlying confidence that you do have what it takes to complete a dissertation, it's too easy to drop out when the going gets tough instead of sticking it through. I found it useful to keep in touch with the "real world," to remind myself that the graduate student population is not representative of humanity in general and to keep my perspective. You got into graduate school because you have already shown to your professors that you have potential and skills that are not typical among most college students, let alone most people -- don't forget that.
That initial failure caused me to answer the basic question, providing the mental fortitude to keep going despite the hurdles and problems I would later face.
Academia is a business
Graduate school is a different ballgame
Initiative
Tenacity "thick skin"
Organizational skills
Communications skills
Choosing an adviser and a committee
Balance and Perspective
You will learn the state of the art in your chosen speciality and conduct cutting-edge research on a subject that you find interesting and enjoyable. If you don't find this compensation sufficient, then you shouldn't be in graduate school in the first place.
"Don't let school get in the way of your education."
- Mark Twain
"The IQ test was invented to predict academic performance, nothing else. If we wanted something that would predict life success, we'd have to invent another test completely."
- Robert Zajonc
You will take classes in the beginning but in your later years you probably won't have any classes. People judge a recently graduated Ph.D. by his or her research, not by his or her class grades. And, without any offense to my professors, most of what you learn in a Ph.D. program comes outside of classes: from doing research on your own, attending conferences, and talking to your fellow students. Success in graduate school does not come from completing a set number of course units but rather from successfully completing a research program.
But in a Ph.D. program, you must select and complete a unique long-term research program. For most of us, this means you have to learn how to do research and all that entails: working closely with your professors, staff and fellow students, communicating results, finding your way around obstacles, dealing with politics, etc.
https://www.stephencovey.com/7habits/7habits.php
Valuing differences is what really drives synergy. Do you truly value the mental, emotional, and psychological differences among people? Or do you wish everyone would just agree with you so you could all get along? Many people mistake uniformity for unity; sameness for oneness. One word--boring! Differences should be seen as strengths, not weaknesses. They add zest to life.
it is often better to ask forgiveness than permission, provided you are not becoming a "loose cannon."
Tenacity means sticking with things even when you get depressed or when things aren't going well.
First, the Ph.D. is the beginning, not the culmination, of your career. Don't worry about making it your magnum opus. Get out sooner, rather than later.
Second, if you bother to talk to and learn from the people who have already gone through this process, you might graduate two years earlier.
http://www.cs.indiana.edu/how.2b/how.2b.research.html#thesis
But without an underlying confidence that you do have what it takes to complete a dissertation, it's too easy to drop out when the going gets tough instead of sticking it through. I found it useful to keep in touch with the "real world," to remind myself that the graduate student population is not representative of humanity in general and to keep my perspective. You got into graduate school because you have already shown to your professors that you have potential and skills that are not typical among most college students, let alone most people -- don't forget that.
That initial failure caused me to answer the basic question, providing the mental fortitude to keep going despite the hurdles and problems I would later face.
Academia is a business
Graduate school is a different ballgame
Initiative
Tenacity "thick skin"
Organizational skills
Communications skills
Choosing an adviser and a committee
Balance and Perspective
You will learn the state of the art in your chosen speciality and conduct cutting-edge research on a subject that you find interesting and enjoyable. If you don't find this compensation sufficient, then you shouldn't be in graduate school in the first place.
"Don't let school get in the way of your education."
- Mark Twain
"The IQ test was invented to predict academic performance, nothing else. If we wanted something that would predict life success, we'd have to invent another test completely."
- Robert Zajonc
You will take classes in the beginning but in your later years you probably won't have any classes. People judge a recently graduated Ph.D. by his or her research, not by his or her class grades. And, without any offense to my professors, most of what you learn in a Ph.D. program comes outside of classes: from doing research on your own, attending conferences, and talking to your fellow students. Success in graduate school does not come from completing a set number of course units but rather from successfully completing a research program.
But in a Ph.D. program, you must select and complete a unique long-term research program. For most of us, this means you have to learn how to do research and all that entails: working closely with your professors, staff and fellow students, communicating results, finding your way around obstacles, dealing with politics, etc.
https://www.stephencovey.com/7habits/7habits.php
Valuing differences is what really drives synergy. Do you truly value the mental, emotional, and psychological differences among people? Or do you wish everyone would just agree with you so you could all get along? Many people mistake uniformity for unity; sameness for oneness. One word--boring! Differences should be seen as strengths, not weaknesses. They add zest to life.
it is often better to ask forgiveness than permission, provided you are not becoming a "loose cannon."
Tenacity means sticking with things even when you get depressed or when things aren't going well.
First, the Ph.D. is the beginning, not the culmination, of your career. Don't worry about making it your magnum opus. Get out sooner, rather than later.
Second, if you bother to talk to and learn from the people who have already gone through this process, you might graduate two years earlier.
http://www.cs.indiana.edu/how.2b/how.2b.research.html#thesis
Bioinformatics software tools
MOE possesses a powerful and flexible facility for multiple sequence and multiple structure alignment of protein chains. A unique feature of MOEs protein alignment tool, MOE-Align, is that it allows mixed structural and non-structured data.
BLASTZ is a multiple sequence alignment program for the whole-genome human-mouse alignments.
http://www.charite.de/bioinf/strap/links.html
BLASTZ is a multiple sequence alignment program for the whole-genome human-mouse alignments.
http://www.charite.de/bioinf/strap/links.html
Sunday, January 16, 2011
Profile HMM
http://pages.cs.brandeis.edu/~cs178/
http://thor.info.uaic.ro/~ciortuz/SLIDES/profileHMM.pdf
www.dartmouth.edu/~madory/ProfileHMM-SimpleCase.ppt
http://lib.bioinfo.pl/courses/view/713
http://www.cs.tau.ac.il/~rshamir/algmb/00/scribe00/html/lec06/node9.html
http://www.ncbi.nlm.nih.gov.proxy.lib.sfu.ca/pubmed/9918945
http://webdocs.cs.ualberta.ca/~colinc/cmput606/
http://thor.info.uaic.ro/~ciortuz/SLIDES/profileHMM.pdf
www.dartmouth.edu/~madory/ProfileHMM-SimpleCase.ppt
http://lib.bioinfo.pl/courses/view/713
http://www.cs.tau.ac.il/~rshamir/algmb/00/scribe00/html/lec06/node9.html
http://www.ncbi.nlm.nih.gov.proxy.lib.sfu.ca/pubmed/9918945
http://webdocs.cs.ualberta.ca/~colinc/cmput606/
Friday, January 14, 2011
Bioinformatics : a practical guide to the analysis of genes and proteins / Andreas D. Baxevanis and B.F. Francis Ouellette.
Title Bioinformatics : a practical guide to the analysis of genes and proteins / Andreas D. Baxevanis and B.F. Francis Ouellette.
Published Hoboken, NJ : Wiley Interscience, 2005
Edition 3rd ed.
Published Hoboken, NJ : Wiley Interscience, 2005
Edition 3rd ed.
Eclipse shortcuts
Ctrl+Shift+R - Search for a resource, Java class / filename
Ctrl+F11 - Run
Highlight class and F3 - Opens the class definition
Ctrl+F11 - Run
Highlight class and F3 - Opens the class definition
Thursday, January 13, 2011
Screen for Linux Session Management
Detach:
“Ctrl-A” “d”.
[root@gigan root]# screen -ls
There are screens on:
31619.ttyp2.gigan (Detached)
4731.ttyp2.gigan (Detached)
Attach:
[root@gigan root]#screen -r 31619.ttyp2.gigan
http://www.rackaid.com/resources/linux-screen-tutorial-and-how-to/
Also tmux
http://www.wikivs.com/wiki/Screen_vs_tmux
$ tmux attach
Tmux allows you to have shell sessions that don't disappear on you just because you disconnected from the machine.
“Ctrl-A” “d”.
[root@gigan root]# screen -ls
There are screens on:
31619.ttyp2.gigan (Detached)
4731.ttyp2.gigan (Detached)
Attach:
[root@gigan root]#screen -r 31619.ttyp2.gigan
http://www.rackaid.com/resources/linux-screen-tutorial-and-how-to/
Also tmux
http://www.wikivs.com/wiki/Screen_vs_tmux
$ tmux list-sessions
$ tmux attach
Tmux allows you to have shell sessions that don't disappear on you just because you disconnected from the machine.
Wednesday, January 12, 2011
Intron finding / prediction program?
Gene prediction programs like Genescan, Genewise, Twinscan, Doublescan, etc. uses information from exons that forms proteins. What about identifying intron structure?
Introns
http://www.rae.org/introns.html
Introns
http://www.rae.org/introns.html
Powell
Powell’s method, the short name of the conjugate direction method proposed by Powell, is a kind of direct search method and has been applied to many engineering optimization problems for it was known as one of the most efficient method in unconstrained nonlinear optimizations. This technique is a modified or deflected gradient approach.
http://www.iahr.org/e-library/beijing_proceedings/Theme_A/COMPARISON%20OF%20OPTIMIZATION%20ALGORITHMS.html
http://www.iahr.org/e-library/beijing_proceedings/Theme_A/COMPARISON%20OF%20OPTIMIZATION%20ALGORITHMS.html
Steven Strogatz on the Elements of Math
Steven Strogatz, an award-winning professor, takes readers from the basics to the baffling in a 15-part series on mathematics. Beginning with a column on why numbers are helpful, he goes on to investigate topics including negative numbers, calculus and group theory, finishing with the mysteries of infinity.
http://topics.nytimes.com/top/opinion/series/steven_strogatz_on_the_elements_of_math/index.html
http://topics.nytimes.com/top/opinion/series/steven_strogatz_on_the_elements_of_math/index.html
Canon ImageClass
http://www.usa.canon.com/cusa/support/consumer/printers_multifunction/imageclass_series/imageclass_d1180#DriversAndSoftware
http://ubuntuforums.org/showthread.php?t=1140724
sudo alien --to-deb --scripts cndrvcups-common-1.80-1.x86_64.rpm
sudo alien --to-deb --scripts cndrvcups-ufr2-uk-1.80-1.x86_64.rpm
Files generated:
cndrvcups-common_1.80-2_amd64.deb
cndrvcups-ufr2-uk_1.80-2_amd64.deb
http://ubuntuforums.org/showthread.php?t=1140724
sudo alien --to-deb --scripts cndrvcups-common-1.80-1.x86_64.rpm
sudo alien --to-deb --scripts cndrvcups-ufr2-uk-1.80-1.x86_64.rpm
Files generated:
cndrvcups-common_1.80-2_amd64.deb
cndrvcups-ufr2-uk_1.80-2_amd64.deb
Think on your feet
http://www.mindtools.com/pages/article/ThinkingonYourFeet.htm
This doesn't mean you have to know everything about everything, but if you are reasonably confident in your knowledge of the subject, that confidence will help you to remain calm and collected even if you are put unexpectedly in the hot seat.
1. Relax
2. Listen
3. Have the Question Repeated
4. Use Stall Tactics
5. Use Silence to your Advantage
6. Stick to One Point and One Supporting Piece of Information
7. Prepare Some "What Ifs" - brainstorming the most difficult questions that people might ask
8. Practice Clear Delivery
9. Summarize and Stop
http://articles.cnn.com/2008-08-12/living/rs.how.to.think.on.feet_1_blueberry-pie-improv-unscripted-world/2?_s=PM:LIVING
1. "yes...and"
2. Go with your gut
3. Make everyone else in your group look good.
1. "yes...and"
"It was a beautiful weekend." If all you say is "Yes, it was great," that ends the conversation right there. But if you say, "Yes, it was great. And I really made the most of it. I went to a concert in the park and brought my yellow Lab. He snatched a sandwich right out of the hands of some poor woman having a picnic. But we had fun."
You can follow up with "Do you like dogs?" or "Have you ever been to a concert in the park?"
The "yes...and" technique gives you the chance to acknowledge what's been said and then move the conversation to a new place, where you just might discover something -- or someone -- delightful.
2. Go with your gut
"There's no time to rationalize, no time to weigh the pros and cons of your response," says Mike Ross, a 34-year-old lawyer and a student at the UCB.
Or as Yogi Berra put it: "You can't think and hit at the same time."
Try to break the habit of second-guessing yourself before you speak. While you're busy thinking up the "right" response, that awkward silence is settling in.
The key is to trust your instinct.
3. Make everyone else in your group look good.
How it works: Here's what you learn in improv: You're nothing without somebody else. There's nothing to improvise without someone to improvise with. The more you trust others to be your props, the more you invite them to shine, the stronger you get.
How to make it work for you: In any situation, practice acknowledging the others in your group (the "yes") and always make an effort to promote their ideas (the "and"). It quite simply makes for better conversation.
This doesn't mean you have to know everything about everything, but if you are reasonably confident in your knowledge of the subject, that confidence will help you to remain calm and collected even if you are put unexpectedly in the hot seat.
1. Relax
2. Listen
3. Have the Question Repeated
4. Use Stall Tactics
5. Use Silence to your Advantage
6. Stick to One Point and One Supporting Piece of Information
7. Prepare Some "What Ifs" - brainstorming the most difficult questions that people might ask
8. Practice Clear Delivery
9. Summarize and Stop
http://articles.cnn.com/2008-08-12/living/rs.how.to.think.on.feet_1_blueberry-pie-improv-unscripted-world/2?_s=PM:LIVING
1. "yes...and"
2. Go with your gut
3. Make everyone else in your group look good.
1. "yes...and"
"It was a beautiful weekend." If all you say is "Yes, it was great," that ends the conversation right there. But if you say, "Yes, it was great. And I really made the most of it. I went to a concert in the park and brought my yellow Lab. He snatched a sandwich right out of the hands of some poor woman having a picnic. But we had fun."
You can follow up with "Do you like dogs?" or "Have you ever been to a concert in the park?"
The "yes...and" technique gives you the chance to acknowledge what's been said and then move the conversation to a new place, where you just might discover something -- or someone -- delightful.
2. Go with your gut
"There's no time to rationalize, no time to weigh the pros and cons of your response," says Mike Ross, a 34-year-old lawyer and a student at the UCB.
Or as Yogi Berra put it: "You can't think and hit at the same time."
Try to break the habit of second-guessing yourself before you speak. While you're busy thinking up the "right" response, that awkward silence is settling in.
The key is to trust your instinct.
3. Make everyone else in your group look good.
How it works: Here's what you learn in improv: You're nothing without somebody else. There's nothing to improvise without someone to improvise with. The more you trust others to be your props, the more you invite them to shine, the stronger you get.
How to make it work for you: In any situation, practice acknowledging the others in your group (the "yes") and always make an effort to promote their ideas (the "and"). It quite simply makes for better conversation.
Tuesday, January 11, 2011
bashrc customization
~/.bash_profile
~/.bash_rc
PS1="\e[0;31m[\A - $LOAD]\e[0m\n[\u@\h \#] \w > "
[10:17 - ]
[user@krusty 21] /home/user/temp >
http://www.linuxconfig.org/Bash_prompt_basics
~/.bash_rc
PS1="\e[0;31m[\A - $LOAD]\e[0m\n[\u@\h \#] \w > "
[10:17 - ]
[user@krusty 21] /home/user/temp >
http://www.linuxconfig.org/Bash_prompt_basics
Monday, January 10, 2011
Bioinformatics: a practical guide to the analysis of genes and proteins
Bioinformatics: a practical guide to the analysis of genes and proteins
By Andreas D. Baxevanis, B. F. Francis Ouellette
http://books.google.com/books?id=i0W9NBmxewQC&printsec=frontcover&dq=bioinformatics+ouellette&source=bl&ots=_9MCohI1zE&sig=zGMdYPvMdFt12v2l30SWMecW6aQ&hl=en&ei=3JUrTcyZIZPQsAOdt4WUBw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBIQ6AEwAA#v=onepage&q&f=false
By Andreas D. Baxevanis, B. F. Francis Ouellette
http://books.google.com/books?id=i0W9NBmxewQC&printsec=frontcover&dq=bioinformatics+ouellette&source=bl&ots=_9MCohI1zE&sig=zGMdYPvMdFt12v2l30SWMecW6aQ&hl=en&ei=3JUrTcyZIZPQsAOdt4WUBw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBIQ6AEwAA#v=onepage&q&f=false
MGED - Microarray Gene Expression Data Society (now FGED)
http://www.mged.org/
Our goal is to assure that investment in functional genomics data generates the maximum public benefit. Our work on defining minimum information specifications for reporting data in functional genomics papers have already enabled large data sets to be used and reused to their greater potential in biological and medical research.
MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment. [Brazma et al, Nature Genetics]
TM4 - Microarray Software Suite
http://www.tm4.org/
MAGE-ML (XML), MAGE-OM (object model)
BioConductor
http://www.bioconductor.org/
Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development.
functional genomics = gene expressions = microarrays
ArrayExpress
http://www.ebi.ac.uk/arrayexpress/
NCBI's Gene Expression Omnibus
http://www.ncbi.nlm.nih.gov/geo/
1. select microarray design
2. MADAM - image processing software, estimates of expression, background noise
3. measure expression, usually log2 because log(1/2) = -log(2) = -1
4. normalizing expression measurements, adjusted to the reference gene
5. assume arrayed elements contain random assortment of genes to avoid bias
6. assume finite RNA sample so when expression on one gene goes up, it must go down for the other
Ntotal = (sum ri / sum gi) over total number of elements in microarray
T = (1/Ntotal)R/G
or use lowess (locally weighted linear regression) to estimate systematic bias in the data
semimetric distance - doesn't follow the triangle rule ( dik <= dij + djk), eg. pearson correlation coefficient, r= -1 (opposite), +1 (identical, perfect correlation), 0 = orthogonal, uncorrelated
squared pearson = 0 <= rsq <= 1 and distance d = 1-rsq (since high correlation/anticorrelation is r=1 and distance should be very close d=0)
Algorithms
K -means (Tavazoie et al.,1999) and self-organizing maps (SOMs; Tamayo et al., 1999; T ̈ r ̈ nen et al., 1999),
1. Michael B. Eisen et al., “Cluster analysis and display of genome-wide expression patterns,” Proceedings of the National Academy of Sciences of the United States of America 95, no. 25 (December 8, 1998): 14863 -14868.
1. Sandrine Dudoit, Robert C Gentleman, and John Quackenbush, “Open source software for the analysis of microarray data,” BioTechniques Suppl (March 2003): 45-51.
1. M K Kerr and G A Churchill, “Statistical design and the analysis of gene expression microarray data,” Genetical Research 77, no. 2 (April 2001): 123-128.
1. Junbai Wang et al., “Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study,” BMC Bioinformatics 3 (November 24, 2002): 36.
1. Ivana V Yang et al., “Within the fold: assessing differential expression measures and reproducibility in microarray assays,” Genome Biology 3, no. 11 (October 24, 2002): research0062.
1. P Pavlidis and W S Noble, “Analysis of strain and regional variation in gene expression in mouse brain,” Genome Biology 2, no. 10 (2001): RESEARCH0042.
Our goal is to assure that investment in functional genomics data generates the maximum public benefit. Our work on defining minimum information specifications for reporting data in functional genomics papers have already enabled large data sets to be used and reused to their greater potential in biological and medical research.
MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment. [Brazma et al, Nature Genetics]
TM4 - Microarray Software Suite
http://www.tm4.org/
MAGE-ML (XML), MAGE-OM (object model)
BioConductor
http://www.bioconductor.org/
Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development.
functional genomics = gene expressions = microarrays
ArrayExpress
http://www.ebi.ac.uk/arrayexpress/
NCBI's Gene Expression Omnibus
http://www.ncbi.nlm.nih.gov/geo/
1. select microarray design
2. MADAM - image processing software, estimates of expression, background noise
3. measure expression, usually log2 because log(1/2) = -log(2) = -1
4. normalizing expression measurements, adjusted to the reference gene
5. assume arrayed elements contain random assortment of genes to avoid bias
6. assume finite RNA sample so when expression on one gene goes up, it must go down for the other
Ntotal = (sum ri / sum gi) over total number of elements in microarray
T = (1/Ntotal)R/G
or use lowess (locally weighted linear regression) to estimate systematic bias in the data
semimetric distance - doesn't follow the triangle rule ( dik <= dij + djk), eg. pearson correlation coefficient, r= -1 (opposite), +1 (identical, perfect correlation), 0 = orthogonal, uncorrelated
squared pearson = 0 <= rsq <= 1 and distance d = 1-rsq (since high correlation/anticorrelation is r=1 and distance should be very close d=0)
Algorithms
K -means (Tavazoie et al.,1999) and self-organizing maps (SOMs; Tamayo et al., 1999; T ̈ r ̈ nen et al., 1999),
1. Michael B. Eisen et al., “Cluster analysis and display of genome-wide expression patterns,” Proceedings of the National Academy of Sciences of the United States of America 95, no. 25 (December 8, 1998): 14863 -14868.
1. Sandrine Dudoit, Robert C Gentleman, and John Quackenbush, “Open source software for the analysis of microarray data,” BioTechniques Suppl (March 2003): 45-51.
1. M K Kerr and G A Churchill, “Statistical design and the analysis of gene expression microarray data,” Genetical Research 77, no. 2 (April 2001): 123-128.
1. Junbai Wang et al., “Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study,” BMC Bioinformatics 3 (November 24, 2002): 36.
1. Ivana V Yang et al., “Within the fold: assessing differential expression measures and reproducibility in microarray assays,” Genome Biology 3, no. 11 (October 24, 2002): research0062.
1. P Pavlidis and W S Noble, “Analysis of strain and regional variation in gene expression in mouse brain,” Genome Biology 2, no. 10 (2001): RESEARCH0042.
Myrna - Cloud-scale differential gene expression for RNA-seq
Myrna
Cloud-scale differential gene expression for RNA-seq
http://bowtie-bio.sourceforge.net/myrna/manual.shtml#what-is-myrna
Cloud-scale differential gene expression for RNA-seq
http://bowtie-bio.sourceforge.net/myrna/manual.shtml#what-is-myrna
Sunday, January 9, 2011
Zotero - managing bibliography
Zotero is a powerful, easy-to-use research tool that helps you gather, organize, and analyze sources and then share the results of your research.
https://addons.mozilla.org/en-US/firefox/addon/3504/developers
https://addons.mozilla.org/en-US/firefox/addon/3504/developers
Friday, January 7, 2011
Gene Expression Analysis, Proteomics, and Network Discovery1
Gene Expression Analysis, Proteomics, and Network Discovery1
First published online December 11, 2009; 10.1104/pp.109.150433
Plant Physiology 152:402-410 (2010)
© 2010 American Society of Plant Biologists
http://www.plantphysiol.org/cgi/content/full/152/2/402
First published online December 11, 2009; 10.1104/pp.109.150433
Plant Physiology 152:402-410 (2010)
© 2010 American Society of Plant Biologists
http://www.plantphysiol.org/cgi/content/full/152/2/402
Thursday, January 6, 2011
Neruology, Neuroinformatics
AutDB
autistic spectrum disorder (ASD)
Brain Architecture Management system (BAMS) - connectivity
Allen Brain Atlas (ABA) - expression
Allen Institute for Brain Research www.brain-map.org/
CARMEN code analysis, repository & modelling for e-neuroscience http://www.carmen.org.uk/presentations
INCF Neuroinformatics www.incf.org/
Neuroscience Information Framework www.neuinfo.org
Kaufman et al. [22] and use a
greedy backward elimination algorithm to maximize score while removing features.
autistic spectrum disorder (ASD)
Brain Architecture Management system (BAMS) - connectivity
Allen Brain Atlas (ABA) - expression
Allen Institute for Brain Research www.brain-map.org/
CARMEN code analysis, repository & modelling for e-neuroscience http://www.carmen.org.uk/presentations
INCF Neuroinformatics www.incf.org/
Neuroscience Information Framework www.neuinfo.org
Kaufman et al. [22] and use a
greedy backward elimination algorithm to maximize score while removing features.
Transposons - Jumping genes, L1 elements (LINE-1)
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Transposons.html
Can be problematic for gene finding and sequence alignment, use RepeatMasker
http://ftp.genome.washington.edu/RM/RepeatMasker.html
Can be problematic for gene finding and sequence alignment, use RepeatMasker
http://ftp.genome.washington.edu/RM/RepeatMasker.html
bioinformatics lectures @ BioInfoBank
http://lib.bioinfo.pl/courses/index/uid/113213
Clustering, multiple sequence alignment (MSA), phylogenetic trees, protein sequencing, gene prediction
Clustering, multiple sequence alignment (MSA), phylogenetic trees, protein sequencing, gene prediction
GENSCAN - Gene finding software for vertebrates, HMM
http://genes.mit.edu/GENSCANinfo.html
http://genes.mit.edu/GENSCAN.html
http://www.cs.ubc.ca/~irmtraud/cs_545/web_material/genscan_paper.pdf
http://lib.bioinfo.pl/courses/view/685
This server provides access to the program Genscan for predicting the locations and exon-intron structures of genes in genomic sequences from a variety of organisms.
HMM - given a list of nucleotide sequence, find which nucleotides correspond to an exon, similar to the biased casino problem where the die rolled can be biased or not (hidden state), has transition and emission probabilities obtained from the training data
Viterbi algorithm uses dynamic programming techniques to minimize the cost of calculating sub-optimal paths by for the maximum values for the previous state only
The algorithm makes a number of assumptions:
- First, both the observed events and hidden events must be in a sequence. The sequence is often temporal, i.e. in time order of occurrence.
- Second, these two sequences need to be aligned: an instance of an observed event needs to correspond to exactly one instance of a hidden event.
- Third, computing the most likely hidden sequence (which leads to a particular state) up to a certain point t must depend only on the observed event at point t, and the most likely sequence which leads to that state at point t − 1.
http://en.wikipedia.org/wiki/Trellis_diagram#Trellis_diagram
http://genes.mit.edu/GENSCAN.html
http://www.cs.ubc.ca/~irmtraud/cs_545/web_material/genscan_paper.pdf
http://lib.bioinfo.pl/courses/view/685
This server provides access to the program Genscan for predicting the locations and exon-intron structures of genes in genomic sequences from a variety of organisms.
HMM - given a list of nucleotide sequence, find which nucleotides correspond to an exon, similar to the biased casino problem where the die rolled can be biased or not (hidden state), has transition and emission probabilities obtained from the training data
Viterbi algorithm uses dynamic programming techniques to minimize the cost of calculating sub-optimal paths by for the maximum values for the previous state only
The algorithm makes a number of assumptions:
- First, both the observed events and hidden events must be in a sequence. The sequence is often temporal, i.e. in time order of occurrence.
- Second, these two sequences need to be aligned: an instance of an observed event needs to correspond to exactly one instance of a hidden event.
- Third, computing the most likely hidden sequence (which leads to a particular state) up to a certain point t must depend only on the observed event at point t, and the most likely sequence which leads to that state at point t − 1.
http://en.wikipedia.org/wiki/Trellis_diagram#Trellis_diagram
Wednesday, January 5, 2011
C-phosphodiester-G
CpG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. "CpG" is shorthand for "—C—phosphate—G—", that is, cytosine and guanine separated by a phosphate, which links the two nucleosides together in DNA. The "CpG" notation is used to distinguish this linear sequence from the CG base-pairing of cytosine and guanine.
There are regions of the DNA that have a higher concentration of CpG sites, known as CpG islands. Many genes in mammalian genomes have CpG islands associated with the start of the gene.[4] Because of this, the presence of a CpG island is used to help in the prediction and annotation of genes. These increased concentrations of CpGs might be associated with the decreased methylation of cytosines often observed in CpG islands — this could result in a reduced vulnerability to transition mutations and, as a consequence, a higher equilibrium density of CpGs surviving.
Methylation of CpG sites within the promoters of genes can lead to their silencing, a feature found in a number of human cancers (for example the silencing of tumor suppressor genes).
http://en.wikipedia.org/wiki/CpG_site
There are regions of the DNA that have a higher concentration of CpG sites, known as CpG islands. Many genes in mammalian genomes have CpG islands associated with the start of the gene.[4] Because of this, the presence of a CpG island is used to help in the prediction and annotation of genes. These increased concentrations of CpGs might be associated with the decreased methylation of cytosines often observed in CpG islands — this could result in a reduced vulnerability to transition mutations and, as a consequence, a higher equilibrium density of CpGs surviving.
Methylation of CpG sites within the promoters of genes can lead to their silencing, a feature found in a number of human cancers (for example the silencing of tumor suppressor genes).
http://en.wikipedia.org/wiki/CpG_site
Tuesday, January 4, 2011
Proteomics: Challenges, Techniques and Possibilities to Overcome Biological Sample Complexity
http://www.sage-hindawi.com/journals/hgp/2009/239204.html
Happy New Year
明けましておめでとうございます。
本年もよろしくお願いいたします。
Akemashite omedetōgozaimasu.
Hon'nen mo yoroshiku onegai itashimasu.
Happy New Year.
Thank you again this year.
this year (今年) ことし
本年もよろしくお願いいたします。
Akemashite omedetōgozaimasu.
Hon'nen mo yoroshiku onegai itashimasu.
Happy New Year.
Thank you again this year.
this year (今年) ことし
Sunday, January 2, 2011
windows domain.invalid error
http://forums.techguy.org/networking/760821-solved-domain-invalid-error.html
Reset TCP/IP stack to installation defaults, type: netsh int ip reset reset.log
Reset WINSOCK entries to installation defaults, type: netsh winsock reset catalog
Reboot the machine.
Try these simple tests after the reboot.
Hold the Windows key and press R, then type CMD (COMMAND for W98/WME) to open a command prompt:
In the command prompt window that opens, type type the following commands one at a time, followed by the Enter key:
IPCONFIG /ALL
PING
PING
PING 206.190.60.37
PING yahoo.com
Reset TCP/IP stack to installation defaults, type: netsh int ip reset reset.log
Reset WINSOCK entries to installation defaults, type: netsh winsock reset catalog
Reboot the machine.
Try these simple tests after the reboot.
Hold the Windows key and press R, then type CMD (COMMAND for W98/WME) to open a command prompt:
In the command prompt window that opens, type type the following commands one at a time, followed by the Enter key:
IPCONFIG /ALL
PING
PING
PING 206.190.60.37
PING yahoo.com
Saturday, January 1, 2011
Polyclonal vs Monoclonal antibodies
http://en.wikipedia.org/wiki/Antibody
Specific antibodies are produced by injecting an antigen into a mammal, such as a mouse, rat or rabbit for small quantities of antibody, or goat, sheep, or horse for large quantities of antibody. Blood isolated from these animals contains polyclonal antibodies—multiple antibodies that bind to the same antigen—in the serum, which can now be called antiserum. Antigens are also injected into chickens for generation of polyclonal antibodies in egg yolk.[51] To obtain antibody that is specific for a single epitope of an antigen, antibody-secreting lymphocytes are isolated from the animal and immortalized by fusing them with a cancer cell line. The fused cells are called hybridomas, and will continually grow and secrete antibody in culture. Single hybridoma cells are isolated by dilution cloning to generate cell clones that all produce the same antibody; these antibodies are called monoclonal antibodies.[52] Polyclonal and monoclonal antibodies are often purified using Protein A/G or antigen-affinity chromatography.[53]
http://www.abcam.com/index.html?pageconfig=resource&rid=11269&pid=11287
Polyclonal antibodies - multiple antibodies that bind to the same antigen
Facts:
* Recognise multiple epitopes on any one antigen. Serum obtained will contain a heterogeneous complex mixture of antibodies of different affinity
* Polyclonals are made up mainly of IgG subclass
* Peptide immunogens are often used to generate polyclonal antibodies that target unique epitopes, especially for protein families of high homology
Antibody production:
* Inexpensive to produce
* Technology and skills required for production low
* Production time scale is short
* Polyclonal antibodies are not useful for probing specific domains of antigen because polyclonal antiserum will usually recognize many domains
Monoclonal antibodies
Facts:
* Detect only one epitope on the antigen.
* They will consist of only one antibody subtype. Where a secondary antibody is required for detection, an antibody against the correct subclass should be chosen.
Antibody production
* High technology required.
* Training is required for the technology used.
* Time scale is long for hybridomas.
Specific antibodies are produced by injecting an antigen into a mammal, such as a mouse, rat or rabbit for small quantities of antibody, or goat, sheep, or horse for large quantities of antibody. Blood isolated from these animals contains polyclonal antibodies—multiple antibodies that bind to the same antigen—in the serum, which can now be called antiserum. Antigens are also injected into chickens for generation of polyclonal antibodies in egg yolk.[51] To obtain antibody that is specific for a single epitope of an antigen, antibody-secreting lymphocytes are isolated from the animal and immortalized by fusing them with a cancer cell line. The fused cells are called hybridomas, and will continually grow and secrete antibody in culture. Single hybridoma cells are isolated by dilution cloning to generate cell clones that all produce the same antibody; these antibodies are called monoclonal antibodies.[52] Polyclonal and monoclonal antibodies are often purified using Protein A/G or antigen-affinity chromatography.[53]
http://www.abcam.com/index.html?pageconfig=resource&rid=11269&pid=11287
Polyclonal antibodies - multiple antibodies that bind to the same antigen
Facts:
* Recognise multiple epitopes on any one antigen. Serum obtained will contain a heterogeneous complex mixture of antibodies of different affinity
* Polyclonals are made up mainly of IgG subclass
* Peptide immunogens are often used to generate polyclonal antibodies that target unique epitopes, especially for protein families of high homology
Antibody production:
* Inexpensive to produce
* Technology and skills required for production low
* Production time scale is short
* Polyclonal antibodies are not useful for probing specific domains of antigen because polyclonal antiserum will usually recognize many domains
Monoclonal antibodies
Facts:
* Detect only one epitope on the antigen.
* They will consist of only one antibody subtype. Where a secondary antibody is required for detection, an antibody against the correct subclass should be chosen.
Antibody production
* High technology required.
* Training is required for the technology used.
* Time scale is long for hybridomas.
Subscribe to:
Posts (Atom)