## Monday, October 31, 2011

### Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

http://www.nature.com/nprot/journal/v6/n11/full/nprot.2011.393.html?WT.ec_id=NPROT-201111

T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

## Sunday, October 30, 2011

### calibre - eBook reader

calibre - eBook reader

PDF to EPUB converter
PDF to TXT converter

### Unsupervised Feature Learning and Deep Learning

http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=ufldl

Course Description

Machine learning has seen numerous successes, but applying learning algorithms today often means spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. In this course, you'll learn about methods for unsupervised feature learning and deep learning, which automatically learn a good representation of the input from unlabeled data. You'll also pick up the "hands-on," practical skills and tricks-of-the-trade needed to get these algorithms to work well.

Basic knowledge of machine learning (supervised learning) is assumed, though we'll quickly review logistic regression and gradient descent.

I. INTRODUCTION

II. LOGISTIC REGRESSION

Representation(1.5x)
Gradient descent in practice(1.2x)(1.5x)
Exponentially weighted average
Shuffling data
Exercise 1: Implementation

III. NEURAL NETWORKS

Representation
Architecture
Examples and intuitions #1(1.2x)
Examples and intuitions #2
Parameter learning
Random initialization
Vectorized implementation
Activation function derivative

V. APPLICATION TO CLASSIFICATION

IV. UNSUPERVISED FEATURE LEARNING and SELF-TAUGHT LEARNING

V. APPLICATION TO CLASSIFICATION

VI. DEEP LEARNING WITH AUTOENCODERS

VII. SPARSE REPRESENTATIONS

VIII. WHITENING

IX. INDEPENDENT COMPONENTS ANALYSIS (ICA)

X. SLOW FEATURE ANALYSIS (SFA)

XI. RESTRICTED BOLTZMANN MACHINES (RBM)

XII. DEEP BELIEF NETWORKS (DBN)

### Confidence

http://www.nlp-secrets.com/nlp-confidence.php

Adopt an open posture. No crossed legs or folded arms.
Make your neck tall and shoulders relaxed, as if you were trying to see over a wall that was very slightly taller than your eye level. Like a meerkat who is looking for a predator. You know what I mean.
Speak clearly and with volume, remember what you're saying is worth hearing.
Don't take yourself too seriously, humour is the most universal language and can help prevent conflict with alpha-male and attention-envy types.
Don't be judgemental to others - but let yourself be open to judgement from others. This relaxes people around you, and helps bring down the barriers between you.

http://www.theprovince.com/Speak+loudly+speak+clearly/3217061/story.html

Coming across as confident is often a result of two things -- body language and tone of voice. No doubt you already know about sitting up straight and making eye contact to show confidence, therefore I am going to focus on how your tone of voice can get you that next job.

Your tone of voice has a big effect on how people are going to both perceive you and respond to you. In fact, your tone of voice is more important than the words you choose; it says to people, "This is how I am really feeling."

-Practise speaking in a slightly lower octave; deeper voices have more credibility than higher-pitched voices.

-Pause before saying a meaningful word or idea you are sharing to emphasize its importance.

-Pronounce every word; don't mumble.

-Record your voice and listen to it.

### e-Books should be free

http://www.booksshouldbefree.com/

Free Audio Books from the public domain
Download a free audiobook in mp3, iPod, or iTunes format

## Friday, October 28, 2011

### ToppGene Suite - gene list enrichment analysis and candidate gene prioritization

http://toppgene.cchmc.org/

* ToppFun: Transcriptome, ontology, phenotype, proteome, and pharmacome annotations based gene list functional enrichment analysis

Detect functional enrichment of your gene list based on Transcriptome, Proteome, Regulome (TFBS and miRNA), Ontologies (GO, Pathway), Phenotype (human disease and mouse phenotype), Pharmacome (Drug-Gene associations), literature co-citation, and other features.

* ToppGene: Candidate gene prioritization

Prioritize or rank candidate genes based on functional similarity to training gene list.

* ToppNet: Relative importance of candidate genes in networks

Prioritize or rank candidate genes based on topological features in protein-protein interaction network.

* ToppGenet: Prioritization of neighboring genes in protein-protein interaction network

Identify and prioritize the neighboring genes of the seeds in protein-protein interaction network based on functional similarity to the "seed" list (ToppGene) or topological features in protein-protein interaction network (ToppNet).

### Ten Simple Rules for Teaching Bioinformatics at the High School Level

Ten Simple Rules for Teaching Bioinformatics at the High School Level

Checklist
1. Am I energized to be enthusiastic about this class?
2. Is the classroom arranged properly for the day's activities?
3. Is my name, course title, and number on the chalkboard?
4. Do I have an ice-breaker planned?
5. Do I have a way to start leaming names?
6. Do I have a way to gather information on student backgrounds, interests, expectations for the course, questions, concerns?
7. Is the syllabus complete and clear?
8. Have I outlined how students will be evaluated?
9. Do I have announcements of needed information ready?
10. Do I have a way of gathering student feedback?
11. When the class is over; will students want to come back? Will you want to come back?

### Ten Simple Rules for Getting involved in your scientific community

Ten Simple Rules for Getting involved in your scientific community

Activities such as organizing conferences and workshops, answering questions and discussing scientific ideas online, contributing to a scientific blog, or participating in open source software projects are typically thought of as outside classic research activity. Having scientists involved in those activities, however, is very important for the community to be dynamic and to promote fruitful discussions and collaborations.

encourage your colleagues to play an active role in the scientific community
want to maintain a balance with the activities directly related to your research projects
remember that you are not alone
If you know why you are doing it and if you enjoy it, you will take the time to do it, and you will do it well

## Wednesday, October 26, 2011

### Quanta Plus -- XML Editor for Ubuntu

Quanta Plus -- XML Editor for Ubuntu

# select all descendants of node parent
/parent/*//

http://www.tizag.com/xmlTutorial/xpathdescendant.php

$sudo apt-get install python-4suite-xml$ 4xpath --string book.xml /catalog/book/author

Result (XPath string):
======================
Gambardella, Matthew

http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm

Simply Python code

http://stackoverflow.com/questions/8692/how-to-use-xpath-in-python

import libxml2
doc = libxml2.parseFile('foo.xml')
for url in doc.xpathEval('//@Url'):
print url.content

## Tuesday, October 25, 2011

### Peter Norvig - The Unreasonable Effectiveness of Data

Collect data, use probability (Baye's Rule) to write some simple code / model, and let the data do all the work.

good vs bad data, over-time, you might pickup your own data?

word sense disambiguation
spelling correction
translation

### Models and useful

Essentially, all models are wrong, but some are useful.
--George Box

### Brown Bag Lunch

This discussion was sparked by a question: "ideas for a short (e.g. 45min) brown bag lunch type session, aiming to share information about a particular piece of work ongoing within a large (newly formed) team,in a way that encourages discussion and thought about potential internal synergies, during the lunch break."

* Called ‘Brown Bag’ because people often bring their food in one, the term refers to informal discussions around a topic (eg ongoing research, first ideas for a project) at lunch time, with lunch brought (or sometimes provided). In an organization, some lunchtime meetings are catered, while in others you're expected to bring your own lunch. For organizers, a nice way to set the expectation that no lunch will be served is to call it a 'brown bag'. That way, participants will bring their own. The equivalent in South Asia might be a 'tiffin box' lunch!).

http://wiki.km4dev.org/wiki/index.php/Brown_Bag_Lunches

### Differentially Expressed Genes in Major Depression Reside on the Periphery of Resilient Gene Coexpression Networks

However, we found that the small-world connectivity characteristics of coexpression networks are resilient to the effects of depression (and of other neuropsychiatric diseases), and that the related pathology is not mediated by network disintegration via attack on hub nodes.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166821/?tool=pubmed

## Monday, October 24, 2011

### Free Android Apps

Tripso

Complete up to date travel guides for over 8000 destinations Triposo is the most comprehensive guide available.
Triposo apps don’t need an internet connection, even the detailed maps are all stored on your device.

WhatsApp - WhatsApp Messenger is a cross-platform mobile messaging app which allows you to exchange messages without having to pay for SMS

Call Meter NG - Keep track of your mobile voice / data / text bills
http://www.appbrain.com/app/de.ub0r.de.android.callMeterNG

Offline Browser - save webpage

123clip
save webpage

Deionized

Nauseous

Angry Bird

http://lgp500.wordpress.com/2011/05/24/top-8-best-android-must-have-apps/
http://shrewdgeek.com/2011/10/17/10-must-have-applications-for-lg-optimus-one-p500/

Custom Rom - CyanogenMod 7.1.0 (Android 2.3.7)

For Optimus One running official 2.3.3, root using SuperOneClick.

Why install custom ROM?
http://www.androidpolice.com/2010/05/01/custom-roms-for-android-explained-and-why-you-want-them/

Super Manager - File manager, Remove stock applications

Benefits of rooting
http://sachinchavan.in/2010/12/how-to-root-android-lg-optimus-one-p500-india/

Titanium Backup

ClockworkMod ROM Manager
http://www.appbrain.com/app/com.koushikdutta.rommanager

GingerBreak (works on LG OPTIMUS ONE V 2.2.2)
http://www.manast.com/2011/04/30/how-to-root-lg-optimus-one-p500-using-gingerbreak/

Z4Root (use Permanent Root) (or use GingerBreak-v1.20.apk)
http://rainulf.ca/androidtag.html (for Telus, P500H, Android firmware v2.2 NOT 2.2.1)
- Enable USB debugging from Menu->Settings->Applications->Development->USB Debugging.
- Make sure the versions / firmware eg. V10B are correct (To find out, go to Settings, then About Phone)
http://lgoptimusonep500.blogspot.com/2010/12/rooting-lg-optimus-one-p500.html

Telus V10S stock:

How to Root LG Optimus One
http://androidos.in/2010/12/how-to-root-lg-optimus-one-remove-unwanted-apps/

Opera Mini
https://market.android.com/details?id=com.opera.mini.android&hl=en

Bluetooth File Transfer
https://market.android.com/details?id=it.medieval.blueftp

MoboPlayer
https://market.android.com/details?id=com.clov4r.android.nil

Barcode Scanner

Dolphin Browser™ HD (Play Flash)
https://market.android.com/details?id=mobi.mgeek.TunnyBrowser&feature=related_apps#?t=W251bGwsMSwxLDEwOSwibW9iaS5tZ2Vlay5UdW5ueUJyb3dzZXIiXQ..

Skyfire - (Play Flash) Skyfire Browser makes your mobile web experience richer, smarter and more fun!
Skyfire is the world’s smartest & most social mobile browser!
https://market.android.com/details?id=com.skyfire.browser&hl=en

Android System Info - Explore all features of your android device!
https://market.android.com/details?id=com.electricsheep.asi&feature=search_result#?t=W251bGwsMSwxLDEsImNvbS5lbGVjdHJpY3NoZWVwLmFzaSJd

App 2 SD Free (move app to SD) - Are you running out of application storage?
https://market.android.com/details?id=com.a0soft.gphone.app2sd&feature=search_result#?t=W251bGwsMSwxLDEsImNvbS5hMHNvZnQuZ3Bob25lLmFwcDJzZCJd

Easy Uninstaller - Simplist & fastest uninstall tool for android.
https://market.android.com/details?id=mobi.infolife.uninstaller&feature=search_result#?t=W251bGwsMSwxLDEsIm1vYmkuaW5mb2xpZmUudW5pbnN0YWxsZXIiXQ..

Spare Parts Plus! - Allows you to enable and change some hidden settings of your Android device.
https://market.android.com/details?id=com.androidapps.spare_parts&feature=related_apps

RockPlayer - RockPlayer is high performance, almost all formats media player with a lot of functions
https://market.android.com/details?id=com.redirectin.rockplayer.android.unified.lite&hl=en

### Ectopic expression

Ectopic expression is the expression of a gene in an abnormal place in an organism. This can be caused by a disease, or it can be artificially produced as a way to help determine what the function of that gene is.

http://en.wikipedia.org/wiki/Ectopic_expression

Similar gene expression profiles do not imply similar tissue functions

Although similarities in gene expression among tissues are commonly inferred to reflect functional constraints, this has never been formally tested. Furthermore, it is unclear which evolutionary processes are responsible for the observed similarities. When examining genome-wide expression data in mouse, we found that patterns of expression similarity between tissues extend to genes that are unlikely to function in the tissues. Thus, ectopic expression can seem coordinated across tissues. This indicates that knowledge of gene expression patterns per se is insufficient to infer gene function. Ectopic expression is possibly explained as expression leakage, caused by spreading of chromatin modifications or the transcription apparatus into neighboring genes.
http://www.sciencedirect.com/science/article/pii/S0168952506000254

## Sunday, October 23, 2011

$find -name "*.mp3" -print0 | xargs -0 mp3gain -r xargs -0 - correctly handles files with spaces ## Saturday, October 22, 2011 ### Call landline Call landline http://www.androidauthority.com/top-best-free-calls-android-phones-voip-19829/ Fring Skype ### Create flash cards online ### Gene set enrichment analysis made simple (GSEA) MADE SIMPLE GENE SET ENRICHMENT ANALYSIS MADE SIMPLE Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis (GSEA), is based on a statistical test known for its lack of sensitivity. In this paper we compare the performance of a simple alternative to GSEA.We find that this simple solution clearly outperforms GSEA.We demonstrate this with eight different microarray datasets. There are currently two major types of procedure for incorporating biological knowledge into differential expression analysis. We will refer to these as the over-representation and the aggregate score approaches. Over-representation analysis can be summarized as follows: First, form a list of candidate genes using the marginal approach. Then, for each gene set, we create a two-by-two table compar- ing the number of candidate genes that are members of the category to those that are not members. The significance of over-representation can be assessed, for example, using the hypergeometric distribution or its binomial approximation. A limitation of the over-representation approach is that it ignores all the genes that did not make the list of candidate genes. The aggregate score approach (eg. GSEA), does not have this limitation. The basic idea is to assign scores to each gene set based on all the gene-specific scores for that gene set. In this paper we compare GSEA to the one sample z-test and χ2 -test http://www.bepress.com/jhubiostat/paper185/ that 7 or so genes is sufficient to uniquely determine a gene set, -- Jesse ### Hypergeometric (draws w/o replacement) and Binomial / Bernoulli (draws with replacement) distributions In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The Binomial distribution is an n times repeated Bernoulli trial. The binomial distribution is the basis for the popular binomial test of statistical significance. The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used. http://en.wikipedia.org/wiki/Binomial_distribution http://en.wikipedia.org/wiki/Hypergeometric_distribution http://stattrek.com/online-calculator/hypergeometric.aspx Draw 5 cards from the deck, what are the chances that 4 are red? > tot <- 52; m <- 26; n <- tot-m; k <- 5; q <- 4; dhyper(q,m,n,k) [1] 0.1495598 > tot <- 52; m <- 26; n <- tot-m; k <- 5; q <- 4; phyper(q,m,n,k) [1] 0.9746899 ## Friday, October 21, 2011 ### Steve Jobs http://www.wired.com/gadgetlab/2011/10/steve-jobs-through-the-years-2/?pid=2412&viewall=true “Remembering that I'll be dead soon is the most important tool I've ever encountered to help me make the big choices in life. Almost everything — all external expectations, all pride, all fear of embarrassment or failure — these things just fall away in the face of death, leaving only what is truly important. Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose. You are already naked. There is no reason not to follow your heart.” — Steve Jobs, at a Stanford University commencement ceremony in 2005. “People don’t know what they want until you show it to them.” “'I was in the parking lot [after the lecture], with the key in the car,” Jobs said. “I thought to myself, If this is my last night on earth, would I rather spend it at a business meeting or with this woman? I ran across the parking lot, asked her if she'd have dinner with me. She said yes, we walked into town and we've been together ever since.'' ## Thursday, October 20, 2011 ### SWAN (Semantic Web Applications in Neuromedicine) Welcome to the SWAN project! SWAN (Semantic Web Applications in Neuromedicine) is a Web-based collaborative program that aims to organize and annotate scientific knowledge about Alzheimer disease (AD) and other neurodegenerative disorders. Its goal is to facilitate the formation, development and testing of hypotheses about the disease. The ultimate goal of this project is to create tools and resources to manage the evolving universe of data and information about AD in such a way that researchers can easily comprehend their larger context ("what hypothesis does this support or contradict?"), compare and contrast hypotheses ("where do these two hypotheses agree and disagree?"), identify unanswered questions and synthesize concepts and data into ever more comprehensive and useful hypotheses and treatment targets for this disease. ### DISCO - DISCOvery NIF Interoperation Capabilities What is DISCO? DISCO is an information integration approach designed to facilitate interoperation among Internet resources. It consists of a set of tools and services that allows resource providers who maintain information to share it with automated systems such as NIF. NIF is then able to “harvest” the information and keep those sets of information up-to-date. How is this accomplished? By using a series of files and/or scripts which are then placed in the root directory of the resource developer’s resource. (NIF can also host the files on its servers and crawl for changes there.) Once the files of the resource providers are in place, and DISCO is notified, the DISCO server can then recognize and "consume" the information shared, providing machine understandable information to NIF Integrator Servers (also known as Aggregators) about your resource. http://www.springerlink.com/content/c613m0l225p072g5/fulltext.html ### DOMEO DOMEO - Document Metadata Organizer Ciccarese P, Ocana M, Clark, T. DOMEO: a web-based tool for semantic annotation of online documents. Paper at Bio-Ontologies 2011, Vienna, Austria. Accepted So highlight text in the web (eg. Pubmed article) and hit Annotate. Loads ontology data when annotating as well. Also lets you share annotations. http://code.google.com/p/domeo/ http://www.slideshare.net/paolociccarese/swan-annotation-framework-text-mining ### PLoS Computational Biology Guidelines for Reviewers Research articles modeling aspects of biological systems should demonstrate both scientific novelty and profound new biological insights. Research articles describing improved or routine methods, models, software, and databases will not be considered by PLoS Computational Biology, and may be more appropriate for PLoS ONE. To be considered for publication in PLoS Computational Biology, any given manuscript must satisfy the following criteria: * Originality * High importance to researchers in computational biology * Significant biological insight and general interest to life scientists * Rigorous methodology * Substantial evidence for its conclusions Manuscripts also must be well written to ensure clear and effective presentation of the work and key findings. The best possible review of a Research Article would answer the following questions: * What are the main claims of the paper and how significant are they? Is this paper important in its discipline? * Have the authors provided adequate proof for their claims? * Are these claims novel? If not, please specify papers that weaken the claims of originality of this one. * Would additional work improve the paper? How much better would the paper be if this work were performed and how difficult would it be to do this work? * Are the claims properly placed in the context of the previous literature? Have the authors treated the literature fairly? * Do the data and analyses support the claims? If not, what other evidence is required? * Are original data deposited in appropriate repositories and accession/version numbers provided for genes, proteins, mutants, diseases, etc.? * Does the study conform to any relevant guidelines such as CONSORT, MIAME, QUORUM, STROBE, and the Fort Lauderdale agreement? * Are details of the methodology sufficient to allow the experiments to be reproduced? * Is any software created by the authors freely available? * PLoS Computational Biology encourages authors to publish detailed protocols and algorithms as supporting information online. Do any particular methods used in the manuscript warrant such treatment? * Is the manuscript well organized and written clearly enough to be accessible to non-specialists? Would you recommend the author seek the services of a professional science writer?* * Have any parts of the paper been published elsewhere? Are there any copyright issues associated with this that conflict with the PLoS license?* * Does the paper use standardized scientific nomenclature and abbreviations? If not, are these explained at the first usage? http://www.ploscompbiol.org/static/reviewerGuidelines.action Oxford Journals http://www.oxfordjournals.org/our_journals/nar/for_authors/msprep_database.html ## Wednesday, October 19, 2011 ### p-value, q-value (FDR) http://www.nonlinear.com/support/progenesis/samespots/faq/pq-values.aspx For example, if there are 200 spots on a gel and we apply an ANOVA or t-test to each, then we would expect to get 10 false positives by chance alone. This is known as the multiple testing problem. Another way to look at the difference is that a p-value of 0.05 implies that 5% of all tests will result in false positives. An FDR adjusted p-value (or q-value) of 0.05 implies that 5% of significant tests will result in false positives. The latter is clearly a far smaller quantity. a p-value of 0.01 implies a 1% chance of false positives To interpret the q-values, you need to look at the ordered list of q-values. There are 839 spots in this experiment. If we take spot 52 as an example, we see that it has a p-value of 0.01 and a q-value of 0.0141. Recall that a p-value of 0.01 implies a 1% chance of false positives, and so with 839 spots, we expect between 8 or 9 false positives, on average, i.e. 839*0.01 = 8.39. In this experiment, there are 52 spots with a value of 0.01 or less, and so 8 or 9 of these will be false positives. On the other hand, the q-value is a little greater at 0.0141, which means we should expect 1.41% of all the spots with q-value less than this to be false positives. This is a much better situation. We know that 52 spots have a q-value less than 0.0141 and so we should expect 52*0.0141 = 0.7332 false positives, i.e. less than one false positive. Just to reiterate, false positives according to p-values take all 839 values into account when determining how many false positives we should expect to see while q-values take into account only those tests with q-values less the threshold we choose. ### Olver Sacks http://www.nytimes.com/2010/11/14/books/review/APaul-t.html?pagewanted=all The Mind's Eye The Island of the Color­blind,” “An Anthropologist on Mars,” “The Man Who Mistook His Wife for a Hat” ## Monday, October 17, 2011 ### Middle school science ### Outdoor relay games ## Sunday, October 16, 2011 ### Pull up ## Saturday, October 15, 2011 ### Common Latex mistakes Using underscores eg. hello_world should be hello\_world cutoff > 0.8 to$cutoff > 0.8$Working with tables: \usepackage{longtable} \begin{center} \begin{longtable}{|c|p{3cm}|p{6cm}|c|} \caption{ \bf{my table title}} \\ %table information \hline 1 & 2 & 3 & 4 \\ \hline a & b & c & d \\ \hline \end{longtable} \begin{flushleft} my table caption \end{flushleft} \label{tab:label} \end{center} ### 7 ways to improve your conversation skills http://www.lifeoptimizer.org/2011/04/01/improve-conversation-skills/ 1. Talk slowly 2. Hold more eye contact 3. Notice the details 4. Give unique compliments 5. Express your emotions 6. Offer interesting insights 7. Use the best words ## Friday, October 14, 2011 ### Hemoglobin Hemoglobin is also found outside red blood cells and their progenitor lines. Other cells that contain hemoglobin include the A9 dopaminergic neurons in the substantia nigra, macrophages, alveolar cells, and mesangial cells in the kidney. In these tissues, hemoglobin has a non-oxygen-carrying function as an antioxidant and a regulator of iron metabolism.[6] Hemoglobin variants are a part of the normal embryonic and fetal development, http://en.wikipedia.org/wiki/Hemoglobin ### N50 - a statistical measure of average length of a set of sequences. http://assemblathon.org/assemblathon-2-basic-assembly-metrics N50 scaffold/contig length is calculated by summing lengths of scaffolds/contigs from the longest to the shortest and determining at what point you reach 50% of the total assembly size. The length of the scaffold/contig at that point is the N50 length. http://en.wikipedia.org/wiki/Contig A sequence contig is a contiguous, overlapping sequence read resulting from the reassembly of the small DNA fragments generated by bottom-up sequencing strategies. http://en.wikipedia.org/wiki/N50_statistic The N50 size is computed by sorting all contigs from largest to smallest and by determining the minimum set of contigs whose sizes total 50% of the entire genome. For example, for a genome of 600Mb, if the assembled sequences add up to 500Mb, the N50 would be calculated by sorting the contigs from largest to smallest and finding the length of the contig where the cumulative size is 250Mb. http://seqanswers.com/forums/showthread.php?t=2332 https://www.broad.harvard.edu/crd/wiki/index.php/N50 Given a set of sequences of varying lengths, the N50 length is defined as the length N for which 50% of all bases in the sequences are in a sequence of length L < N. the N50 (L50) is the median contig length from a list of all the contigs lengths in the assembly N50 of {2, 2, 2, 3, 3, 4, 8, 8} is 5 ### Taking Shelter 2011 http://www.imdb.com/title/tt1675192/ Plagued by a series of apocalyptic visions, a young husband and father questions whether to shelter his family from a coming storm, or from himself. ## Wednesday, October 12, 2011 ### Download GEO files using R and GEOquery source("http://bioconductor.org/biocLite.R") # download BioC installation routines biocLite() # install the core packages biocLite("GEOquery") # install the GEO libraries library(GEOquery) getGEOSuppFiles('GSE20987') untar("GSE20987/GSE20987_RAW.tar", exdir="data") cel_files <- list.files("data/", pattern = "gz") cel_files_qualified <- paste("data", cel_files, sep="/") sapply(cel_files_qualified, gunzip) ### Mouse embryology http://php.med.unsw.edu.au/embryology/index.php?title=Mouse_Timeline_Detailed Day 21 (E21.0) Hearing and Balance - Fully mature morphological and physiological innervation of vestibular system (P28) PCW - postconceptional weeks GL - greatest length http://books.google.com/books?id=79eQKVkMxmEC&pg=PA87&lpg=PA87&dq=developmental+stage+pcw&source=bl&ots=fxuQUY4eoR&sig=C2t1CVFZjXOtjZyIRSbJOnpn7Y0&hl=en&ei=NyKXToDrNOjKiQLxi-2bDQ&sa=X&oi=book_result&ct=result&resnum=2&ved=0CCYQ6AEwAQ#v=onepage&q=developmental%20stage%20pcw&f=false ### List of images in Gray's Anatomy: IX. Neurology List of images in Gray's Anatomy: IX. Neurology http://en.wikipedia.org/wiki/List_of_images_in_Gray%27s_Anatomy:_IX._Neurology#the_optic_nerve_.28Gray.27s_s197.29 ### Diploid short read assembly Assemblathon 1, http://www.ncbi.nlm.nih.gov/pubmed/21926179 Earl et al, Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Res. 2011 Sep 16. PubMed PMID: 21926179 ABySS, http://www.ncbi.nlm.nih.gov/pubmed/19251739 Simpson et al, ABySS: a parallel assembler for short read sequence data. Genome Res. 2009. Jun;19(6):1117-23. PubMed PMID: 19251739; PubMed Central PMCID: PMC2694472 ### Test if two samples are different visually using boxplot() assuming normal distribution, use t.test(A, B) #default does not assume equality of variance, small p-value = diff. means, Welch Two sample t-test (unpaired) var.test(A, B) # test for equality of variance, big p-value = same variance t.test(A, B, var.equal=TRUE) # Two sample t-test wilcox.test(A, B) # does not assume normality, just assume a common continous distribution under the null hypothesis, small p-value = diff means http://cran.r-project.org/manuals.html ### Tests of agreement with normality, comparing distributions Kolmogorov-Smirnov test (KS test) for normality, "Do x and y come from the same distribution?" ( see ks.test() in R ) x <- rnorm(50) y <- runif(30) ks.test(x, y) you get a small-pvalue so reject the hypothesis that the distributions are the same, therefore two distributions are different. small p-value = different or Shapiro-Wilk normality test ( see shapiro.test() in R ) or visually using QQ plot (Q-Q plot, see qqnorm() and qqplot() ) x: empirical data, y: theoretical data, ideally, data should lie close to the diagonal cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf ### Contingency table in R using table() An Introduction to R http://cran.r-project.org/doc/manuals/R-intro.html Frequency, contingency table in R using table() http://www.statmethods.net/stats/frequencies.html table You can generate frequency tables using the table( ) function, tables of proportions using the prop.table( ) function, and marginal frequencies using margin.table( ). a <- rep(c(NA, 1/0:3), 10) > a [1] NA Inf 1.0000000 0.5000000 0.3333333 NA Inf [8] 1.0000000 0.5000000 0.3333333 NA Inf 1.0000000 0.5000000 [15] 0.3333333 NA Inf 1.0000000 0.5000000 0.3333333 NA [22] Inf 1.0000000 0.5000000 0.3333333 NA Inf 1.0000000 [29] 0.5000000 0.3333333 NA Inf 1.0000000 0.5000000 0.3333333 [36] NA Inf 1.0000000 0.5000000 0.3333333 NA Inf [43] 1.0000000 0.5000000 0.3333333 NA Inf 1.0000000 0.5000000 [50] 0.3333333 > table(a) a 0.333333333333333 0.5 1 Inf 10 10 10 10 ----------------- # 2-Way Frequency Table attach(mydata) A <- letters[1:3] B <- sample(a) mytable <- table(A,B) # A will be rows, B will be columns > mytable # print table # B #A a b c # a 0 1 0 # b 1 0 0 # c 0 0 1 margin.table(mytable, 1) # A frequencies (summed over B) margin.table(mytable, 2) # B frequencies (summed over A) prop.table(mytable) # cell percentages prop.table(mytable, 1) # row percentages prop.table(mytable, 2) # column percentages ### QR-decomposition and least squares http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd141.htm Least Squares process Linear least squares regression is by far the most widely used modeling method. It is what most people mean when they say they have used "regression", "linear regression" or "least squares" to fit a model to their data. Linear least squares regression also gets its name from the way the estimates of the unknown parameters are computed f(x;\vec{\beta}) = \beta_0 + \beta_1x + \beta_{11}x^2 Least Square Problem Given an inconsistent system of equations, , we want to find a vector, , from so that the error is the smallest possible error. The vector is called the least squares solution. --------- see lm() and lsfit() in R for least squares fitting procedure --------- http://tutorial.math.lamar.edu/Classes/LinAlg/QRDecomposition.aspx QR-Decomposition (see qr() in R) There is a nice application of the QR-Decomposition to the Least Squares Process. Theorem 3 Suppose that A has linearly independent columns. Then the normal system associated with Ax=b can be written as, Rx = t(Q)b Theorem 1 Suppose that A is an n x m matrix with linearly independent columns then A can be factored as, A = QR where Q is an n x m matrix with orthonormal columns and R is an invertible m x m upper triangular matrix. ## Tuesday, October 11, 2011 ### Illumina BeadChips http://www.dkfz.de/gpcf/beadchips.html mouseWG-6 v2 BeadChip® The BeadChip Mouse Sentrix-6 V2 offers comprehensive analysis of genome-wide expression on a single array. * probes are defined gene-specific 50mer oligonucleotides * >700,000 oligonucleotides per bead/spot * on average 30x redundancy for each transcript * Provides comprehensive coverage of the transcribed mouse genome on a single array * Analyzes the expression level of 45,281 mouse transcripts, variants, and EST clusters * comprised of more than 1,600,000 beads per chip * Up-to-date gene list and annotation of Mouse Sentrix-6 V2 BeadChip For more information please download the Mouse Sentrix-6 V2 Whole Genome BeadChip datasheet and the corresponding technical bulletin from Illumina. One physical array is made up of 6 identical, but independent chips. http://www.ohsu.edu/xd/research/research-cores/gmsr/project-design/array-technology/illumina-bead-arrays.cfm The Illumina BeadChip is proprietary method of performing multiplex gene expression and genotyping analysis. The essential element of BeadChip technology is the attachment of oligonucleotides to silica beads. The beads are then randomly deposited into wells on a substrate (for example, a glass slide). The resultant array is decoded to determine which oligonucleotide-bead combination is in which well. The decoded arrays may be used for a number of applications, including gene expression analysis and genotyping. Scroll to the bottom of the page for a primer on array decoding. Expression array overview Gene expression analysis is performed using a 79-base oligonucleotide that has two segments. The 5′ 50-base segment of the oligonucleotide is designed to hybridize to sequences available in the public data repositories. It is this segment that will bind to the labeled target derived from the poly(A) component of the total RNA. The 3′ 29-base segment of the oligonucleotide is the address. The address is a unique sequence created by Illumina specifically to allow unambiguous identification of the oligonucleotide after it has been deposited on the array. Illumina_schematic Arrays may have as many as 44,000 unique oligonucleotides. Each oligonucleotide is synthesized in a large batch using standard technologies. The oligonucleotides are then attached to the surface of a 3-micron silica bead. Each bead has only one type of oligonucleotide attached to it, but it has hundreds of thousands of copies of this oligonucleotide. Standard lithographic techniques are used to create a honeycomb pattern of wells on the surface of glass slides. Each well can hold one bead. The beads for a given array are mixed in equal amounts and deposited on the slide surface. The beads occupy the wells in a random distribution. Each bead is represented by, on average, about 20 instances within the array. The identity of each bead is determined by decoding using the address sequence. A unique array layout file is then associated with each array and used to decode the data during scanning of the array. ### International Genetically Engineered Machine competition (iGEM) http://2011.igem.org/Special:Search?search=team%3A&go=Go http://2011.igem.org/Team:British_Columbia/Team The International Genetically Engineered Machine competition (iGEM) is the premiere undergraduate Synthetic Biology competition. Student teams are given a kit of biological parts at the beginning of the summer from the Registry of Standard Biological Parts. Working at their own schools over the summer, they use these parts and new parts of their own design to build biological systems and operate them in living cells. This project design and competition format is an exceptionally motivating and effective teaching method. ### Erroneous analyses of interactions in neuroscience: a problem of significance http://www.nature.com/neuro/journal/v14/n9/full/nn.2886.html Whatever the reasons for the error, its ubiquity and potential effect suggest that researchers and reviewers should be more aware that the difference between significant and not significant is not itself necessarily significant. A fictive example would be “Hippocampal firing synchrony correlated with memory performance in the placebo condition (r = 0.43, P = 0.01), but not in the drug condition (r = 0.19, P = 0.21)”. When making a comparison between two correlations, researchers should directly contrast the two correlations using an appropriate statistical method. ### Friend A friend is one of the nicest things you can have, and one of the best things you can be. ~Douglas Pagels ### Trainees in bioinformatics and computational biology should seek depth of knowledge over breadth. http://www.nature.com/naturejobs/2011/111006/full/nj7367-143a.html?WT.ec_id=NATUREjobs-20111007 Virginia Gewin doi:10.1038/nj7367-143a “We don't know where we will be in ten years because the technologies and ideas are moving so fast,” he says. As Cleaver notes: “Perhaps the best career strategy is to stay flexible and curious.” ### 38 tips on writing an academic CV http://blogs.nature.com/naturejobs/2011/09/27/38-tips-on-writing-an-academic-cv?WT.ec_id=NATUREjobs-20111007 38 tips on writing an academic CV Posted by Rachel Bowden on Sep 27, 2011 Bookmark and Share "[Academia] seems to be the only field where you can make it as long as you want it to be," The most important information should be on the first half of the first page, says Baker, and the very first thing should be your name, not the words 'curriculum vitae'. Content: the basics The three main sections that should form the bulk of your academic CV are: * Research * Teaching * Administration ### business and vision You need to have a good vision to succeed in business. --Sangdo / Merchant k-drama ## Monday, October 10, 2011 ### Best places to work ### trepidation trep·i·da·tion (trp-dshn) n. 1. A state of alarm or dread; apprehension. See Synonyms at fear. 2. An involuntary trembling or quivering. trepidation [ˌtrɛpɪˈdeɪʃən] n 1. a state of fear or anxiety 2. a condition of quaking or palpitation, esp one caused by anxiety ### Non-coding RNAs: could they be the answer? http://www.ncbi.nlm.nih.gov/pubmed/21183459 Brief Funct Genomics. 2010 Dec 22. Non-coding RNAs: could they be the answer? Costa FF. Abstract Despite a considerable amount of effort by different groups to evaluate the genetic traits associated with complex diseases by genome-wide association studies (GWAS), just a few regions, mainly linked to protein-coding genes, were identified. Recently, studies from different groups have implicated new classes of long non-coding RNAs (ncRNAs) to important molecular mechanisms. Additionally, high-throughput transcriptome analyses of different cell types have shown that an unexpected amount of genomic DNA is transcribed. I am writing to propose that the majority of the regions that do not clearly correspond to a 'gene' controlling certain traits might be ncRNAs or other regulatory transcripts that are still unknown. These regions will need to be carefully examined in the future. PMID: 21183459 [PubMed - as supplied by publisher] ## Sunday, October 9, 2011 ### Early worm "I think we consider too much the good luck of the early bird and not enough the bad luck of the early worm." --Theodore Roosevelt ## Friday, October 7, 2011 ### Chaos and stillness "In the midst of movement and chaos keep stillness inside of you." --Deepak Chopra ### Mapping gene IDs, microarray probeset probe IDs ### You can't fool God! "God knows what you've been doing, everything you've been doing. You may fool me, but you can't fool God!" --Great Gatsby ## Thursday, October 6, 2011 ### Computational and statistical approaches to analyzing variants identified by exome sequencing Computational and statistical approaches to analyzing variants identified by exome sequencing Nathan O Stitziel1,2†, Adam Kiezun2,3† and Shamil Sunyaev2,3* http://genomebiology.com/2011/12/9/227 New sequencing technology has enabled the identification of thousands of single nucleotide polymorphisms in the exome, and many computational and statistical approaches to identify disease-association signals have emerged. Here we review the computational and statistical approaches that have emerged for managing these data in this rapidly exploding field. First, we briefly review the process for identifying variants in next-generation sequencing (NGS) studies and then discuss strategies for identifying the causal variant in Mendelian disorders among the total number of variants identified. We also discuss strategies for identifying the causal gene(s) in complex diseases among all genes in the genome, before outlining some challenges facing current exome sequencing studies. ### Waltz through hippocampal neuropil http://www.youtube.com/watch?v=FZT6c0V8fW4 Reconstruction of a block of hippocampus from a rat approximately 5 micrometers on a side from serial section transmission electron microscopy in the lab of Kristen Harris at the University of Texas at Austin in collaboration with Terry Sejnowski at the Salk Institute and Mary Kennedy at Caltech. Josef Spacek, Daniel Keller, Varun Chaturvedi, Chandrajit Bajaj, Justin Kinney and Tom Bartol made major contributions to the reconstruction and the video. For more reconstructions: http://www.cell.com/neuron/abstract/S0896-6273(10)00624-0 Links to laboratories: http://cnl.salk.edu/ http://synapses.clm.utexas.edu/lab/lab.stm http://www.its.caltech.edu/~mbklab/mary.html http://www.cs.utexas.edu/~bajaj/ http://synapses.clm.utexas.edu/lab/spacek/josef.html ... (more info) (less info) ### rss TIOBE Programming Community Index for September 2011 http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Position Sep 2011-Position Sep 2010-Position Delta in Position Programming Language Ratings Sep 2011 Delta-Sep 2010 Status 1 1 Java 18.761% +0.85% A 2 2 C 18.002% +0.86% A 3 3 C++ 8.849% -0.96% A 4 6 C# 6.819% +1.80% A 5 4 PHP 6.596% -1.77% A 6 8 Objective-C 6.158% +2.79% A 7 5 (Visual) Basic 4.420% -1.38% A 8 7 Python 4.000% -0.58% A 9 9 Perl 2.472% +0.03% A 10 11 JavaScript 1.469% -0.20% A 11 10 Ruby 1.434% -0.47% A ### Brain disorders cost Europe 800 billion euros a year: study http://www.pharmatimes.com/Article/11-10-05/Brain_disorders_cost_Europe_800_billion_euros_a_year_study.aspx The cost of brain disorders in Europe soared to 798 billion euros last year, double the figure for 2005 and equating to 1,550 euros per capita, says a new report. The bill will continue to rise as people live longer, and this represents "the number one economic challenge for European health care now and in the future," says the study, which was commissioned by the European Brain Council (EBC). But without urgent action, the situation can only worsen, given the continuing rise in life expectancy in Europe, the authors warn. Because of the ageing population, degenerative disorders such as dementia, Parkinson's and stroke are particularly destined to become more common, but anxiety and mood disorders are also very prevalent in older populations, they add. ### Teacher "A teacher affects eternity; he can never tell where his influence stops." --Henry B. Adams ## Monday, October 3, 2011 ### download wikipedia ### Department of Numbers Department of Numbers The Department of Numbers contextualizes public data so that individuals can form independent opinions on everyday social and economic matters. http://www.deptofnumbers.com/ ## Sunday, October 2, 2011 ### Best Korean Dramas of 2010 ### k3b and brasero problems, can read DVD but not CD # install cdrtools as a replacement of cdrkit dpkg -i ./libscg1*.deb dpkg -i ./cdda2wav*.deb dpkg -i ./cdrecord*.deb dpkg -i ./mkisofs*.deb dpkg -i ./cdrtools*.deb$ wodim --version
Cdrecord-ProDVD-ProBD-Clone 3.01a03 (x86_64-unknown-linux-gnu) Copyright (C) 1995-2010 Joerg Schilling
http://forum.ubuntuusers.de/topic/kleines-projekt-mit-paketverwaltung-die-schil/8/#post-2809761

http://www.cdrinfo.com/Sections/Reviews/Print.aspx?ArticleId=20651

Another reason why the CD/DVD drive might not read the CD properly is because it's not compatible. My drive can read Memorex type of CD-Rs (650MB) and Fujifilm (700MB), DVDs (including Maxell DVD), but not Maxell CD-Rs (700MB). Maybe it has something to do with the speed the CD was written in (16, 24, 40, 48)?

$sudo chmod +s /usr/bin/wodim$sudo k3b

sbubba
November 7th, 2010, 04:41 AM
Hello
I had this problem with k3b 1.91.0 on ubuntu 10.04.
I've had to purge and reinstall some packages:
hal libhal1 libhal-storage1 k3b wodim
(apt-get wants to remove many packages in addition to these, you have to reinstall all of them)

k3b always says cdrecord has no permission to open device and brasero knows theres a blank disc in but says it has 0 bites free

http://ubuntuforums.org/archive/index.php/t-1291337.html

LG GSA-H55N Firmware
http://www.lg.com/us/support/product/support-product-profile.jsp?customerModelCode=GSA-H55N&matchedModelCode=NOT_MATCHED&searchEngineModelCode=GSA-H55N&initialTab=documents&targetPage=support-product-profile#

## Saturday, October 1, 2011

### Cell-Specific Mechanism-Based Gene Therapy Approach to Treat Retinitis Pigmentosa

http://www.sciencedaily.com/releases/2011/09/110930153050.htm?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+sciencedaily+%28ScienceDaily%3A+Latest+Science+News%29

Furthermore, they used a tissue-specific promoter to achieve cell-specific expression of the transduced genes, which is unusual for shRNA delivery.

### ALLEN INSTITUTE FOR BRAIN SCIENCE 2011 Annual Symposium: Open Questions in Neuroscience

http://www.alleninstitute.org/events/symposium/index.html

Sacha B. Nelson, Brandeis University
http://www.bio.brandeis.edu/faculty/nelson.html
Defining the mammalian neurome

Nathaniel Heintz, Investigator, Howard Hughes Medical Institute