Thursday, April 28, 2011

Synapse2Disease

http://www.synapse2disease.ca/en_projet.html

The Synapse to Disease (S2D) project was initiated in 2006, following a successful funding application to Genome Canada and Genome Quebec. The original aim of the S2D project was to identify synaptic genes that cause or predispose an individual to neurodevelopmental diseases such as autism, mental retardation, schizophrenia, and Tourette Syndrome. Contrary to single-gene diseases such as Huntington or Duchenne Muscular Dystrophy, the diseases selected for S2D project result from the interaction of multiple genetic factors (genes) and environmental factors. They are therefore classified as "complex", and represent a major challenge for genetic analysis.

biomarkers in gene expression

backward selection or forward selection that can select top most discriminating biomarkers

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5593146&tag=1
http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node42.html

Forward selection has drawbacks, including the fact that each addition of a new variable may render one or more of the already included variables non-significant. An alternate approach which avoids this is backward selection. Under this approach, one starts with fitting a model with all the variables of interest (following the initial screen). Then the least significant variable is dropped, so long as it is not significant at our chosen critical level. We continue by successively re-fitting reduced models and applying the same rule until all remaining variables are statistically significant.

Wednesday, April 27, 2011

Unbalanced data sets in machine learning

http://www.openstarts.units.it/dspace/bitstream/10077/4002/1/Menardi%20Torelli%20DEAMS%20WPS2.pdf

It has been widely reported that the class imbalance heavily compromises the process of learning, because the model tends to focus on the prevalent class and to ignore the rare events (Japkowicz and Stephen, 2002).

However, unless the classes are perfectly separable (Hand and Vinciotti, 2003) or the complexity of the problem is low (Japkowicz and Stephen, 2002),
neglecting the unbalance leads to heavy consequences, both in model estimation and when the evaluation of the accuracy of the estimated model has to be measured.

What typically happens in such a situation is that standard classifiers tend to be overwhelmed by the prevalent class and ignore the rare examples.

Fixes:
1. A first approach to this class of methods produces some modification of the classifier in order to compensate the imbalance. This approach is generally applied to classifiers whose training is based on the optimization of some function related to the overall accuracy.
2. Solutions at the data level for dealing with unbalanced classes basically focus on altering the class distribution in order to get a more balanced sample. (oversampling and undersampling)  The reason that altering
the class distribution of the training data aids learning with highly skewed datasets
is that it effectively imposes non-uniform misclassification costs.

Socrates

Be as you wish to seem.

Monday, April 25, 2011

skin cells directly into neural stem cells

http://www.sciencedaily.com/releases/2011/04/110425153600.htm?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+sciencedaily+%28ScienceDaily%3A+Latest+Science+News%29

Dr. Ding focuses on reprogramming skin cells into neural stem cells using the existing iPS technology -- but with a twist. Dr. Ding never lets the cells enter the pluripotent state of iPS cells, in which they could develop into any type of cell. Instead he uses yet another cocktail of factors to transform the skin cells directly into neural stem cells. Avoiding the pluripotent state is important because it avoids the potential danger that "rogue" iPS cells could develop into a tumor if used to replace or repair damaged organs or tissue.

Amazon seller lists book at $23,698,655.93 -- plus shipping

http://www.cnn.com/2011/TECH/web/04/25/amazon.price.algorithm/index.html?hpt=C2
The incident highlights a little-known fact about e-commerce sites such as Amazon: Often, people don't create and update prices; computer algorithms do.

Human vs Rat brain

http://learn.genetics.utah.edu/content/addiction/genetics/neurobiol.html

Friday, April 22, 2011

Bioinformatics visualization

http://www.sfu.ca/~shaw/
http://biov.iat.sfu.ca/IMASProjectWeb

BrainFrame-VDA09.pdf (282K)
BrainFrame: A Knowledge Visualization System for the Neurosciences,
Steven J. Barnes, Chris D. Shaw
Proceedings of VDA 2009 Conference on Visualization and Data Analysis 2009 , San Jose, California, January 19-22, 2009, pp. 72430F.1-10 (10 pages).

Thursday, April 21, 2011

Chinese handwriting input

http://www.purpleculture.net/chinese-handwriting-input/

What is a PhD really worth?

http://www.nature.com/naturejobs/2011/110421/full/nj7343-381a.html?WT.ec_id=NATUREjobs-20110421

http://graduate-school.phds.org/
http://sites.nationalacademies.org/pga/resdoc/index.htm

Frugal science graduate students emerge with a very important life lesson: money does not buy happiness (although it certainly can make misery a lot more comfortable), and living frugally is better than amassing tens of thousands of dollars of debt.

But I believe the most important lesson is that no programme of higher education can guarantee its graduates gainful and lucrative employment. At best, a graduate programme in any discipline can provide its students with key skills, knowledge and abilities. How the graduates apply that learning is up to them.

Focused seminars in areas such as communication, business basics and public policy would go a long way towards strengthening the capabilities of PhD students and improving their career prospects.

Wednesday, April 20, 2011

heatmap using a custom distance for hclust clustering

# Correlation distance function.
cordist <- function(x ) {
return( as.dist(( 1 - cor(t(x), use = "pairwise.complete.obs"))))
}

# Getting nice aligned clustering trees. Assume given my.dists.
clus<-hclust(my.dists,method = "a")

# Plot heatmap
#heatmap(x, Rowv=as.dendrogram(hclust(cordist(x))))
heatmap(x, Rowv=as.dendrogram(clus))

Einstein - Imagination is more important than knowledge.

As Einstein so eloquently put it, "Imagination is more important than knowledge." And that creativity, that imagination, that ability to find that new approach to the problem everyone else is working on is what's going to make you a great scientist.

Imagination is more important than knowledge.

One of the main thing to learn in graduate studies is how to think critically, synthesize, create a new process, not simply follow (tech work) which is tech dependent that may become obsolete in the future.

http://scienceblogs.com/startswithabang/2010/08/advice_for_young_aspiring_scie.php

What happens when the spotlight shines on the young scientists?

http://www.nature.com/nature/journal/v467/n7317_supp/full/467S22a.html

Box 1: Advice to laureates

* Reply to emails from students within twelve hours
* Don't dictate a student's life
* Give creative freedom
* Foster relationship among students in the lab, not just with them
* Let students develop their 'voice' when writing papers
* Communicate your science to the public by using the media


Box 2: Advice to students

* Choose a supervisor who does not travel too much
* Don't try to please your supervisor all the time, be prepared to challenge them
* Put questions to your supervisor, but think of some possible suggestions beforehand
* Assume your supervisor is wrong and develop your own way to approach the problem
* Idealism regarding science in politics is good, but be aware that it will be a steep challenge
* Don't give up too easily

Tuesday, April 19, 2011

Proteomics and forestry - Jörg Bohlmann

Jörg Bohlmann
Secondary metabolism; Plant defense against insects and insect-associated pathogens; Forestry and grapevine genomics
http://www.msl.ubc.ca/faculty/bohlmann

academia.edu - Facebook for academics

http://www.academia.edu/

Peter Sorger

http://sorger.med.harvard.edu/

Systems biology modeling

Systems Biology Conferences

http://www.jenage.de/information-centre/systems-biology/meetings-calendar.html

DebugMode - Wink = flash tutorial and presentation creation software

http://www.debugmode.com/wink/

Wink is a Tutorial and Presentation creation software, primarily aimed at creating tutorials on how to use software (like a tutor for MS-Word/Excel etc). Using Wink you can capture screenshots, add explanations boxes, buttons, titles etc and generate a highly effective tutorial for your users.

Dalian, China - DNA Day

http://www.dnaday.com/

Dalian-China’s Best Tourist City. With beautiful scenery, a nice climate and fast development, this romantic city ranks among China’s best tourist destinations in many people’s minds. It is also a trading and financial center in northeastern Asia and has gained the name ‘Hong Kong of Northern China.’

Glial

http://med.stanford.edu/ism/2009/september/glia-0921.html



Alzheimer’s disease is characterized by massive synapse loss.

As one ascends the scale of evolutionary complexity, an increasing proportion of the brain’s cells are glial. In the simple nematode worm, they’re sparse; in a fruit fly, they’re up to 25 percent; in a mouse, about 65 percent. In a human brain, behind every great neuron stand nine great glial cells.

http://med.stanford.edu/ism/images/featureStories/glia-illo-092109.jpg

There are three main types of glial cells. Oligodendrocytes (1) send projections that wrap axons (2) – long, signal-carrying portions of neurons (3) – in sheathes of a fatty substance called myelin (4), speeding signal conduction. Microglia (5) are, essentially, the brain’s immune cells, but they also monitor neighboring brain cells for damage and gobble up debris, and they probably have other functions, too. Astrocytes (6) carry on a host of activities. Their long extensions can monitor levels of neuronal activity either along axons at synapses (7) – junctions that relay signals from one neuron to the next – and, when those activity levels are high, signal to local blood vessels (8) to dilate, increasing blood supply to hard-working neurons. Astrocytes also produce and secrete substances that have a major influence on the formation and elimination of synapses.

ALS vs MS

http://www.cnsonline.org/www/archive/ms/ms-04.html

"sclerosis," which literally means hardening (as a result of increased connective tissue or glia).

Multiple sclerosis is a disease of myelin, not primarily of nerve cells. This myelin surrounds the axons, or the long process of the nerve cell.

The principle characteristic in the pathology of amyotrophic lateral sclerosis(ALS) is loss of motor nerve cells in the anterior horns of the spinal cord and in the motor nuclei of the brain stem.

  Thus, it is not primary demyelination, as it is in multiple sclerosis, that is the primary destructive effect in ALS.

Monday, April 18, 2011

Mantel test

The Mantel test, named after Nathan Mantel, is a statistical test of the correlation between two matrices.

The test is commonly used in ecology, where the data are usually estimates of the "distance" between objects such as species of organisms. For example, one matrix might contain estimates of the genetic distances (i.e., the amount of difference between two different genomes) between all possible pairs of species in the study, obtained by the methods of molecular systematics; while the other might contain estimates of the geographical distance between the ranges of each species and every other species.

http://en.wikipedia.org/wiki/Mantel_test

Condtional Random Field - CRF (Machine Learning)

http://en.wikipedia.org/wiki/Conditional_random_field

A conditional random field (CRF) is a type of discriminative undirected probabilistic graphical model. It is most often used for labeling or parsing of sequential data, such as natural language text or biological sequences[1] and computer vision[2] . Specifically, CRFs find applications in shallow parsing[3] , named entity recognition[4] and gene finding, among other tasks, being an alternative to the related hidden Markov models.

http://www.inference.phy.cam.ac.uk/hmw26/crf/

The primary advantage of CRFs over hidden Markov
models is their conditional nature, resulting in the relaxation of the indepen-
dence assumptions required by HMMs in order to ensure tractable inference.

CRFs outperform both MEMMs
and HMMs on a number of real-world sequence labeling tasks


Nando de Freitas
http://www.cs.ubc.ca/~nando/
http://www-stat.stanford.edu/~tibs/ElemStatLearn/download.html


David JC MacKay
http://www.inference.phy.cam.ac.uk/mackay/itila/ 

QR - quick response code

A QR code (short for Quick Response) is a specific matrix barcode (or two-dimensional code), readable by dedicated QR barcode readers and camera phones. The code consists of black modules arranged in a square pattern on a white background. The information encoded can be text, URL or other data.

http://en.wikipedia.org/wiki/QR_code

http://www.qrinkle.com/

http://goo.gl/

http://r20.rs6.net/tn.jsp?llr=lgkxgtbab&et=1106584988522&s=6570&e=001x4WpKVoIrGnfLd9yneahKEE3mjYjdFZIwTcIHFgRPYs_6bCdRAFiR8PUCmm9QmxGRLWBe3jjEC_VfcvnieKl3NxtV9EMNTwG_870hXx2cybexx0G0I6AVH_VGw4f4ZGz5GzXUlQg_dohlELdczK8KAJ6Hk6-jYppREdiikufjSq6TuCe_Cf48Fy7KalBZuRSozJRgFCPJ5lc-sKO-Cs60_8ARJUtaOnMd5opAhZmewAAvhIHYkZ78SvEZNIRRlfR_PHt-7nQpLY=

Sunday, April 17, 2011

Friedrich Nietzsche - Stronger

"What doesn't kill us makes us stronger." -- Friedrich Nietzsche

Friday, April 15, 2011

VanBUG Career Panel Talk

  • if you like what you're doing now, great, otherwise move on
  • interviews are made to measure your stress level
  • PhD can sometimes close more doors than it opens (might need to relocate ...)
  • Some companies allow you to publish while working!
  • Network network network and did I say network?
  • Conference blogging as a career, blog in general, (caveat: might have setbacks, eg. bad comments, so try to be objective, exclude personal life)
  • get some web presence, work on some open-source projects (it shows that you can work with others) 
  • There's no 'Bioinformatics' job title in the Gov't search, pick 'Scientist' or computer scientist or something close ...
  • Follow your passion, side-projects might blossom
  • Know where you want to go and work towards it, eg. PhD requires you to drive your own research
  • ~98% of startups fail, need to find the right people to work with, those people with complementary skills (entrepreneurial)
  • Great researchers have ideas, are able to see through it to the end and excellent collaborators 
http://www.vanbug.org/2011/careers-in-bioinformatics/

    Thursday, April 14, 2011

    S. Johnson - Bitter Love

    Love is the wisdom of the fool and the folly of the wise.Love is the wisdom of the fool and the folly of the wise.

    Metric (TFM) file not found

    I tried to use \usepackage{times} to set my document in the times font, I get an error message: "! Font OT1/ptm/m/n/10=ptmr7t at 10.0pt not loadable: Metric (TFM) file not found" Why doesn't it work?

    install packages

    texlive-fonts-recommended
    latex-xtf-fonts
    ttf-symbol-replacement

    Monday, April 11, 2011

    LaTex special characters

    http://web.science.mq.edu.au/~rdale/resources/writingnotes/latexstyle.html#dashes

    200 - 500 bp  is written as "200 -- 500 bp"

    tilde ~0.5  use $\sim$0.5

    http://theoval.cmp.uea.ac.uk/~nlct/latex/novices/symbols.html#sec:chars
    http://tex.stackexchange.com/questions/9363/how-does-one-insert-a-backslash-or-a-tilde-into-latex

    http://latex.knobs-dials.com/

    Can't get root shell when fsck check fails in ubuntu

    Try editing your regular boot entry in grub (e shortcut and appending init=/bin/sh to end of the line beginning with linux .... After you will boot it (Ctrl-x), you will get a root shell immediately.

    http://superuser.com/questions/215590/how-to-boot-grub2-into-the-simplest-linux-shell

    $ sudo gedit /boot/grub/grub.cfg

    menuentry 'Ubuntu recovery with Ctrl+X then fsck /dev/sda4 (home)' --class ubuntu --class gnu-linux --class gnu --class os {
    recordfail
    insmod ext2
    set root='(hd0,3)'
    search --no-floppy --fs-uuid --set dc61149c-085c-4b71-a92f-127a6b831719
    echo 'Loading Linux 2.6.32-25-generic ...'
    linux /boot/vmlinuz-2.6.32-25-generic root=UUID=dc61149c-085c-4b71-a92f-127a6b831719 ro single init=/bin/sh
    echo 'Loading initial ramdisk ...'
    initrd /boot/initrd.img-2.6.32-25-generic
    }

    https://help.ubuntu.com/community/Grub2

    When all else fails, try booting to the Windows partition, with a boot cd / usb

    underconnectivity theory of autism

    Inter-regional brain communication and its disturbance in autism



    http://www.frontiersin.org/systems_neuroscience/10.3389/fnsys.2011.00010/full

    underconnectivity theory of autism postulates that individuals with autism have a reduced communication bandwidth between frontal and posterior cortical areas, which constrains the psychological processes that rely on the integrated functioning of frontal and posterior brain networks.

    Thus, brain volume measurements have revealed that the rate of brain growth in autism slows after age 4, leading to a decreased volume of white matter in adolescents with autism relative to neurotypical adolescents. Given that white matter is the medium which is used for inter-regional brain communication, it seems incontrovertible that brain connectivity is disrupted in autism.

    Altered functional connectivity has also been found in other disorders, including schizophrenia (Meyer-Lindenberg et al., 2001), attention deficit hyperactivity disorder (Tian et al., 2006), multiple sclerosis (Au Duong et al., 2005), and dyslexia (Pugh et al., 2000). These findings suggest that disordered brain connectivity may underlie a variety of cognitive impairments.

    While autism is primarily associated with frontal–posterior underconnectivity, preliminary evidence suggests that these other disorders are linked with impairments in other types of connections (Pugh et al., 2000; Meyer-Lindenberg et al., 2001; Au Duong et al., 2005; Tian et al., 2006).

    Recent findings of atypical patterns in both functional and anatomical connectivity in autism have established that autism is a not a localized neurological disorder, but one that affects many parts of the brain in many types of thinking tasks. fMRI studies repeatedly find evidence of decreased coordination between frontal and posterior brain regions in autism, as measured by functional connectivity.

    Sunday, April 10, 2011

    Epigenetics: Impact of DNA methylation

    http://ows.molgen.mpg.de/2009/lectures/steinhoff.pdf


    Bioinformation. 2010 Jan 23;4(7):331-7.
    Computational Epigenetics: the new scientific paradigm.
    Lim SJ, Tan TW, Tong JC.
    http://www.ncbi.nlm.nih.gov/pubmed/20978607


    Computational epigenetics
    http://bioinformatics.oxfordjournals.org/content/24/1/1.abstract

    http://www.youtube.com/watch?v=eYrQ0EhVCYA


    Nat Biotechnol. 2010 Oct;28(10):1045-8.
    The NIH Roadmap Epigenomics Mapping Consortium.
    Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA.
    Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. Bernstein.Bradley@mgh.harvard.edu

    http://www.ncbi.nlm.nih.gov/pubmed/20944595
    http://commonfund.nih.gov/epigenomics/

    Goo gl - URL shortener

    http://goo.gl/

    Fight Club

    On a long enough timeline, the survival rate for everyone drops to zero.

    Saturday, April 9, 2011

    worry is wasteful - JEWEL 'Hands'

    And not to worry 'cause worry is wasteful
    And useless in times like these

    In the end only kindness matters

    http://www.azlyrics.com/lyrics/jewel/hands.html

    DREAM - Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges

    http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0009202

    Competition inspired by Critical Assessment of techniques for protein Structure Prediction (CASP)

    Friday, April 8, 2011

    Why French Fries Are Such Good Comfort Food

    http://healthland.time.com/2011/04/07/why-french-fries-are-good-comfort-food/?hpt=C2

    Oxytocin is crucial to the processes that allow love and social contact to reduce stress.

    eps global medical development

    http://www.epsglobal.ca/plus/list.php?tid=66

    Mark Victor Hansen - Ideas

    Ideas attract money, time, talents, skills, energy and other complementary ideas that will bring them into reality.

    Thursday, April 7, 2011

    Stouffer's method - weighted z-scores

    http://en.wikipedia.org/wiki/Fisher%27s_method#Relation_to_Stouffer.27s_Z-score_method

    A closely related approach to Fisher's method is based on Z-scores rather than p-values.

    One advantage of the Z-score approach is that it is straightforward to introduce weights.

    Stouffer, S.A.; E.A. Suchman, L.C. DeVinney, S.A. Star, R.M. Jr. Williams (1949). The American Soldier, Vol.1: Adjustment during Army Life. Princeton University Press, Princeton,.

    Crafting Your Funding Application

    www.grad.ubc.ca/awards
    www.scholarshipscanada.com
    www.studentawards.com

    like writing a cover letter

    summary

    - convince them that your research is important
    - summary page is the most critical section of your application
    - good structure with headings (background, hypothesis, objectives, significance, method, problem

    - catchy first paragraph
    - relevant
    - don't use short form in title
    - don't make reviewer feel dumb
    - write for the general audience
    - define acronyms
    - cite reference
    - be specific, put numbers, examples, how you would approach the problem
    - get help from supervisor about writing it
    - be meticulous, ask a friend to proof-read, read it backwards

    CIHR
    www.cihr-crsh.gc.ca/33043.html#annex1


    Grantsmanship:
    www.hfsp.org/how/ArtOfGrants.htm

    TIMELINE
    aug. 1 - colleage review
    aug. 12 - supervisor first review
    aug. 22 - editor review
    sept. 1 - supervisor review
    sept. 19 - signatures
    sept. 23 - dept. deadline
    oct. 7 - deadline

    supporting evidence for reference letters
    - critical thinking, independence, perseverance, originality, judgement, research ability

    Tuesday, April 5, 2011

    R ROC, AUC

    source("http://bioconductor.org/getBioC.R")
    getBioC("ROCR")
    getBioC("GEOquery")
    install.packages("e1071")

    http://rocr.bioinf.mpi-sb.mpg.de/ 
    data(ROCR.simple) pred <- ROCR::prediction(ROCR.simple$predictions, ROCR.simple$labels)
    perf <- performance(pred,"tpr","fpr")
    plot(perf,colorize=TRUE)
    plot(performance(prediction(c(0,1,-1), labels=c("a","b","a")), "tpr", "fpr"))
    performance(prediction(c(0,1,1), labels=c("a","b","a")), "auc")@y.values

    Short-read aligners / mappers

    maps to a reference genome, or another somatic genome (tumor vs normal sample) - useful for copy number variations (CNV)

    - MAQ
    - BowTie, BWA, Soap2 (uses Burrows–Wheeler Transform [38], a
    technique previously used for compression) The
    BWT string is built by sorting all of the circular
    shifts of a string, and concatenating the last characters
    of each circular shift 'last-first property'
    - mrFAST

    pair reads are useful for large structural variations (insertions, deletions, translocations), problem with inserts that are larger than the expected distance between pairs (try hanging-reads - only of the reads are mapped)



    ABI Solid's colorspace is useful for SNP analysis
    This approach also
    demonstrates a key advantage of the color-space
    encoding. When one compares a regular, letter-
    space read to a known DNA sequence, it is difficult
    to determine if a discrepancy is due to a true differ-
    ence between the two genomes, or to a sequencing
    error. In color-space, we can usually separate the two
    explanations: if the difference is due to an SNP
    between the genomes, this will lead to two adjacent
    color-space changes, as both of the colors that inter-
    rogated the nucleotide will change.
    On the other hand, a sequencing error will only
    affect one color, and therefore can be differentiated
    from a SNP.

    color-space Smith-Waterman - aligns the reference of all 4 possible translations (like transcript to protein)

    problem with repeats, non-uniform coverage


      Flow cytometry

      http://www.bioconductor.org/packages/2.8/bioc/html/iFlow.html
      http://www.ficcs.org/software/

      RForge - R and Java

      http://www.rforge.net/rJava/

      High throughput sequencing

      A.V. Dalca, M. Brudno, Briefings in Bioinformatics (2010) 11(1):3-14.

      While
      GenomeMapper [57] is the first tool to allow for
      the simultaneous mapping of HTS reads to multiple
      genomes, identifying variants—both SNPs and
      SVs—based on many low-coverage individuals is
      another important research area, and one which
      may prove key to enabling the $1000 genome and
      the full promise of personal genomics.


      An alternate method for copy-number variation
      (CNV) discovery relies on the ‘depth-of-coverage’
      (DOC) signal. If a certain genomic region is present
      multiple times in the donor genome, more reads will
      likely be generated from it, and consequently the
      corresponding region in the reference will have
      higher coverage (Figure 4D).  -- this assumes uniform coverage though!


      While SNPs and small indels can be located by
      analyzing the mappings of unpaired reads, the iden-
      tification of structural variants (SVs), where the
      genome is drastically altered, is more difficult with
      short reads. For example, a large deletion in the
      donor’s genome (i.e. a segment of the reference
      not present in the donor) may create split-reads
      that cover the location of the deletion (the break-
      point), and map to the reference with their two
      halves on opposite sides of the deleted segment.


      Accordingly, the discovery
      of SVs in a genome is typically based on pair-end
      sequencing approaches [19]. The two reads are
      mapped to the reference genome, with the distance
      between them referred to as ‘mapped distance’. This
      mapped distance and the relative orientations of the
      mapping are then compared to the expected insert
      size: if the distance is similar and the orientations are
      unchanged, the matepair is termed ‘concordant’, and
      is thought to be unlikely to overlap an SV. If, on the
      other hand, one of these is different or changed (the
      mate pair is called ‘discordant’), it likely overlaps a
      variant, such as an insertion (the mapped distance
      will be smaller than expected insert size), deletion
      (it will be larger) or inversion (the orientation of
      one of the two mappings will be opposite from the
      expected).


      Methods for SV detection with mate pairs can
      identify many, but not all SVs. For example,
      insertions (in the donor) larger than the insert size
      cannot be discovered by these methods, as no mate-
      pair will completely span the insertion event.

      Voltaire - "Judge a man by his questions rather than his answers."

      "Judge a man by his questions rather than his answers."


      Voltaire

      Monday, April 4, 2011

      What makes clinical research ethical?

      http://www.ncbi.nlm.nih.gov/pubmed/10819955


      JAMA. 2000 May 24-31;283(20):2701-11.

      What makes clinical research ethical?

      Warren G. Magnuson Clinical Center, Bldg 10, Room 1C118, National Institutes of Health, Bethesda, MD 20892-1156, USA.

      Mechanisms of B-cell lymphoma pathogenesis

      1. Ralf Kuppers, “Mechanisms of B-cell lymphoma pathogenesis,” Nat Rev Cancer 5, no. 4 (April 2005): 251-262.

      http://www.nature.com/nrc/journal/v5/n4/abs/nrc1589.html

      DAVID - Gene name batch viewer

      http://david.abcc.ncifcrf.gov/home.jsp

      The Database for Annotation, Visualization and Integrated Discovery (DAVID ) v6.7 is an update to the sixth version of our original web-accessible programs. DAVID now provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes. For any given gene list, DAVID tools are able to:

      Identify enriched biological themes, particularly GO terms
      Discover enriched functional-related gene groups
      Cluster redundant annotation terms
      Visualize genes on BioCarta & KEGG pathway maps
      Display related many-genes-to-many-terms on 2-D view.
      Search for other functionally related genes not in the list
      List interacting proteins
      Explore gene names in batch
      Link gene-disease associations
      Highlight protein functional domains and motifs
      Redirect to related literatures
      Convert gene identifiers from one type to another.
      And more

      Saturday, April 2, 2011

      Identification of cis-regulatory variants that may be causal for disorders

      http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618400/



      In Silico Detection of Sequence Variations
      Modifying Transcriptional Regulation
      https://bora.uib.no/bitstream/1956/2703/1/In_Silico_PLOMed.pdf
      * RAVEN (regulatory analysis of variation in enhancers).
      http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.0040005
      http://www.cisreg.ca
      * In this paper we present a Web-based tool for the identification of genetic variation in
      potential transcription factor binding sites.


      https://depace.med.harvard.edu/work.html


      ORegAnno
      JASPAR
      dbSNP
      Phylofoot - Tools for phylogenetic footprinting 
      sTRAP - http://strap.molgen.mpg.de/
      TRANSFAC - transcription factor motifs



      A traditional bioinformatics approach to
      predict TFBSs is through the application of binding site
      profile models known as position-specific weight matrices
      (PWMs) [24]. Such matrix models assign a score to each
      candidate binding sequence.


      Thomas Manke, Matthias Heinig, and Martin Vingron, “Quantifying the effect of sequence variation on regulatory interactions,” Human Mutation 31, no. 4 (April 2010): 477-483.
      sTRAP - http://strap.molgen.mpg.de/



      is-rSNP: a novel technique for in silico regulatory SNP detection Macintyre G, Bailey J, Haviv I, Kowalczyk ABioinformatics 2010 26(18): i524. doi:10.1093/bioinformatics/btq378


      Predicting functional regulatory polymorphisms [Get the full text PDF from Pubget] Torkamani A, Schork NJBioinformatics 2008 24(16): 1787. doi:10.1093/bioinformatics/btn311


      Impact of DNA-binding position variants on yeast gene expression [Get the full text PDF from Pubget] Swamy KBS, Cho C, Chiang S, Tsai ZT, Tsai HNucleic Acids Research 2009 37(21): 6991. doi:10.1093/nar/gkp743


      Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP Ameur A, Rada-Iglesias A, Komorowski J, Wadelius CNucleic Acids Research 2009 37(12): e85. doi:10.1093/nar/gkp381

      Bioinformatics databases

      http://wiki.reactome.org/index.php/Reactome_Resource_Guide

      Google Summer of Code 2011 Projects

      http://www.google-melange.com/gsoc/org/google/gsoc2011/orange
      http://www.google-melange.com/gsoc/org/google/gsoc2011/genomeinformatics
      http://www.google-melange.com/gsoc/org/google/gsoc2011/obf
      http://www.open-bio.org/wiki/Google_Summer_of_Code
      http://gmod.org/wiki/GSoC

      Friday, April 1, 2011

      CS Guide for Graduate Students

      http://www.cs.ubc.ca/~murphyk/Teaching/guideForStudents.html

      Bioinformatics Training Materials

      http://www.biotnet.org/training-materials

      RECOMB 2011

      Design of Protein-Protein Interactions with a Novel Ensemble-Based
      Scoring Algorithm.
      Kyle E. Roberts, Patrick R. Cushing, Prisca Boisguerin, Dean R. Madden
      and Bruce R. Donald.
      * K*, Protein design, Bruce Donald, Flexible rotamer backbone
      * http://ftp.cs.duke.edu/~kroberts/latexProjects/Recomb2011/Final/recomb_calwriteup1.pdf
      http://www.cs.duke.edu/donaldlab/osprey.php
      * NSRP - They are synthesized in many bacteria and fungi by large multifunctional proteins called nonribosomal peptide synthetases (NRPS). A unique feature of NRPS system is the ability to synthesize peptides containing proteinogenic as well as non-proteinogenic amino acids. http://linux1.nii.res.in/~zeeshan/nrps.html
      * DEE is a provable algorithm, does not produce gaps
      * Game theory (minimax) - positive and negative designs

      Experiment Specific Expression Patterns.
      Tobias Petri, Robert Küffner and Ralf Zimmer.
      * Look for genes that deviates from the model 'unexpected genes'
      * http://compbio.cs.sfu.ca/recomb2011/recomb2011_submission_249.pdf
      http://www.springerlink.com/content/h725542467jv537j/