Sunday, July 31, 2011

Data mining for Genomics and Proteomics

Analysis of Gene and Protein Expression Data

http://books.google.com/books/about/Data_Mining_for_Genomics_and_Proteomics.html?id=8ngex9vgFpEC

Darius M. Dziuda

QH 441.2 D98 2010

Dimension Reduction with Gene Expression Data Using Targeted Variable Importance Measurement

Dimension Reduction with Gene Expression Data Using Targeted Variable Importance Measurement
Wang H, van der Laan MJ
BMC Bioinformatics 2011, 12:312 (29 July 2011)

Wednesday, July 27, 2011

Talking to Machines

http://www.radiolab.org/2011/may/31/

We begin with a love story--from a man who unwittingly fell in love with a chatbot on an online dating site. Then, we encounter a robot therapist whose inventor became so unnerved by its success that he pulled the plug. And we talk to the man who coded Cleverbot, a software program that learns from every new line of conversation it receives...and that's chatting with more than 3 million humans each month. Then, five intrepid kids help us test a hypothesis about a toy designed to push our buttons, and play on our human empathy. And we meet a robot built to be so sentient that its creators hope it will one day have a consciousness, and a life, all its own.

Skip first line using awk's FNR>1

Skip first line using FNR>1

$ head chr4-dp5.sorted.coverage | awk '{ if(FNR>1) {split($1, a, ":"); print a[1]"\t"a[2]"\t"$2"\t"$3"\t"$4} }'

Monday, July 25, 2011

Zipf's law unzipped

http://iopscience.iop.org/1367-2630/13/4/043004/

Information theory is used to find the most likely distribution of group sizes given the number of objects, groups and the number of objects in the largest group. The result is the dashed curve in the figure. The same striking agreement is found for all data sets investigated.

Saturday, July 23, 2011

Online language course

Vancouver Learning Network
http://vlns.ca/courses.php

Friday, July 22, 2011

On Frontline, a Personal Look at Parkinson's

As one of the researchers says in this story, there's an old saying which is that genetics loads the gun, but environment pulls the trigger. And that may be part of what plays out with Parkinson's

As Michael J. Fox likes to say, we each get our own customized version of the disease, but unfortunately none of them come with operating instructions.

http://www.pbs.org/newshour/bb/health/jan-june09/parkinsons_02-03.html

Centre for Applied Neurogenetics

http://www.can.ubc.ca/research-projects/project-highlights/

Thursday, July 21, 2011

Genetic Code of E. Coli Is Hijacked by Biologists

Genetic Code of E. Coli Is Hijacked by Biologists
By NICHOLAS WADE
Published: July 14, 2011

http://www.nytimes.com/2011/07/15/health/15genome.html?_r=1

Bioinformatics journals, impact factor

http://www.bioinformatics.org/wiki/Journals

Communication: The best words in the best order

These days, technical writers provide information about products and services not just as instruction manuals, but through websites, e-learning materials, online help modules and FAQ pages, wikis, podcasts and blogs. And they focus on a range of projects — from composing step-by-step protocols for setting up an electron microscope and using the imaging software, to writing scientific manuscripts and regulatory documents. These writers work not in isolation, but as part of teams of researchers, engineers, physicians or computer scientists.

* Laura Bonetta1
doi:10.1038/nj7355-255a

http://www.nature.com/naturejobs/2011/110714/full/nj7355-255a.html?WT.ec_id=NATUREjobs-20110721

Getting a pay rise in academia

http://blogs.nature.com/naturejobs/2011/07/20/getting-a-pay-rise-in-academia

"You will be more successful if you hand in more applications. That's perfectly all right." She also cautions against having a single narrow research focus. "We advise people to have at least two specialisations that they follow in order to increase their chances of getting funded."

propose that you are appointed at the top of that grade's scale.
double-check your contract
secure your own funding:benefit your career in general
justify why you should get more money. "Frame the request in terms of the value you bring to your employer,"
"The people that I've seen successfully get a promotion in academia have had a very good plan of what they want to do and have been able to market themselves to their PI. It takes a lot of planning and communication skills."
publication record is still one of the main ways your value is judged

Unraveling gene regulatory networks from time-resolved gene expression data -- a measures comparison study

http://www.biomedcentral.com/1471-2105/12/292/abstract

Abstract (provisional)

Background

Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications.
Results

Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study.
Conclusions

Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.

Martin Luther King Jr.

A man who won't die for something is not fit to live.

--Martin Luther King Jr.

Wednesday, July 20, 2011

Cancer

The International Cancer Genome Consortium (ICGC) Data Portal (http://dcc.icgc.org) provides access to genomic, transcriptomic, epigenomic, and clinical data generated by the major cancer sequencing projects including the ICGC, The Cancer Genome Atlas (TCGA), Tumor Sequencing Project (TSP), and Johns Hopkins University.

Vienna ISMB ECCB 2011 Accepted Posters

https://www.iscb.org/cms_addon/conferences/ismbeccb2011/posterlist.php?cat=A

A large scale analysis in the human proteome detects correlation among disease associated mutations and perturbation of protein stability

Rita Casadio University of Bologna Valentina Indio (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & “Giorgio Prodi” Center (CIRC)); Pier Luigi Martelli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network); Marco Vassura (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Piero Fariselli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Short Abstract: Technological advancements constantly increase the number of mutations that need annotation in translated regions of the human genome. Single residue mutations in proteins are known to affect protein stability and function. As a consequence they can be disease associated. Available computational methods starting from protein sequence/structure predict whether residue mutations are conducive to disease or alternatively to instability of the protein folded structure. However the relationship among stability changes in proteins and their involvement in human diseases still needs to be established. Here we try to rationalize in a nutshell the complexity of the question by generalizing over information already stored in public databases. For this we derive for each Single Aminoacid Polymorphysm (SAP) type the probability of being disease-related (Pd) and compute from thermodynamic data three indexes indicating the probability that it is conducive to decreasing (P-), increasing (P+) and perturbing the protein structure stability (Pp). Statistically validated analysis of the different P/Pd correlations indicates that Pd best correlates with Pp. Pp/Pd correlation values are as high as 0.49, and increase up 0.67 when data variability is taken into consideration. This is indicative of a medium/good correlation among Pd and Pp and corroborates the assumption that protein stability changes can be associated to disease at the proteome level.
All the probabilities are listed in a feature table useful to label SAPs as disease/protein perturbation frequently or less frequently associated in the current data bases.

The functional importance and detection of regulatory sequence variants

Virginie Bernard Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute Wyeth Wasserman (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); David Arenillas (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); Short Abstract: The convergence of high-throughput technologies for sequencing individual exomes and genomes and rapid advances in genome annotation are driving a neo-revolution in human genetics. This wave of family-based genetics analysis is revealing causal mutations responsible for striking phenotypes. By mapping the reads to the human genome reference and by searching for variations relative to the reference, a list of small nucleotide variations and structural variations is obtained. Analysis is required to reveal those variations most likely to contribute to a disease phenotype within a family. Existing software score the severity of changes that arise in protein encoding exons. However, most mutations within a family are situated in the 98% of the genome that controls the developmental and physiological profile of gene activity - the sequences that control when and where a gene will be active.

Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. With full genome sequencing becoming accessible to medical researchers, the need to identify potential causal mutations in regulatory DNA is becoming imperative. We are implementing a software system to enable genetics researchers to characterize regulatory DNA changes within individual genome sequences. We are combining reference databases of known regulatory elements, experimental archives of protein-DNA interactions and computational predictions within an integrated analysis package. With our software, researchers will have greater capacity to identify variations potentially causal for disease.

The poster introduces the challenges and approaches of regulatory sequence variation analysis.

A guide to web tools to prioritize candidate genes

Yves Moreau Katholieke Universiteit Leuven
Leon-Charles Tranchevent (Katholieke Universiteit Leuven) Francisco Bonachela Capdevila (Katholieke Universiteit Leuven, Department of Computer Science); Daniela Nitsch (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Bart De Moor (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Patrick De Causmaecker (Katholieke Universiteit Leuven, Department of Computer Science); Yves Moreau (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Short Abstract: Finding the most promising genes among large lists of candidate genes has been defined as the gene prioritization problem. It is a recurrent problem in genetics in which genetic conditions are reported to be associated with chromosomal regions. In the last decade, several different computational approaches have been developed to tackle this challenging task. In this study, we review 19 computational solutions for human gene prioritization that are freely accessible as web tools and illustrate their differences. We summarize the various biological problems to which they have been successfully applied. Ultimately, we describe several research directions that could increase the quality and applicability of the tools. In addition we developed a website (http://www.esat.kuleuven.be/gpp) containing detailed information about these and other tools, which is regularly updated. This review and the associated website constitute together a guide to help users select a gene prioritization strategy that suits best their needs.

Network-based gene prioritization from expression data by diffusing through protein interaction networks

Daniela Nitsch KU Leuven Léon-Charles Tranchevent (KU Leuven, ESAT-SCD); Joana Gonçalves (INESC-ID, Knowledge Discovery and Bioinformatics (KDBIO) group); Yves Moreau (KU Leuven, ESAT-SCD); Short Abstract: Discovering novel disease genes is challenging for diseases for which no prior knowledge is available. Performing genetic studies frequently result in large lists of candidate genes of which only few can be followed up for further investigation. In the past couple of years, several gene prioritization methods have been proposed. Most of them use a guilt- by - association concept, and are therefore not applicable when little is known about the phenotype or no disease genes are available.

We have proposed a method that overcomes this limitation by replacing prior knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. At the core of the method are a protein interaction network and disease-specific expression data. Our approach propagates the expression data over the network using an extended Random Walk approach based on kernel methods, as the inclusion of indirect associations compensating for network sparsity and small world effect issues. It relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed neighboring genes in a protein interaction network.
We have benchmarked our approach, and results showed that it clearly outperforms other gene prioritization approaches with an average ranking position of 8 out of 100 genes, and an AUC value of 92.3%.

Recently, we have developed the web server PINTA implementing our gene prioritization approach to make it available for clinicians and other researchers.

Association Rule Mining with Prior Knowledge for Alzheimer's Disease

Peter Li Mayo Clinic Gyorgy Simon (Mayo Clinic, Health Sciences Research); Short Abstract: As we migrate to modeling diseases as a multi-factorial problem, the ability to analyze any given genomic data set is limited by the combinatorial explosion of false discoveries. The statistical solution is to require increased significance (e.g. Bonferroni correction), but this increases false negatives. Another approach is to use prior knowledge, such as pathways and networks. Most methods fail to account for population heterogeneity. In this work, we present a novel approach integrating prior knowledge, population heterogeneity, with a two-stage association rule mining technique, whose behavior is different from traditional testing.

We evaluated this method using GWAS from the Joint Aging, Addiction and Metal Health (JAAMH) Alzheimer's Disease (AD) data set of 1237 cases and 1254 controls. A combined interaction network was built from Reactome, BrioGrid, IntAct, MINT, DIP and HPRD. In the first stage, we generate haplotype blocks and then apply predictive association rule mining for each block. In the second stage, we discover combinations of predictive haplotypes, whose corresponding genes are on average ?k hops away from the nearest known AlzGene gene on the network.

We found that at k=1, we discovered 50% less patterns than we would have without the use of prior knowledge, yet we recovered 93% of the significant patterns and 89% of the unique genes. The lower total number of patterns allows for less stringent Bonferroni correction, leading to 10% increase in the number of significant patterns. The predictive capability of the discovered genes is higher than that of individual SNPs or haplotypes.

PINTA: a web server for network-based gene prioritization from expression data

http://nar.oxfordjournals.org/content/39/suppl_2/W334.full

PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.

Our method relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed genes in a genome-wide protein–protein interaction network. This allows the detection of a strong signal for a candidate even if its own differential expression value is too small to be detected by a standard analysis, as long as its interacting partners are highly differentially expressed.

Sophocles

One word frees us of all the weight and pain of life: that word is love.

Sophocles c. 497/6 BC – winter 406/5 BC)[1] is one of three ancient Greektragedians whose plays have survived.
The most famous tragedies of Sophocles feature Oedipus and Antigone: they are generally known as the Theban plays
http://en.wikipedia.org/wiki/Sophocles

Oedipus the King
http://www.imdb.com/title/tt0087833/

Tuesday, July 19, 2011

NCBI Databases and Resources

http://www.ncbi.nlm.nih.gov/guide/genes-expression/

Love and Sports

"The first thing is to love your sport. Never do it to please someone else. It has to be yours."

Peggy Fleming

Monday, July 18, 2011

Friday, July 15, 2011

Google map to GPX

http://www.elsewhere.org/journal/gmaptogpx/

The Google Maps API is great, but it doesn’t have an easy way to export data in GPX format. This bookmarklet is my attempt at a hack to get information out of Google Maps and into GPX, suitable for loading on a GPS.
This bookmarklet can create a GPX file based on driving directions, an address search or a local search. The GPX file will contain a route, a single waypoint, or up to ten waypoints, respectively. The code for extracting waypoints from local search originally came from this page.
If you’re looking for a utility to display GPX files in Google Maps, I recommend GPS Visualizer.

http://home.comcast.net/~ghayman3/garmin.gps/page3.htm
GPS

Home » Device Communications » Garmin Communicator Plugin API » Device Support Matrix

http://developer.garmin.com/web-device/garmin-communicator-plugin/device-support-matrix/

GPS

Nature Milestones timeline

http://www.nature.com/milestones/geneexpression/milestones/index.html

Human Connectome

http://humanconnectome.org/about/publications.html

www.alleninstitute.org/
www.gensat.org/
http://adni.loni.ucla.edu

https://sites.google.com/a/brain-connectivity-toolbox.net/bct/datasets

UBC Map of Knowledge

http://www.knowledgenetwork.ubc.ca/CKNet/Interactive.html

When the full version is done, this is where you can

Find yourself on UBC’s map or be found by others
Search for people, topics or geographic area
Observe the evolution of the UBC net
Identify the sub-community within UBC

Is Your Brain Asleep on the Job?

http://www.psychologytoday.com/blog/prime-your-gray-cells/201107/is-your-brain-asleep-the-job

If you don't find your job challenging, you've likely zoned out due to boredom. Boredom arises when you have maxed out, learned as much as you can learn or mastered all the skills required to perform your job, particularly if your job doesn't require complex thinking and analysis.

When it comes to your brain, the 'use it or lose it' principle applies.

monitoring goals keeps your brain focused on the task, even while it sleeps; and if you link pleasure to achieving your goals, your brain will be even further motivated to perform. Your brain is wired to understand the good consequences that come from distinct action; setting your workday up so that it can perform to its peak capacity is a recipe for success.

Modular bioinformatics

http://en.wikipedia.org/wiki/OSGi

The Open Services Gateway initiative framework is a module system and service platform for the Java programming language that implements a complete and dynamic component model, something that as of 2011 does not exist in standalone Java/VM environments. Applications or components (coming in the form of bundles for deployment) can be remotely installed, started, stopped, updated and uninstalled without requiring a reboot; management of Java packages/classes is specified in great detail. Application life cycle management (start, stop, install, etc.) is done via APIs that allow for remote downloading of management policies. The service registry allows bundles to detect the addition of new services, or the removal of services, and adapt accordingly.

Cytoscape, Protege, PathVisio, ImageJ, Jalview or Chipster

http://www.kirkk.com/modularity/2011/01/chapter-14-introducing-osgi/

Positional integratomic approach in identification of genomic candidate regions for Parkinson disease

1. A Maver and B Peterlin, “Positional integratomic approach in identification of genomic candidate regions for Parkinson disease,” Bioinformatics (May 19, 2011), http://bioinformatics.oxfordjournals.org/content/early/2011/05/19/bioinformatics.btr313.abstract.
www.ncbi.nlm.nih.gov/pubmed/21596793

ABSTRACT
Motivation: Recent abundance of data from studies employing
high-throughput technologies to reveal alterations in human disease
on genomic, transcriptomic, proteomic, and other levels, offer the
possibility to integrate this information into a comprehensive picture
of molecular events occurring in human disease. Diversity of data
originating from these studies presents a methodological obstacle in
the integration process, also due to difficulties in choosing the opti-
mal unified denominator that would allow inclusion of variables from
various types of studies. We present a novel approach for integra-
tion of such multi-origin data based on positions of genetic altera-
tions occurring in human diseases. Parkinson disease (PD) was
chosen as a model for evaluation of our methodology.
Methods: Datasets from various types of studies in PD (linkage,
genome-wide association, transcriptomic and proteomic studies)
were obtained from online repositories or were extracted from avail-
able research papers. Subsequently, human genome assembly was
subdivided into 10kb regions, and significant signals from aforemen-
tioned studies were arranged into their corresponding regions ac-
cording to their genomic position. For each region rank product
values were calculated and significance values were estimated by
permuting the original dataset.
Results: Altogether, 179 regions (representing 33 contiguous ge-
nomic regions) had significant accumulation of signals when p-value
cut-off was set at 0.0001. Identified regions with significant accumu-
lation of signals contained 29 plausible candidate genes for PD. In
conclusion, we present a novel approach for identification of candi-
date regions and genes for various human disorders, based on the
positional integration of data across various types of omic studies.

INCF - International Neuroinformatics Coordinating Facility

http://incf.org/resources/research-tools

Vancouver events

http://vancouver.ca/parks/events/events.htm

http://www.findfamilyfun.com/eventthismonth.htm

SCHIZOPHRENIA; SCZD

http://omim.org/entry/181500

Schizophrenia is a psychosis, a disorder of thought and sense of self. Although it affects emotions, it is distinguished from mood disorders in which such disturbances are primary. Similarly, there may be mild impairment of cognitive function, and it is distinguished from the dementias in which disturbed cognitive function is considered primary. There is no characteristic pathology, such as neurofibrillary tangles in Alzheimer disease (104300). Schizophrenia is a common disorder with a lifetime prevalence of approximately 1%. It is highly heritable but the genetics are complex. This may not be a single entity.

http://www.sciencedirect.com/science/article/pii/S0140673609609958

Monogenic genetic disorders

Monogenic genetic disorders occur as a direct consequence of a single gene being defective. Such disorders are inherited (passed on from one generation to another) in a simple pattern according to Mendel's Laws. As such, these disorders are often referred to as Mendelian disorders. However, many disorders are not inherited in this pattern. These include disorders due to more than one gene (polygenic disorders), those caused by mutations in non-nuclear mitochondrial genes (such as Leber's atrophy) and nucleotide repeat disorders (such as myotonic dystrophy).

http://www.geneticalliance.org.uk/education2.htm

Thursday, July 14, 2011

Productivity and Impact of the Top 100 Cited Parkinson’s Disease Investigators since 1985

http://iospress.metapress.com/content/v52282222pw67251/fulltext.pdf

Writing scientific articles that others want to read by Linda Cooper

http://www.prionetcanada.ca/files/Linda_Cooper4202.pdf

PrioNet Canada

http://www.prionetcanada.ca/

PrioNet Canada is a Network of Centres of Excellence for research into prions and prion diseases. Prion diseases are transmissible and fatal neurodegenerative diseases of both humans and animals.

Science Careers: Bioinformatics Scientist

http://www.sciencebuddies.org/science-fair-projects/science-engineering-careers/Genom_bioinformaticsscientist_c001.shtml

Median salary $66,510

http://www.sciencebuddies.org/science-fair-projects/science-engineering-careers/interview_caroline-thorn.shtml
http://bioteach.ubc.ca/Bioinformatics/interviews/

Related Occupations

Biologist
Computer scientist
Computational biologist
Database administrator
Statistician
Mathematician

Wallpapers, desktop background

http://www.ewallpapers.eu/Nature/Landscape/Rain-is-coming.html
http://www.1stwebdesigner.com/freebies/66-high-resolution-nature-desktop-backgrounds/

Finding your TP (while minimizing your FP)

Finding your True Positive (while minimizing your FP): A combination of sensitivity (open mindedness) and specificity (not taking yourself for granted)

Wednesday, July 13, 2011

Widespread transcription at neuronal activity-regulated enhancers

http://www.ncbi.nlm.nih.gov/pubmed/20393465

Enhancers near the c-fos gene with increased CBP/RNAPII/NPAS4 binding and eRNA production upon membrane depolarization

The strong inducibility of CBP binding at thousands of neuronal enhancers and their presence near activity-regulated genes (e.g., c-fos, rgs, and nr4a2) (Fig. 1 and Supplementary table 2) suggests that these enhancers may contribute to the induction of activity-regulated gene expression.

We find in neurons that CREB, SRF, and NPAS4 bind to neuronal enhancers as well as promoters (Supplementary Table 3).

This tight co-localization of individual TFs with CBP at a subset of enhancers (Supplementary Table 4) suggests that TFs may work together to regulate enhancer function, possibly by recruiting CBP.

We provide genome-wide evidence that thousands of neuronal activity-regulated enhancers that are defined by activity-independent H3K4me1 marks and activity-dependent CBP binding also recruit RNAPII and produce eRNAs

http://genome.ucsc.edu/cgi-bin/hgEncodeVocab?term=CTCF,H3K4me1,H3K4me2,H3K4me3,H3K27ac,H3K9ac,H3K36me3,H4K20me1,H3K27me3,Input

H3K4me1	Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts.
H3K4me2	Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters.
H3K4me3	Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated.
H3K27ac	Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases.

2011 USC Workshop on Action, Language and Neuroinformatics

http://nsl.usc.edu/mediawiki/index.php/Program_%26_Abstracts
http://nsl.usc.edu/mediawiki/index.php/2011_Workshop

Tuesday, July 12, 2011

The Genetics of Parkinson's Disease

http://www.medscape.com/viewarticle/528722_3

Dominantly Inherited Parkinson Disease: Probable Gain-of-Function Mechanism

SNCA / Park 1 / Park 4
LRRK2 / Park 8

Recessively Inherited Parkinson Disease: Probable Loss-of-Function Mechanism

Parkin / Park 2 -> early-onset
PINK1
DJ1

http://adam.about.net/reports/Parkinson-s-disease.htm

Monogenic disease

http://www.who.int/genomics/public/geneticdiseases/en/index2.html

Monogenic diseases result from modifications in a single gene occurring in all cells of the body. Though relatively rare, they affect millions of people worldwide.

Thalassaemia
Sickle cell anemia
Haemophilia
Cystic Fibrosis
Tay sachs disease
Fragile X syndrome
Huntington's disease

Exome sequencing -- cheaper and faster, greater sequencing depth alternative to whole genome sequencing

http://en.wikipedia.org/wiki/Exome_sequencing

Hybrid capture

In-Solution Capture.

This technique involves hybridizing shotgun libraries of genomic DNA to target-specific sequences on a microarray.^[4] Roche NimbleGen was first to take this technology and adapt it for next-generation sequencing. They developed the Sequence Capture Human Exome 2.1M Array to capture ~180,000 coding exons.^[3] This method is both time-saving and cost-effective compared to PCR based methods. The Agilent Capture Array and the comparative genomic hybridization array also other methods that can be used for hybrid capture of target sequences. Limitations in this technique include the need for expensive hardware as well as a relatively large amount of DNA.^[4]

In-solution capture

To capture genomic regions of interest using in-solution capture, a pool of custom oligonucleotides (probes) is synthesized and hybridized in solution to a fragmented genomic DNA sample. The probes (labeled with beads) selectively hybridize to the genomic regions of interest after which the beads (now including the DNA fragments of interest) can be pulled down and washed to clear excess material. The beads are then removed and the genomic fragments can be sequenced allowing for selective DNA sequencing of genomic regions (e.g. exons) of interest.

Paired end (overlapping and < 500bp) vs Mate Pair (1.5K to 20K)

http://biostar.stackexchange.com/questions/789/about-paired-end-sequencing

http://seqanswers.com/forums/showthread.php?t=10

Monday, July 11, 2011

Learning

"I am always doing that which I cannot do, in order that I may learn how to do it."
Pablo Picasso

bam2fastq - extract raw sequence from BAM alignment files

http://www.hudsonalpha.org/gsl/software/bam2fastq.php

samtools idxstats
Retrieve and print stats in the index file. The output is TAB delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads.

http://samtools.sourceforge.net/samtools.shtml

Dx = Diagnosis

Dx = Diagnosis
Rx = Prescription (from latin verb "recipe" = to take) http://ask.yahoo.com/20051114.html

Neuroprotective effect of a new DJ-1-binding compound against neurodegeneration in Parkinson's disease and stroke model rats

http://7thspace.com/headlines/388742/neuroprotective_effect_of_a_new_dj_1_binding_compound_against_neurodegeneration_in_parkinsons_disease_and_stroke_model_rats.html

Foxp2 - Forkhead box protein P2

http://en.wikipedia.org/wiki/FOXP2

The FOXP2 protein contains a forkhead-box DNA-binding domain, making it a member of the FOX group of transcription factors, involved in regulation of gene expression. In addition to this characteristic forkhead-box domain, the protein contains a polyglutamine tract, a zinc finger and a leucine zipper.

In humans, mutations of FOXP2 cause a severe speech and language disorder. One particular target that is directly downregulated by FOXP2 in human neurons is the CNTNAP2 gene, a member of the neurexin family; variants in this target gene have been associated with common forms of language impairment.

FOXP2 directly regulates a large number of downstream target genes

http://www.medicalnewstoday.com/articles/230758.php

How Brain Death works

http://science.howstuffworks.com/environmental/life/human-biology/brain-death1.htm

Due to loss of sugar, oxygen
Can live up to 6 min. after the heart stops, so apply CPR right away!

TOP500 List - June 2011 (1-100)

http://www.top500.org/list/2011/06/100

Japan's K Computer

RIKEN Advanced Institute for Computational Science in Kobe

Apache™ Hadoop™ -- software for reliable, scalable, distributed computing

http://hadoop.apache.org/

What Is Apache Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

Sunday, July 10, 2011

Gene set analysis for longitudinal gene expression data

Gene set analysis for longitudinal gene expression data

Ke Zhang email, Haiyan Wang email, Arne C Bathke email, Solomon W Harrar email, Hans-Peter Piepho email and Youping Deng email

BMC Bioinformatics 2011, 12:273doi:10.1186/1471-2105-12-273
Published: 3 July 2011

http://www.biomedcentral.com/1471-2105/12/273/abstract

Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a Gene set analysis (GSA) needs to take into account the within-gene correlations in addition to possible between-gene correlations.

Baby, Old and Hair

"Babies haven't any hair; Old men's heads are just as bare; Between the cradle and the grave, Lie a haircut and a shave"

--Samuel Hoffenstein

Courage

"Ofttimes the test of courage becomes rather to live than to die."

--Vittorio Alfieri

SOAP - short oligonucleotide analysis package - BGI

http://www.eurekalert.org/pub_releases/2011-07/ai-baf070611.php

Beijing Genomics Institute (BGI) announces first release of updated bioinformatics software

Kind words

"Kind words can be short and easy to speak, but their echoes are truly endless."

--Mother Teresa

How do you eat an elephant? One piece at a time

When life gives you lemons, make lemonade

Saturday, July 9, 2011

Bioinformatics training and challenges

Bioinformatics training: a review of challenges, actions and support requirements

http://bib.oxfordjournals.org/content/11/6/544.abstract

Bioinformatics challenges for genome-wide association studies

http://bioinformatics.oxfordjournals.org/content/26/4/445.abstract

http://www.sp-consultant.com/SurveyResults

Grub timeout not working

edit /boot/grub/grub.cfg. Find this section and change "-1" to "5".

if [ "${recordfail}" = 1 ]; then
set timeout=5
else
set timeout=10
fi

http://ubuntuforums.org/showthread.php?t=1598854&page=4

sudo grub-editenv create

Friday, July 8, 2011

Interdisciplinary Research

https://commonfund.nih.gov/interdisciplinary/

The goal of the Common Fund’s Interdisciplinary Research (IR) program is to change academic research culture such that interdisciplinary approaches and team science spanning various biomedical and behavioral specialties are encouraged and rewarded. The program includes the following components:

Interdisciplinary Research Consortia
Interdisciplinary Training Programs
Innovation in Interdisciplinary Technology and Methods
Multiple Principal Investigator (Multi-PI) Policy

MOUSE GENETICS LEADS TO NEW CLUES FOR HUMAN PSYCHIATRIC DISORDERS

Several psychiatric disorders, including attention deficit hyperactivity disorder (ADHD), drug and alcohol addiction, and schizophrenia, are characterized by poor impulse control and difficulty inhibiting certain behaviors. These traits are referred to as behavioral inflexibility, which is thought to be partially under genetic control. However, the genes responsible have been difficult to identify in humans. In a paper available online March 10, 2011 in the journal Biological Psychiatry, researchers in the Common Fund’s Interdisciplinary Research program’s Consortium for Neuropsychiatric Phenomics report that they have identified several genes associated with behavioral inflexibility in mice, and that these findings might be applicable to humans as well. To identify genes that underlie behavioral inflexibility, Dr. David Jentsch and colleagues from the University of California Los Angeles and the University of Tennessee first tested 51 genetically different strains of mice for the ability to reverse their behavior in a learned task. To successfully complete this task, mice had to learn to poke their nose into an opening either on the left or right side of the cage in order to receive a food reward. Once the mice mastered this skill, they had to unlearn which side to poke their nose into, and re-learn to poke their nose on the opposite side. The number of tries it takes a mouse to reverse its behavior indicates how much behavioral flexibility and impulse control the mouse has. The researchers reasoned that by looking at both the genes and behaviors of the mice, they could find genetic differences that were associated with the behavioral differences. Indeed, the researchers zeroed in on a region of the mouse chromosome 10 that contains several genes that influence behavioral flexibility. One gene, Syn3, regulates chemical communication in the brain and has been inconclusively linked to schizophrenia in humans. Another gene, Nt5dc3, is a gene of unknown function that has been associated with ADHD. The current research suggests that both of these genes should be investigated further to discover what role they may play in human psychiatric disorders, and also demonstrates a new way to use mouse behavior and genetics to find genes that may contribute to complex behaviors in humans.

Ubuntu does not unmount cleanly during shutdown

$ sudo apt-get install --reinstall libc6
$ sudo apt-get install --reinstall libc6-dev
$ sudo apt-get install --reinstall libc6-i386
$ sudo apt-get install --reinstall sysvinit-utils
$ sudo apt-get install --reinstall upstart

https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/672177

http://askubuntu.com/questions/39384/use-upstart-to-umount-nfs-at-shutdown-restart

https://wiki.ubuntu.com/MaverickMeerkat/ReleaseNotes

Upstart 0.6.7
http://www.ubuntuupdates.org/packages/show/268020?page=2

Mentoring: On the right path -- Alison McCook

http://www.nature.com/naturejobs/2011/110630/full/nj7353-667a.html?WT.ec_id=NATUREjobs-20110707

Maybe academia isn't where you want to be

Box 1: Advice for mentors: How to help

Principal investigators hoping to help their postdocs to find suitable positions should consider these tips.

Set aside five minutes a few times a year to check in on your postdocs' career plans and progress.
Know your limits. If your postdocs are interested in careers that you know nothing about, such as patent law, introduce them to someone who can help: someone in the institution's faculty-affairs or postdoctoral office, for example.
Give them experience with peer review: tell journals that you want your postdocs to review papers with you.
Don't rely on job ads alone. If a postdoc is exceptional, try calling colleagues and collaborators to ask about positions soon to open up that haven't yet been advertised.
Sell their strengths when employers call. If your postdoc does well in presentations, don't just say that. Better to say he or she is a good communicator — a talent that may help to get grants.
Don't help too much; let postdocs manage some of the process alone. Lucy Shapiro, a developmental biologist at Stanford University in California, says that she won't help a postdoc to write the talk that maps out his or her research plan. “To me, that's a very important do-it-on-your-own measure, so people know what they're dealing with when they decide to offer someone a job,” she says. A.M.

Box 2: Advice for postdocs: Going it alone

If your principal investigator is unwilling or unable to help with your job search, try these steps to make progress on your own.

Craft a plan of your goals and timelines. Even if you don't show it to anyone, it is a good way to analyse your strengths and weaknesses, and to give yourself direction.
Plan ahead. Try to pick an adviser whose postdocs are typically successful, says Jodi Lubetsky, a manager of science policy at the Association of American Medical Colleges in Washington DC.
If you've already joined a lab and your adviser is “missing in action”, schedule a time to talk, she says. If that isn't working and you still want to get your principal investigator involved, find a neutral person to whom you feel safe talking.
Publish, but don't obsess. Some employers want several papers per year; some are fine with fewer in top-tier journals. Don't worry too much about quantity, says Ron Vale, a cell biologist at the University of California, San Francisco. One strong paper is often good enough, he says.
Get outside funding or fellowships to show employers that you can compete successfully for money. Awards can come from sources such as local governments, foundations or professional societies.
Multiply your mentors. Even if your principal investigator is helpful, it is a good idea to establish relationships with 2–4 experienced scientists, who will then be able to answer personalized questions during a phone call from employers, and contribute more than just generic recommendation letters. A.M.

Writing tips

http://www.nature.com/naturejobs/2011/110707/full/nj7354-129a.html?WT.ec_id=NATUREjobs-20110707

So stop waiting to feel ready. Get started with some short and regular writing snacks. What you write won't be perfect at first, but you will be on your way to becoming a prolific academic writer.

Kindness

The smallest act of kindness is worth more than the grandest intention.

Oscar Wilde

Get disk UUID

http://ubuntuforums.org/showthread.php?t=665567&page=2

$ ls /dev/disk/by-uuid/ -alh