Analysis of Gene and Protein Expression Data
http://books.google.com/books/about/Data_Mining_for_Genomics_and_Proteomics.html?id=8ngex9vgFpEC
Darius M. Dziuda
QH 441.2 D98 2010
Just a collection of some random cool stuff. PS. Almost 99% of the contents here are not mine and I don't take credit for them, I reference and copy part of the interesting sections.
Sunday, July 31, 2011
Dimension Reduction with Gene Expression Data Using Targeted Variable Importance Measurement
Dimension Reduction with Gene Expression Data Using Targeted Variable Importance Measurement
Wang H, van der Laan MJ
BMC Bioinformatics 2011, 12:312 (29 July 2011)
Wang H, van der Laan MJ
BMC Bioinformatics 2011, 12:312 (29 July 2011)
Wednesday, July 27, 2011
Talking to Machines
http://www.radiolab.org/2011/may/31/
We begin with a love story--from a man who unwittingly fell in love with a chatbot on an online dating site. Then, we encounter a robot therapist whose inventor became so unnerved by its success that he pulled the plug. And we talk to the man who coded Cleverbot, a software program that learns from every new line of conversation it receives...and that's chatting with more than 3 million humans each month. Then, five intrepid kids help us test a hypothesis about a toy designed to push our buttons, and play on our human empathy. And we meet a robot built to be so sentient that its creators hope it will one day have a consciousness, and a life, all its own.
We begin with a love story--from a man who unwittingly fell in love with a chatbot on an online dating site. Then, we encounter a robot therapist whose inventor became so unnerved by its success that he pulled the plug. And we talk to the man who coded Cleverbot, a software program that learns from every new line of conversation it receives...and that's chatting with more than 3 million humans each month. Then, five intrepid kids help us test a hypothesis about a toy designed to push our buttons, and play on our human empathy. And we meet a robot built to be so sentient that its creators hope it will one day have a consciousness, and a life, all its own.
Skip first line using awk's FNR>1
Skip first line using FNR>1
$ head chr4-dp5.sorted.coverage | awk '{ if(FNR>1) {split($1, a, ":"); print a[1]"\t"a[2]"\t"$2"\t"$3"\t"$4} }'
$ head chr4-dp5.sorted.coverage | awk '{ if(FNR>1) {split($1, a, ":"); print a[1]"\t"a[2]"\t"$2"\t"$3"\t"$4} }'
Monday, July 25, 2011
Zipf's law unzipped
http://iopscience.iop.org/1367-2630/13/4/043004/
Information theory is used to find the most likely distribution of group sizes given the number of objects, groups and the number of objects in the largest group. The result is the dashed curve in the figure. The same striking agreement is found for all data sets investigated.
Information theory is used to find the most likely distribution of group sizes given the number of objects, groups and the number of objects in the largest group. The result is the dashed curve in the figure. The same striking agreement is found for all data sets investigated.
Saturday, July 23, 2011
Friday, July 22, 2011
On Frontline, a Personal Look at Parkinson's
As one of the researchers says in this story, there's an old saying which is that genetics loads the gun, but environment pulls the trigger. And that may be part of what plays out with Parkinson's
As Michael J. Fox likes to say, we each get our own customized version of the disease, but unfortunately none of them come with operating instructions.
http://www.pbs.org/newshour/bb/health/jan-june09/parkinsons_02-03.html
As Michael J. Fox likes to say, we each get our own customized version of the disease, but unfortunately none of them come with operating instructions.
http://www.pbs.org/newshour/bb/health/jan-june09/parkinsons_02-03.html
Thursday, July 21, 2011
Genetic Code of E. Coli Is Hijacked by Biologists
Genetic Code of E. Coli Is Hijacked by Biologists
By NICHOLAS WADE
Published: July 14, 2011
http://www.nytimes.com/2011/07/15/health/15genome.html?_r=1
By NICHOLAS WADE
Published: July 14, 2011
http://www.nytimes.com/2011/07/15/health/15genome.html?_r=1
Communication: The best words in the best order
These days, technical writers provide information about products and services not just as instruction manuals, but through websites, e-learning materials, online help modules and FAQ pages, wikis, podcasts and blogs. And they focus on a range of projects — from composing step-by-step protocols for setting up an electron microscope and using the imaging software, to writing scientific manuscripts and regulatory documents. These writers work not in isolation, but as part of teams of researchers, engineers, physicians or computer scientists.
* Laura Bonetta1
doi:10.1038/nj7355-255a
http://www.nature.com/naturejobs/2011/110714/full/nj7355-255a.html?WT.ec_id=NATUREjobs-20110721
* Laura Bonetta1
doi:10.1038/nj7355-255a
http://www.nature.com/naturejobs/2011/110714/full/nj7355-255a.html?WT.ec_id=NATUREjobs-20110721
Getting a pay rise in academia
http://blogs.nature.com/naturejobs/2011/07/20/getting-a-pay-rise-in-academia
"You will be more successful if you hand in more applications. That's perfectly all right." She also cautions against having a single narrow research focus. "We advise people to have at least two specialisations that they follow in order to increase their chances of getting funded."
propose that you are appointed at the top of that grade's scale.
double-check your contract
secure your own funding:benefit your career in general
justify why you should get more money. "Frame the request in terms of the value you bring to your employer,"
"The people that I've seen successfully get a promotion in academia have had a very good plan of what they want to do and have been able to market themselves to their PI. It takes a lot of planning and communication skills."
publication record is still one of the main ways your value is judged
"You will be more successful if you hand in more applications. That's perfectly all right." She also cautions against having a single narrow research focus. "We advise people to have at least two specialisations that they follow in order to increase their chances of getting funded."
propose that you are appointed at the top of that grade's scale.
double-check your contract
secure your own funding:benefit your career in general
justify why you should get more money. "Frame the request in terms of the value you bring to your employer,"
"The people that I've seen successfully get a promotion in academia have had a very good plan of what they want to do and have been able to market themselves to their PI. It takes a lot of planning and communication skills."
publication record is still one of the main ways your value is judged
Unraveling gene regulatory networks from time-resolved gene expression data -- a measures comparison study
http://www.biomedcentral.com/1471-2105/12/292/abstract
Abstract (provisional)
Background
Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications.
Results
Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study.
Conclusions
Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.
Abstract (provisional)
Background
Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications.
Results
Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study.
Conclusions
Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.
Martin Luther King Jr.
A man who won't die for something is not fit to live.
--Martin Luther King Jr.
--Martin Luther King Jr.
Wednesday, July 20, 2011
Cancer
The International Cancer Genome Consortium (ICGC) Data Portal (http://dcc.icgc.org) provides access to genomic, transcriptomic, epigenomic, and clinical data generated by the major cancer sequencing projects including the ICGC, The Cancer Genome Atlas (TCGA), Tumor Sequencing Project (TSP), and Johns Hopkins University.
Vienna ISMB ECCB 2011 Accepted Posters
https://www.iscb.org/cms_addon/conferences/ismbeccb2011/posterlist.php?cat=A
A large scale analysis in the human proteome detects correlation among disease associated mutations and perturbation of protein stability
Rita Casadio University of Bologna Valentina Indio (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & “Giorgio Prodi” Center (CIRC)); Pier Luigi Martelli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network); Marco Vassura (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Piero Fariselli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Short Abstract: Technological advancements constantly increase the number of mutations that need annotation in translated regions of the human genome. Single residue mutations in proteins are known to affect protein stability and function. As a consequence they can be disease associated. Available computational methods starting from protein sequence/structure predict whether residue mutations are conducive to disease or alternatively to instability of the protein folded structure. However the relationship among stability changes in proteins and their involvement in human diseases still needs to be established. Here we try to rationalize in a nutshell the complexity of the question by generalizing over information already stored in public databases. For this we derive for each Single Aminoacid Polymorphysm (SAP) type the probability of being disease-related (Pd) and compute from thermodynamic data three indexes indicating the probability that it is conducive to decreasing (P-), increasing (P+) and perturbing the protein structure stability (Pp). Statistically validated analysis of the different P/Pd correlations indicates that Pd best correlates with Pp. Pp/Pd correlation values are as high as 0.49, and increase up 0.67 when data variability is taken into consideration. This is indicative of a medium/good correlation among Pd and Pp and corroborates the assumption that protein stability changes can be associated to disease at the proteome level.
All the probabilities are listed in a feature table useful to label SAPs as disease/protein perturbation frequently or less frequently associated in the current data bases.
The functional importance and detection of regulatory sequence variants
Virginie Bernard Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute Wyeth Wasserman (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); David Arenillas (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); Short Abstract: The convergence of high-throughput technologies for sequencing individual exomes and genomes and rapid advances in genome annotation are driving a neo-revolution in human genetics. This wave of family-based genetics analysis is revealing causal mutations responsible for striking phenotypes. By mapping the reads to the human genome reference and by searching for variations relative to the reference, a list of small nucleotide variations and structural variations is obtained. Analysis is required to reveal those variations most likely to contribute to a disease phenotype within a family. Existing software score the severity of changes that arise in protein encoding exons. However, most mutations within a family are situated in the 98% of the genome that controls the developmental and physiological profile of gene activity - the sequences that control when and where a gene will be active.
Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. With full genome sequencing becoming accessible to medical researchers, the need to identify potential causal mutations in regulatory DNA is becoming imperative. We are implementing a software system to enable genetics researchers to characterize regulatory DNA changes within individual genome sequences. We are combining reference databases of known regulatory elements, experimental archives of protein-DNA interactions and computational predictions within an integrated analysis package. With our software, researchers will have greater capacity to identify variations potentially causal for disease.
The poster introduces the challenges and approaches of regulatory sequence variation analysis.
A guide to web tools to prioritize candidate genes
Yves Moreau Katholieke Universiteit Leuven
Leon-Charles Tranchevent (Katholieke Universiteit Leuven) Francisco Bonachela Capdevila (Katholieke Universiteit Leuven, Department of Computer Science); Daniela Nitsch (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Bart De Moor (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Patrick De Causmaecker (Katholieke Universiteit Leuven, Department of Computer Science); Yves Moreau (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Short Abstract: Finding the most promising genes among large lists of candidate genes has been defined as the gene prioritization problem. It is a recurrent problem in genetics in which genetic conditions are reported to be associated with chromosomal regions. In the last decade, several different computational approaches have been developed to tackle this challenging task. In this study, we review 19 computational solutions for human gene prioritization that are freely accessible as web tools and illustrate their differences. We summarize the various biological problems to which they have been successfully applied. Ultimately, we describe several research directions that could increase the quality and applicability of the tools. In addition we developed a website (http://www.esat.kuleuven.be/gpp) containing detailed information about these and other tools, which is regularly updated. This review and the associated website constitute together a guide to help users select a gene prioritization strategy that suits best their needs.
Network-based gene prioritization from expression data by diffusing through protein interaction networks
Daniela Nitsch KU Leuven Léon-Charles Tranchevent (KU Leuven, ESAT-SCD); Joana Gonçalves (INESC-ID, Knowledge Discovery and Bioinformatics (KDBIO) group); Yves Moreau (KU Leuven, ESAT-SCD); Short Abstract: Discovering novel disease genes is challenging for diseases for which no prior knowledge is available. Performing genetic studies frequently result in large lists of candidate genes of which only few can be followed up for further investigation. In the past couple of years, several gene prioritization methods have been proposed. Most of them use a guilt- by - association concept, and are therefore not applicable when little is known about the phenotype or no disease genes are available.
We have proposed a method that overcomes this limitation by replacing prior knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. At the core of the method are a protein interaction network and disease-specific expression data. Our approach propagates the expression data over the network using an extended Random Walk approach based on kernel methods, as the inclusion of indirect associations compensating for network sparsity and small world effect issues. It relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed neighboring genes in a protein interaction network.
We have benchmarked our approach, and results showed that it clearly outperforms other gene prioritization approaches with an average ranking position of 8 out of 100 genes, and an AUC value of 92.3%.
Recently, we have developed the web server PINTA implementing our gene prioritization approach to make it available for clinicians and other researchers.
Association Rule Mining with Prior Knowledge for Alzheimer's Disease
Peter Li Mayo Clinic Gyorgy Simon (Mayo Clinic, Health Sciences Research); Short Abstract: As we migrate to modeling diseases as a multi-factorial problem, the ability to analyze any given genomic data set is limited by the combinatorial explosion of false discoveries. The statistical solution is to require increased significance (e.g. Bonferroni correction), but this increases false negatives. Another approach is to use prior knowledge, such as pathways and networks. Most methods fail to account for population heterogeneity. In this work, we present a novel approach integrating prior knowledge, population heterogeneity, with a two-stage association rule mining technique, whose behavior is different from traditional testing.
We evaluated this method using GWAS from the Joint Aging, Addiction and Metal Health (JAAMH) Alzheimer's Disease (AD) data set of 1237 cases and 1254 controls. A combined interaction network was built from Reactome, BrioGrid, IntAct, MINT, DIP and HPRD. In the first stage, we generate haplotype blocks and then apply predictive association rule mining for each block. In the second stage, we discover combinations of predictive haplotypes, whose corresponding genes are on average ?k hops away from the nearest known AlzGene gene on the network.
We found that at k=1, we discovered 50% less patterns than we would have without the use of prior knowledge, yet we recovered 93% of the significant patterns and 89% of the unique genes. The lower total number of patterns allows for less stringent Bonferroni correction, leading to 10% increase in the number of significant patterns. The predictive capability of the discovered genes is higher than that of individual SNPs or haplotypes.
A large scale analysis in the human proteome detects correlation among disease associated mutations and perturbation of protein stability
Rita Casadio University of Bologna Valentina Indio (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & “Giorgio Prodi” Center (CIRC)); Pier Luigi Martelli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network); Marco Vassura (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Piero Fariselli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Short Abstract: Technological advancements constantly increase the number of mutations that need annotation in translated regions of the human genome. Single residue mutations in proteins are known to affect protein stability and function. As a consequence they can be disease associated. Available computational methods starting from protein sequence/structure predict whether residue mutations are conducive to disease or alternatively to instability of the protein folded structure. However the relationship among stability changes in proteins and their involvement in human diseases still needs to be established. Here we try to rationalize in a nutshell the complexity of the question by generalizing over information already stored in public databases. For this we derive for each Single Aminoacid Polymorphysm (SAP) type the probability of being disease-related (Pd) and compute from thermodynamic data three indexes indicating the probability that it is conducive to decreasing (P-), increasing (P+) and perturbing the protein structure stability (Pp). Statistically validated analysis of the different P/Pd correlations indicates that Pd best correlates with Pp. Pp/Pd correlation values are as high as 0.49, and increase up 0.67 when data variability is taken into consideration. This is indicative of a medium/good correlation among Pd and Pp and corroborates the assumption that protein stability changes can be associated to disease at the proteome level.
All the probabilities are listed in a feature table useful to label SAPs as disease/protein perturbation frequently or less frequently associated in the current data bases.
The functional importance and detection of regulatory sequence variants
Virginie Bernard Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute Wyeth Wasserman (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); David Arenillas (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); Short Abstract: The convergence of high-throughput technologies for sequencing individual exomes and genomes and rapid advances in genome annotation are driving a neo-revolution in human genetics. This wave of family-based genetics analysis is revealing causal mutations responsible for striking phenotypes. By mapping the reads to the human genome reference and by searching for variations relative to the reference, a list of small nucleotide variations and structural variations is obtained. Analysis is required to reveal those variations most likely to contribute to a disease phenotype within a family. Existing software score the severity of changes that arise in protein encoding exons. However, most mutations within a family are situated in the 98% of the genome that controls the developmental and physiological profile of gene activity - the sequences that control when and where a gene will be active.
Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. With full genome sequencing becoming accessible to medical researchers, the need to identify potential causal mutations in regulatory DNA is becoming imperative. We are implementing a software system to enable genetics researchers to characterize regulatory DNA changes within individual genome sequences. We are combining reference databases of known regulatory elements, experimental archives of protein-DNA interactions and computational predictions within an integrated analysis package. With our software, researchers will have greater capacity to identify variations potentially causal for disease.
The poster introduces the challenges and approaches of regulatory sequence variation analysis.
A guide to web tools to prioritize candidate genes
Yves Moreau Katholieke Universiteit Leuven
Leon-Charles Tranchevent (Katholieke Universiteit Leuven) Francisco Bonachela Capdevila (Katholieke Universiteit Leuven, Department of Computer Science); Daniela Nitsch (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Bart De Moor (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Patrick De Causmaecker (Katholieke Universiteit Leuven, Department of Computer Science); Yves Moreau (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Short Abstract: Finding the most promising genes among large lists of candidate genes has been defined as the gene prioritization problem. It is a recurrent problem in genetics in which genetic conditions are reported to be associated with chromosomal regions. In the last decade, several different computational approaches have been developed to tackle this challenging task. In this study, we review 19 computational solutions for human gene prioritization that are freely accessible as web tools and illustrate their differences. We summarize the various biological problems to which they have been successfully applied. Ultimately, we describe several research directions that could increase the quality and applicability of the tools. In addition we developed a website (http://www.esat.kuleuven.be/gpp) containing detailed information about these and other tools, which is regularly updated. This review and the associated website constitute together a guide to help users select a gene prioritization strategy that suits best their needs.
Network-based gene prioritization from expression data by diffusing through protein interaction networks
Daniela Nitsch KU Leuven Léon-Charles Tranchevent (KU Leuven, ESAT-SCD); Joana Gonçalves (INESC-ID, Knowledge Discovery and Bioinformatics (KDBIO) group); Yves Moreau (KU Leuven, ESAT-SCD); Short Abstract: Discovering novel disease genes is challenging for diseases for which no prior knowledge is available. Performing genetic studies frequently result in large lists of candidate genes of which only few can be followed up for further investigation. In the past couple of years, several gene prioritization methods have been proposed. Most of them use a guilt- by - association concept, and are therefore not applicable when little is known about the phenotype or no disease genes are available.
We have proposed a method that overcomes this limitation by replacing prior knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. At the core of the method are a protein interaction network and disease-specific expression data. Our approach propagates the expression data over the network using an extended Random Walk approach based on kernel methods, as the inclusion of indirect associations compensating for network sparsity and small world effect issues. It relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed neighboring genes in a protein interaction network.
We have benchmarked our approach, and results showed that it clearly outperforms other gene prioritization approaches with an average ranking position of 8 out of 100 genes, and an AUC value of 92.3%.
Recently, we have developed the web server PINTA implementing our gene prioritization approach to make it available for clinicians and other researchers.
Association Rule Mining with Prior Knowledge for Alzheimer's Disease
Peter Li Mayo Clinic Gyorgy Simon (Mayo Clinic, Health Sciences Research); Short Abstract: As we migrate to modeling diseases as a multi-factorial problem, the ability to analyze any given genomic data set is limited by the combinatorial explosion of false discoveries. The statistical solution is to require increased significance (e.g. Bonferroni correction), but this increases false negatives. Another approach is to use prior knowledge, such as pathways and networks. Most methods fail to account for population heterogeneity. In this work, we present a novel approach integrating prior knowledge, population heterogeneity, with a two-stage association rule mining technique, whose behavior is different from traditional testing.
We evaluated this method using GWAS from the Joint Aging, Addiction and Metal Health (JAAMH) Alzheimer's Disease (AD) data set of 1237 cases and 1254 controls. A combined interaction network was built from Reactome, BrioGrid, IntAct, MINT, DIP and HPRD. In the first stage, we generate haplotype blocks and then apply predictive association rule mining for each block. In the second stage, we discover combinations of predictive haplotypes, whose corresponding genes are on average ?k hops away from the nearest known AlzGene gene on the network.
We found that at k=1, we discovered 50% less patterns than we would have without the use of prior knowledge, yet we recovered 93% of the significant patterns and 89% of the unique genes. The lower total number of patterns allows for less stringent Bonferroni correction, leading to 10% increase in the number of significant patterns. The predictive capability of the discovered genes is higher than that of individual SNPs or haplotypes.
PINTA: a web server for network-based gene prioritization from expression data
http://nar.oxfordjournals.org/content/39/suppl_2/W334.full
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.
Our method relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed genes in a genome-wide protein–protein interaction network. This allows the detection of a strong signal for a candidate even if its own differential expression value is too small to be detected by a standard analysis, as long as its interacting partners are highly differentially expressed.
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.
Our method relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed genes in a genome-wide protein–protein interaction network. This allows the detection of a strong signal for a candidate even if its own differential expression value is too small to be detected by a standard analysis, as long as its interacting partners are highly differentially expressed.
Sophocles
One word frees us of all the weight and pain of life: that word is love.
Sophocles c. 497/6 BC – winter 406/5 BC)[1] is one of three ancient Greektragedians whose plays have survived.
The most famous tragedies of Sophocles feature Oedipus and Antigone: they are generally known as the Theban plays
http://en.wikipedia.org/wiki/Sophocles
Oedipus the King
http://www.imdb.com/title/tt0087833/
Sophocles c. 497/6 BC – winter 406/5 BC)[1] is one of three ancient Greektragedians whose plays have survived.
The most famous tragedies of Sophocles feature Oedipus and Antigone: they are generally known as the Theban plays
http://en.wikipedia.org/wiki/Sophocles
Oedipus the King
http://www.imdb.com/title/tt0087833/
Tuesday, July 19, 2011
Love and Sports
"The first thing is to love your sport. Never do it to please someone else. It has to be yours."
Peggy Fleming
Peggy Fleming
Monday, July 18, 2011
Friday, July 15, 2011
Google map to GPX
http://www.elsewhere.org/journal/gmaptogpx/
The Google Maps API is great, but it doesn’t have an easy way to export data in GPX format. This bookmarklet is my attempt at a hack to get information out of Google Maps and into GPX, suitable for loading on a GPS.
This bookmarklet can create a GPX file based on driving directions, an address search or a local search. The GPX file will contain a route, a single waypoint, or up to ten waypoints, respectively. The code for extracting waypoints from local search originally came from this page.
If you’re looking for a utility to display GPX files in Google Maps, I recommend GPS Visualizer.
http://home.comcast.net/~ghayman3/garmin.gps/page3.htm
GPS
The Google Maps API is great, but it doesn’t have an easy way to export data in GPX format. This bookmarklet is my attempt at a hack to get information out of Google Maps and into GPX, suitable for loading on a GPS.
This bookmarklet can create a GPX file based on driving directions, an address search or a local search. The GPX file will contain a route, a single waypoint, or up to ten waypoints, respectively. The code for extracting waypoints from local search originally came from this page.
If you’re looking for a utility to display GPX files in Google Maps, I recommend GPS Visualizer.
http://home.comcast.net/~ghayman3/garmin.gps/page3.htm
GPS
Human Connectome
http://humanconnectome.org/about/publications.html
www.alleninstitute.org/
www.gensat.org/
http://adni.loni.ucla.edu
www.alleninstitute.org/
www.gensat.org/
http://adni.loni.ucla.edu
UBC Map of Knowledge
http://www.knowledgenetwork.ubc.ca/CKNet/Interactive.html
When the full version is done, this is where you can
- Find yourself on UBC’s map or be found by others
- Search for people, topics or geographic area
- Observe the evolution of the UBC net
- Identify the sub-community within UBC
Is Your Brain Asleep on the Job?
http://www.psychologytoday.com/blog/prime-your-gray-cells/201107/is-your-brain-asleep-the-job
If you don't find your job challenging, you've likely zoned out due to boredom. Boredom arises when you have maxed out, learned as much as you can learn or mastered all the skills required to perform your job, particularly if your job doesn't require complex thinking and analysis.
When it comes to your brain, the 'use it or lose it' principle applies.
monitoring goals keeps your brain focused on the task, even while it sleeps; and if you link pleasure to achieving your goals, your brain will be even further motivated to perform. Your brain is wired to understand the good consequences that come from distinct action; setting your workday up so that it can perform to its peak capacity is a recipe for success.
If you don't find your job challenging, you've likely zoned out due to boredom. Boredom arises when you have maxed out, learned as much as you can learn or mastered all the skills required to perform your job, particularly if your job doesn't require complex thinking and analysis.
When it comes to your brain, the 'use it or lose it' principle applies.
monitoring goals keeps your brain focused on the task, even while it sleeps; and if you link pleasure to achieving your goals, your brain will be even further motivated to perform. Your brain is wired to understand the good consequences that come from distinct action; setting your workday up so that it can perform to its peak capacity is a recipe for success.
Modular bioinformatics
http://en.wikipedia.org/wiki/OSGi
The Open Services Gateway initiative framework is a module system and service platform for the Java programming language that implements a complete and dynamic component model, something that as of 2011[update] does not exist in standalone Java/VM environments. Applications or components (coming in the form of bundles for deployment) can be remotely installed, started, stopped, updated and uninstalled without requiring a reboot; management of Java packages/classes is specified in great detail. Application life cycle management (start, stop, install, etc.) is done via APIs that allow for remote downloading of management policies. The service registry allows bundles to detect the addition of new services, or the removal of services, and adapt accordingly.
Cytoscape, Protege, PathVisio, ImageJ, Jalview or Chipster
http://www.kirkk.com/modularity/2011/01/chapter-14-introducing-osgi/
The Open Services Gateway initiative framework is a module system and service platform for the Java programming language that implements a complete and dynamic component model, something that as of 2011[update] does not exist in standalone Java/VM environments. Applications or components (coming in the form of bundles for deployment) can be remotely installed, started, stopped, updated and uninstalled without requiring a reboot; management of Java packages/classes is specified in great detail. Application life cycle management (start, stop, install, etc.) is done via APIs that allow for remote downloading of management policies. The service registry allows bundles to detect the addition of new services, or the removal of services, and adapt accordingly.
Cytoscape, Protege, PathVisio, ImageJ, Jalview or Chipster
http://www.kirkk.com/modularity/2011/01/chapter-14-introducing-osgi/
Positional integratomic approach in identification of genomic candidate regions for Parkinson disease
1. A Maver and B Peterlin, “Positional integratomic approach in identification of genomic candidate regions for Parkinson disease,” Bioinformatics (May 19, 2011), http://bioinformatics.oxfordjournals.org/content/early/2011/05/19/bioinformatics.btr313.abstract.
www.ncbi.nlm.nih.gov/pubmed/21596793
ABSTRACT
Motivation: Recent abundance of data from studies employing
high-throughput technologies to reveal alterations in human disease
on genomic, transcriptomic, proteomic, and other levels, offer the
possibility to integrate this information into a comprehensive picture
of molecular events occurring in human disease. Diversity of data
originating from these studies presents a methodological obstacle in
the integration process, also due to difficulties in choosing the opti-
mal unified denominator that would allow inclusion of variables from
various types of studies. We present a novel approach for integra-
tion of such multi-origin data based on positions of genetic altera-
tions occurring in human diseases. Parkinson disease (PD) was
chosen as a model for evaluation of our methodology.
Methods: Datasets from various types of studies in PD (linkage,
genome-wide association, transcriptomic and proteomic studies)
were obtained from online repositories or were extracted from avail-
able research papers. Subsequently, human genome assembly was
subdivided into 10kb regions, and significant signals from aforemen-
tioned studies were arranged into their corresponding regions ac-
cording to their genomic position. For each region rank product
values were calculated and significance values were estimated by
permuting the original dataset.
Results: Altogether, 179 regions (representing 33 contiguous ge-
nomic regions) had significant accumulation of signals when p-value
cut-off was set at 0.0001. Identified regions with significant accumu-
lation of signals contained 29 plausible candidate genes for PD. In
conclusion, we present a novel approach for identification of candi-
date regions and genes for various human disorders, based on the
positional integration of data across various types of omic studies.
www.ncbi.nlm.nih.gov/pubmed/21596793
ABSTRACT
Motivation: Recent abundance of data from studies employing
high-throughput technologies to reveal alterations in human disease
on genomic, transcriptomic, proteomic, and other levels, offer the
possibility to integrate this information into a comprehensive picture
of molecular events occurring in human disease. Diversity of data
originating from these studies presents a methodological obstacle in
the integration process, also due to difficulties in choosing the opti-
mal unified denominator that would allow inclusion of variables from
various types of studies. We present a novel approach for integra-
tion of such multi-origin data based on positions of genetic altera-
tions occurring in human diseases. Parkinson disease (PD) was
chosen as a model for evaluation of our methodology.
Methods: Datasets from various types of studies in PD (linkage,
genome-wide association, transcriptomic and proteomic studies)
were obtained from online repositories or were extracted from avail-
able research papers. Subsequently, human genome assembly was
subdivided into 10kb regions, and significant signals from aforemen-
tioned studies were arranged into their corresponding regions ac-
cording to their genomic position. For each region rank product
values were calculated and significance values were estimated by
permuting the original dataset.
Results: Altogether, 179 regions (representing 33 contiguous ge-
nomic regions) had significant accumulation of signals when p-value
cut-off was set at 0.0001. Identified regions with significant accumu-
lation of signals contained 29 plausible candidate genes for PD. In
conclusion, we present a novel approach for identification of candi-
date regions and genes for various human disorders, based on the
positional integration of data across various types of omic studies.
SCHIZOPHRENIA; SCZD
http://omim.org/entry/181500
Schizophrenia is a psychosis, a disorder of thought and sense of self. Although it affects emotions, it is distinguished from mood disorders in which such disturbances are primary. Similarly, there may be mild impairment of cognitive function, and it is distinguished from the dementias in which disturbed cognitive function is considered primary. There is no characteristic pathology, such as neurofibrillary tangles in Alzheimer disease (104300). Schizophrenia is a common disorder with a lifetime prevalence of approximately 1%. It is highly heritable but the genetics are complex. This may not be a single entity.
http://www.sciencedirect.com/science/article/pii/S0140673609609958
Schizophrenia is a psychosis, a disorder of thought and sense of self. Although it affects emotions, it is distinguished from mood disorders in which such disturbances are primary. Similarly, there may be mild impairment of cognitive function, and it is distinguished from the dementias in which disturbed cognitive function is considered primary. There is no characteristic pathology, such as neurofibrillary tangles in Alzheimer disease (104300). Schizophrenia is a common disorder with a lifetime prevalence of approximately 1%. It is highly heritable but the genetics are complex. This may not be a single entity.
http://www.sciencedirect.com/science/article/pii/S0140673609609958
Monogenic genetic disorders
Monogenic genetic disorders occur as a direct consequence of a single gene being defective. Such disorders are inherited (passed on from one generation to another) in a simple pattern according to Mendel's Laws. As such, these disorders are often referred to as Mendelian disorders. However, many disorders are not inherited in this pattern. These include disorders due to more than one gene (polygenic disorders), those caused by mutations in non-nuclear mitochondrial genes (such as Leber's atrophy) and nucleotide repeat disorders (such as myotonic dystrophy).
http://www.geneticalliance.org.uk/education2.htm
http://www.geneticalliance.org.uk/education2.htm
Thursday, July 14, 2011
PrioNet Canada
http://www.prionetcanada.ca/
PrioNet Canada is a Network of Centres of Excellence for research into prions and prion diseases. Prion diseases are transmissible and fatal neurodegenerative diseases of both humans and animals.
PrioNet Canada is a Network of Centres of Excellence for research into prions and prion diseases. Prion diseases are transmissible and fatal neurodegenerative diseases of both humans and animals.
Science Careers: Bioinformatics Scientist
http://www.sciencebuddies.org/science-fair-projects/science-engineering-careers/Genom_bioinformaticsscientist_c001.shtml
Median salary $66,510
http://www.sciencebuddies.org/science-fair-projects/science-engineering-careers/interview_caroline-thorn.shtml
http://bioteach.ubc.ca/Bioinformatics/interviews/
Related Occupations
Median salary $66,510
http://www.sciencebuddies.org/science-fair-projects/science-engineering-careers/interview_caroline-thorn.shtml
http://bioteach.ubc.ca/Bioinformatics/interviews/
Related Occupations
- Biologist
- Computer scientist
- Computational biologist
- Database administrator
- Statistician
- Mathematician
Finding your TP (while minimizing your FP)
Finding your True Positive (while minimizing your FP): A combination of sensitivity (open mindedness) and specificity (not taking yourself for granted)
Wednesday, July 13, 2011
Widespread transcription at neuronal activity-regulated enhancers
http://www.ncbi.nlm.nih.gov/pubmed/20393465
Enhancers near the c-fos gene with increased CBP/RNAPII/NPAS4 binding and eRNA production upon membrane depolarization
The strong inducibility of CBP binding at thousands of neuronal enhancers and their presence near activity-regulated genes (e.g., c-fos, rgs, and nr4a2) (Fig. 1 and Supplementary table 2) suggests that these enhancers may contribute to the induction of activity-regulated gene expression.
We find in neurons that CREB, SRF, and NPAS4 bind to neuronal enhancers as well as promoters (Supplementary Table 3).
This tight co-localization of individual TFs with CBP at a subset of enhancers (Supplementary Table 4) suggests that TFs may work together to regulate enhancer function, possibly by recruiting CBP.
We provide genome-wide evidence that thousands of neuronal activity-regulated enhancers that are defined by activity-independent H3K4me1 marks and activity-dependent CBP binding also recruit RNAPII and produce eRNAs
http://genome.ucsc.edu/cgi-bin/hgEncodeVocab?term=CTCF,H3K4me1,H3K4me2,H3K4me3,H3K27ac,H3K9ac,H3K36me3,H4K20me1,H3K27me3,Input
Enhancers near the c-fos gene with increased CBP/RNAPII/NPAS4 binding and eRNA production upon membrane depolarization
The strong inducibility of CBP binding at thousands of neuronal enhancers and their presence near activity-regulated genes (e.g., c-fos, rgs, and nr4a2) (Fig. 1 and Supplementary table 2) suggests that these enhancers may contribute to the induction of activity-regulated gene expression.
We find in neurons that CREB, SRF, and NPAS4 bind to neuronal enhancers as well as promoters (Supplementary Table 3).
This tight co-localization of individual TFs with CBP at a subset of enhancers (Supplementary Table 4) suggests that TFs may work together to regulate enhancer function, possibly by recruiting CBP.
We provide genome-wide evidence that thousands of neuronal activity-regulated enhancers that are defined by activity-independent H3K4me1 marks and activity-dependent CBP binding also recruit RNAPII and produce eRNAs
http://genome.ucsc.edu/cgi-bin/hgEncodeVocab?term=CTCF,H3K4me1,H3K4me2,H3K4me3,H3K27ac,H3K9ac,H3K36me3,H4K20me1,H3K27me3,Input
H3K4me1 | Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. |
H3K4me2 | Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. |
H3K4me3 | Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. |
H3K27ac | Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. |
Tuesday, July 12, 2011
The Genetics of Parkinson's Disease
http://www.medscape.com/viewarticle/528722_3
Dominantly Inherited Parkinson Disease: Probable Gain-of-Function Mechanism
- SNCA / Park 1 / Park 4
- LRRK2 / Park 8
Recessively Inherited Parkinson Disease: Probable Loss-of-Function Mechanism
- Parkin / Park 2 -> early-onset
- PINK1
- DJ1
Monogenic disease
http://www.who.int/genomics/public/geneticdiseases/en/index2.html
Monogenic diseases result from modifications in a single gene occurring in all cells of the body. Though relatively rare, they affect millions of people worldwide.
Thalassaemia
Sickle cell anemia
Haemophilia
Cystic Fibrosis
Tay sachs disease
Fragile X syndrome
Huntington's disease
Monogenic diseases result from modifications in a single gene occurring in all cells of the body. Though relatively rare, they affect millions of people worldwide.
Thalassaemia
Sickle cell anemia
Haemophilia
Cystic Fibrosis
Tay sachs disease
Fragile X syndrome
Huntington's disease
Exome sequencing -- cheaper and faster, greater sequencing depth alternative to whole genome sequencing
http://en.wikipedia.org/wiki/Exome_sequencing
Hybrid capture
This technique involves hybridizing shotgun libraries of genomic DNA to target-specific sequences on a microarray.[4] Roche NimbleGen was first to take this technology and adapt it for next-generation sequencing. They developed the Sequence Capture Human Exome 2.1M Array to capture ~180,000 coding exons.[3] This method is both time-saving and cost-effective compared to PCR based methods. The Agilent Capture Array and the comparative genomic hybridization array also other methods that can be used for hybrid capture of target sequences. Limitations in this technique include the need for expensive hardware as well as a relatively large amount of DNA.[4]In-solution capture
To capture genomic regions of interest using in-solution capture, a pool of custom oligonucleotides (probes) is synthesized and hybridized in solution to a fragmented genomic DNA sample. The probes (labeled with beads) selectively hybridize to the genomic regions of interest after which the beads (now including the DNA fragments of interest) can be pulled down and washed to clear excess material. The beads are then removed and the genomic fragments can be sequenced allowing for selective DNA sequencing of genomic regions (e.g. exons) of interest.Monday, July 11, 2011
Learning
"I am always doing that which I cannot do, in order that I may learn how to do it."
Pablo Picasso
Pablo Picasso
bam2fastq - extract raw sequence from BAM alignment files
http://www.hudsonalpha.org/gsl/software/bam2fastq.php
samtools idxstats
Retrieve and print stats in the index file. The output is TAB delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads.
http://samtools.sourceforge.net/samtools.shtml
samtools idxstats
Retrieve and print stats in the index file. The output is TAB delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads.
http://samtools.sourceforge.net/samtools.shtml
Dx = Diagnosis
Dx = Diagnosis
Rx = Prescription (from latin verb "recipe" = to take) http://ask.yahoo.com/20051114.html
Rx = Prescription (from latin verb "recipe" = to take) http://ask.yahoo.com/20051114.html
Foxp2 - Forkhead box protein P2
http://en.wikipedia.org/wiki/FOXP2
The FOXP2 protein contains a forkhead-box DNA-binding domain, making it a member of the FOX group of transcription factors, involved in regulation of gene expression. In addition to this characteristic forkhead-box domain, the protein contains a polyglutamine tract, a zinc finger and a leucine zipper.
In humans, mutations of FOXP2 cause a severe speech and language disorder. One particular target that is directly downregulated by FOXP2 in human neurons is the CNTNAP2 gene, a member of the neurexin family; variants in this target gene have been associated with common forms of language impairment.
FOXP2 directly regulates a large number of downstream target genes
http://www.medicalnewstoday.com/articles/230758.php
The FOXP2 protein contains a forkhead-box DNA-binding domain, making it a member of the FOX group of transcription factors, involved in regulation of gene expression. In addition to this characteristic forkhead-box domain, the protein contains a polyglutamine tract, a zinc finger and a leucine zipper.
In humans, mutations of FOXP2 cause a severe speech and language disorder. One particular target that is directly downregulated by FOXP2 in human neurons is the CNTNAP2 gene, a member of the neurexin family; variants in this target gene have been associated with common forms of language impairment.
FOXP2 directly regulates a large number of downstream target genes
http://www.medicalnewstoday.com/articles/230758.php
How Brain Death works
http://science.howstuffworks.com/environmental/life/human-biology/brain-death1.htm
Due to loss of sugar, oxygen
Can live up to 6 min. after the heart stops, so apply CPR right away!
Due to loss of sugar, oxygen
Can live up to 6 min. after the heart stops, so apply CPR right away!
TOP500 List - June 2011 (1-100)
http://www.top500.org/list/2011/06/100
Japan's K Computer
RIKEN Advanced Institute for Computational Science in Kobe
Japan's K Computer
RIKEN Advanced Institute for Computational Science in Kobe
Apache™ Hadoop™ -- software for reliable, scalable, distributed computing
http://hadoop.apache.org/
What Is Apache Hadoop?
The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
What Is Apache Hadoop?
The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
Sunday, July 10, 2011
Gene set analysis for longitudinal gene expression data
Gene set analysis for longitudinal gene expression data
Ke Zhang email, Haiyan Wang email, Arne C Bathke email, Solomon W Harrar email, Hans-Peter Piepho email and Youping Deng email
BMC Bioinformatics 2011, 12:273doi:10.1186/1471-2105-12-273
Published: 3 July 2011
http://www.biomedcentral.com/1471-2105/12/273/abstract
Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a Gene set analysis (GSA) needs to take into account the within-gene correlations in addition to possible between-gene correlations.
Ke Zhang email, Haiyan Wang email, Arne C Bathke email, Solomon W Harrar email, Hans-Peter Piepho email and Youping Deng email
BMC Bioinformatics 2011, 12:273doi:10.1186/1471-2105-12-273
Published: 3 July 2011
http://www.biomedcentral.com/1471-2105/12/273/abstract
Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a Gene set analysis (GSA) needs to take into account the within-gene correlations in addition to possible between-gene correlations.
Baby, Old and Hair
"Babies haven't any hair; Old men's heads are just as bare; Between the cradle and the grave, Lie a haircut and a shave"
--Samuel Hoffenstein
--Samuel Hoffenstein
SOAP - short oligonucleotide analysis package - BGI
http://www.eurekalert.org/pub_releases/2011-07/ai-baf070611.php
Beijing Genomics Institute (BGI) announces first release of updated bioinformatics software
Beijing Genomics Institute (BGI) announces first release of updated bioinformatics software
Kind words
"Kind words can be short and easy to speak, but their echoes are truly endless."
--Mother Teresa
--Mother Teresa
Saturday, July 9, 2011
Bioinformatics training and challenges
Bioinformatics training: a review of challenges, actions and support requirements
http://bib.oxfordjournals.org/content/11/6/544.abstract
Bioinformatics challenges for genome-wide association studies
http://bioinformatics.oxfordjournals.org/content/26/4/445.abstract
http://www.sp-consultant.com/SurveyResults
http://bib.oxfordjournals.org/content/11/6/544.abstract
Bioinformatics challenges for genome-wide association studies
http://bioinformatics.oxfordjournals.org/content/26/4/445.abstract
http://www.sp-consultant.com/SurveyResults
Grub timeout not working
edit /boot/grub/grub.cfg. Find this section and change "-1" to "5".
if [ "${recordfail}" = 1 ]; then
set timeout=5
else
set timeout=10
fi
http://ubuntuforums.org/showthread.php?t=1598854&page=4
if [ "${recordfail}" = 1 ]; then
set timeout=5
else
set timeout=10
fi
http://ubuntuforums.org/showthread.php?t=1598854&page=4
sudo grub-editenv create
Friday, July 8, 2011
Interdisciplinary Research
https://commonfund.nih.gov/interdisciplinary/
The goal of the Common Fund’s Interdisciplinary Research (IR) program is to change academic research culture such that interdisciplinary approaches and team science spanning various biomedical and behavioral specialties are encouraged and rewarded. The program includes the following components:
- Interdisciplinary Research Consortia
- Interdisciplinary Training Programs
- Innovation in Interdisciplinary Technology and Methods
- Multiple Principal Investigator (Multi-PI) Policy
MOUSE GENETICS LEADS TO NEW CLUES FOR HUMAN PSYCHIATRIC DISORDERS
Several psychiatric disorders, including attention deficit hyperactivity disorder (ADHD), drug and alcohol addiction, and schizophrenia, are characterized by poor impulse control and difficulty inhibiting certain behaviors. These traits are referred to as behavioral inflexibility, which is thought to be partially under genetic control. However, the genes responsible have been difficult to identify in humans. In a paper available online March 10, 2011 in the journal Biological Psychiatry, researchers in the Common Fund’s Interdisciplinary Research program’s Consortium for Neuropsychiatric Phenomics report that they have identified several genes associated with behavioral inflexibility in mice, and that these findings might be applicable to humans as well. To identify genes that underlie behavioral inflexibility, Dr. David Jentsch and colleagues from the University of California Los Angeles and the University of Tennessee first tested 51 genetically different strains of mice for the ability to reverse their behavior in a learned task. To successfully complete this task, mice had to learn to poke their nose into an opening either on the left or right side of the cage in order to receive a food reward. Once the mice mastered this skill, they had to unlearn which side to poke their nose into, and re-learn to poke their nose on the opposite side. The number of tries it takes a mouse to reverse its behavior indicates how much behavioral flexibility and impulse control the mouse has. The researchers reasoned that by looking at both the genes and behaviors of the mice, they could find genetic differences that were associated with the behavioral differences. Indeed, the researchers zeroed in on a region of the mouse chromosome 10 that contains several genes that influence behavioral flexibility. One gene, Syn3, regulates chemical communication in the brain and has been inconclusively linked to schizophrenia in humans. Another gene, Nt5dc3, is a gene of unknown function that has been associated with ADHD. The current research suggests that both of these genes should be investigated further to discover what role they may play in human psychiatric disorders, and also demonstrates a new way to use mouse behavior and genetics to find genes that may contribute to complex behaviors in humans.
Ubuntu does not unmount cleanly during shutdown
$ sudo apt-get install --reinstall libc6
$ sudo apt-get install --reinstall libc6-dev
$ sudo apt-get install --reinstall libc6-i386
$ sudo apt-get install --reinstall sysvinit-utils
$ sudo apt-get install --reinstall upstart
https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/672177
http://askubuntu.com/questions/39384/use-upstart-to-umount-nfs-at-shutdown-restart
https://wiki.ubuntu.com/MaverickMeerkat/ReleaseNotes
Upstart 0.6.7
http://www.ubuntuupdates.org/packages/show/268020?page=2
$ sudo apt-get install --reinstall libc6-dev
$ sudo apt-get install --reinstall libc6-i386
$ sudo apt-get install --reinstall sysvinit-utils
$ sudo apt-get install --reinstall upstart
https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/672177
http://askubuntu.com/questions/39384/use-upstart-to-umount-nfs-at-shutdown-restart
https://wiki.ubuntu.com/MaverickMeerkat/ReleaseNotes
Upstart 0.6.7
http://www.ubuntuupdates.org/packages/show/268020?page=2
Mentoring: On the right path -- Alison McCook
http://www.nature.com/naturejobs/2011/110630/full/nj7353-667a.html?WT.ec_id=NATUREjobs-20110707
Maybe academia isn't where you want to be
Box 1: Advice for mentors: How to help
Maybe academia isn't where you want to be
Principal investigators hoping to help their postdocs to find suitable positions should consider these tips.
Box 2: Advice for postdocs: Going it alone
- Set aside five minutes a few times a year to check in on your postdocs' career plans and progress.
- Know your limits. If your postdocs are interested in careers that you know nothing about, such as patent law, introduce them to someone who can help: someone in the institution's faculty-affairs or postdoctoral office, for example.
- Give them experience with peer review: tell journals that you want your postdocs to review papers with you.
- Don't rely on job ads alone. If a postdoc is exceptional, try calling colleagues and collaborators to ask about positions soon to open up that haven't yet been advertised.
- Sell their strengths when employers call. If your postdoc does well in presentations, don't just say that. Better to say he or she is a good communicator — a talent that may help to get grants.
- Don't help too much; let postdocs manage some of the process alone. Lucy Shapiro, a developmental biologist at Stanford University in California, says that she won't help a postdoc to write the talk that maps out his or her research plan. “To me, that's a very important do-it-on-your-own measure, so people know what they're dealing with when they decide to offer someone a job,” she says. A.M.
If your principal investigator is unwilling or unable to help with your job search, try these steps to make progress on your own.
- Craft a plan of your goals and timelines. Even if you don't show it to anyone, it is a good way to analyse your strengths and weaknesses, and to give yourself direction.
- Plan ahead. Try to pick an adviser whose postdocs are typically successful, says Jodi Lubetsky, a manager of science policy at the Association of American Medical Colleges in Washington DC.
- If you've already joined a lab and your adviser is “missing in action”, schedule a time to talk, she says. If that isn't working and you still want to get your principal investigator involved, find a neutral person to whom you feel safe talking.
- Publish, but don't obsess. Some employers want several papers per year; some are fine with fewer in top-tier journals. Don't worry too much about quantity, says Ron Vale, a cell biologist at the University of California, San Francisco. One strong paper is often good enough, he says.
- Get outside funding or fellowships to show employers that you can compete successfully for money. Awards can come from sources such as local governments, foundations or professional societies.
- Multiply your mentors. Even if your principal investigator is helpful, it is a good idea to establish relationships with 2–4 experienced scientists, who will then be able to answer personalized questions during a phone call from employers, and contribute more than just generic recommendation letters. A.M.
Writing tips
http://www.nature.com/naturejobs/2011/110707/full/nj7354-129a.html?WT.ec_id=NATUREjobs-20110707
So stop waiting to feel ready. Get started with some short and regular writing snacks. What you write won't be perfect at first, but you will be on your way to becoming a prolific academic writer.
So stop waiting to feel ready. Get started with some short and regular writing snacks. What you write won't be perfect at first, but you will be on your way to becoming a prolific academic writer.
Subscribe to:
Posts (Atom)