Wednesday, July 20, 2011

Vienna ISMB ECCB 2011 Accepted Posters

https://www.iscb.org/cms_addon/conferences/ismbeccb2011/posterlist.php?cat=A

A large scale analysis in the human proteome detects correlation among disease associated mutations and perturbation of protein stability

Rita Casadio University of Bologna Valentina Indio (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & “Giorgio Prodi” Center (CIRC)); Pier Luigi Martelli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network); Marco Vassura (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Piero Fariselli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science );   Short Abstract: Technological advancements constantly increase the number of mutations that need annotation in translated regions of the human genome. Single residue mutations in proteins are known to affect protein stability and function. As a consequence they can be disease associated. Available computational methods starting from protein sequence/structure predict whether residue mutations are conducive to disease or alternatively to instability of the protein folded structure. However the relationship among stability changes in proteins and their involvement in human diseases still needs to be established. Here we try to rationalize in a nutshell the complexity of the question by generalizing over information already stored in public databases. For this we derive for each Single Aminoacid Polymorphysm (SAP) type the probability of being disease-related (Pd) and compute from thermodynamic data three indexes indicating the probability that it is conducive to decreasing (P-), increasing (P+) and perturbing the protein structure stability (Pp). Statistically validated analysis of the different P/Pd correlations indicates that Pd best correlates with Pp. Pp/Pd correlation values are as high as 0.49, and increase up 0.67 when data variability is taken into consideration. This is indicative of a medium/good correlation among Pd and Pp and corroborates the assumption that protein stability changes can be associated to disease at the proteome level.
All the probabilities are listed in a feature table useful to label SAPs as disease/protein perturbation frequently or less frequently associated in the current data bases.


The functional importance and detection of regulatory sequence variants

Virginie Bernard Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute Wyeth Wasserman (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); David Arenillas (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia);   Short Abstract: The convergence of high-throughput technologies for sequencing individual exomes and genomes and rapid advances in genome annotation are driving a neo-revolution in human genetics. This wave of family-based genetics analysis is revealing causal mutations responsible for striking phenotypes. By mapping the reads to the human genome reference and by searching for variations relative to the reference, a list of small nucleotide variations and structural variations is obtained. Analysis is required to reveal those variations most likely to contribute to a disease phenotype within a family. Existing software score the severity of changes that arise in protein encoding exons. However, most mutations within a family are situated in the 98% of the genome that controls the developmental and physiological profile of gene activity - the sequences that control when and where a gene will be active.

Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. With full genome sequencing becoming accessible to medical researchers, the need to identify potential causal mutations in regulatory DNA is becoming imperative. We are implementing a software system to enable genetics researchers to characterize regulatory DNA changes within individual genome sequences. We are combining reference databases of known regulatory elements, experimental archives of protein-DNA interactions and computational predictions within an integrated analysis package. With our software, researchers will have greater capacity to identify variations potentially causal for disease.

The poster introduces the challenges and approaches of regulatory sequence variation analysis.

A guide to web tools to prioritize candidate genes

Yves Moreau Katholieke Universiteit Leuven
Leon-Charles Tranchevent (Katholieke Universiteit Leuven) Francisco Bonachela Capdevila (Katholieke Universiteit Leuven, Department of Computer Science); Daniela Nitsch (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Bart De Moor (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Patrick De Causmaecker (Katholieke Universiteit Leuven, Department of Computer Science); Yves Moreau (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD);   Short Abstract: Finding the most promising genes among large lists of candidate genes has been defined as the gene prioritization problem. It is a recurrent problem in genetics in which genetic conditions are reported to be associated with chromosomal regions. In the last decade, several different computational approaches have been developed to tackle this challenging task. In this study, we review 19 computational solutions for human gene prioritization that are freely accessible as web tools and illustrate their differences. We summarize the various biological problems to which they have been successfully applied. Ultimately, we describe several research directions that could increase the quality and applicability of the tools. In addition we developed a website (http://www.esat.kuleuven.be/gpp) containing detailed information about these and other tools, which is regularly updated. This review and the associated website constitute together a guide to help users select a gene prioritization strategy that suits best their needs.

Network-based gene prioritization from expression data by diffusing through protein interaction networks

Daniela Nitsch KU Leuven Léon-Charles Tranchevent (KU Leuven, ESAT-SCD); Joana Gonçalves (INESC-ID, Knowledge Discovery and Bioinformatics (KDBIO) group); Yves Moreau (KU Leuven, ESAT-SCD);   Short Abstract: Discovering novel disease genes is challenging for diseases for which no prior knowledge is available. Performing genetic studies frequently result in large lists of candidate genes of which only few can be followed up for further investigation. In the past couple of years, several gene prioritization methods have been proposed. Most of them use a guilt- by - association concept, and are therefore not applicable when little is known about the phenotype or no disease genes are available.

We have proposed a method that overcomes this limitation by replacing prior knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. At the core of the method are a protein interaction network and disease-specific expression data. Our approach propagates the expression data over the network using an extended Random Walk approach based on kernel methods, as the inclusion of indirect associations compensating for network sparsity and small world effect issues. It relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed neighboring genes in a protein interaction network.
We have benchmarked our approach, and results showed that it clearly outperforms other gene prioritization approaches with an average ranking position of 8 out of 100 genes, and an AUC value of 92.3%.

Recently, we have developed the web server PINTA implementing our gene prioritization approach to make it available for clinicians and other researchers.  

Association Rule Mining with Prior Knowledge for Alzheimer's Disease

Peter Li Mayo Clinic Gyorgy Simon (Mayo Clinic, Health Sciences Research);   Short Abstract: As we migrate to modeling diseases as a multi-factorial problem, the ability to analyze any given genomic data set is limited by the combinatorial explosion of false discoveries. The statistical solution is to require increased significance (e.g. Bonferroni correction), but this increases false negatives. Another approach is to use prior knowledge, such as pathways and networks. Most methods fail to account for population heterogeneity. In this work, we present a novel approach integrating prior knowledge, population heterogeneity, with a two-stage association rule mining technique, whose behavior is different from traditional testing.

We evaluated this method using GWAS from the Joint Aging, Addiction and Metal Health (JAAMH) Alzheimer's Disease (AD) data set of 1237 cases and 1254 controls. A combined interaction network was built from Reactome, BrioGrid, IntAct, MINT, DIP and HPRD. In the first stage, we generate haplotype blocks and then apply predictive association rule mining for each block. In the second stage, we discover combinations of predictive haplotypes, whose corresponding genes are on average ?k hops away from the nearest known AlzGene gene on the network.

We found that at k=1, we discovered 50% less patterns than we would have without the use of prior knowledge, yet we recovered 93% of the significant patterns and 89% of the unique genes. The lower total number of patterns allows for less stringent Bonferroni correction, leading to 10% increase in the number of significant patterns. The predictive capability of the discovered genes is higher than that of individual SNPs or haplotypes.

No comments: