Each human exome contains thousands of nonsynonymous single-nucleotide variants (nSNVs) that have unknown biological effects. The potential impact of nSNVs on biological function is now routinely assessed using computational methods for application in biomedical research and clinical genome profiling reports. Of the variants receiving a non-neutral (function-damaging) prediction, those at evolutionarily conserved sites are frequently of heightened interest for scientists and clinicians because such sites are among the most critical for proper protein function. Indeed, a majority of amino acid mutations that have been investigated experimentally are located at ultraconserved sites1, which show no amino acid residue difference among diverse species spanning over 500 million years of evolution (Supplementary Fig. 1). Functionally damaging mutants at these sites are likely to have significant consequences for health and disease.
For these ultraconserved sites, we estimated the false positive rate (FPR) of two state-of-the-art computational tools, Condel2 and PolyPhen-2 (ref. 3), by using the standard collection of neutral variants (HumVar3) that was used to train and test these two tools (Table 1).Our analysis revealed a high FPR for Condel (89%) and PolyPhen-2 (75%). For 73% of the neutral nSNVs in HumVar, both produced a function-damaging prediction. Additionally, the overall accuracy of these tools at ultraconserved positions was low (55% and 60%, respectively). Therefore, predictions produced by current computational tools may mislead downstream experimental and clinical investigations aimed at studying functionally important sites.