Hello. Sign in to personalize your visit. New user? Register now.  
Journal of Computational Biology
Linear Regression Models for Solvent Accessibility Prediction in Proteins

To cite this article:
Michael Wagner, Rafaℓ Adamczak, Aleksey Porollo, Jarosℓaw Meller. Journal of Computational Biology. April 2005, 12(3): 355-369. doi:10.1089/cmb.2005.12.355.

Published in Volume: 12 Issue 3: April 21, 2005

Full Text: • PDF for printing (101.3 KB) • PDF w/ links (143.7 KB)


Michael Wagner
Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, Cincinnati, OH 45229.
Rafaℓ Adamczak
Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, Cincinnati, OH 45229.
Aleksey Porollo
Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, Cincinnati, OH 45229.
Jarosℓaw Meller
Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, Cincinnati, OH 45229.
Department of Informatics, Nicholas Copernicus University, 87-100 Toruń, Poland.

The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L 1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.

Free first page

This paper was cited by:

Protein function annotation from sequence: prediction of residues interacting with RNA
R. V. Spriggs, Y. Murakami, H. Nakamura, S. Jones
Bioinformatics. Jul 2009, Vol. 25, No. 12: 1492-1497
CrossRef
Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods
Janita Thusberg, Mauno Vihinen
Human Mutation. Jun 2009, Vol. 30, No. 5: 703-714
CrossRef
The Ser/Thr/Tyr phosphoproteome of Lactococcus lactis IL1403 reveals multiply phosphorylated proteins
Boumediene Soufi, Florian Gnad, Peter Ruhdal Jensen, Dina Petranovic, Matthias Mann, Ivan Mijakovic, Boris Macek
PROTEOMICS. Oct 2008, Vol. 8, No. 17: 3486-3493
CrossRef
A novel computational and structural analysis of nsSNPs in CFTR gene
C. George Priya Doss, R. Rajasekaran, C. Sudandiradoss, K. Ramanathan, R. Purohit, R. Sethumadhavan
Genomic Medicine. Feb 2008, Vol. 2, No. 1-2: 23-32
CrossRef
Prediction-based fingerprints of protein–protein interactions
Aleksey Porollo, Jarosław Meller
Proteins: Structure, Function, and Bioinformatics. Mar 2007, Vol. 66, No. 3: 630-645
CrossRef
Two-stage support vector regression approach for predicting accessible surface areas of amino acids
Minh N. Nguyen, Jagath C. Rajapakse
Proteins: Structure, Function, and Bioinformatics. Jun 2006, Vol. 63, No. 3: 542-550
CrossRef
Combining prediction of secondary structure and solvent accessibility in proteins
Rafał Adamczak, Aleksey Porollo, Jarosław Meller
Proteins: Structure, Function, and Bioinformatics. Jun 2005, Vol. 59, No. 3: 467-475
CrossRef
All articles
Previous Next