论文部分内容阅读
Background: In recent years, secreted proteins have been identified as markers for disease typing and staging or the development of drugs.Computational identification of blood-secretory human proteins, especially proteins with highly and abnormally expressed genes in diseased human tissues, such as cancers, can provide useful information to proteomic studies for targeted disease biomarker discovery in serum.Methods: In this study, we used Support Vector Machines (SVMs) to predict blood-secretory human protein.On a dataset containing 305 known blood-secretory human proteins and be used as the positive dataset in our study.We have randomly selected two datasets from the Pfam protein families that do not contain positive dataset as the negative dataset.Each negative dataset contains 400 protein sequences.Results: By choosing amino acids composition as the only input vector, we are able to achieve 89.8% accuracy with 89.0% sensitivity for the jackknife test.Further, incorporting the compositions of amino acids dipeptide and the hydropathy distribution into the input vector, we show that the prediction results are improved to 93.0% accuracy with 92.4% sensitivity in the jackknife test.Conclusions: We hope that the promising results using novel descriptors will improve the performance of identification of blood-secretory human proteins.The high accuracy is helpful for further experimental study .