Accurate Prediction of Protein Disordered Regions

来源 :第五届全国生物信息学与系统生物学学术大会 | 被引量 : 0次 | 上传用户:rilinx_2009
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Background: Intrinsically disordered proteins (IDPs) that do not possess stable secondary and tertiary structures are crucial for the function of numerous proteins.Even though these proteins lack intrinsic structure, they could bind to many different macromolecular partners when functioning in protein and protein interactions.And their prevalence is implicated in a number of human diseases.Accurate prediction of disordered regions from protein sequence is a necessary prerequisite for the further understanding of principles and mechanisms of protein function, and is a key for the elaboration of a structural and functional hierarchy of proteins.Therefore prediction of IDPs has attracted the attention of many researchers, and a number of prediction methods have been developed.Predictions of disorder play major roles in directing laboratory experiments that are leading to the discovery of disordered proteins, and thereby leading to a positive feedback loop in the investigation of these proteins.Methods: In this work, we propose a novel predictor, DRPred, which is based on a disorder position-specific scoring matrix (DPSSM).We develop a strategy to construct DPSSM which is generated by sequence alignment, and its element is disorder profiles.DRPred uses a database of disorder states of 35811 entries that is a non-redundant database contains amino acids and relative one of disorder, near-disorder and order information.A query is aligned against the database and resulting in DPSSM.The training set is compiled by Pierre and colleagues.It contains 723 protein sequences with 215,612 residues, of which 13,909(6.5%) are disordered.Our approach utilizes a custom-designed set of features that are based on sequence profiles, predicted secondary structure, a structural position-specific scoring matrix (SPSSM), DPSSM, predicted shape string and its profiles.Results: The PDB723 set is used as the training in the classification algorithm of Conditional Random Fields (CRFs).Short and long disordered regions are predicted by using our unitary predictor.The performance of our approach for predicting disordered regions exhibited Matthews correlation coefficient (MCC) of 72.55%, sensitivity (SN) of 70.63%, specificity (SP) of 98.63% and accuracy (AC) of 96.82%.Conclusions: DRPred outperformed on accuracies of prediction when compared with well-established publicly available disordered region predictors.It can be especially useful for prediction of disordered regions .
其他文献
  Background: Previous results indicated that the CDK2/Cyclin E1 protein complex, which plays a key role in regulating the cell cycle, could be disrupted by t
  Background: Nucleosome positioning plays an important role in regulation of the gene activity in eukaryotic cell.DNA sequence is believed to be one of the m
  Background: Pathway databases, especially KEGG, have been widely used as a reference knowledge base for biomedical scientists to interpret their experimenta
  Background: Trans-action siRNA (ta-siRNA) is a type of small interfering RNA detected from plant and is reported to play an important role in post transcrip
  Background: Yersinia pestis is a highly pathogenic Gram-negative bacterium.Y.pestis infection causes three deadly diseases: pneumonic plague, septicemic pla
  Background: Given the sequenced fragments from a pair of chromosomes, the goal of the haplotype assembly problem is to reconstruct the two haplotypes for th
  Background: Nasopharyngeal Carcinoma (NPC) is one of the highest mortal malignancies around the world, and its etiology involves a number of sophisticated b
  Background: Bacterial persisters are a tiny fraction of preexisting dormant cells inside bacterial populations.Although isogenic with the rest of the popula
  Background: Scientific nomenclature is a system of words used to name things in a particular discipline.Therefore, accurate translation of scientific nomenc
  Background: Current sequencing technology (Illumina Solexa, Applied Biosystems SoLiD, and Helicos Biosciences Heliscope etc.) allows one to read millions of