ReliefF-RFE Feature Ranking Method in Bioinformatics

来源 :第五届全国生物信息学与系统生物学学术大会 | 被引量 : 0次 | 上传用户:binhuchen007
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Background: Bioinformatics data, such as Genomics and metabolomics data are usually high dimensional and small sample size.This brings difficulty for researchers to explain the data and understand the complex biological process.Therefore many multivariate techniques and machine learning methods have been applied to analyze the bioinformatics data and extract meaningful information from them, thus to get a deep apprehension of the biological process.Methods: ReliefF is one efficient filter feature ranking algorithm, and it runs fast and can scale to very high dimensional datasets.It calculates the weights of the features, and sorts the features in descending order according to the weights.Since the high dimensional bioinformatics data usually contain noises and non-related information, they may affect the distances among the samples, thus influence the weights of the features calculated by ReliefF.Here we proposed a ReliefF recursive feature elimination algorithm (ReliefF-RFE), which eliminates the features that have negative weights or the feature having the smallest weight if the weights of all the features are positive in each iteration.This process will continue until the feature set is empty.The final feature rank is got according to the eliminating order.Results: One metabolomics data set about liver disease (containing 60 samples in 3 classes, and 1459 features) and two public data sets (leukemia and lymphoma, available at http://www.gems-system.org/) were used to show the validation of ReliefF-RFE.For each data set, l0-fold cross validation was run 10 times.By sequential forward searching the feature rank, the accuracy rates of the KNN classification were got.The result shows that ReliefFRFE outperforms ReliefF.Conclusions: We proposed a ReliefF-RFE feature ranking algorithm which could rank the features more accurately than ReliefF according to the feature information related to the problem .
其他文献
Background: The great variety of human cell types develops from a single fertilized egg, a process that is governed by regulatory networks controlling the required genetic programs.It is still a chall
Background: Cellular protein abundance is widely regulated in various levels, including transcription, post-transcription, translation and post-translation modifications.From a coevolutionary viewpoin
Background: The research on protein folding is the frontier of life science and fold classification is the foundation of protein in folding study.Nowadays, protein fold classification relies on expert
Background: Apoptosis proteins is a kind of protein with specific functions, play an important role in the growth and homeostasis of organisms.Since the function of apoptosis proteins correlates with
Background: By studying the correlation of histone modifications and the process of transcription, it has been showed that there is very universal correlation between histone modification and gene exp
Background: IL-13 which is produced by a variety of cells, mainly by activated type Ⅱ T helper cells, is a multi-effectiveness of cytokines.It has confirmed that IL-13 is the primary cytokine that ind
Ubiquitylation is one of the most popular post-translational modifications (PTM), which plays important roles in directing the protein degradation.Therefore, identification of ubiquitylation sites is
With the development of the visualization technology, varieties of protein molecular 3D visualization software have been developed and applied for molecular modeling.However, most software can not rea
Background: Inflammation plays an important role in lung cancer development and cancer therapy.To identify potential protein markers for prognosis in non-small cell lung cancer (NSCLC) patients receiv
Background: With the exponentially exploding volume of scientific literatures available, traditional expert curation becomes increasingly ineffective to keep biological knowledge up-to-date, comprehen