论文部分内容阅读
Data mining is most promising and flourishing frontier in database system. Data mining, also popularly referred to as Knowledge discovery in Database (KDD), is the automated convenient extraction of patterns representing knowledge implicitly stored in large database, datahouses, and other massive information repositories. Our capabilities of both generating and collecting data have been increasing rapidly in the last several decades. Contributing factors include the widespread use of bar codes for most commercial products, the computerization of many business, scientific, and government transactions, and advances in data collection tools ranging from scanned text and image platforms to satellite remote sensing systems. In addition, popular use of the Word Wide web as a global information system has flooded us with a tremendous amount of data and information. This explosive growth in stored data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge.
Clustering is main tool in data mining. Clustering algorithms try to partition a set of unlabeled input vectors into a number of subsets (clusters) such that data in the same subset are more similar to each other than to data in other subsets. There are two kinds of unsupervised clustering algorithms: fuzzy versus crisp. Crisp clustering assign any vector to unique cluster while fuzzy clustering allows the vector to belong to several clusters in different membership degrees. The study develops several fast crisp clustering approaches such as cover-based, grid-based, and kernel- based ones to challenge these open problems: how can a clusteringapproach assess those arbitrary-shaped and density-skewed clusters? Also, how one can obtain a fast approach to clustering in linear complexity? In addition, fuzzy clustering and approximation possess special position in data mining when the unlabeled vectors have uncertainty. We probe its clustering mechanism and applicable fields.
Choquet fuzzy integral has become more and more popular fusion approach. As being based on the fuzzy measure, however, two main problems arise when using fuzzy measures in practical application. One is the lack of a clear understanding about the meaning of the measures; until now, no consensus has been reached about what the numbers mean. The other problem is that typically it is not possible to get a complete specification of the value of the measure for all the subsets in the domain, but for a reduced number of them. To solve these problems, we propose several design including that an approach based on TS fuzzy modeling uses for the parameter estimation of Choquet fuzzy integral of fuzzy measure in it, some new fuzzy integral model is contributed to increase the flexibility of original ones. Finally, we discover several characters of the existed calculating methods for the Choquet fuzzy integral fuzzy and analyzes the neural network structure consisting of Choquet fuzzy integral units.