论文部分内容阅读
聚类作为区间型数据挖掘的重要任务之一,在度量对象之间的相似度或距离方面面临着极大的困难.传统聚类方法扩展到区间型数据后,在度量对象之间的距离时往往只考虑到了区间型数据的边界,而忽略了区间型数据内部的信息.文章引入区间型数据的概率分布,希望通过相关的区间型数据来估计每一簇的概率密度函数.提出了一种新的基于区间型数据的核密度估计方法,然后利用新方法估计出的概率密度函数重新定义了对象之间的距离,最后提出了一种自适应的区间型数据聚类方法.实验结果显示了该方法是有效的,同时也表明用区间型数据的概率分布定义距离比用区间的端点定义距离更为合理.“,”As one of the vital tasks in mining interval data,clustering faces stupendous difficulties on mea-suring similarity or distance between objects.Existing traditional clustering methods have been extended to interval data via geometric distances which mainly consider the bounds of the interval data.These meth-ods neglect information inside the interval data.Therefore,we take the probability distributions of interval value into consideration by using the whole interval data to estimate the probability density function of one cluster.In order to estimate the probability density function of one cluster,we propose a new kernel densi-ty estimation approach which is a nonparametric estimation for interval data.Then,we define a distance between interval objects via the probability density function by the new kernel density estimation.Finally,we construct an adaptive clustering method for interval data.Experimental results show that the proposed method is effective and also indicate that it is more reasonable to use probability distribution of interval value than to only consider the endpoints of intervals.