An unsupervised grid-based approach for clustering analysis

来源 :Science China(Information Sciences) | 被引量 : 0次 | 上传用户:f168168f
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
In recent years, the growing volume of data in numerous clustering tasks has greatly boosted the existing clustering algorithms in dealing with very large datasets. The K-means has been one of the most popular clustering algorithms because of its simplicity and easiness in application, but its effciency and effectiveness for large datasets are often unacceptable. In contrast to the K-means algorithm, most existing grid-clustering algorithms have linear time and space complexities and thus can perform well for large datasets. In this paper, we propose a grid-based partitional algorithm to overcome the drawbacks of the K-means clustering algorithm. This new algorithm is based on two major concepts: 1) maximizing the average density of a group of grids instead of minimizing the minimal square error which is applied in the K-means algorithm, and 2) using grid- clustering algorithms to thoroughly reformulate the object-driven assigning in the K-means algorithm into a new grid-driven assigning. Consequently, our proposed algorithm obtains an average speed-up about 10-100 times faster and produces better partitions than those by the K-means algorithm. Also, compared with the K-means algorithm, our proposed algorithm has ability to partition any dataset when the number of clusters is unknown. The effectiveness of our proposed algorithm has been demonstrated through successfully clustering datasets with different features in comparison with the other three typical clustering algorithms besides the K-means algorithm. In recent years, the growing volume of data in numerous clustering tasks has greatly boosted the existing clustering algorithms in dealing with very large datasets. The K-means has been one of the most popular clustering algorithms because of its simplicity and easiness in application, but its contrast and efficiency for large datasets are often unacceptable. In contrast to the K-means algorithm, most existing grid-clustering algorithms have linear time and space complexities and thus can perform well for large datasets. In this paper, we propose a grid- based partitional algorithm to overcome the drawbacks of the K-means clustering algorithm. This new algorithm is based on two major concepts: 1) maximizing the average density of a group of grids instead of minimizing the minimal square error which is applied in the K- means algorithm, and 2) using grid-clustering algorithms to thoroughly reformulate the object-driven assigning in the K-means algorithm into a new grid-driven assigni ng. Consequently, our proposed algorithm is an average speed-up about 10-100 times faster and produces better partitions than those by the K-means algorithm. Also, compared with the K-means algorithm, our proposed algorithm has ability to partition any dataset when the number of clusters is unknown. The effectiveness of our proposed algorithm has been demonstrated through successfully clustering datasets with different features in comparison with the other three typical clustering algorithms besides the K-means algorithm.
其他文献
同期上映的两部电影《1942》和《王的盛宴》的题材背景、叙事结构和电影语言都相去甚远。但其主题都关乎对人性及其选择的关照和对权力制度的讨论。本文试图从叔本华的悲观主
《墙壁之间》是一部神奇的电影,不仅仅是由于小说获得了法兰西文化厅《电视周刊》奖,影片获得了第61届戛纳电影节的最高荣誉金棕榈奖,更是因为教育话题引发的空前讨论。世界
其实本文中重点在于探讨科恩兄弟电影的创作手法—黑色幽默,分别从故事情节,文化背景以及创作手段这三个方面由浅入深地分析科恩兄弟的电影创作之路,并且以《血迷宫》和《师
自90年代以来,电视剧产业繁荣发展,当前电视剧产业的竞争已白热化。市场的竞争使得不少电视剧向收视率投降,故而出现“一窝蜂”现象。近年来,广电总局频频颁发通知规范几类电
作为大陆电影体制内新生代导演的中坚人物,娄烨的电影于繁华的上海都市景慕上努力探寻着属于一代人的激情飞扬的青春梦幻。分析娄烨电影表达的特质,也就是对于第六代导演的解
在目前城镇化快速发展的背景下,对中小城市主要的交通问题以及改善对策进行探讨和研究。以河南省信阳市区为例,分析了中小城市交通问题的形成原因,并结合信阳市区的实际情况,
“新媒体①”是利用网络、数字技术,通过互联网、宽带局域网、无线通信网和卫星等渠道,以电脑、手机和电视作为终端,向受众提供语音、视频等数据服务等交互式信息和娱乐服务,
目的:探讨适合中国急性髓系白血病(AML)患者的预后分层标准。方法:收集634例AML患者的临床资料,临床特征及实验室检查,采用U检验及COX回归分析,生存分析应用Kaplan-Meier法,L
A new noise reduction method for nonlinear signal based on maximum variance unfolding(MVU)is proposed.The noisy sig- nal is firstly embedded into a high-dimensi
鱼耳石是存在于硬骨鱼内耳中的功能性沉积体,主要由碳酸钙和有机质构成,是一种典型的天然生物矿物.鱼的内耳中共有微耳石、星耳石、矢耳石各1对.利用FTIR光谱和Raman光谱对实