Sun Li-juan①② Chen Xiao-dong① Han Chong① Guo Jian①②
①(College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China) ②(Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)
There is a great challenge in the data stream clustering due to a limitation of time and space. In order to solve this problem, a new fuzzy-clustering algorithm, called Weight Decay Streaming Micro Clustering (WDSMC), is presented in this paper. The algorithm uses a reformed weighted Fuzzy C-Means (FCM) algorithm, and improves the quality of clustering by the structures of micro-clusters and weight-decay. Experimental results show that this algorithm has better accuracy than Stream Weight Fuzzy C-Means (SWFCM) and StreamKM++ algorithm.
Jonathan A S, Elaine R F, Rodrigo C B, et al.. Data stream clustering: a survey[J]. ACM Computing Surveys, 2013, 46(1):13:1-13:31.
[2]
Shifei D, Fulin W, Jun Q, et al.. Research on data stream clustering algorithms[J]. Artificial Intelligence Review, 2013, 43(4): 593-600.
[3]
Tian Z, Raghu R, and Miron L. BIRCH: an efficient data clustering method for very large databases[C]. Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, USA, 1996: 103-114.
[4]
Aggarwal C C, Han J, and Yu P S. A framework for clustering evolving data streams[C]. Proceedings of the 29th Conference on Very Large Data Bases, Berlin, Germany, 2003: 81-92.
[5]
Chen Y and Tu L. Density-based clustering for real-time stream data[C]. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2007: 133-142.
[6]
Cao F, Ester M, Qian W, et al.. Density-based clustering over an evolving data stream with noise[C]. Proceedings of the 16th SIAM International Conference on Data Mining, Maryland, USA, 2006: 328-339.
[7]
Ackermann M R, M?rtens M, Raupach C, et al.. StreamKM ++: a clustering algorithm for data streams[J]. Journal of Experimental Algorithmics, 2012, 17(1): 2-4.
[8]
Arthur D and Vassilvitskii S. K-means++: the advantages of careful seeding[C]. Proceedings of the 2007 ACM-SIAM Symposium on Discrete Algorithm, New Orleans, USA, 2007: 1027-1035.
[9]
Baraldi A and Blonda P. A survey of fuzzy clustering algorithms for pattern recognition[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 1999, 29(6): 778-785.
[10]
Renxia W, Xiaoya Y, and Xiaoke S. A weighted fuzzy clustering algorithm for data stream[C]. Proceedings of the 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, Guangzhou, China, 2008: 360-364.
Guo Gong-de, Li Nan, and Chen Li-fei. Concept drift detection for data stream based on mixture model[J]. Journal of Computer Research and Development, 2014, 51(4): 731-742.
Hu Wei. Research and realization of a web information extraction and knowledge presentation system[J]. Application of Computer System, 2013, 22(5): 116-121.
[13]
李子柳. 大数据实时流式聚类框架研究[D]. [硕士论文], 中山大学, 2013.
Li Zi-liu. A framework for real time stream clustering of big data[D]. [Master dissertation], Sun Yat-sen University, 2013.
[14]
Hossein M K, Suhaimi I, and Javad H. Outlier detection in stream data by clustering method[J]. International Journal of Advanced Computer Science and Information Technology, 2013, 2(3): 25-34.