For the traditional clustering algorithms efficiency problems in situations of insufficient datasets or datasets with noises, a Knowledge Transfer Clustering Algorithm with Privacy Protection (KTCAPP) is proposed based on the classical Fuzzy C-Means (FCM) technology by leveraging two kinds of knowledge which are the historical class center and the historical class membership. The performance of KTCAPP is enhanced by using auxiliary knowledge from history datasets to guide the current clustering task with insufficient datasets or datasets with noises. In addition, KTCAPP is of good capability of privacy protection because the algorithm only uses the historical class center and the historical class membership which do not expose the raw data. Experiment results show the proposed algorithm is efficient.
FERRARI D G and CASTRO L N. Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods [J]. Information Sciences, 2015, 301(1): 181-194. doi: 10.1016/j.ins.2014.12.044.
[2]
TZORTZIS G and LIKAS A. The minmax k-means clustering algorithm[J]. Pattern Recognition, 2014, 47(7): 2505-2516. doi: 10.1016/j.patcog.2014.01.015.
DENG Z H, ZHANG J B, JIANG Y Z, et al. Fuzzy subspace clustering based zero-order L2-norm TSK fuzzy system[J]. Journal of Electronics & Information Technology, 2015, 37(9): 2082-2088. doi: 10.11999/JEIT150074.
[5]
POPAT S K and EMMANUEL M. Review and comparative study of clustering techniques[J]. International Journal of Computer Science and Information Technologies, 2014, 5(1): 805-812.
[6]
BOUGUETTAYA A, YU Q, LIU X, et al. Efficient agglomerative hierarchical clustering[J]. Expert Systems with Applications, 2015, 42(5): 2785-2797. doi: 10.1016/j.eswa. 2014.09.054.
[7]
ZHU L, CHUNG F L, and WANG S T. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions[J]. IEEE Transactions on System, Man and Cybernetics, 2009, 39(3): 578-591. doi: 10.1109/TSMCB. 2008.2004818.
[8]
DUNN J C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1973, 3(3): 32-57. doi: 10.1080/ 01969727308546046.
[9]
BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. New York, Plenum Press, 1981: 43-93.
ZHAO F, LIU H Q, and FAN J L. Multi-objective evolutionary clustering with complementary spatial information for image segmentation[J]. Journal of Electronics & Information Technology, 2015, 37(3): 672-678. doi: 10.11999/JEIT140371.
ZHAO X M, LI Y, and ZHAO Q H. Image segmentation by fuzzy clustering algorithm combining hidden Markov random field and Gaussian regression model[J]. Journal of Electronics & Information Technology, 2014, 26(11): 2730-2736. doi: 10.3724/SP.J.1146.2013.01751.
[12]
KIM Y H, SHIM K, KIM M S, et al. DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce[J]. Information Systems, 2014, 42(1): 15-35. doi: 10.1016/j.is.2013.11.002.
[13]
AGRAWAL A S and BOJEWWAR S. Comparative study of various clustering techniques[J]. International Journal of Computer Science and Mobile Computing, 2014, 3(10): 497-504.
[14]
SHAO L, ZHU F, and LI X. Transfer learning for visual categorization: a survey[J]. Neural Networks and Learning, 2014, 26(5): 1019-1034. doi: 10.1109/TNNLS.2014.2330900.
[15]
LU J, BEHBOOD V, HAO P, et al. Transfer learning using computational intelligence: A survey[J]. Knowledge-based Systems, 2015, 80(1): 14-23. doi: 10.1016/j.knosys.2015. 01.010.
[16]
LONG M S, WANG J M, DING G G, et al. Transfer learning with graph co-regularization[J]. Knowledge and Data Engineering, 2014, 26(7): 1805-1818. doi: 10.1109/TKDE. 2013.97.
[17]
DAI W Y, XUE G R, YANG Q, et al. Co-clustering based classification for out-of-domain document[C]. The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, 2007: 210-219. doi: 10.1145/1281192.1281218.
[18]
GU Q and ZHOU J. Learning the shared subspace for multi- task clustering and transductive transfer classification[C]. The 2009 Ninth IEEE International Conference on Data Mining, IEEE, Washington DC, USA, 2009: 159-168. doi: 10.1109/ICDM.2009.32.
[19]
YANG Q, CHEN Y Q, XUE G R, et al. Heterogeneous transfer learning for image clustering via the social web[C]. Proceeding of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore, 2009: 1-9.
[20]
XUE G R, DAI W Y, YANG Q, et al. Topic-bridged PLSA for cross-domain text classification[C]. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, ACM, 2008: 627-634. doi: 10.1145/1390334. 1390441.
[21]
MOHAMMAD K S and SHAMS N. Analysis of KDD CUP 99 dataset using clustering based data mining[J]. International Journal of Database Theory and Application, 2013, 6(5): 23-34.
[22]
GU Q and ZHOU J. Co-clustering on manifolds[C]. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2009: 359-368. doi: 10.1145/1557019.1557063.
[23]
DAI W Y, YANG Q, XUE G R, et al. Self-taught clustering [C]. Proceeding of the 25th International Conference on Machine Learning, ACM, New York, NY, USA, 2008: 200-207. doi: 10.1145/1390156.1390182.
[24]
JING L, NG K M, and HUANG Z. An entropy weighting K-means algorithm for subspace clustering of high- dimensional sparse data[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1026-1041. doi: 10.1109/ TKDE.2007.1048.
[25]
LIU J, MOHAMMED J, CARTER J, et al. Distance-based clustering of CGH data[J]. Bioinformatics, 2006, 22(16): 1971-1978. doi: 10.1093/bioinformatics/btl185.
[26]
MCCALLUM A K. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering[OL]. http://www.cs.cmu.edu/mccallum/bow, 1996.