The large scale and the coarse classification granularity of resources in literature knowledge bases lead to disorientation and overloading when learners retrieve and read papers. This paper proposes a mechanism of knowledge clustering and knowledge statistics based on MapReduce. Firstly, this paper presents a Co-occurrence Matrix building algorithm based on MapReduce (MR-CoMatrix). Secondly, it makes combination of the co-occurrence matrix and similarity coefficient to build the similarity matrix. Thirdly, the similarity matrix is standardized with Z scores. Finally, knowledge clusters are constructed with the Ward,s method. After knowledge clustering, this paper introduces a knowledge Statistics algorithm based on MapReduce (MR-Statistics) to dig the hidden information in each cluster. The experimental results show that the literature knowledge base with MR- CoMatrix and MR-Statistics can realize the accurate and fine clustering, multi-dimension statistics, computational efficiency, and less cost of time.
SERET A, VERBRAKEN T, and BAESENS B. A new knowledge-based constrained clustering approach: theory and application in direct marking[J]. Applied Soft Computing, 2014, 24(3): 316-327.
XU Sen, LU Zhimao, and GU Guochang. Spectral clustering algorithm for document cluster ensemble problem[J]. Journal on Communications, 2010, 31(6): 58-66.
[7]
ZHU Wenxing, CHEN Jianli, and LI Weiguo. An augmented Lagrangian method for VLSI global placement[J]. The Journal of Supercomputing, 2014, 69(2): 714-738.
[8]
ZHOU F, TORRE F D L, and HODGINS J K. Hierarchical aligned cluster analysis for temporal clustering of human motion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(3): 582-596.
[9]
MASHSHI S, NIU G, MAKOTO Y, et al. Information- maximization clustering based on squared-loss mutual information[J]. Neural Computation, 2014. 26(1): 84-131.
[10]
YU Feili, CAO Liangliang, FERIS R S, et al. Designing Category-level attributes for discriminative visual recognition [C]. Preceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013: 771-776.
LI Jianyuan, ZHOU Jiaogen, and GUAN Jihong. A survey of clustering algorithms based on spectra of graphs[J]. CAAI Transactions on Intelligent Systems, 2011, 6(5): 405-414.
[12]
LU Zhimao and ZHANG Qi. Clustering by data competition [J]. Science China (Information Sciences), 2013, 56(1): 1-13.
[13]
CHENG Bo, WANG Minhong, A I, et al. Research on e-learning in the workplace 2000-2012: A bibliometric analysis of the literature[J]. Educational Research Review, 2013, 11: 56-72.
KONG Wanzeng, SUN Zhihai, and YANG Can. Automatic spectral clustering based on eigengap and orthogonal eigenvector[J]. Acta Electronica Sinica, 2010, 38(8): 1880-1891.
[15]
CARPENTIER S, SOLE A D, and KAC V G. Rational matrix pseudodifferential operators[J]. Selecta Mathematica, 2014, 20(2): 403-419.
[16]
JUGL E, KUHWALD T, and IVERSEN K. Algorithm for construction of (0,1)-matrix codes[J]. Electronics Letters, 1997, 33(3): 226-229.
LI Jianjiang, CUI Jian, WANG Dan, et al. Survey of MapReduce parallel programming model [J]. Acta Electronica Sinica, 2011, 39(11): 2635-2642.
[18]
FERRERA P, PRADO I D, PALACIOS E, et al. Tuple MapReduce and pangool: an associated implementation[J]. Knowledge and Information Systems, 2014, 41(2): 531-557.
CHEN Jirong and LE Jiajin. SingleMapReduce: a MapReduce programming model based on single output file of HDFS[J]. Journal of South China University of Technology, 2014, 42(5): 135-142.
WANG Zhaoguo, YI Han, and ZHANG Weihua. Power saving based on characteristics of machine learning in data center[J]. Journal of Software, 2014, 25(7): 1432-1447.
YI Xiaohua, LIU Jie, and YE Dan. Development method of MapReduce oriented data flow processing[J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(2): 161-168.
[22]
ROWBERRY J. Z Scores[M]. New York: Springer Science + Business Media, 2013: 3419-3420.
[23]
VARIN T and BUREAU R. Clustering files of chemical structures using the Szekely-Rizzo generalization of Ward’s method[J]. Journal of Molecular Graphics and Modelling, 2009, 28(2): 187-195.
[24]
LEE A. Minkowski generalizations of Ward’s method in hierarchical clustering[J]. Journal of Classification, 2014, 31(2): 194-218.
[25]
MURTAGH F and LEGENDRE P. Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?[J]. Journal of Classification, 2014, 31(3): 274-295.