Reviews on Group Detection in Online Social Networks
PAN Li WU Peng HUANG Danhua
(School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China)
(National Engineering Laboratory for Information Content Analysis Technology, Shanghai 200240, China)
Abstract:Groups are important mesoscopic organizations of Online Social Networks (OSNs). Group detection not only has important theoretical significance, but also has a wide range of applications. It promotes the application and development of online social networks. In this paper, group detection technology in online social networks is studied. Based on analyzing the formation mechanism of social groups, the online social network groups is defined and the group detection problem is introduced. According to different features adopted by group detection methods, the methods based on the attribute features only and those based on combination of attribute features and structure features are analyzed, respectively. Especially, it reviews the malicious behavior group detection methods by analyzing their feature selection mechanisms and detection models in detail. Finally, further research direction of group detection in online social networks is prospected.
潘理,吴鹏,黄丹华. 在线社交网络群体发现研究进展[J]. 电子与信息学报, 2017, 39(9): 2097-2107.
PAN Li, WU Peng, HUANG Danhua. Reviews on Group Detection in Online Social Networks. JEIT, 2017, 39(9): 2097-2107.
FANG Binxing, JIA Yan, and HAN Yi. Social network analysis—key research problems, related work, and future prospects[J]. Bulletin of Chinese Academy of Sciences, 2015, 30(2): 187-199. doi: 10.16418/j.issn.1000-3045.2015.02.007.
XU Jin, YANG Yang, JIANG Fei, et al. Social network structure feature analysis and its modelling[J]. Bulletin of Chinese Academy of Sciences, 2015, 30(2): 216-228. doi: 10. 16418/j.issn.1000-3045.2015.02.009.
[3]
HE L, LU C-T, MA J, et al. Joint community and structural hole spanner detection via harmonic modularity[C]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016: 875-884.
[4]
YING X, WANG C, WANG M, et al. CoDAR: Revealing the generalized procedure & recommending algorithms of community detection[C]. Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 2016: 2181-2184.
[5]
SHAHRIARI M, GUNASHEKAR S, DOMARUS M V, et al. Predictive analysis of temporal and overlapping community structures in social media[C]. Proceedings of the 25th International Conference Companion on World Wide Web, Geneva, Switzerland, 2016: 855-860.
[6]
LIANG X, TANG J, and PAN L. A neighborhood vector propagation algorithm for community detection[C]. 2014 IEEE Global Communications Conference, Austin, TX, USA, 2014: 2923-2928.
[7]
WU P and PAN L. Multi-objective community detection based on memetic algorithm[J]. PloS One, 2015, 10(5): e0126845. doi: 10.1371/journal.pone.0126845.
[8]
CRANE R and SORNETTE D. Robust dynamic classes revealed by measuring the response function of a social system[J]. Proceedings of the National Academy of Sciences, of the United States of America, 2008, 105(41): 15649-15653. doi: 10.1073/pnas.0803685105.
[9]
KANE G C, ALAVI M, LABIANCA G, et al. What's different about social media networks? A framework and research agenda[J]. MIS Quarterly, 2014, 38(1): 274-304.
[10]
ATKIN R. Combinatorial Connectivities in Social Systems: An Application of Simplicial Complex Structures to the Study of Large Organizations[M]. Swiss, Birkhauser, 1977: 71-91.
[11]
AGGARWAL C C, WOLF J L, YU P S, et al. Fast algorithms for projected clustering[C]. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, 1999: 61-72.
[12]
WOO KG, LEE JH, KIM MH, et al. FINDIT: A fast and intelligent subspace clustering algorithm using dimension voting[J]. Information and Software Technology, 2004, 46(4): 255-271.doi: 10.1016/j.infsof.2003.07.003.
[13]
YIP K P, CHEUNG D W, and NG M K. On discovery of extremely low-dimensional clusters using semi-supervised projected clustering[C]. 21st International Conference on Data Engineering (ICDE'05), Tokoyo, Japan, 2005: 329-340.
[14]
AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic subspace clustering of high dimensional data for data mining applications[C]. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, 1998: 94-105.
[15]
CHENG C-H, FU A W, and ZHANG Y. Entropy-based subspace clustering for mining numerical data[C]. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, 1999: 84-93.
[16]
ASSENT I, KRIEGER R, et al. EDSC: Efficient density- based subspace clustering[C]. Proceedings of the 17th ACM Conference on Information and Knowledge Management, Napa Valley, California, USA, 2008: 1093-1102.
[17]
ZHANG X, ZHENG H, LI X, et al. You are where you have been: Sybil detection via geo-location analysis in OSNs[C]. Global Communications Conference, Austin,TX, USA, 2014: 698-703.
[18]
WANG G, KONOLIGE T, WILSON C, et al. You are how you click: Clickstream analysis for sybil detection[C]. Proceedings of the 22nd USENIX Conference on Security, Washington, DC, USA, 2013: 1-15.
[19]
McCORD M and CHUAH M. Spam Detection on Twitter Using Traditional Classifiers[M]. In Autonomic and Trusted Computing. Springer, 2011: 175-186.
[20]
ZHOU Y, CHENG H, and YU J X. Graph clustering based on structural/attribute similarities[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 718-729. doi: 10.14778/ 1687627.1687709.
[21]
ZHOU Y, CHENG H, and YU J X. Clustering large attributed graphs: An efficient incremental approach[C]. 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, 2010: 689-698.
[22]
CHENG H, ZHOU Y, and YU J X. Clustering large attributed graphs: A balance between structural and attribute similarities[J]. ACM Transactions on Knowledge Discovery from Data, 2011, 5(2): 1-33. doi: 10.1145/1921632.1921638.
[23]
CHENG H, ZHOU Y, HUANG X, et al. Clustering large attributed information networks: An efficient incremental computing approach[J]. Data Mining and Knowledge Discovery, 2012, 25(3): 450-477. doi: 10.1007/s10618-012- 0263-0.
[24]
RUAN Y, FUHRY D, and PARTHASARATHY S. Efficient community detection in large networks using content and links[C]. Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013: 1089-1098.
[25]
AKOGLU L, TONG H, MEEDER B, et al. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs[C]. Proceedings of the 2012 SIAM International Conference on Data Mining, Anaheim, CA, USA, 2012: 439-450.
[26]
XU Z, KE Y, WANG Y, et al. A model-based approach to attributed graph clustering[C]. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, Arizona, USA, 2012: 505-516.
[27]
XU Z, KE Y, WANG Y, et al. GBAGC: A general Bayesian framework for attributed graph clustering[J]. ACM Transactions on Knowledge Discovery from Data, 2014, 9(1): 1-43. doi: 10.1145/2629616.
[28]
WU P and PAN L. Multi-objective community detection method by integrating users' behavior attributes[J]. Neurocomputing, 2016, 210. 13-25. doi: 10.1016/j.neucom. 2015.11.128.
[29]
SILVA A, WAGNER MEIRA J, and ZAKI M J. Mining attribute-structure correlated patterns in large attributed graphs[J]. Proceedings of the VLDB Endowment, 2012, 5(5): 466-477. doi: 10.14778/2140436.2140443.
[30]
YANG J, MCAULEY J, and LESKOVEC J. Community detection in networks with node attributes[C]. 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 2013: 1151-1156.
[31]
GUNNEMANN S, FARBER I, RAUBACH S, et al. Spectral subspace clustering for graphs with feature vectors[C]. 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 2013: 231-240.
[32]
GUNNEMANN S, FARBER I, BODEN B, et al. GAMer: A synthesis of subspace clustering and dense subgraph mining[J]. Knowledge and Information Systems, 2014, 40(2): 243-278. doi: 10.1007/s10115-013-0640-z.
[33]
HUANG X, CHENG H, and YU J X. Dense community detection in multi-valued attributed networks[J]. Information Sciences, 2015, 314: 77-99. doi: 10.1016/j.ins.2015.03.075.
[34]
REVELLE M, DOMENICONI C, SWEENEY M, et al. Finding community topics and membership in graphs[C]. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 2015: 625-640.
[35]
ATZMUELLER M, DOERFEL S, and MITZLAFF F. Description-oriented community detection using exhaustive subgroup discovery[J]. Information Sciences, 2016, 329. 965-984. doi: 10.1016/j.ins.2015.05.008.
[36]
YIN H, HU Z, ZHOU X, et al. Discovering interpretable geo-social communities for user behavior prediction[C]. 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 2016: 942-953.
[37]
LIU L, XU L, WANGY Z, et al. Community detection based on structure and content: A content propagation perspective [C]. 2015 IEEE International Conference on Data Mining (ICDM), Atlantic City, NJ, USA, 2015: 271-280.
[38]
POOL S, BONCHI F, and LEEUWEN M V. Description- driven community detection[J]. ACM Transactions on Intelligent Systems and Technology, 2014, 5(2): 28. doi: 10.1145/2517088.
[39]
PEROZZI B, AKOGLU L, et al. Focused clustering and outlier detection in large attributed graphs[C]. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2014: 1346-1355.
[40]
THOMAS K, MCCOY D, GRIER C, et al. Trafficking fraudulent accounts: The role of the underground market in Twitter spam and abuse[C]. Proceedings of the 22nd USENIX Conference on Security, Washington, D.C., USA, 2013: 195-210.
[41]
HUANG T-K, RAHMAN M S, MADHYASTHA H V, et al. An analysis of socware cascades in online social networks[C]. Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013: 619-630.
[42]
ZHANG X, LI Z, ZHU S, et al. Detecting spam and promoting campaigns in Twitter[J]. ACM Transactions on the Web, 2016, 10(1): 1-28. doi: 10.1145/2846102.
[43]
SINGH A, NGAN T W, DRUSCHEL P, et al. Eclipse attacks on overlay networks: Threats and defenses[C]. 25th IEEE International Conference on Computer Communications, Waikoloa, Hawaii, USA, 2006: 1-12.
[44]
SIT E and MORRIS R. Security considerations for peer-to- peer distributed hash tables[C]. Revised Papers from the First International Workshop on Peer-to-Peer Systems, Springer-Verlag, 2002: 261-269.
[45]
ZUBIAGA A, LIAKATA M, PROCTER R, et al. Towards detecting rumours in social media[C]. Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, 2015: 35-41.
CHENG Xiaotao, LIU Caixia, and LIU Shuxin. Graph-based features for identifying spammers in microblog networks[J]. Acta Automatica Sinica, 2015, 41(9): 1533-1541.
[47]
VISWANATH B, MONDAL M, CLEMENT A, et al. Exploring the design space of social network-based sybil defenses[C]. 2012 Fourth International Conference on Communication Systems and Networks (COMSNETS 2012), Bangalore, India, 2012: 1-8.
[48]
VISWANATH B, POST A, GUMMADI K P, et al. An analysis of social network-based sybil defenses[J]. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 363-374. doi: 10.1145/1851275.1851226.
[49]
DANEZIS G and MITTAL P. SybilInfer: Detecting sybil nodes using social networks[C]. The Network and Distributed System Security Symposium, San Diego, CA, USA, 2009.
[50]
KWON S, CHA M, JUNG K, et al. Prominent features of rumor propagation in online social media[C]. 2013 IEEE 13th International Conference on Data Mining (ICDM), Dallas, TX, USA, 2013: 1103-1108.
[51]
GAN Q and SUEL T. Improving web spam classifiers using link structure[C]. Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web, Banff, Alberta, Canada, 2007: 17-20.
[52]
BOYKIN P O and ROYCHOWDHURY V P. Leveraging social networks to fight spam[J]. Computer, 2005, 38(4): 61-68. doi: 10.1109/MC.2005.132.
[53]
FAKHRAEI S, FOULDS J, SHASHANKA M, et al. Collective spammer detection in evolving multi-relational social networks[C]. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 2015: 1769-1778.
[54]
HU X, TANG J, GAO H, et al. Social spammer detection with sentiment information[C]. Proceedings of the 2014 IEEE International Conference on Data Mining, Shenzhen, China, 2014: 180-189.
[55]
SHAMS R and MERCER R E. Classifying spam emails using text and readability features[C]. 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 2013: 657-666.
[56]
SANDULESCU V and ESTER M. Detecting singleton review spammers using semantic similarity[C]. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 2015: 971-976.
[57]
CASTILLO C, MENDOZA M, and POBLETE B. Information credibility on Twitter[C]. Proceedings of the 20th International Conference on World Wide Web, New York, NY, USA, 2011: 675-684.
[58]
ENNALS R, BYLER D, AGOSTA J M, et al. What is disputed on the web?[C]. Proceedings of the 4th Workshop on Information Credibility, Raleigh, North Carolina, USA, 2010: 67-74.
[59]
ZHAO Z, RESNICK P, and MEI Q. Enquiring minds: Early detection of rumors in social media from enquiry posts[C]. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 2015: 1395-1405.
[60]
TAKAHASHI T and IGATA N. Rumor detection on Twitter[C]. Joint 6th International Conference on Soft Computing and Intelligent Systems and 13th International Symposium on Advanced Intelligent Systems, Kobe, Japan, 2012: 452-457.
[61]
ZHOU X, CAO J, JIN Z, et al. Real-time news certification system on Sina Weibo[C]. Proceedings of the 24th[61] International Conference on World Wide Web, Florence,
Italy, 2015: 983-988.
[62]
NOH G and KIM C K. RobuRec: Robust sybil attack defense in online recommender systems[C]. 2013 IEEE International Conference on Communications, Budapest, Hangary, 2013: 2001-2005.
[63]
YANG Y, SUN Y, KAY S, et al. Securing rating aggregation systems using statistical detectors and trust[J]. IEEE Transactions on Information Forensics & Security, 2009, 4(4): 883-898. doi: 10.1109/TIFS.2009.2033741.
[64]
YU H, SHI C, KAMINSKY M, et al. DSybil: Optimal sybil-resistance for recommendation systems[C]. 30th IEEE Symposium on Security and Privacy, Washington, DC, USA, 2009: 283-298.
[65]
GUPTA A and KUMARAGURU P. Credibility ranking of tweets during high impact events[C]. Proceedings of the 1st Workshop on Privacy and Security in Online Social Media, Lyon, France, 2012: 2-8.
[66]
GUPTA A, LAMBA H, and KUMARAGURU P. Prayforboston: Analyzing fake content on Twitter[C]. eCrime Researchers Summit, San Francisco, CA, USA, 2013: 1-12.
[67]
CAO Q, YANG X, YU J, et al. Uncovering large groups of active malicious accounts in online social networks[C]. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, Arizona, USA, 2014: 477-488.
[68]
MA J, GAO W, WEI Z, et al. Detect rumors using time series of social context information on microblogging websites[C]. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, 2015: 1751-1754.