HUANG Chengquan①② WANG Shitong① JIANG Yizhang① DONG Aimei①③
①(School of Digital Media, Jiangnan University, Wuxi 214122, China) ②(Engineering Training Center, Guizhou Minzu Univeristy, Guiyang 550025, China) ③(School of Information, Qilu University of Technology, Jinan 250353, China)
Coordinate Descent (CD) is a promising method for large scale pattern classification issues with straightforward operation and fast convergence speed. In this paper, inspired by v-soft margin Support Vector Machine (v-SVM) for pattern classification, a new v-Soft Margin Logistic Regression Classifier (v-SMLRC) is proposed for pattern classification to enhance the generalization performance of Logistic Regression Classifier (LRC). The dual of v-SMLRC can be regarded as CDdual problem with equality constraint and then a new large scale pattern classification method called v-SMLRC-CDdual is proposed. The proposed v-SMLRC-CDdual can maximize the inner-class margin and effectively enhance the generalization performance of LRC. Empirical results conducted with large scale document datasets demonstrate that the proposed method is effective and comparable to other related methods.
BOTTOU L and BOUSQUET O. The tradeoffs of large scale learning[C]. Proceedings of Advances in Neural Information Processing Systems, Cambridge, 2008: 151-154.
[2]
LIN C Y, TSAI C H, LEE C P, et al. Large-scale logistic regression and linear support vector machines using Spark[C]. Proceedings of 2014 IEEE International Conference on Big Data, Washington DC, 2014: 519-528. doi: 10.1109/BigData. 2014.7004269.
[3]
AGERRI R, ARTOLA X, BELOKI Z, et al. Big data for natural language processing: A streaming approach[J]. Knowledge-Based Systems, 2015, 79: 36-42. doi: 10.1016/ j.knosys.2014.11.007.
[4]
DARROCH J N and RATCLIFF D. Generalized iterative scaling for log-linear models[J]. The Annals of Mathematical Statistics, 1972, 43(5): 1470-1480. doi: 10.1214/aoms/ 1177692379.
[5]
DELLA P S, DELLA P V, and LAFFERTY J. Inducing features of random fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(4): 380-393. doi: 10.1109/34.588021.
[6]
GOODMAN J. Sequential conditional generalized iterative scaling[C]. Proceedings of the 40th annual meeting of the association of computational linguistics, Philadelphia, 2002: 9-16. doi: 10.3115/1073083.1073086.
[7]
JIN R, YAN R, ZHANG J, et al. A faster iterative scaling algorithm for conditional exponential model[C]. Proceedings of the 20th International Conference on Machine Learning, New York, 2003: 282-289.
[8]
HUANG F L, HSIEN C J, CHANG K W, et al. Iterative scaling and coordinate descent methods for maximum entropy[J]. Journal of Machine Learning Research, 2010, 11(2): 815-848.
[9]
MINKA T P. A comparison of numerical optimizers for logistic regression[OL]. http://research.microsoft.com/en-us/ um/people/minka/papers/logreg/minka-logreg.pdf. 2007.
[10]
KOMAREK P and MOORE A W. Making logistic regression a core data mining tool: a practical investigation of accuracy, speed, and simplicity[R]. Technical report TR-05-27, Robotics Institute of Carnegie Mellon University, Pittsburgh, 2005.
[11]
LIN C J, WENG R C, and KEERTHI S S. Trust region Newton method for large-scale logistic regression[J]. Journal of Machine Learning Research, 2008, 9(4): 627-650.
[12]
KEERTHI S S, DUAN K B, SHEVADE S K, et al. A fast dual algorithm for kernel logistic regression[J]. Machine Learning, 2005, 61(1-3): 151-165. doi: 10.1007/s10994- 005-0768-5.
[13]
PLATT J C. Fast training of support vector machines using sequential minimal optimization[C]. Proceedings of Advances in Kernel Methods: Support Vector Learning, Cambridge, 1999: 185-208.
[14]
YU H F, HUANG F L, and LIN C J. Dual coordinate descent methods for logistic regression and maximum entropy models[J]. Machine Learning, 2011, 85(1/2): 41-75. doi: 10.1007/s10994-010-5221-8.
GU X, WANG S T, and XU M. A new cross-multidomain classification algorithm and its fast version for large datasets[J]. Acta Automatica Sinica, 2014, 40(3): 531-547. doi: 10.3724/SP.J.1004.2014.00531.
GU X and WANG S T. Fast cross-domain classification method for large multisources/small target domains[J]. Journal of Computer Research and Development, 2014, 51(3): 519-535. doi: 10.7544/issn1000-1239.2014.20120652.
ZHANG X F, CHEN B, WANG P H, et al. A target recognition method based on dirichlet process latent variable support vector machine model[J]. Journal of Electronics & Information Technology, 2015, 37(1): 29-36. doi: 10.11999/ JEIT140129.
JI X R, HOU C Q, and HOU Y B. Research on the distributed training method for linear SVM in WSN[J]. Journal of Electronics & Information Technology, 2015, 37(3): 708-714. doi: 10.11999/JEIT140408.
GAO F R, WANG J J, XI X G, et al. Gait recognition for lower extremity electromyographic signals based on PSO- SVM method[J]. Journal of Electronics & Information Technology, 2015, 37(5): 1154-1159. doi: 10.11999/ JEIT141083.
[20]
HSIEH C J, CHANG K W, LIN C J, et al. A dual coordinate descent method for large-scale linear SVM[C]. Proceedings of the 25th International Conference on Machine Learning, New York, 2008: 408-415. doi: 10.1145/1390156.1390208.
[21]
CHEN P H, LIN C J, and SCHÖLKOPF B. A tutorial on v-support vector machines[J]. Applied Stochastic Models in Business and Industry, 2005, 21(2): 111-136. doi: 10.1002/ asmb.537.
[22]
PENG X J, CHEN D J, and KONG L Y. A clipping dual coordinate descent algorithm for solving support vector machines[J]. Knowledge-Based Systems, 2014, 71: 266-278. doi: 10.1016/j.knosys.2014.08.005.
[23]
TSAI C H, LIN C Y, and LIN C J. Incremental and decremental training for linear classification[C]. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014: 343-352. doi: 10.1145/2623330.2623661.