摘要 传统支持向量机分类过程的计算量和支持向量的个数成正比,当支持向量较多时,其分类过程的计算比较耗时。该文基于支持向量的稀疏性,证明了对支持向量压缩时,收紧新的快速决策函数和原始决策函数之间的误差等价于在样本空间对原始支持向量进行K均值聚类操作,据此提出了一种约简支持向量的快速分类算法FD-SVM(Fast Decision algorithm of Support Vector Machine),该算法首先对原始的支持向量进行特定比例的K均值聚类操作,聚类的中心为约简后新的支持向量,按照分类误差最小的原则构建优化模型,用二次规划方法求解得到新的支持向量的系数。标准数据集上的实验表明,保持分类精度的损失在统计意义上不明显的前提下,FD-SVM可以有效压缩支持向量的数量,提高分类速度。
Abstract:The number of Support Vectors (SVs) of SVM is usually large and this results in a substantially slower classification speed than many other approaches. The less SVs means the more sparseness and higher classification speed. How to reduce the number of SVs but without loss of generalization performance becomes a significant problem both theoretically and practically. Basing on the sparsity of SVs, it is proven that when clustering original SVs, the minimal upper bound of the error between the original decision function and the fast decision function can be achieved by K-means clustering the original SVs in input space, then a new algorithm called Fast Decision algorithm of Support Vector Machine (FD-SVM) is proposed, which employs K-means to cluster a dense SVs set to a sparse set and the cluster centers are used as the new SVs, then aiming to minimize the classification gap between SVM and FD-SVM, a quadratic programming model is built for obtaining the optimal coefficients of the new sparse SVs. Experiments on toy and real-world data sets demonstrate that compared with original SVM, the number of SVs decreases and the speed of classification increases, while the loss of accuracy is acceptable at the 0.05 significant level.