现有基于网络报文流量信息的协议分析方法仅考虑报文载荷中的明文信息,不适用于包含大量密文信息的安全协议。为充分发掘利用未知规范安全协议的密文数据特征,针对安全协议报文明密文混合、密文位置可变的特点,该文提出一种基于熵估计的安全协议密文域识别方法CFIA(Ciphertext Field Identification Approach)。在挖掘关键词序列的基础上,利用字节样本熵描述网络流中字节的分布特性,并依据密文的随机性特征,基于熵估计预定位密文域分布区间,进而查找密文长度域,定位密文域边界,识别密文域。实验结果表明,该方法仅依靠网络数据流量信息即可有效识别协议密文域,并具有较高的准确率。
Previous network-trace-based methods only consider the plaintext format of payload data, and are not suitable for security protocols which include a large number of ciphertext data; therefore, a novel approach named CFIA (Ciphertext Field Identification Approach) is proposed based on entropy estimation for unknown security protocols. On the basis of keywords sequences extraction, CFIA utilizes byte sample entropy and entropy estimation to pre-locate ciphertext filed, and further searches ciphertext length field to identify ciphertext field. The experimental results show that without using dynamic binary analysis, the proposed method can effectively identify ciphertext fields purely from network traces, and the inferred formats are highly accurate in identifying the protocols.
CABALLERO J, YIN H, LIANG Zhenkai, et al. Polyglot: automatic extraction of protocol message format using dynamic binary analysis[C]. Proceedings of the 14th ACM Conference on Computer and Communications Security, New York: 2007: 317-329. doi: 10.1145/1315245.1315286.
[2]
CUI Weidong, PEINADO M, CHEN K, et al. Automatic reverse engineering of input format[P]. USA, 8935677 B2, 2015-1-13.
[3]
WANG Zhi, JIANG Xuxian, CUI Weidong, et al. ReFormat: Automatic reverse engineering of encrypted messages[C]. European Symposium on Research in Computer Security, Berlin, 2009: 200-215. doi: 10.1007/978-3-642-04444-1_13.
[4]
CABALLERO J, POOSANKAM P, KREIBICH C, et al. Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering[C]. Proceedings of the 16th ACM Conference on Computer and Communications Security, New York, 2009: 621-634. doi: 10.1145/1653662. 1653737.
[5]
CABALLERO J and SONG D. Automatic protocol reverse- engineering: message format extraction and field semantics inference[J]. Computer Network, 2013, 57(2): 451-474. doi: 10.1016/j.comnet.2012.08.003.
[6]
BEDDOE M. The protocol information project[EB/OL]. http://www.4tphi.net/~awalters/PI/PI.html, 2004.
[7]
CUI Weidong, KANNAN J, and WANG H J. Discoverer: Automatic protocol reverse engineering from network traces[C]. Proceedings of the 16th USENIX Security Symposium, Berkeley, 2007: 199-212.
LI Min and YU Shunzheng. Noise-tolerant and optimal segmentation of message formats for unknown application- layer protocols[J]. Journal of Software, 2013, 24(3): 604-617. doi: 10.3724/SP.J.1001.2013.04243.
[9]
LUO Jianzhen and YU Shunzheng. Position-based automatic reverse engineering of network protocols[J]. Journal of Network and Computer Applications, 2013, 36(3): 1070-1077. doi: 10.1016/j.jnca.2013.01.013.
[10]
ZHANG Zhuo, ZHANG Zhibin, Lee P P C, et al. Toward unsupervised protocol feature Word extraction[J]. IEEE Journal on Selected Areas in Communications, 2014, 32(10): 1894-1906. doi: 10.1109/JSAC.2014.2358857.
[11]
TÉTARD O. Netzob[OL]. http://www.netzob.org/, 2013.
[12]
BOSSERT G, GUIHÉRY F, and HIET G. Towards automated protocol reverse engineering using semantic information[C]. Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, Kyoto, 2014: 51-62. doi: 10.1145/2590296.2590346.
[13]
KUMANO Y, ATA S, NAKAMURA N, et al. Towards real- time processing for application identification of encrypted traffic[C]. International Conference on Computing, Networking and Communications, Honolulu, HI, 2014: 136-140. doi: 10.1109/ICCNC.2014.6785319.
ZHAO Bo, GUO Hong, LIU Qinrang, et al. Protocol independent identification of encrypted traffic based on weighted cumulative sum test[J]. Journal of Software, 2013, 24(6): 1334-1345. doi: 10.3724/SP.J.1001.2013.04279.
[15]
OLIVAIN J and GOUBAULT-LARRECQ J. Detecting subverted cryptographic protocols by entropy checking[R]. LSV-06-13, 2006.
[16]
BONFIGLIO D, MELLIA M, MEO M, et al. Revealing skype traffic: when randomness plays with you[C]. Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Kyoto, 2007: 37-48. doi: 10.1145/1282380. 1282386.
[17]
PANINSKI L. A coincidence-based test for uniformity given very sparsely sampled discrete data[J]. IEEE Transactions on Information Theory, 2008, 54(10): 4750-4755. doi: 10.1109/ TIT.2008.928987.
PIRONTI A, POZZA D, and SISTO R. Spi2Java User Manual-Version 3.1[R]. Turin: Piedmont: Italy, Polytechnic University of Turin, 2008.
[21]
ACETO G, DAINOTTI A, DONATO W, et al. PortLoad: taking the best of two worlds in traffic classification[C]. Proceedings of IEEE International Conference on Computer Communications, San Diego, CA, 2010: 1-5. doi: 10.1109/ INFCOMW.2010.5466645.