Algorithm of motif discovery for multiple attributes uncertain data stream is proposed on the basis of MEME (Multiple Expectation-maximization for Motif Elicitation), which consults the thought of sequential pattern discovery in bioinformatics to solve the problem of frequent pattern discovery for multiple attributes uncertain data stream. A new method for update calculation of uncertain sliding window is designed based on mixed type model, SAX (Symbolic Aggregate approXimation) symbolic strategy is improved, and similarity analysis method for multiple attributes motifs under different sliding windows is put forward. The proposed algorithm is verified to be correct functionally by a set of uncertain data stream in the wireless sensor network of air and missile defense. Its accuracy is measured through planting different number of motifs. Furthermore, comparison with previous algorithm with tuples’ valid probability set to 1 shows that the proposed algorithm can discover frequent pattern for multiple attributes uncertain data stream precisely.
LEUNG C K S, JIANG F, and HAYDUK Y. A landmark- model based system for mining frequent patterns from uncertain data streams[C]. 2011 International Database Engineering and Applications Symposium, Lisbon, Portugal, 2011: 249-250. doi: 10.1145/2076623.2076659.
[2]
CHUI C K and KAO B. A decremental approach for mining frequent itemsets from uncertain data[C]. 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Osaka, Japan, 2008: 64-75. doi: 10.1007/978-3-540-68125.
[3]
LEUNG C K S, HAO B, and BRAJCZUK D A. Mining uncertain data for frequent itemsets that satisfy aggregate constraints[C]. 25th Annual ACM Symposium on Applied Computing, Sierre, Switzerland, 2010: 1034-1038. doi: 10.1145/1774088.1774305.
[4]
LEUNG C K S and HAO B. Mining of frequent items from streams of uncertain data[C]. 25th IEEE International Conference on Data Engineering, Piscataway, NJ, USA, 2009: 1663-1670. doi: 10.1109/ICDE.2009.157.
[5]
汤克明. 不确定数据流中频繁数据挖掘[D]. [博士论文], 南京航空航天大学, 2012.
TANG Keming. Study on frequent data mining from uncertain data streams[D]. [Ph.D. dissertation], Nanjing University of Aeronautics and Astronautics, 2012.
[6]
HEWANADUNGODAGE C, YUNI X, and LEE J J. Hyper-structure mining of frequent patterns in uncertain data streams[J]. Knowledge and Information Systems, 2013, 37: 219-244. doi: 10.1007/s10115-012-0581-y.
[7]
LEUNG C K S, CUZZOCREA A, FAN J, et al. Discovering frequent patterns from uncertain data streams with time-fading and landmark models[J]. Transactions on Large-Scale Data and Knowledge-Centered Systems VIII, 2013: 174-196. doi: 10.1007/978-3-642-37574-3_8.
YANG Jiaoyun. High performance algorithms and models for large-scale biological sequence analysis[D]. [Ph.D. dissertation], University of Science and Technology of China, 2014.
[11]
LIN J, KEOGH E, PATEL P, et al. Finding motifs in time series[C]. Proceedings of the 2nd Workshop on Temporal Data Mining at KDD, District of Colombia, USA, 2002: 53-68.
[12]
CHIU B, KEOGH E, and LONARDI S. Probabilistic discovery of time series motifs[C]. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, District of Colombia, USA, 2003: 493-498. doi: 10.1145/956750.956808.
[13]
FERREIRA P G, AZEVEDO P J, SILVA C G, et al. Mining approximate motifs in time series[C]. 9th international conference on Discovery Science, Berlin, Germany, 2006: 89-101 .
[14]
MUEEN A, KEOGH E, ZHU Q, et al. Exact discovery of time series motif[C]. 9th SIAM International Conference on Data Mining 2009, Nevada, USA, 2009: 469-480.
[15]
ABDULLAH M and NIKAN C. Enumeration of time series motifs of all lengths[J]. Knowledge and Information Systems, 2015, 45: 105-132. doi: 10.1007/s10115-014-0793-4.
ZHANG Yipu, HUO Hongwei, YU Q, et al. A novel fixed- position projection refinement algorithm for TFBS Identification[J]. Chinese Journal of Computers, 2013, 36(12): 2545-2559. doi: 10.3724/SP.J.1016.2013.02545.
[17]
TIMOTHY L B. DREME: motif discovery in transcription factor ChIP-seq data[J]. Original Paper, 2011, 17(12): 1653-1659. doi: 10.1093/bioinformatics/btr261.
[18]
DANIEL Q and XIE Xiaohui. EXTREME: an online EM algorithm for motif discovery[J]. Original Paper, 2014, 30(12): 1667-1673. doi: 10.1093/bioinformatics/btu093.
[19]
THANH T L T, PENG Liping, DIAO Yanlei, et al. CLARO: modeling and processing uncertain data streams[J]. The VLDB Journal, 2012, 21: 651-676. doi: 10.1007/s00778- 011-0261-7.
[20]
ARCHAMBEAU C and VERLEYSEN M. Manifold constrained finite Gaussian mixtures [C]. 8th International Work Conference on Artificial Neural Networks, Berlin, Germany, 2005: 820-828.
[21]
MICHELE D. Modeling and querying data series and data streams with uncertainty[D]. [Ph.D. dissertation], University of Trento, 2014.
[22]
HONG Y. On computing the distribution function for the sum of independent and non-identical random indicators [R]. Technical Report, Department of Statistics, Virginia Tech, 2011.
QU Wenlong, ZHANG Kejun, YANG Bingru, et al. Time series symbolization based on singular event feature clustering[J]. Systems Engineering and Electronics, 2006, 28(8): 1131-1134.
[24]
JESSICA L, EAMONN K, LI W, et al. Experiencing SAX: a novel symbolic representation of time series[J]. Data Minning and Knowledge Discovery, 2007, 15: 107-144. doi: 10.1007/ s10618-007- 0064-z.