The main drawback of sparse representation based Single Channel Blind Source Separation (SCBSS) is the interference between sub-dictionaries. To alleviate this drawback, an extra sub-dictionary, named common sub-dictionary, is proposed to add into traditional union dictionary. The single source is reconstructed by linear combining sparsely activity atoms of its corresponding sub-dictionary and common sub-dictionary. The common sub-dictionary can pure discriminative information in each source’s specified sub-dictionary since the common information different sources shared together is gathered in common sub-dictionary. The optimization of objective function involves three steps: sparse representation, dictionary updating and weight coefficients optimization, the three steps are iteratively performed for a specified number of times or until convergence. In test stage, single source separation is achieved by combining atoms in source corresponding sub-dictionary and common sub-dictionary with the sparse coefficients of single mixed signal over union dictionary. Experimental results on speech dataset show that, when compared with traditional and state of art algorithms, the proposed algorithm can improve the performance 1 dB at most.
VANEPH A, MCNEIL E, RIGAUD F, et al. An automated source separation technology and its practical applications[C]. Audio Engineering Society Convention 140. Audio Engineering Society, Paris, France, 2016: 181-182.
DU Jian, GONG Kexian, and GE Lindong. Low complexity algorithm on blind separation of paired carrier multiple access signals based on single way timing accuracy[J]. Journal of Electronics & Information Technology, 2014, 36(8): 1872-1877. doi: 10.3724/SP.J.1146.2013.01459.
[3]
LOPEZ A R, ONO N, REMES U, et al. Designing multichannel source separation based on single-channel source separation[C]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Brisbane, Australia, 2015: 469-473. doi: 10.1109/ICASSP. 2015.7178013.
WU Di, TAO Rui, and ZHANG Xiaojun, et al. Perception auditory scene analysis for speaker recognition[J]. Acta Acustica, 2016, 41(2): 260-272. doi: 10.15949/j.cnki.0371- 0025.2016.02.015.
YANG Lidong, WANG Jing, and XIE Xiang, et al. Low rank tensor completion for recovering missing data in multi-channel audio signal[J]. Journal of Electronics & Information Technology, 2016, 38(2): 394-399. doi: 10.11999 /JEIT150589.
[6]
JANG G J, LEE T W, and OH Y H. Single-channel signal separation using time-domain basis functions[J]. IEEE Signal Processing Letters, 2003, 10(6): 168-171. doi: 10.1109/LSP. 2003.811630.
WANG Gang and SUN Bin. Research on blind signal separation technology and algorithm[J]. Aerospace Electronic Warfare, 2015, 31(4): 53-56. doi: 10.16328/j.htdz8511.2015. 04.015.
[8]
SCHMIDT M N and OLSSON R K. Single-channel speech separation using sparse non-negative matrix factorization[C]. ISCA International Conference on Spoken Language Proceesing, (INTERSPEECH), Pittsburgh, Pennsylvania, 2006: 2614-2617.
[9]
KING B J and ATLAS L. Single-channel source separation using complex matrix factorization[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(8): 2591-2597. doi: 10.1109/TASL.2011.2156786.
[10]
GRAIS E M and ERDOGAN H. Single channel speech music separation using nonnegative matrix factorization with sliding window and spectral masks[C]. Annual Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, 2011: 1773-1776.
[11]
GRAIS E M and ERDOGAN H. Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation[C]. INTERSPEECH, Lyon, France, 2013: 808-812.
[12]
WENINGER F, LE Roux J, HERSHEY J R, et al. Discriminative NMF and its application to single-channel source separation[C]. Annual Conference of the International Speech Communication Association (INTERSPEECH), Singapore, 2014: 865-869.
[13]
BAO G, XU Y, and YE Z. Learning a discriminative dictionary for single-channel speech separation[J]. IEEE/ ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(7): 1130-1138. doi: 10.1109/TASLP. 2014.2320575.
[14]
WANG Z and SHA F. Discriminative non-negative matrix factorization for single-channel speech separation[C]. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014: 3749-3753. doi: 10.1109/ICASSP.2014.6854302.
ZHANG Chunmei, YIN Zhongke, and XIAO Mingxia. Signal over-complete representation and sparse decomposition based on redundant dictionary[J]. Chinese Science Bulletin, 2006, 51(6): 628-633.
[16]
AHARON M, ELAD M, and BRUCKSTEIN A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322. doi: 10.1109/TSP.2006.881199.
[17]
COOKE M, BARKER J, CUNNINGHAM S, et al. An audio-visual corpus for speech perception and automatic speech recognition[J]. The Journal of the Acoustical Society of America, 2006, 120(5): 2421-2424. doi: 10.1121/1.2229005.
[18]
VINCENT E, GRIBONVAL R, and FEVOTTE C. Performance measurement in blind audio source separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1462-1469. doi: 10.1109/TSA.2005. 858005.
[19]
THOMAS S, SAON G, KUO H, et al. The IBM BOLT speech transcription system[C]. Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, 2015: 3150-3153.
[20]
NORRIS D, MCQUEEN J M, and CUTLER A. Prediction, Bayesian inference and feedback in speech recognition[J]. Language, Cognition and Neuroscience, 2016, 31(1): 4-18. doi: 10.1080/23273798.2015.1081703.