Neural Network Language Modeling Using an Improved Topic Distribution Feature
LIU Chang①② ZHANG Yike①② ZHANG Pengyuan①② YAN Yonghong①②③
①(Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Acadamy of Sciences, Beijing 100190, China) ②(University of Chinese Academy of Sciences, Beijing 100049, China) ③(Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China)
Abstract:Attaching topic features to the input of Recurrent Neural Network (RNN) models is an efficient method to leverage distant contextual information. To cope with the problem that the topic distributions may vary greatly among different documents, this paper proposes an improved topic feature using the topic distributions of documents and applies it to a recurrent Long Short-Term Memory (LSTM) language model. Experiments show that the proposed feature achieved an 11.8% relatively perplexity reduction on the Penn TreeBank (PTB) dataset, and reached 6.0% and 6.8% relative Word Error Rate (WER) reduction on the SWitch BoarD (SWBD) and Wall Street Journal (WSJ) speech recognition task respectively. On WSJ speech recognition task, RNN with this feature can reach the effect of LSTM on eval92 testset.
刘畅, 张一珂,张鹏远,颜永红. 基于改进主题分布特征的神经网络语言模型[J]. 电子与信息学报, 2018, 40(1): 219-225.
LIU Chang, ZHANG Yike, ZHANG Pengyuan, YAN Yonghong. Neural Network Language Modeling Using an Improved Topic Distribution Feature. JEIT, 2018, 40(1): 219-225.
MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]. INTERSPEECH, Makuhari, Chiba, Japan, 2010: 1045-1048.
[2]
MIKOLOV T, JOULIN A, CHOPRA S, et al. Learning longer memory in recurrent neural networks[OL]. https:// arxiv.org/abs/1412.7753v22014.
[3]
MEDENNIKOV I and BULUSHEVA A. LSTM-based language models for spontaneous speech recognition[C]. International Conference on Speech and Computer, Athens, Greece, 2016: 469-475.
[4]
HUANG Z, ZWEIG G, and DUMOULIN B. Cache based recurrent neural network language model inference for first pass speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 2014: 6354-6358.
[5]
COCCARO N and JURAFSKY D. Towards better integration of semantic predictors in statistical language modeling[C]. International Conference on Spoken Language Processing, Sydney, Australia, 1998: 2403-2406.
[6]
KHUDANPUR S and WU J. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling[J]. Computer Speech & Language, 2000, 14(4): 355-372.
[7]
LAU R, ROSENFELD R, and ROUKOS S. Trigger-based language models: A maximum entropy approach[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, USA, 2002: 45-48.
[8]
ECHEVERRY-CORREA J D, FERREIROS-LÓPEZ J, COUCHEIRO-LIMERES A, et al. Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition[J]. Expert Systems with Applications, 2015, 42(1): 101-112.
[9]
MIKOLOV T and ZWEIG G. Context dependent recurrent neural network language model[C]. Spoken Language Technology Workshop, Miami, Florida, USA, 2012: 234-239.
ZHANG Jian, QU Dan, and LI Zhen. Recurrent neural network language model based on word vector features[J]. Pattern Recognition and Artificial Intelligence, 2015, (4): 299-305. doi: 10.16451/j.cnki.issn1003-6059.201504002.
[11]
GONG C, LI X, and WU X. Recurrent neural network language model with part-of-speech for Mandarin speech recognition[C]. International Symposium on Chinese Spoken Language Processing, Singapore, 2014: 459-463.
ZUO Lingyun, ZHANG Qingqing, LI Ta, et al. Revaluation based on LSTM -DNN language model in telephone conversation sqeech recognition[J]. Journal of Chongqing University of Post and Telecomunications, 2016, 28(2): 180-186. doi: 10.3979/j.issn.1673-825X.2016.02.007.
WANG Long, YANG Junan, CHEN Lei, et al. Parallel optimization of chinese language model based on recurrent neural network[J]. Journal of Applied Sciences, 2015, 33(3): 253-261. doi: 10.3969/j.issn.0255-8297.2015.03.004.
[14]
PIOTR Bojanowski, EDOUARD Grave, ARMAND Joulin, et al. Enriching word vectors with subword information[OL]. https://arxiv.org/abs/1607.04606v2.
[15]
GANGULY D, ROY D, MITRA M, et al. Word embedding based generalized language model for information retrieval[C]. The International ACM SIGIR Conference, Santiago, Chile, 2015: 795-798.
[16]
LI X. Recurrent neural network training with preconditioned stochastic gradient descent[OL]. https://arxiv.org/abs/1606. 04449v2, 2016.
[17]
BLEI D M, NG A Y, and JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[18]
BHUTADA S, BALARAM V V S S S, and BULUSU V V. Semantic latent dirichlet allocation for automatic topic extraction[J]. Journal of Information & Optimization Sciences, 2016, 37(3): 449-469.
[19]
MARCUS M P, MARCINKIEWICZ M A, and SANTORINI B. Building a large annotated corpus of English: the penn treebank[J]. Computational Linguistics, 1993, 19(2): 313-330.