基于改进主题分布特征的神经网络语言模型

doi:10.11999/JEIT170219

摘要
图/表
参考文献(19)
相关文章 (15)

全文: PDF (302 KB)
输出: BibTeX | EndNote (RIS)

摘要在递归神经网络(RNN)语言模型输入中增加表示当前词所对应主题的特征向量是一种有效利用长时间跨度历史信息的方法。由于在不同文档中各主题的概率分布通常差别很大，该文提出一种使用文档主题概率改进当前词主题特征的方法，并将改进后的特征应用于基于长短时记忆(LSTM)单元的递归神经网络语言模型中。实验表明，在PTB数据集上该文提出的方法使语言模型的困惑度相对于基线系统下降11.8%。在SWBD数据集多候选重估实验中，该文提出的特征使LSTM模型相对于基线模型词错误率(WER)相对下降6.0%；在WSJ数据集上的实验中，该特征使LSTM模型相对于基线模型词错误率(WER)相对下降6.8%，并且在eval92测试集上，改进隐含狄利克雷分布(LDA)特征使RNN效果与LSTM相当。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	刘畅
	张一珂
	张鹏远
	颜永红

关键词 ：语音识别, 语言模型, 隐含狄利克雷分布, 长短时记忆

Abstract：Attaching topic features to the input of Recurrent Neural Network (RNN) models is an efficient method to leverage distant contextual information. To cope with the problem that the topic distributions may vary greatly among different documents, this paper proposes an improved topic feature using the topic distributions of documents and applies it to a recurrent Long Short-Term Memory (LSTM) language model. Experiments show that the proposed feature achieved an 11.8% relatively perplexity reduction on the Penn TreeBank (PTB) dataset, and reached 6.0% and 6.8% relative Word Error Rate (WER) reduction on the SWitch BoarD (SWBD) and Wall Street Journal (WSJ) speech recognition task respectively. On WSJ speech recognition task, RNN with this feature can reach the effect of LSTM on eval92 testset.

Key words： Speech recognition Language model Latent Dirichlet Allocation (LDA) Long Short-Term Memory (LSTM)

收稿日期: 2017-03-17 出版日期: 2017-10-27

PACS:

TP391.42

基金资助:国家自然科学基金(11590770-4, U1536117, 11504406, 11461141004)，国家重点研发计划重点专项(2016YFB0801203, 2016YFB0801200)，新疆维吾尔自治区科技重大专项(2016A03007- 1)

通讯作者: 张鹏远：男，1978年生，研究员，硕士生导师，研究方向为大词表非特定人连续语音识别、关键词检索、声学模型、鲁棒语音识别等. E-mail: pzhang@hccl.ioa.ac.cn

作者简介: 刘畅：女，1992年生，博士生，研究方向为语音信号处理、语音识别、语言模型等. 张一珂：男，1991年生，博士生，研究方向为语音信号处理、语音识别、语言模型、自然语言理解等. 张鹏远：男，1978年生，研究员，硕士生导师，研究方向为大词表非特定人连续语音识别、关键词检索、声学模型、鲁棒语音识别等. 颜永红：男，1967年生，研究员，博士生导师，研究方向为语音信号处理、语音识别、口语系统及多模系统、人机界面技术等.

引用本文:

刘畅, 张一珂,张鹏远,颜永红. 基于改进主题分布特征的神经网络语言模型[J]. 电子与信息学报, 2018, 40(1): 219-225. LIU Chang, ZHANG Yike, ZHANG Pengyuan, YAN Yonghong. Neural Network Language Modeling Using an Improved Topic Distribution Feature. JEIT, 2018, 40(1): 219-225.

链接本文:

http://jeit.ie.ac.cn/CN/10.11999/JEIT170219 或 http://jeit.ie.ac.cn/CN/Y2018/V40/I1/219

[1]	MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]. INTERSPEECH, Makuhari, Chiba, Japan, 2010: 1045-1048.
[2]	MIKOLOV T, JOULIN A, CHOPRA S, et al. Learning longer memory in recurrent neural networks[OL]. https:// arxiv.org/abs/1412.7753v22014.
[3]	MEDENNIKOV I and BULUSHEVA A. LSTM-based language models for spontaneous speech recognition[C]. International Conference on Speech and Computer, Athens, Greece, 2016: 469-475.
[4]	HUANG Z, ZWEIG G, and DUMOULIN B. Cache based recurrent neural network language model inference for first pass speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 2014: 6354-6358.
[5]	COCCARO N and JURAFSKY D. Towards better integration of semantic predictors in statistical language modeling[C]. International Conference on Spoken Language Processing, Sydney, Australia, 1998: 2403-2406.
[6]	KHUDANPUR S and WU J. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling[J]. Computer Speech & Language, 2000, 14(4): 355-372.
[7]	LAU R, ROSENFELD R, and ROUKOS S. Trigger-based language models: A maximum entropy approach[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, USA, 2002: 45-48.
[8]	ECHEVERRY-CORREA J D, FERREIROS-LÓPEZ J, COUCHEIRO-LIMERES A, et al. Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition[J]. Expert Systems with Applications, 2015, 42(1): 101-112.
[9]	MIKOLOV T and ZWEIG G. Context dependent recurrent neural network language model[C]. Spoken Language Technology Workshop, Miami, Florida, USA, 2012: 234-239.
[10]	张剑, 屈丹, 李真. 基于词向量特征的循环神经网络语言模型[J]. 模式识别与人工智能, 2015, (4): 299-305. doi: 10.16451 /j.cnki.issn1003-6059.201504002.
	ZHANG Jian, QU Dan, and LI Zhen. Recurrent neural network language model based on word vector features[J]. Pattern Recognition and Artificial Intelligence, 2015, (4): 299-305. doi: 10.16451/j.cnki.issn1003-6059.201504002.
[11]	GONG C, LI X, and WU X. Recurrent neural network language model with part-of-speech for Mandarin speech recognition[C]. International Symposium on Chinese Spoken Language Processing, Singapore, 2014: 459-463.
[12]	左玲云, 张晴晴, 黎塔, 等. 电话交谈语音识别中基于LSTM-DNN语言模型的重评估方法研究[J]. 重庆邮电大学学报(自然科学版), 2016, 28(2): 180-186. doi: 10.3979/j.issn. 1673-825X.2016.02.007.
	ZUO Lingyun, ZHANG Qingqing, LI Ta, et al. Revaluation based on LSTM -DNN language model in telephone conversation sqeech recognition[J]. Journal of Chongqing University of Post and Telecomunications, 2016, 28(2): 180-186. doi: 10.3979/j.issn.1673-825X.2016.02.007.
[13]	王龙, 杨俊安, 陈雷, 等. 基于循环神经网络的汉语语言模型并行优化算法[J]. 应用科学学报, 2015, 33(3): 253-261. doi: 10.3969/j.issn.0255-8297.2015.03.004.
	WANG Long, YANG Junan, CHEN Lei, et al. Parallel optimization of chinese language model based on recurrent neural network[J]. Journal of Applied Sciences, 2015, 33(3): 253-261. doi: 10.3969/j.issn.0255-8297.2015.03.004.
[14]	PIOTR Bojanowski, EDOUARD Grave, ARMAND Joulin, et al. Enriching word vectors with subword information[OL]. https://arxiv.org/abs/1607.04606v2.
[15]	GANGULY D, ROY D, MITRA M, et al. Word embedding based generalized language model for information retrieval[C]. The International ACM SIGIR Conference, Santiago, Chile, 2015: 795-798.
[16]	LI X. Recurrent neural network training with preconditioned stochastic gradient descent[OL]. https://arxiv.org/abs/1606. 04449v2, 2016.
[17]	BLEI D M, NG A Y, and JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[18]	BHUTADA S, BALARAM V V S S S, and BULUSU V V. Semantic latent dirichlet allocation for automatic topic extraction[J]. Journal of Information & Optimization Sciences, 2016, 37(3): 449-469.
[19]	MARCUS M P, MARCINKIEWICZ M A, and SANTORINI B. Building a large annotated corpus of English: the penn treebank[J]. Computational Linguistics, 1993, 19(2): 313-330.