基于云推理模型的深度强化学习探索策略研究

doi:10.11999/JEIT170347

摘要
图/表
参考文献(13)
相关文章 (15)

全文: PDF (229 KB)
输出: BibTeX | EndNote (RIS)

摘要强化学习通过与环境的交互学得任务的决策策略，具有自学习与在线学习的特点。但“交互试错”的机制也往往导致了算法的运行效率较低、收敛速度较慢。知识包含了人类经验和对事物的认知规律，利用知识引导智能体(agent)的学习，是解决上述问题的一种有效方法。该文尝试将定性规则知识引入到强化学习中，通过云推理模型对定性规则进行表示，将其作为探索策略引导智能体的动作选择，以减少智能体在状态-动作空间探索的盲目性。该文选用OpenAI Gym作为测试环境，通过在自定义的“CartPole-v2”中的实验，验证了提出的基于云推理模型探索策略的有效性，可以提高强化学习的学习效率，加快收敛速度。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	李晨溪
	曹雷
	陈希亮
	张永亮
	徐志雄
	彭辉
	段理文

关键词 ：云推理, 深度强化学习, 知识, 探索策略

Abstract：Reinforcement learning which has self-improving and online learning properties gets the policy of tasks through the interaction with environment. But the mechanism of “trial-and-error” usually leads to a large number of training episodes. Knowledge includes human experience and the cognition of environment. This paper tries to introduce the qualitative rules into the reinforcement learning, and represents these rules through the cloud reasoning model. It is used as the heuristics exploration strategy to guide the action selection. Empirical evaluation is conducted in OpenAI Gym environment called “CartPole-v2” and the result shows that using exploration strategy based on the cloud reasoning model significantly enhances the performance of the learning process.

Key words： Cloud reasoning Deep reinforcement learning Knowledge Exploration strategy

收稿日期: 2017-04-18 出版日期: 2017-11-01

PACS:

TP18

基金资助:中电集团重点预研基金(6141B08010101)，中国博士后科学基金(2015T81081, 2016M602974)，江苏省自然科学青年基金(BK20140075)

通讯作者: 李晨溪：男，1989年生，博士生，研究方向为指挥信息系统工程、强化学习. E-mail: streamorning@qq.com

作者简介: 李晨溪：男，1989年生，博士生，研究方向为指挥信息系统工程、强化学习. 曹雷：男，1965年生，教授，研究方向为指挥信息系统工程. 陈希亮：男，1985年生，讲师，研究方向为指挥信息系统工程. 张永亮：男，1982年生，讲师，研究方向为指挥信息系统工程.

引用本文:

李晨溪, 曹雷, 陈希亮,张永亮, 徐志雄, 彭辉,段理文. 基于云推理模型的深度强化学习探索策略研究[J]. 电子与信息学报, 2018, 40(1): 244-248. LI Chenxi,CAO Lei, CHEN Xiliang, ZHANG Yongliang, XU Zhixiong, PENG Hui, DUAN Liwen. Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning. JEIT, 2018, 40(1): 244-248.

链接本文:

http://jeit.ie.ac.cn/CN/10.11999/JEIT170347 或 http://jeit.ie.ac.cn/CN/Y2018/V40/I1/244

[1]	SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. MA: MIT Press, 1998: 3-24. doi: 10.1109/ TNN.1998.712192.
[2]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[OL]. https://arxiv.org /abs/1312.5602v1, 2013.12.
[3]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human- level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236.
[4]	OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 4026-4034.
[5]	BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1471-1479.
[6]	HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational information maximizing exploration[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1109-1117.
[7]	DAVENPORT T H, PRUSAK L, and PRUSAK L. Working Knowledge: How Organizations Manage What They Know [M]. Boston: Harvard Business School Press, 1997: 1-24. doi: 10.1145/347634.348775.
[8]	SANTOS M and BOTELLA G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems[J]. Knowledge-Based Systems, 2012, 32(8): 28-36.
[9]	BIANCHI R A C, ROS R, and MANTARAS R L D. Improving reinforcement learning by using case based heuristics[C]. Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Burlin, 2009: 75-89.
[10]	KUHLMANN G, STONE P, MOONEY R, et al. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer[C]. Proceedings of the 19th National Conference on Artificial Intelligence Workshop on Supervisory Control of Learning and Adaptive Systems, California, 2004: 30-35.
[11]	LI Deyi, CHEUNG D, SHI Xuemei, et al. Uncertainty reasoning based on cloud models in controllers[J]. Computers & Mathematics with Applications, 1998, 35(3): 99-123.
[12]	SINGH S P. Learning to solve Markovian decision processes [D]. [Ph.D. dissertation], University of Massachusetts, Amherst, 1994: 66-72.
[13]	HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Proceedings of the 30th AAAI Conference on Articial Intelligence, Phoenix, 2016: 2094-2100.