【摘 要】
:
Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play (NFSP) is an effective algorithm that learns approxi-mate Nash Equilibrium of
【机 构】
:
College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China
论文部分内容阅读
Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play (NFSP) is an effective algorithm that learns approxi-mate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge.However,it needs to train a neural network in an off-policy manner to approximate the action values.For games with large search spaces,the training may suffer from unnecessary exploration and sometimes fails to converge.In this paper,we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP,called MC-NFSP,to improve the performance in real-time zero-sum imperfect-information games.With experiments and empirical analysis,we demon-strate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not.Furthermore,we develop an Asynchronous Neural Fictitious Self-Play framework (ANFSP).It uses asyn-chronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality.The experiments with th e games with hidden state informa-tion (Texas Hold\'em),and the FPS (firstperson shooter) games demonstrate effectiveness of our algorithms.
其他文献
盐雾环境下的污秽物具有较高的可溶盐含量,可造成绝缘子绝缘性能下降,进而危及电力系统的安全运行.以XWP2-160型瓷双伞绝缘子和FXBW-110/120-2型复合绝缘子为研究对象,基于场致荷电机理,利用多物理场耦合软件COMSOL对其积污特性进行数值模拟,探索了盐密ESDD的计算方法,并验证了其合理性.利用该方法研究了盐雾环境下风速、雾滴粒径、电压类型对瓷及复合绝缘子积污特性的影响,分析了污秽沿绝缘子伞裙的分布规律.结果表明:复合绝缘子的积污量大于同条件下瓷双伞绝缘子积污量,盐雾环境频发地区可优先考虑瓷双
Traditional first-order logic has four definitions for quantifiers,which are defined by universal and existential quan-tifiers.In L3-valued (three-valued) first-order logic,there are eight kinds of definitions for quantifiers;and corresponding Gentzen ded
本文针对组合浮囊型浮式防波堤结构,提出了三种不同组合的结构方案,并通过Flow-3D软件进行数值计算,对比分析了在中等水深的长周期波条件下的消浪性能.并得到以下结论:浮囊按矩形布置的组合型式比按三角形布置的组合型式消浪效果更好;无论是矩形布置还是三角形布置,双层浮囊排列的浮式防波堤在D/d>0.5或B/L>0.3的情况下均可以达到消减一半波能的目的;相比箱板式浮式防波堤,板阻式可以通过扰动底部水质点运动进一步提高浮式防波堤的消浪性能.同时,针对板阻式结构,由于下部浮囊在横断面方向不是整排布置,本文引入关于
In this paper,a binary-extensible quality status en-coding scheme,named IQSCT (IoT quality status code table),is proposed for the PCB-based product with available recovery options in remanufacturing.IQSCT is achieved by code evolu-tion based on binary log
A sememe is defined as the minimum semantic unit of languages in linguistics.Sememe knowledge bases are built by manually annotating sememes for words and phrases.HowNet is the most well-known sememe knowledge base.It has been extensively utilized in many
LNG船舶通航因特殊监管要求,具有一定排他性.前期研究表明基于目前一般通航规则,单个港址LNG泊位数量不宜超过4个,超过后,船舶通航效率较低、运营监管风险较高.本文采用多智能体泊位组联合运营仿真建模,采用控制变量法,定量评估了码头高负荷状态下不同LNG船舶监管条件和航行距离对LNG运输船舶通航效率的影响,以及采用一定优化通航组织方式对通航效率提升的效果.结果表明,LNG泊位数量4个、独立单向航道、LNG船舶航行监管距离在20海里左右时,系统运营效率相对的较高;进一步采取LNG船舶组队进出港或设置独立双向L
Multi-user collaborative editors are useful computer-aided tools to support human-to-human collabora-tion.For multi-user collaborative editors,selective undo is an essential utility enabling users to undo any editing operations at any time.Collaborative e
The emergence of non-volatile memory (NVM) has introduced new opportunities for performance optimizations in existing storage systems.To better utilize its byte-addressability and near-DRAM performance,NVM can be attached on the memory bus and accessed vi
针对新型航标装置研究了浮体在规则波作用下的随波性能及浮体形状对浮体结构运动的影响,结合流体力学、模型试验及数值模拟等理论知识,通过AQWA软件对新型航标装置浮体结构的水动力特性进行数值模拟,分析了在规则波作用下四种形状浮体结构的运动响应幅值算子、附加质量和辐射阻尼及一阶波浪激振力随入射波频率的变化规律,验证了新型航标装置圆柱形浮体运行可靠性.
1 IntroductionrnBy making the best of the information technology in smart grid,considerable power energy can be effectively saved[1,2].How-ever,frequently collecting user\'s power consumption data in-curs privacy disclosure issues.Meanwhile,data integri