Unity ML-Agents工具包v0.4和Udacity深度强化学习纳米学位

2023-12-08 04:43:38

We are happy to announce the release of the latest version of ML-Agents Toolkit: v0.4. It contains a number of features, which we hope everyone will enjoy.

我们很高兴宣布发布ML-Agents工具包的最新版本：v0.4。它包含许多功能，我们希望每个人都会喜欢。

It includes the option to train your environments directly from the editor, rather than as built executables, making iteration time much quicker. In addition, we are introducing a set of new challenging environments, as well as algorithmic improvements to help the agents learn to solve tasks that might previously only be learned with great difficulty or in some cases not at all. You can try out the new release by going to our GitHub release page. More exciting news – we are partnering with Udacity to launch an online education program – Deep Reinforcement Learning Nanodegree. Read on below to learn more.

它包括直接从编辑器而不是作为内置可执行文件训练您的环境的选项，从而使迭代时间更快。另外，我们将引入一组新的具有挑战性的环境，以及算法上的改进，以帮助代理学习解决以前可能很难学习甚至在某些情况下根本无法学习的任务。您可以转到我们的GitHub发布页面尝试新版本。更多令人振奋的消息–我们正在与Udacity合作推出在线教育计划– 深度强化学习纳米学位。在下面阅读以了解更多信息。

环境环境 (Environments)

We include two new environments with our latest release: Walker and Pyramids. Walker is physics-based humanoid ragdoll and Pyramids is a complex sparse-reward environment.

我们的最新版本包括两个新环境：Walker和Pyramids。沃克(Walker)是基于物理学的人形布娃娃，而金字塔(Pyramids)是一个复杂的稀疏奖励环境。

沃克 (Walker)

演示地址

The first new example environment we are including is called “Walker.” It contains agents which are humanoid ragdolls. They are completely physics-based, so the goal is for the agent to learn to control its limbs in a way that can allow it to walk forward. It learns this… with somewhat humorous results. Since there are many degrees of freedom in the agent’s body, we think this can serve as a great benchmark for Reinforcement Learning algorithms that research might develop.

我们包括的第一个新示例环境称为“ Walker”。它包含的是类人动物布娃娃。它们完全基于物理学，因此目标是使代理学习以可以使其前进的方式控制肢体。它学习到了……有些幽默的结果。由于代理人的身体有许多自由度，我们认为这可以作为研究可能开发的强化学习算法的一个很好的基准。

金字塔型 (Pyramids)

演示地址

The second new environment is called “Pyramids.” It features the return of our favorite blue cube agent. Rather than collecting bananas or hopping over walls, this time around the agent has to get to a golden brick atop a pyramid of other bricks. The trick, however, is that this pyramid only appears once a randomly placed switch has been activated. The agent only gets a positive reward upon reaching the brick, making this a very sparse-rewarding environment.

第二个新环境称为“金字塔”。它具有我们最喜欢的蓝色立方体代理的回报。这次不是围绕收集香蕉，也不是在墙壁上跳来跳去，这一次，代理商必须走到其他砖块金字塔顶上的金砖块上。但是，诀窍在于，仅当激活随机放置的开关后，此金字塔才会出现。代理商只有在到达砖头时才能获得积极的回报，这使它成为一个稀疏的奖励环境。

其他环境变化 (Additional environment variations)

Additionally, we are providing visual observation and imitation learning versions of many of our existing environments. The visual observation environments, in particular, are designed as a challenge for researchers interested in benchmarking neural network models which utilize convolutional neural networks (CNNs).

此外，我们提供了许多现有环境的视觉观察和模仿学习版本。尤其是，对于希望对使用卷积神经网络(CNN)的神经网络模型进行基准测试感兴趣的研究人员而言，视觉观察环境尤其面临挑战。

To learn more about our provided example environments, follow this link.

要了解有关我们提供的示例环境的更多信息，请单击此链接。

好奇心改善学习 (Improved learning with Curiosity)

To help agents solve tasks in which the rewards are fewer and far between, we’ve added an optional augmentation to our PPO algorithm. That augmentation is an implementation of the Intrinsic Curiosity Module, as described in this research paper from last year. In essence, the addition allows the agent to reward itself using an intrinsic reward signal based on how surprised it is by the outcome of its actions. This will enable it to more easily and frequently solve very sparse-reward environments, such as the Pyramid environment described above.

为了帮助代理解决奖励少而又差的任务，我们在PPO算法中添加了可选的增强功能。如去年的这篇研究论文所述，这种增强是内在好奇心模块的实现。从本质上讲，这种加法允许代理根据其行动结果的惊讶程度，使用内在的奖励信号来奖励自己。这将使它能够更轻松，更频繁地解决奖励稀疏的环境，例如上述的金字塔环境。

编辑培训 (In-Editor training)

One feature which has been requested since the announcement of ML-Agents toolkit is the ability to perform training from within the Unity Editor. We are happy to be taking the first step toward that goal in this release. It is now possible to simply launch the learn.py script, and then press the “play” button from within the editor to perform training. This will allow training to happen without having to build an executable and allows for faster iterations. We think this will save our users a lot of time, as well as shortening the gap between traditional game development workflows and the ML-Agents training process. This is made possible by a revamping of our communication system. Our improvements to the developer workflow will not stop here though. This is just the first step toward even closer integration with the Unity Editor which will be rolling out throughout 2018.

自发布ML-Agents工具包以来，要求提供的一项功能是能够在Unity编辑器中执行培训。我们很高兴在此版本中朝着该目标迈出第一步。现在可以简单地启动learn.py脚本，然后从编辑器中按“播放”按钮进行培训。这样就可以进行培训，而不必构建可执行文件，并且可以加快迭代速度。我们认为这将为我们的用户节省大量时间，并缩短了传统游戏开发工作流程与ML-Agents培训过程之间的差距。通过改造我们的通信系统，这成为可能。不过，我们对开发人员工作流程的改进将不止于此。这只是与Unity Editor进一步紧密集成的第一步，该编辑器将于2018年全年推出。

TensorFlowSharp升级 (TensorFlowSharp upgrade)

Lastly, we are happy to share that the TensorFlowSharp plugin has now been upgraded from 1.4 to 1.7.1. This means that developers and researchers can now use Unity ML-Agents Toolkit with models built using the near-latest version of TensorFlow and maintain compatibility between the models they train and the models they can embed into Unity projects. We have also improved our documentation around creating Android and iOS executables which take advantage of ML-Agents toolkit. You can check it out here.

最后，我们很高兴分享TensorFlowSharp插件现已从1.4升级到1.7.1。这意味着开发人员和研究人员现在可以将Unity ML-Agents工具包与使用最新版本的TensorFlow构建的模型一起使用，并保持他们训练的模型与可以嵌入到Unity项目中的模型之间的兼容性。我们还改善了有关使用ML-Agents工具包创建Android和iOS可执行文件的文档。您可以在这里查看。

Udacity深度强化学习纳米学位 (Udacity Deep Reinforcement Learning Nanodegree)

We are proud to announce that we are partnering with Udacity on a new nanodegree to help students and our community of users who want a deeper understanding of reinforcement learning. This Udacity course uses ML-Agents toolkit as a way to illustrate and teach the various concepts. If you’ve been using ML-Agents toolkit or want to know the math, algorithms, and theories behind reinforcement learning, sign up.

我们很自豪地宣布，我们正在与Udacity合作开发一种新的纳米学位，以帮助希望更深入地了解强化学习的学生和用户社区。该Udacity课程使用ML-Agents工具包作为说明和教授各种概念的方式。如果您一直在使用ML-Agents工具包，或者想了解强化学习背后的数学，算法和理论，请注册。

演示地址

反馈 (Feedback)

In addition to the features described above, we’ve also improved the performance of PPO, fixed a number of bugs, and improved the quality of tests provided with the ML-Agents codebase. As always, we welcome any feedback which you might have. Feel free to reach out to us on our GitHub issues page, or email us directly at ml-agents@unity3d.com.

除了上述功能之外，我们还提高了PPO的性能，修复了许多错误，并提高了ML-Agents代码库提供的测试质量。与往常一样，我们欢迎您提供任何反馈。请随时在我们的GitHub问题页面上与我们联系，或直接通过ml-agents@unity3d.com向我们发送电子邮件。

翻译自: https://blogs.unity3d.com/2018/06/19/unity-ml-agents-toolkit-v0-4-and-udacity-deep-reinforcement-learning-nanodegree/

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > COMSOL金纳米团簇表面等离激元效应
下一篇 > COMSOL纳米颗粒表面等离激元效应

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

Unity ML-Agents工具包v0.4和Udacity深度强化学习纳米学位

环境环境 (Environments)

沃克 (Walker)

金字塔型 (Pyramids)

其他环境变化 (Additional environment variations)

好奇心改善学习 (Improved learning with Curiosity)

编辑培训 (In-Editor training)

TensorFlowSharp升级 (TensorFlowSharp upgrade)

Udacity深度强化学习纳米学位 (Udacity Deep Reinforcement Learning Nanodegree)

反馈 (Feedback)

相关文章