备案控制台

开发者社区云计算文章正文

A Brief Note about Boltzmann/Softmax Exploration Strategy

2017-04-28 1546

版权

版权声明：

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： One method that is often used in combination with the RL algorithms is the Beltzmann or softmax exploration strategy. The action selection strategy is still random, but selection probabili

One method that is often used in combination with the RL algorithms is the Beltzmann or softmax exploration strategy.
The action selection strategy is still random, but selection probabilities are weighted by their relative Q-values. This makes it more likely for the agent to choose good actions, whereas two actions that have similar Q-values will have almost the same probability to get selected. Its general form is

P (a) = e Q ( s , a ) T \sum i e Q ( s , a i ) T

in which

P(a) is the probability of selecting action

a and

T is the temperature parameter. Higher values of

T will move the selection more towards a purely random strategy and lower values will move to a fully greedy strategy.

止于至玄

目录

相关文章

青衫无名

|

机器学习/深度学习

Dynamic Entity Representation with Max-pooling Improves Machine

青衫无名

1379 0 0

Lux_Sun

PAT (Advanced Level) Practice - 1053 Path of Equal Weight（30 分）

PAT (Advanced Level) Practice - 1053 Path of Equal Weight（30 分）

Lux_Sun

138 0 0

Trouble..

|

机器学习/深度学习自然语言处理算法

Joint Information Extraction with Cross-Task and Cross-Instance High-Order Modeling 论文解读

先前的信息抽取(IE)工作通常独立地预测不同的任务和实例(例如，事件触发词、实体、角色、关系)，而忽略了它们的相互作用，导致模型效率低下。

Trouble..

118 0 0

AI浩

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

AI浩

2571 0 0

lfreeali

|

机器学习/深度学习关系型数据库 Oracle

060611G _optimizer_null_aware_antijoin

[20170606]11G _optimizer_null_aware_antijoin.txt --//上午测试_optimizer_null_aware_antijoin,发现自己不经意间又犯了一个低级错误,做1个记录.

lfreeali

1020 0 0

lfreeali

|

SQL Perl

052011GR2 _optimizer_null_aware_antijoin

[20150520]11GR2 _optimizer_null_aware_antijoin.txt --好久没写sql 优化的帖子: http://www.killdb.

lfreeali

933 0 0

Trouble..

|

机器学习/深度学习自然语言处理算法

TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking 论文解读

近年来，从非结构化文本中提取实体和关系引起了越来越多的关注，但由于识别共享实体的重叠关系存在内在困难，因此仍然具有挑战性。先前的研究表明，联合学习可以显著提高性能。然而，它们通常涉及连续的相互关联的步骤，并存在暴露偏差的问题。

Trouble..

246 0 0

重庆八怪

|

数据库

dynamic sampling

dynamic sampling dynamic sampling对于我们来说其实不算太陌生，从9i r2开始，dynamic sampling其实已经不动声色地融入到数据库中了。

重庆八怪

883 0 0

Lux_Sun

PAT (Advanced Level) Practice - 1012 The Best Rank（25 分）

PAT (Advanced Level) Practice - 1012 The Best Rank（25 分）

Lux_Sun

126 0 0

Chin2018

|

Python

tf.control_dependencies与tf.identity组合详解

Chin2018

1987 0 0

热门文章

最新文章

信用算力基于 RocketMQ 实现金融级数据服务的实践

git 报错 RPC failed; curl 18 transfer closed with outstanding read data remaining

如何在chrome上开启WebGL功能和判断目前浏览器是否支持

阿里云全面支持IPv6！一文揽尽4位大咖精彩演讲

OAuth 及移动端鉴权调研

YARN中的CPU资源隔离-CGroups

南理工计算机学院研究生课程的评价和反思（研一上）

我的实用设计模式 - Simple Factory和Reflection

再提一下Linux系统中的MD5校验

checkpoint性能测试

有偿创建 CosyVoice2-0.5B 大模型

《深度揭秘：分布式技术如何赋能AI与鸿蒙系统集成的性能飞跃》

《深度解析：人工智能与鸿蒙系统集成中的版本管理与迭代升级》

《探索AI与鸿蒙融合的开源宝藏：这些框架你不能错过》

《鸿蒙系统下AI模型训练加速：时间成本的深度剖析与优化策略》

《深度剖析：鸿蒙系统下智能NPC与游戏剧情的深度融合》

2025年国内工单系统推荐：技术架构、场景适配与行业实践

JVM实战—2.JVM内存设置与对象分配流转

2025年AI客服机器人推荐：核心能力与实际场景应用分析

从第十批算法备案通过名单中分析算法的属地占比、行业及应用情况

相关电子书

更多

Smart Scalable Feature Reduction with Random Forests

Towards A Fault-Tolerant Speaker Verification System: A Regularization Approach To Reduce The Condition Number

Cost-Based Optimizer in Apache

下一篇

获取百炼API-KEY