A Brief Note about Boltzmann/Softmax Exploration Strategy

简介: One method that is often used in combination with the RL algorithms is the Beltzmann or softmax exploration strategy. The action selection strategy is still random, but selection probabili

One method that is often used in combination with the RL algorithms is the Beltzmann or softmax exploration strategy.
The action selection strategy is still random, but selection probabilities are weighted by their relative Q-values. This makes it more likely for the agent to choose good actions, whereas two actions that have similar Q-values will have almost the same probability to get selected. Its general form is

P(a)=eQ(s,a)TieQ(s,ai)T

in which P(a) is the probability of selecting action a and T is the temperature parameter. Higher values of T will move the selection more towards a purely random strategy and lower values will move to a fully greedy strategy.
相关文章
PAT (Advanced Level) Practice - 1053 Path of Equal Weight(30 分)
PAT (Advanced Level) Practice - 1053 Path of Equal Weight(30 分)
138 0
|
机器学习/深度学习 自然语言处理 算法
Joint Information Extraction with Cross-Task and Cross-Instance High-Order Modeling 论文解读
先前的信息抽取(IE)工作通常独立地预测不同的任务和实例(例如,事件触发词、实体、角色、关系),而忽略了它们的相互作用,导致模型效率低下。
118 0
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
2571 0
|
机器学习/深度学习 关系型数据库 Oracle
060611G _optimizer_null_aware_antijoin
[20170606]11G _optimizer_null_aware_antijoin.txt --//上午测试_optimizer_null_aware_antijoin,发现自己不经意间又犯了一个低级错误,做1个记录.
1020 0
|
SQL Perl
052011GR2 _optimizer_null_aware_antijoin
[20150520]11GR2 _optimizer_null_aware_antijoin.txt --好久没写sql 优化的帖子: http://www.killdb.
933 0
|
机器学习/深度学习 自然语言处理 算法
TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking 论文解读
近年来,从非结构化文本中提取实体和关系引起了越来越多的关注,但由于识别共享实体的重叠关系存在内在困难,因此仍然具有挑战性。先前的研究表明,联合学习可以显著提高性能。然而,它们通常涉及连续的相互关联的步骤,并存在暴露偏差的问题。
246 0
|
数据库
dynamic sampling
dynamic sampling dynamic sampling对于我们来说其实不算太陌生,从9i r2开始,dynamic sampling其实已经不动声色地融入到数据库中了。
883 0
PAT (Advanced Level) Practice - 1012 The Best Rank(25 分)
PAT (Advanced Level) Practice - 1012 The Best Rank(25 分)
126 0