【路径规划】基于贝叶斯网络实现仓储机器人巡逻路径规划附matlab代码

简介: 【路径规划】基于贝叶斯网络实现仓储机器人巡逻路径规划附matlab代码

1 简介

Introduction

In a search and rescue scenario, a robot must navigate through a disaster area, identifying survivors to be rescued while preserving its own safety. Many tradeoffs must be calculated, such as whether to exploit a known area or explore a new area. Acquired knowledge about the safety levels of certain areas, likelihood of finding survivors, and urgency of attention for each survivor are all factors that must account into the route planning of the robot.

This applied scenario is essentially a foraging task in which beliefs about foraging sites and safety of areas can be modeled by Bayesian inference methods. In this project, I simulate small environments with different arrangements and configurations of foraging sites, and observe the effects of explorative and exploitative agents searching the environment.

Methods

The project was completed in two stages. The first focuses on getting the agent to effectively learn the changing danger levels in an environment. The second stage adds on the element of foraging.

Experiment 1: Inferring the safety levels of different regions of the map


Figure 1

As seen in figure 1, the environment is simulated as a square 2D grid. Gray squares indicate off-limit areas, while any other colored area indicates a part of the valid path. Squares of increasing redness indicate the danger level of the area, a value that ranges 1-100. A random destination, marked by a star, is set for the agent, marked by a dot, who first starts out in the home square, marked by a dark outline and star pattern. The left side of the figure represents the true state of the world, whereas the right side represents the agent’s beliefs of the world. As we see in the initial maps, the agent starts out with no knowledge of the dangers on the map, and is already aware of the entire physical layout and able to calculate multiple routes to reach the assigned destination.

Each step in the simulation has a defined sequence of events. First, all possible routes from the current location to the destination are calculated. Then, the accumulated danger of each route is found by summing the danger values of each square involved in the route. The agent takes a weighted sum of the distance and accumulated danger of each route and chooses the ideal route to take. It takes one step along this preferred route. In the case of this small map, the danger rating is multiplied by .1 to achieve a reasonable effect, such that a path of length 3 containing a danger spot of rating 40 would amount to a route desirability score of 7. In the new location, the agent samples the area for danger and updates its beliefs about the danger rating of that square. It also does the same for all neighboring locations of the current square. If the destination is now reached, a new random destination is assigned, and the process starts over. The sequence of random destination points allows the agent to roam the entire environment for our observation.

For each square, the agent has a belief value of the danger level of the square, which can range from 1 to 100. The prior belief of this value is represented as a normal probability distribution the danger levels as μ parameters.

Belief of μ is updated by Bayes rule using maximum a posteriori estimate. The prior is represented as a normal probability distribution as in the following equation. Initially μ is set to 0 and σ to 1.

 

Likelihood is also represented as a normal distribution, with x as a new observation of danger rating.

 

Hypotheses about μ are updated using Bayes rule.

 

To calculate the maximum a posteriori estimate, we note that the likelihood is set to a distribution of N(μ,1) while the prior has a distribution of N(μ00). The product of two normal distributions results in another normal distribution with parameters μ and σ as

 

Thus, the mode of such a distribution is simply a calculation of μ. We can then plug in values to determine the updated belief. Setting σ0 to 1 for simplicity and n to 1 (a memoryless strategy), we end up with a simple averaging of the old and new belief of danger.

 

Figure 2

With this simple model of averaging, the agent is able to effectively learn the danger ratings of each area. Figure 2 shows the exploring agent at the initial stages of exploration. It has learned the danger ratings of a few areas.

 

Figure 3

After more time has passed, the entire map is learned.

 

Figure 4

The agent can adapt to new dangers in the environment. The addition of a dangerous spot in the center is immediately detected, and will eventually be completely learned.

 

Figure 5

The agent also properly strategizes its route. In Figure 5, there are 2 reasonably short routes from the current location to the destination. After the agent has properly learned the danger mapping, it will take the slightly longer route in order to avoid the crossing the highly dangerous red strip. We can see that it effectively balances between distance and safety.

Experiment 2: Collecting resources on the map

 

Figure 6

The next step is to add a foraging component to the search and rescue. The stars on the map now represent resources, with darker shades of blue representing higher reward values from a range of 1-100.  Similar to the safety levels, the reward levels are sampled when the agent crosses a resource in its path.

 

Figure 7

The agent’s new goal is to pick its own new destinations, whether it is to go to a new foraging site or to return home. The agent now has a health value that gets decremented by 1 for distance and further decremented by a sampling of the danger in that area. If health reaches below zero, the agent dies and starts over at the home square, with no recollection of past observances. The agent can only restore health by returning to the home square, but must also strive to maximize foraged rewards. An exploration parameter is also added.  This parameter directly influences 3 things: the prior belief of the resources having high reward, the minimum health buffer that the agent maintains before heading home, and the selection of destination. An explorative agent will simply pick a random resource to visit, whereas and exploitative agent will pick one that it has seen to provide the greatest reward.

Rather than a normal distribution, the rewards are sampled along an exponential distribution to ensure that all values are positive. The goal of the agent is then to learn the decay rate of λ for the exponential distribution of each square. For ease of computation, the conjugate prior of a gamma distribution is used to represent prior belief of λ.

 

The likelihood is represented by an exponential distribution

 

Hypotheses of gamma are updated using Bayes rule

 

Noting that our prior has a distribution of Gamma(α,β) and our likelihood has distribution Exponential(λ), the conjugate pair of distributions results in a posterior of Gamma as well, with α and β calculated as

.

The mode of a Gamma distribution is

.

Therefore, plugging in the α and β of the posterior into the mode equation results in the following:

 

Exploration is a value ranging (0,1) that affects 3 parts of the decision-making process:

1)The initial value of α in the belief. A highly explorative agent will have an α that affects the shape of the distribution such that the mean is very high. The agent thinks all foraging locations are very promising.

2)The safety net value of health that the agent maintains before deciding to head home

3)Picking a destination. A random number is chosen in the range (0,1). If it is less than the exploration value, the agent will pick a random foraging site to head towards rather than just picking a site it knows is good.

Results

For each map, 5 conditions of exploration parameter values are tested for a fixed and equal amount of time (50 steps). Then, for each map, the number of deaths occurred and reward earned in each life is recorded down.

Map 1 is designed for the benefit of a more explorative agent. Foraging spots of very high value are hidden within dangerous areas that an explorative agent would be more likely to reach. Map 2 is designed more to the benefit of an exploitative agent, as the hidden rewards are of very low value. Map 3 is somewhere in between 1 and 2, with some hidden foraging spots worth a lot and some worth a little.

 

Map 1: Environment beneficial for exploration

 

Map 2: Environment beneficial for exploitation

 

2 部分代码

%Route Planning%Iteration 1: Only learns the safety level of each location%Runs route planning simulation%Parameters: sigma, safety_wt, exploration?, exploitation?%Variables: real grid, belief grid, location, destination%0:neutral, >0:safe, <0:unsafe, NaN:not a valid locationfunction bayesianNavigation()    clear all;        %ui functions and variables    speed = .2;    quit = false;        function set_real_grid(points)        points = round(points);        real_grid(points(1,2),points(1,1))=str2double(get(danger_edit,'String'));    end    function set_speed(sp)        speed = sp;    end    function qt()        quit = true;    end    %Set up figure/UI    f = figure('Position', [100, 100, 1200, 500]);    danger_edit = uicontrol('Parent',f,'Style','edit','Position',[60,475,40,23],'String','0');    danger_text = uicontrol('Parent',f,'Style','text','Position',[10,475,40,23],'String','Danger Rating:');    qt_ui = uicontrol('Parent',f,'Style','pushbutton','Position',[110,475,40,23],'String','Quit','Callback',@(~,~)qt());    spd_ui = uicontrol('Parent',f,'Style','slider','Position',[170,473,200,23],'Min',0,'Max',10,'SliderStep',[1 1]./10,...        'Value',5,'Callback',@(h,~)set_speed(get(h,'Value')));    health_text = uicontrol('Parent',f,'Style','text','Position',[425,475,40,23],'String','Health: 1000');    reward_text = uicontrol('Parent',f,'Style','text','Position',[475,475,40,23],'String','Reward: 30');         %Set parameters    sigma = 1;    danger_wt = .5;    %Set real and belief grid    %real_grid = [0 0 50; 0 NaN 0; 0 0 0];    %belief_grid = [0 0 0; 0 NaN 0; 0 0 0];    real_grid = [0 0 50 0 0; 0 NaN NaN NaN 0; 0 0 0 0 20; 0 NaN NaN NaN 0; 50 0 0 0 30];    belief_grid = [0 0 0 0 0; 0 NaN NaN NaN 0; 0 0 0 0 0; 0 NaN NaN NaN 0; 0 0 0 0 0];    %Set home, real resources, belief resources    home = [1 1];    real_resources = [1 1 0];    belief_resources = [1 1 0];    %Set health counter    health_counter = 1000;    %Set reward counter    reward_counter = 0;    %Set current location and destination    %curr_loc = [3 3];    %dest_loc = [1 1];    curr_loc = [1 1];    dest_loc = [5 5];    %Loop through time until whenever    while true        if quit            break        end        %plot and wait        s1 = subplot(1,2,1);        imgsc = plot_grid(real_grid,curr_loc,dest_loc,home,real_resources);        ax = imgca;        set(imgsc,'ButtonDownFcn',@(~,~)set_real_grid(get(ax,'CurrentPoint')),'HitTest','on')        s2 = subplot(1,2,2);        plot_grid(belief_grid, curr_loc, dest_loc, home, belief_resources);        pause(1-speed/11);        %check whether goal has been reached        if isequal(curr_loc,dest_loc)            %compare journey with optimal journey and record results            %set new dest            dest_loc = new_dest(real_grid);        else            %make safety observation and update belief grid            observation = randn + real_grid(curr_loc(1),curr_loc(2));            curr_belief = belief_grid(curr_loc(1),curr_loc(2));            belief_grid(curr_loc(1),curr_loc(2)) = (curr_belief+observation)/2; %assume sigma = 1            %make safety observation and update belief grid for neighbors            [ neighbors ] = find_neighbors(curr_loc, belief_grid);            for i = 1:size(neighbors,1)                observation = randn + real_grid(neighbors(i,1),neighbors(i,2));                curr_belief = belief_grid(neighbors(i,1),neighbors(i,2));                belief_grid(neighbors(i,1),neighbors(i,2)) = (curr_belief+observation)/2; %assume sigma = 1            end            %take resource and update belief_resources            %update health counter            %calculate routes            routes = calculate_routes(curr_loc,dest_loc,belief_grid);            %calculate safety belief of each route            danger_ratings = zeros(1,size(routes,1));            for i =1:size(routes,1)                route = routes{i};                idx = sub2ind(size(belief_grid), route(:,1), route(:,2));                danger_ratings(i) = sum(belief_grid(idx));            end            %calculate distance of each route            distances = zeros(1,size(routes,1));            for i =1:size(routes,1)                route = routes{i};                distances(i) = size(route,1);            end            %calculate desired route based on safety and distance            desirability = danger_wt .* danger_ratings + distances;            [b,ind] = sort(desirability,'ascend');            %take a step in the desired direction            curr_loc = routes{ind(1)}(2,:);        end    end%Plot difference in safety/efficiency of actual vs optimal route taken as a function of timeend

3 仿真结果

4 参考文献

[1]毛蕊蕊. 基于贝叶斯网络的最佳交通路径规划[D]. 西安工程大学.

博主简介:擅长智能优化算法、神经网络预测、信号处理、元胞自动机、图像处理、路径规划、无人机等多种领域的Matlab仿真,相关matlab代码问题可私信交流。

部分理论引用网络文献,若有侵权联系博主删除。


相关文章
用MASM32按Time Protocol(RFC868)协议编写网络对时程序中的一些有用的函数代码
用MASM32按Time Protocol(RFC868)协议编写网络对时程序中的一些有用的函数代码
|
30天前
|
机器学习/深度学习 网络架构 计算机视觉
目标检测笔记(一):不同模型的网络架构介绍和代码
这篇文章介绍了ShuffleNetV2网络架构及其代码实现,包括模型结构、代码细节和不同版本的模型。ShuffleNetV2是一个高效的卷积神经网络,适用于深度学习中的目标检测任务。
64 1
目标检测笔记(一):不同模型的网络架构介绍和代码
|
18天前
|
机器学习/深度学习 算法 数据安全/隐私保护
基于贝叶斯优化CNN-LSTM网络的数据分类识别算法matlab仿真
本项目展示了基于贝叶斯优化(BO)的CNN-LSTM网络在数据分类中的应用。通过MATLAB 2022a实现,优化前后效果对比明显。核心代码附带中文注释和操作视频,涵盖BO、CNN、LSTM理论,特别是BO优化CNN-LSTM网络的batchsize和学习率,显著提升模型性能。
|
1月前
|
机器学习/深度学习 算法 数据可视化
基于QLearning强化学习的机器人避障和路径规划matlab仿真
本文介绍了使用MATLAB 2022a进行强化学习算法仿真的效果,并详细阐述了Q-Learning原理及其在机器人避障和路径规划中的应用。通过Q-Learning算法,机器人能在未知环境中学习到达目标的最短路径并避开障碍物。仿真结果展示了算法的有效性,核心程序实现了Q表的更新和状态的可视化。未来研究可扩展至更复杂环境和高效算法。![](https://ucc.alicdn.com/pic/developer-ecology/nymobwrkkdwks_d3b95a2f4fd2492381e1742e5658c0bc.gif)等图像展示了具体仿真过程。
59 0
|
1月前
|
机器学习/深度学习 传感器 安全
基于模糊神经网络的移动机器人路径规划matlab仿真
该程序利用模糊神经网络实现移动机器人的路径规划,能在含5至7个静态未知障碍物的环境中随机导航。机器人配备传感器检测前方及其两侧45度方向上的障碍物距离,并根据这些数据调整其速度和方向。MATLAB2022a版本下,通过模糊逻辑处理传感器信息,生成合理的路径,确保机器人安全到达目标位置。以下是该程序在MATLAB2022a下的测试结果展示。
|
23天前
|
机器学习/深度学习 算法 数据安全/隐私保护
基于贝叶斯优化卷积神经网络(Bayes-CNN)的多因子数据分类识别算法matlab仿真
本项目展示了贝叶斯优化在CNN中的应用,包括优化过程、训练与识别效果对比,以及标准CNN的识别结果。使用Matlab2022a开发,提供完整代码及视频教程。贝叶斯优化通过构建代理模型指导超参数优化,显著提升模型性能,适用于复杂数据分类任务。
|
2月前
|
安全 C#
某网络硬盘网站被植入传播Trojan.DL.Inject.xz等的代码
某网络硬盘网站被植入传播Trojan.DL.Inject.xz等的代码
完成切换网络+修改网络连接图标提示的代码框架
完成切换网络+修改网络连接图标提示的代码框架
|
2天前
|
SQL 安全 物联网
网络安全与信息安全:深入探讨网络漏洞、加密技术及安全意识###
网络安全与信息安全是当今数字化时代的重要议题。本文将详细探讨网络安全和信息安全的差异,重点介绍常见的网络漏洞、加密技术以及如何提升用户和组织的安全意识。通过具体案例和技术分析,帮助读者理解这些关键概念,并提供实用的建议以应对潜在的网络威胁。 ###
|
2天前
|
安全 网络安全 API
揭秘网络世界的守护神:网络安全与信息安全的深度剖析
【10月更文挑战第36天】在数字时代的洪流中,网络安全和信息安全如同守护神一般,保护着我们的数据不受侵犯。本文将深入探讨网络安全漏洞的成因、加密技术的奥秘以及提升个人安全意识的重要性。通过分析最新的攻击手段、介绍先进的防御策略,并分享实用的安全实践,旨在为读者呈现一个全方位的网络安全与信息安全知识图谱。让我们一同揭开网络世界的神秘面纱,探索那些不为人知的安全秘籍。
14 6

热门文章

最新文章