【路径规划】基于贝叶斯网络实现仓储机器人巡逻路径规划附matlab代码

简介: 【路径规划】基于贝叶斯网络实现仓储机器人巡逻路径规划附matlab代码

1 简介

Introduction

In a search and rescue scenario, a robot must navigate through a disaster area, identifying survivors to be rescued while preserving its own safety. Many tradeoffs must be calculated, such as whether to exploit a known area or explore a new area. Acquired knowledge about the safety levels of certain areas, likelihood of finding survivors, and urgency of attention for each survivor are all factors that must account into the route planning of the robot.

This applied scenario is essentially a foraging task in which beliefs about foraging sites and safety of areas can be modeled by Bayesian inference methods. In this project, I simulate small environments with different arrangements and configurations of foraging sites, and observe the effects of explorative and exploitative agents searching the environment.

Methods

The project was completed in two stages. The first focuses on getting the agent to effectively learn the changing danger levels in an environment. The second stage adds on the element of foraging.

Experiment 1: Inferring the safety levels of different regions of the map


Figure 1

As seen in figure 1, the environment is simulated as a square 2D grid. Gray squares indicate off-limit areas, while any other colored area indicates a part of the valid path. Squares of increasing redness indicate the danger level of the area, a value that ranges 1-100. A random destination, marked by a star, is set for the agent, marked by a dot, who first starts out in the home square, marked by a dark outline and star pattern. The left side of the figure represents the true state of the world, whereas the right side represents the agent’s beliefs of the world. As we see in the initial maps, the agent starts out with no knowledge of the dangers on the map, and is already aware of the entire physical layout and able to calculate multiple routes to reach the assigned destination.

Each step in the simulation has a defined sequence of events. First, all possible routes from the current location to the destination are calculated. Then, the accumulated danger of each route is found by summing the danger values of each square involved in the route. The agent takes a weighted sum of the distance and accumulated danger of each route and chooses the ideal route to take. It takes one step along this preferred route. In the case of this small map, the danger rating is multiplied by .1 to achieve a reasonable effect, such that a path of length 3 containing a danger spot of rating 40 would amount to a route desirability score of 7. In the new location, the agent samples the area for danger and updates its beliefs about the danger rating of that square. It also does the same for all neighboring locations of the current square. If the destination is now reached, a new random destination is assigned, and the process starts over. The sequence of random destination points allows the agent to roam the entire environment for our observation.

For each square, the agent has a belief value of the danger level of the square, which can range from 1 to 100. The prior belief of this value is represented as a normal probability distribution the danger levels as μ parameters.

Belief of μ is updated by Bayes rule using maximum a posteriori estimate. The prior is represented as a normal probability distribution as in the following equation. Initially μ is set to 0 and σ to 1.

 

Likelihood is also represented as a normal distribution, with x as a new observation of danger rating.

 

Hypotheses about μ are updated using Bayes rule.

 

To calculate the maximum a posteriori estimate, we note that the likelihood is set to a distribution of N(μ,1) while the prior has a distribution of N(μ00). The product of two normal distributions results in another normal distribution with parameters μ and σ as

 

Thus, the mode of such a distribution is simply a calculation of μ. We can then plug in values to determine the updated belief. Setting σ0 to 1 for simplicity and n to 1 (a memoryless strategy), we end up with a simple averaging of the old and new belief of danger.

 

Figure 2

With this simple model of averaging, the agent is able to effectively learn the danger ratings of each area. Figure 2 shows the exploring agent at the initial stages of exploration. It has learned the danger ratings of a few areas.

 

Figure 3

After more time has passed, the entire map is learned.

 

Figure 4

The agent can adapt to new dangers in the environment. The addition of a dangerous spot in the center is immediately detected, and will eventually be completely learned.

 

Figure 5

The agent also properly strategizes its route. In Figure 5, there are 2 reasonably short routes from the current location to the destination. After the agent has properly learned the danger mapping, it will take the slightly longer route in order to avoid the crossing the highly dangerous red strip. We can see that it effectively balances between distance and safety.

Experiment 2: Collecting resources on the map

 

Figure 6

The next step is to add a foraging component to the search and rescue. The stars on the map now represent resources, with darker shades of blue representing higher reward values from a range of 1-100.  Similar to the safety levels, the reward levels are sampled when the agent crosses a resource in its path.

 

Figure 7

The agent’s new goal is to pick its own new destinations, whether it is to go to a new foraging site or to return home. The agent now has a health value that gets decremented by 1 for distance and further decremented by a sampling of the danger in that area. If health reaches below zero, the agent dies and starts over at the home square, with no recollection of past observances. The agent can only restore health by returning to the home square, but must also strive to maximize foraged rewards. An exploration parameter is also added.  This parameter directly influences 3 things: the prior belief of the resources having high reward, the minimum health buffer that the agent maintains before heading home, and the selection of destination. An explorative agent will simply pick a random resource to visit, whereas and exploitative agent will pick one that it has seen to provide the greatest reward.

Rather than a normal distribution, the rewards are sampled along an exponential distribution to ensure that all values are positive. The goal of the agent is then to learn the decay rate of λ for the exponential distribution of each square. For ease of computation, the conjugate prior of a gamma distribution is used to represent prior belief of λ.

 

The likelihood is represented by an exponential distribution

 

Hypotheses of gamma are updated using Bayes rule

 

Noting that our prior has a distribution of Gamma(α,β) and our likelihood has distribution Exponential(λ), the conjugate pair of distributions results in a posterior of Gamma as well, with α and β calculated as

.

The mode of a Gamma distribution is

.

Therefore, plugging in the α and β of the posterior into the mode equation results in the following:

 

Exploration is a value ranging (0,1) that affects 3 parts of the decision-making process:

1)The initial value of α in the belief. A highly explorative agent will have an α that affects the shape of the distribution such that the mean is very high. The agent thinks all foraging locations are very promising.

2)The safety net value of health that the agent maintains before deciding to head home

3)Picking a destination. A random number is chosen in the range (0,1). If it is less than the exploration value, the agent will pick a random foraging site to head towards rather than just picking a site it knows is good.

Results

For each map, 5 conditions of exploration parameter values are tested for a fixed and equal amount of time (50 steps). Then, for each map, the number of deaths occurred and reward earned in each life is recorded down.

Map 1 is designed for the benefit of a more explorative agent. Foraging spots of very high value are hidden within dangerous areas that an explorative agent would be more likely to reach. Map 2 is designed more to the benefit of an exploitative agent, as the hidden rewards are of very low value. Map 3 is somewhere in between 1 and 2, with some hidden foraging spots worth a lot and some worth a little.

 

Map 1: Environment beneficial for exploration

 

Map 2: Environment beneficial for exploitation

 

2 部分代码

%Route Planning%Iteration 1: Only learns the safety level of each location%Runs route planning simulation%Parameters: sigma, safety_wt, exploration?, exploitation?%Variables: real grid, belief grid, location, destination%0:neutral, >0:safe, <0:unsafe, NaN:not a valid locationfunction bayesianNavigation()    clear all;        %ui functions and variables    speed = .2;    quit = false;        function set_real_grid(points)        points = round(points);        real_grid(points(1,2),points(1,1))=str2double(get(danger_edit,'String'));    end    function set_speed(sp)        speed = sp;    end    function qt()        quit = true;    end    %Set up figure/UI    f = figure('Position', [100, 100, 1200, 500]);    danger_edit = uicontrol('Parent',f,'Style','edit','Position',[60,475,40,23],'String','0');    danger_text = uicontrol('Parent',f,'Style','text','Position',[10,475,40,23],'String','Danger Rating:');    qt_ui = uicontrol('Parent',f,'Style','pushbutton','Position',[110,475,40,23],'String','Quit','Callback',@(~,~)qt());    spd_ui = uicontrol('Parent',f,'Style','slider','Position',[170,473,200,23],'Min',0,'Max',10,'SliderStep',[1 1]./10,...        'Value',5,'Callback',@(h,~)set_speed(get(h,'Value')));    health_text = uicontrol('Parent',f,'Style','text','Position',[425,475,40,23],'String','Health: 1000');    reward_text = uicontrol('Parent',f,'Style','text','Position',[475,475,40,23],'String','Reward: 30');         %Set parameters    sigma = 1;    danger_wt = .5;    %Set real and belief grid    %real_grid = [0 0 50; 0 NaN 0; 0 0 0];    %belief_grid = [0 0 0; 0 NaN 0; 0 0 0];    real_grid = [0 0 50 0 0; 0 NaN NaN NaN 0; 0 0 0 0 20; 0 NaN NaN NaN 0; 50 0 0 0 30];    belief_grid = [0 0 0 0 0; 0 NaN NaN NaN 0; 0 0 0 0 0; 0 NaN NaN NaN 0; 0 0 0 0 0];    %Set home, real resources, belief resources    home = [1 1];    real_resources = [1 1 0];    belief_resources = [1 1 0];    %Set health counter    health_counter = 1000;    %Set reward counter    reward_counter = 0;    %Set current location and destination    %curr_loc = [3 3];    %dest_loc = [1 1];    curr_loc = [1 1];    dest_loc = [5 5];    %Loop through time until whenever    while true        if quit            break        end        %plot and wait        s1 = subplot(1,2,1);        imgsc = plot_grid(real_grid,curr_loc,dest_loc,home,real_resources);        ax = imgca;        set(imgsc,'ButtonDownFcn',@(~,~)set_real_grid(get(ax,'CurrentPoint')),'HitTest','on')        s2 = subplot(1,2,2);        plot_grid(belief_grid, curr_loc, dest_loc, home, belief_resources);        pause(1-speed/11);        %check whether goal has been reached        if isequal(curr_loc,dest_loc)            %compare journey with optimal journey and record results            %set new dest            dest_loc = new_dest(real_grid);        else            %make safety observation and update belief grid            observation = randn + real_grid(curr_loc(1),curr_loc(2));            curr_belief = belief_grid(curr_loc(1),curr_loc(2));            belief_grid(curr_loc(1),curr_loc(2)) = (curr_belief+observation)/2; %assume sigma = 1            %make safety observation and update belief grid for neighbors            [ neighbors ] = find_neighbors(curr_loc, belief_grid);            for i = 1:size(neighbors,1)                observation = randn + real_grid(neighbors(i,1),neighbors(i,2));                curr_belief = belief_grid(neighbors(i,1),neighbors(i,2));                belief_grid(neighbors(i,1),neighbors(i,2)) = (curr_belief+observation)/2; %assume sigma = 1            end            %take resource and update belief_resources            %update health counter            %calculate routes            routes = calculate_routes(curr_loc,dest_loc,belief_grid);            %calculate safety belief of each route            danger_ratings = zeros(1,size(routes,1));            for i =1:size(routes,1)                route = routes{i};                idx = sub2ind(size(belief_grid), route(:,1), route(:,2));                danger_ratings(i) = sum(belief_grid(idx));            end            %calculate distance of each route            distances = zeros(1,size(routes,1));            for i =1:size(routes,1)                route = routes{i};                distances(i) = size(route,1);            end            %calculate desired route based on safety and distance            desirability = danger_wt .* danger_ratings + distances;            [b,ind] = sort(desirability,'ascend');            %take a step in the desired direction            curr_loc = routes{ind(1)}(2,:);        end    end%Plot difference in safety/efficiency of actual vs optimal route taken as a function of timeend

3 仿真结果

4 参考文献

[1]毛蕊蕊. 基于贝叶斯网络的最佳交通路径规划[D]. 西安工程大学.

博主简介:擅长智能优化算法、神经网络预测、信号处理、元胞自动机、图像处理、路径规划、无人机等多种领域的Matlab仿真,相关matlab代码问题可私信交流。

部分理论引用网络文献,若有侵权联系博主删除。


相关文章
|
4月前
|
安全
【2023高教社杯】D题 圈养湖羊的空间利用率 问题分析、数学模型及MATLAB代码
本文介绍了2023年高教社杯数学建模竞赛D题的圈养湖羊空间利用率问题,包括问题分析、数学模型建立和MATLAB代码实现,旨在优化养殖场的生产计划和空间利用效率。
216 6
【2023高教社杯】D题 圈养湖羊的空间利用率 问题分析、数学模型及MATLAB代码
|
4月前
|
数据采集 存储 移动开发
【2023五一杯数学建模】 B题 快递需求分析问题 建模方案及MATLAB实现代码
本文介绍了2023年五一杯数学建模竞赛B题的解题方法,详细阐述了如何通过数学建模和MATLAB编程来分析快递需求、预测运输数量、优化运输成本,并估计固定和非固定需求,提供了完整的建模方案和代码实现。
105 0
【2023五一杯数学建模】 B题 快递需求分析问题 建模方案及MATLAB实现代码
|
2天前
|
SQL 安全 网络安全
网络安全与信息安全:知识分享####
【10月更文挑战第21天】 随着数字化时代的快速发展,网络安全和信息安全已成为个人和企业不可忽视的关键问题。本文将探讨网络安全漏洞、加密技术以及安全意识的重要性,并提供一些实用的建议,帮助读者提高自身的网络安全防护能力。 ####
35 17
|
13天前
|
存储 SQL 安全
网络安全与信息安全:关于网络安全漏洞、加密技术、安全意识等方面的知识分享
随着互联网的普及,网络安全问题日益突出。本文将介绍网络安全的重要性,分析常见的网络安全漏洞及其危害,探讨加密技术在保障网络安全中的作用,并强调提高安全意识的必要性。通过本文的学习,读者将了解网络安全的基本概念和应对策略,提升个人和组织的网络安全防护能力。
|
14天前
|
SQL 安全 网络安全
网络安全与信息安全:关于网络安全漏洞、加密技术、安全意识等方面的知识分享
随着互联网的普及,网络安全问题日益突出。本文将从网络安全漏洞、加密技术和安全意识三个方面进行探讨,旨在提高读者对网络安全的认识和防范能力。通过分析常见的网络安全漏洞,介绍加密技术的基本原理和应用,以及强调安全意识的重要性,帮助读者更好地保护自己的网络信息安全。
38 10
|
15天前
|
SQL 安全 网络安全
网络安全与信息安全:关于网络安全漏洞、加密技术、安全意识等方面的知识分享
在数字化时代,网络安全和信息安全已成为我们生活中不可或缺的一部分。本文将介绍网络安全漏洞、加密技术和安全意识等方面的内容,并提供一些实用的代码示例。通过阅读本文,您将了解到如何保护自己的网络安全,以及如何提高自己的信息安全意识。
43 10
|
15天前
|
存储 监控 安全
云计算与网络安全:云服务、网络安全、信息安全等技术领域的融合与挑战
本文将探讨云计算与网络安全之间的关系,以及它们在云服务、网络安全和信息安全等技术领域中的融合与挑战。我们将分析云计算的优势和风险,以及如何通过网络安全措施来保护数据和应用程序。我们还将讨论如何确保云服务的可用性和可靠性,以及如何处理网络攻击和数据泄露等问题。最后,我们将提供一些关于如何在云计算环境中实现网络安全的建议和最佳实践。
|
17天前
|
监控 安全 网络安全
网络安全与信息安全:漏洞、加密与意识的交织
在数字时代的浪潮中,网络安全与信息安全成为维护数据完整性、保密性和可用性的关键。本文深入探讨了网络安全中的漏洞概念、加密技术的应用以及提升安全意识的重要性。通过实际案例分析,揭示了网络攻击的常见模式和防御策略,强调了教育和技术并重的安全理念。旨在为读者提供一套全面的网络安全知识框架,从而在日益复杂的网络环境中保护个人和组织的资产安全。
|
14天前
|
安全 网络安全 数据安全/隐私保护
网络安全与信息安全:关于网络安全漏洞、加密技术、安全意识等方面的知识分享
在数字化时代,网络安全和信息安全已成为我们日常生活中不可或缺的一部分。本文将深入探讨网络安全漏洞、加密技术和安全意识等方面的问题,并提供一些实用的建议和解决方案。我们将通过分析网络攻击的常见形式,揭示网络安全的脆弱性,并介绍如何利用加密技术来保护数据。此外,我们还将强调提高个人和企业的安全意识的重要性,以应对日益复杂的网络威胁。无论你是普通用户还是IT专业人士,这篇文章都将为你提供有价值的见解和指导。
|
15天前
|
安全 算法 网络协议
网络安全与信息安全知识分享
本文深入探讨了网络安全漏洞、加密技术以及安全意识三个方面,旨在帮助读者更好地理解和应对网络安全威胁。通过分析常见的网络安全漏洞类型及其防范措施,详细介绍对称加密和非对称加密的原理和应用,并强调提高个人和企业安全意识的重要性,为构建更安全的网络环境提供指导。
32 2