Internet of Vehicles – Window Querying with Alibaba Cloud RDS for PostgreSQL-阿里云开发者社区

Internet of Vehicles – Window Querying with Alibaba Cloud RDS for PostgreSQL

2018-03-09 2065

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB 分布式版，标准版 2核8GB

云原生数据库 PolarDB PostgreSQL 版，标准版 2核4GB 50GB

云原生数据库 PolarDB MySQL 版，通用型 2核4GB 50GB

简介： Internet of Vehicles (IoV) is a popular topic of research in the field of IoT. One of the biggest issues facing IoV is collecting vehicle's travel tracks in real time.

Background

Internet of Vehicles (IoV) is one of the hottest topic of research in the field of Internet of Things (IoT). A typical scenario for Internet of Vehicles applications is collecting vehicle's travel tracks, but tracks of vehicles are usually not reported in real time. Several track records may be accumulated or reported at intervals.

A typical data structure is as follows:

(car_id, pos geometry, crt_time timestamp)

Heavy traffic and traffic signals often occur during the vehicle traveling process. The reported track records may be as follows:

1, position 1, '2017-01-01 12:00:00'  
1, position 1, '2017-01-01 12:00:05 PM'  
1, position 1, '2017-01-01 12:00:10 PM'  
1, position 1, '2017-01-01 12:00:15 PM'  
1, position 1, '2017-01-01 12:00:20 PM'  
1, position 2, '2017-01-01 12:00:30 PM'

That is, multiple records in the same position may be uploaded due to heavy traffic or traffic signals.

Therefore, there is a requirement to clean unnecessary wait records from the database. We keep at most two records for a given point, indicating arrival and departure.

This operation can be executed via window function.

Surely, in terms of providing the best efficiency, it is more reasonable to clean tracks from a terminal. Only two records will be kept for the start point of a position.

Implementation Example

1.Design a table structure

create table car_trace (cid int, pos point, crt_time timestamp);

2.Generate 10 million pieces of test data and assume there are 1,000 vehicles (in order to make it easy to repeat the data and test the effect, 25 points are used at the position).

insert into car_trace select random()*999, point((random()*5)::int, (random()*5)::int), clock_timestamp() from generate_series(1,10000000);

3.Create an index

create index idx_car on car_trace (cid, crt_time);

4.Query a data layout

select * from car_trace where cid=1 order by crt_time limit 1000;  

  
   1 | (3,1) | 2017-07-22 21:30:09.84984  
   1 | (1,4) | 2017-07-22 21:30:09.850297  
   1 | (1,4) | 2017-07-22 21:30:09.852586  
   1 | (1,4) | 2017-07-22 21:30:09.854155  
   1 | (1,4) | 2017-07-22 21:30:09.854425  
   1 | (3,1) | 2017-07-22 21:30:09.854493

As shown on the list, several pieces of data are repetitive.

5.Filter the records at a single position via the window. At most two records for arrival and departure from the position will be kept.

Two window functions are used here: lead and lag. Lag indicates the previous record of the current record and Lead indicates the next record of the current record.

The method of determining arrival and departure points is as follows:

•The current position is not equal to the previous position, indicating that the record is the current position's arrival point.

•The current position is not equal to the next position, indicating that the record is the current position's leaving point.

•The previous position is empty, indicating that the current record is the first record.

•The next position is empty, indicating that the current record is the last record.

select * from   
(  
select   
  *,   
  lag(pos) over (partition by cid order by crt_time) as lag,   
  lead(pos) over (partition by cid order by crt_time) as lead   
from car_trace   
  where cid=1   
  and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735'  
) t  
  where pos <> lag  
  or pos <> lead  
  or lag is null  
  or lead is null;  
  
 cid |  pos  |          crt_time          |  lag  | lead    
-----+-------+----------------------------+-------+-------  
   1 | (2,1) | 2017-07-22 21:30:09.83994  |       | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)  
   1 | (4,4) | 2017-07-22 21:30:09.84179  | (5,2) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)  
   1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)  
   1 | (0,5) | 2017-07-22 21:30:09.84536  | (1,5) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)  
   1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.84984  | (3,3) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)  
   1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)  
   1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)  
   1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |   
(20 rows)

When track cleaning is not performed, the obtained results are as follows:

select   
  *,   
  lag(pos) over (partition by cid order by crt_time) as lag,   
  lead(pos) over (partition by cid order by crt_time) as lead   
from car_trace   
  where cid=1   
  and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735';  
  
 cid |  pos  |          crt_time          |  lag  | lead    
-----+-------+----------------------------+-------+-------  
   1 | (2,1) | 2017-07-22 21:30:09.83994  |       | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)  
   1 | (4,4) | 2017-07-22 21:30:09.84179  | (5,2) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)  
   1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)  
   1 | (0,5) | 2017-07-22 21:30:09.84536  | (1,5) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)  
   1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.84984  | (3,3) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.852586 | (1,4) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854155 | (1,4) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)  
   1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)  
   1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)  
   1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |   
(22 rows)

Lag and lead are used to clear records in the stay process.

Database Optimization for IoV

In a typical scenario, a lot of vehicle IDs are involved in the business. Data gathered for different vehicles is usually written into a database. If no optimization is made, after entering the database, the data of different vehicles may be staggered. That is, the data of different vehicles may be stored in one data block.

A lot of data blocks will be scanned (scanning IO amplification) when the track of a single vehicle is queried. There are two optimization methods to speed up the querying process.

1.Write into the database after the business end gathers grouping and sorting.

For example, after receiving the data submitted by the vehicle terminal, the program groups vehicle IDs, sorts them by time, and writes them into the database (using insert into tbl values (),(),...();). In this way, the data of the same vehicle will fall into the same data block as much as is possible.

2.Use a partition to reorganize the data in the database.

For example, we can store data based on the vehicle ID, every vehicle, or vehicle HASH partition.

The two methods both relate to reorganizing the data based on query requirements to achieve the purpose of decreasing scanning IO.

Internet of Vehicles – Window Querying with Alibaba Cloud RDS for PostgreSQL

Background

Implementation Example

Database Optimization for IoV

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像