PostgreSQL 内核扩展之 - 管理十亿级3D扫描数据(基于Lidar产生的point cloud数据)-阿里云开发者社区

背景知识

还记得成龙演的那部《十二生肖》里用3D扫描和打印技术复制的生肖吗？
screenshot
3D打印是近几年兴起的一种技术，除了存储 物体表面的位置信息，还有颜色，密度等信息 。
而3D扫描其实在军用领域很早以前就有了。
如果使用普通的数据库来存储，得把这些属性拆开来存。
而在PostgreSQL中，你完全不需要把这些属性拆开，他们本来就是一体的，用好PG的扩展接口就好了。
PostgreSQL 扩展指南：
https://yq.aliyun.com/articles/55981

什么是Lidar

3D扫描的基础知识，来自百度百科
LiDAR——Light Detection And Ranging，即激光探测与测量。
是利用GPS（Global Position System）和IMU（Inertial Measurement Unit，惯性测量装置）机载激光扫描。
其所测得的数据为数字表面模型(Digital Surface Model, DSM)的离散点表示，数据中含有 空间三维信息 和 激光强度 信息。
应用分类(Classification)技术在这些原始数字表面模型中移除建筑物、人造物、覆盖植物等测点，即可获得数字高程模型(Digital Elevation Model, DEM)，并同时得到地面覆盖物的高度。

机载LIDAR技术在国外的发展和应用已有十几年的历史，但是我国在这方面的研究和应用还只是刚刚起步,其中利用航空激光扫描探测数据进行困难地区DEM、DOM、DLG数据产品生产是当今的研究热点之一。
该技术在地形测绘、环境检测、三维城市建模等诸多领域具有广阔的发展前景和应用需求，有可能为测绘行业带来一场新的技术革命。
针对不同的应用领域及成果要求，结合灵活的搭载方式，LiDAR技术可以广泛应用于基础测绘、道路工程、电力电网、水利、石油管线、海岸线及海岛礁、数字城市等领域，提供高精度、大比例尺（1:500至1:10000）的空间数据成果。

例子
激光扫描设备装置可记录一个单发射脉冲返回的首回波﹑中间多个回波与最后回波，通过对每个回波时刻记录，可同时获得多个高程信息，将IMU/DGPS系统和激光扫描技术进行集成，飞机向前飞行时，扫描仪横向对地面发射连续的激光束，同时接受地面反射回波，IMU/DGPS系统记录每一个激光发射点的瞬间空间位置和姿态，从而可计算得到激光反射点的空间位置。
b03533fa828ba61e24f4d8e14134970a314e59e1

激光雷达的激光束扫瞄机场的滑道，探测飞机将遇到的风切变，即逆风的改变。
023b5bb5c9ea15ce520c2114b6003af33b87b2c9

什么是point cloud

https://en.wikipedia.org/wiki/Point_cloud

前面提到了LIDAR扫描的数据包括了位置信息，以及其他传感器在对应这个位置采集的其他信息如 RGB、时间、温度、湿度等等（你可以大开脑洞想，什么都有）。
也就是说，每一个点都包含了大量的信息。

The variables captured by LIDAR sensors varies by sensor and capture process.   
Some data sets might contain only X/Y/Z values.   
Others will contain dozens of variables: X, Y, Z; intensity and return number;   
red, green, and blue values; return times; and many more.

        
          
        
        
        
          
          AI 代码解读

point cloud则是很多点的集合，所以信息量是非常庞大的。

A point cloud is a set of data points in some coordinate system.  

In a three-dimensional coordinate system, these points are usually defined by X, Y, and Z coordinates, and often are intended to represent the external surface of an object.  

Point clouds may be created by 3D scanners. These devices measure a large number of points on an object's surface, and often output a point cloud as a data file.   
The point cloud represents the set of points that the device has measured.

        
          
        
        
        
          
          AI 代码解读

point cloud如何存入PostgreSQL

PostgreSQL 存储的point cloud格式与PDAL（Point Data Abstraction Library）库一致。
非常好的复用了PDAL的数据结构，操作方法等。
因为PDAL的通用性，所以兼容性也非常的好。

PostgreSQL Pointcloud deals with all this variability by using a "schema document" to describe the contents of any particular LIDAR point.   
Each point contains a number of dimensions, and each dimension can be of any data type, with scaling and/or offsets applied to move between the actual value and the value stored in the database.   
The schema document format used by PostgreSQL Pointcloud is the same one used by the PDAL library.

        
          
        
        
        
          
          AI 代码解读

PostgreSQL Point Cloud支持的对象类型

PcPoint

基本类型，样例

{  
    "pcid" : 1,  
      "pt" : [0.01, 0.02, 0.03, 4]  
}

        
          
        
        
        
          
          AI 代码解读

PcPatch

PcPoint的聚合类型，一个PcPatch存储的是一些临近PcPoint的聚合。

The structure of database storage is such that storing billions of points as individual records in a table is not an efficient use of resources. 
Instead, we collect a group of PcPoint into a PcPatch. Each patch should hopefully contain points that are near together.

        
          
        
        
        
          
          AI 代码解读

样例

{  
    "pcid" : 1,  
     "pts" : [  
              [0.02, 0.03, 0.05, 6],  
              [0.02, 0.03, 0.05, 8]  
             ]  
}

        
          
        
        
        
          
          AI 代码解读

PostgreSQL Point Cloud支持的Functions

PC_MakePoint(pcid integer, vals float8[]) returns pcpoint

构造pcpoint

SELECT PC_MakePoint(1, ARRAY[-127, 45, 124.0, 4.0]);  
010100000064CEFFFF94110000703000000400  

INSERT INTO points (pt)  
SELECT PC_MakePoint(1, ARRAY[x,y,z,intensity])  
FROM (  
  SELECT    
  -127+a/100.0 AS x,   
    45+a/100.0 AS y,  
         1.0*a AS z,  
          a/10 AS intensity  
  FROM generate_series(1,100) AS a  
) AS values;  
pcid是唯一标示一个点的ID

        
          
        
        
        
          
          AI 代码解读

PC_AsText(p pcpoint) returns text

pcpoint转换成人类可读的文本

SELECT PC_AsText('010100000064CEFFFF94110000703000000400'::pcpoint);  

{"pcid":1,"pt":[-127,45,124,4]}

        
          
        
        
        
          
          AI 代码解读

PC_PCId(p pcpoint) returns integer (from 1.1.0)

取pcid

SELECT PC_PCId('010100000064CEFFFF94110000703000000400'::pcpoint));  

1

        
          
        
        
        
          
          AI 代码解读

PC_AsBinary(p pcpoint) returns bytea

将pcpoint转换为OGC WKB编码的二进制

SELECT PC_AsBinary('010100000064CEFFFF94110000703000000400'::pcpoint);  

\x01010000800000000000c05fc000000000008046400000000000005f40

        
          
        
        
        
          
          AI 代码解读

PC_Get(pt pcpoint, dimname text) returns numeric

读取pcpoint包含的维度(指定属性)信息

SELECT PC_Get('010100000064CEFFFF94110000703000000400'::pcpoint, 'Intensity');  

4

        
          
        
        
        
          
          AI 代码解读

PC_Get(pt pcpoint) returns float8[] (from 1.1.0)

读取pcpoint的属性，返回ARRAY

SELECT PC_Get('010100000064CEFFFF94110000703000000400'::pcpoint);  

{-127,45,124,4}

        
          
        
        
        
          
          AI 代码解读

PC_Patch(pts pcpoint[]) returns pcpatch

把多个pcpoint聚合成一个pcpatch

INSERT INTO patches (pa)  
SELECT PC_Patch(pt) FROM points GROUP BY id/10;

        
          
        
        
        
          
          AI 代码解读

PC_NumPoints(p pcpatch) returns integer

pcpatch中包含多少个pcpoint

SELECT PC_NumPoints(pa) FROM patches LIMIT 1;  

9

        
          
        
        
        
          
          AI 代码解读

PC_PCId(p pcpatch) returns integer (from 1.1.0)

返回pcpatch包含的pcpoint的pcid

SELECT PC_PCId(pa) FROM patches LIMIT 1;  

1

        
          
        
        
        
          
          AI 代码解读

PC_Envelope(p pcpatch) returns bytea

将pcpatch转换为OGC WKB编码的二进制

SELECT PC_Envelope(pa) FROM patches LIMIT 1;  

\x0103000000010000000500000090c2f5285cbf5fc0e17a  
14ae4781464090c2f5285cbf5fc0ec51b81e858b46400ad7  
a3703dba5fc0ec51b81e858b46400ad7a3703dba5fc0e17a  
14ae4781464090c2f5285cbf5fc0e17a14ae47814640

        
          
        
        
        
          
          AI 代码解读

PC_AsText(p pcpatch) returns text

将pcpatch转换为文本

SELECT PC_AsText(pa) FROM patches LIMIT 1;  

{"pcid":1,"pts":[  
 [-126.99,45.01,1,0],[-126.98,45.02,2,0],[-126.97,45.03,3,0],  
 [-126.96,45.04,4,0],[-126.95,45.05,5,0],[-126.94,45.06,6,0],  
 [-126.93,45.07,7,0],[-126.92,45.08,8,0],[-126.91,45.09,9,0]  
]}

        
          
        
        
        
          
          AI 代码解读

PC_Summary(p pcpatch) returns text (from 1.1.0)

返回json格式的文本

SELECT PC_Summary(pa) FROM patches LIMIT 1;  

{"pcid":1, "npts":9, "srid":4326, "compr":"dimensional","dims":[{"pos":0,"name":"X","size":4,"type":"int32_t","compr":"sigbits","stats":{"min":-126.99,"max":-126.91,"avg":-126.95}},{"pos":1,"name":"Y","size":4,"type":"int32_t","compr":"sigbits","stats":{"min":45.01,"max":45.09,"avg":45.05}},{"pos":2,"name":"Z","size":4,"type":"int32_t","compr":"sigbits","stats":{"min":1,"max":9,"avg":5}},{"pos":3,"name":"Intensity","size":2,"type":"uint16_t","compr":"rle","stats":{"min":0,"max":0,"avg":0}}]}

        
          
        
        
        
          
          AI 代码解读

PC_Uncompress(p pcpatch) returns pcpatch

解压pcpatch

SELECT PC_Uncompress(pa) FROM patches   
   WHERE PC_NumPoints(pa) = 1;  

01010000000000000001000000C8CEFFFFF8110000102700000A00

        
          
        
        
        
          
          AI 代码解读

PC_Union(p pcpatch[]) returns pcpatch

多个pcpatch聚合为一个pcpatch

-- Compare npoints(sum(patches)) to sum(npoints(patches))  
SELECT PC_NumPoints(PC_Union(pa)) FROM patches;  
SELECT Sum(PC_NumPoints(pa)) FROM patches;  

100

        
          
        
        
        
          
          AI 代码解读

PC_Intersects(p1 pcpatch, p2 pcpatch) returns boolean

判断两个pcpatch是否有交叉

-- Patch should intersect itself  
SELECT PC_Intersects(  
         '01010000000000000001000000C8CEFFFFF8110000102700000A00'::pcpatch,  
         '01010000000000000001000000C8CEFFFFF8110000102700000A00'::pcpatch);  

t

        
          
        
        
        
          
          AI 代码解读

PC_Explode(p pcpatch) returns SetOf[pcpoint]

将pcpatch转换为pcpoint

SELECT PC_AsText(PC_Explode(pa)), id   
FROM patches WHERE id = 7;  

              pc_astext               | id   
--------------------------------------+----  
 {"pcid":1,"pt":[-126.5,45.5,50,5]}   |  7  
 {"pcid":1,"pt":[-126.49,45.51,51,5]} |  7  
 {"pcid":1,"pt":[-126.48,45.52,52,5]} |  7  
 {"pcid":1,"pt":[-126.47,45.53,53,5]} |  7  
 {"pcid":1,"pt":[-126.46,45.54,54,5]} |  7  
 {"pcid":1,"pt":[-126.45,45.55,55,5]} |  7  
 {"pcid":1,"pt":[-126.44,45.56,56,5]} |  7  
 {"pcid":1,"pt":[-126.43,45.57,57,5]} |  7  
 {"pcid":1,"pt":[-126.42,45.58,58,5]} |  7  
 {"pcid":1,"pt":[-126.41,45.59,59,5]} |  7

        
          
        
        
        
          
          AI 代码解读

PC_PatchAvg(p pcpatch, dimname text) returns numeric

求pcpatch中包含的某个维度的信息的平均值

SELECT PC_PatchAvg(pa, 'intensity')   
FROM patches WHERE id = 7;  

5.0000000000000000

        
          
        
        
        
          
          AI 代码解读

PC_PatchMax(p pcpatch, dimname text) returns numeric

求pcpatch中包含的某个维度的信息的最大值

PC_PatchMin(p pcpatch, dimname text) returns numeric

求pcpatch中包含的某个维度的信息的最小值

PC_PatchAvg(p pcpatch) returns pcpoint (from 1.1.0)

求pcpatch中所有pcpoint的所有维度的平均值

PC_PatchMax(p pcpatch) returns pcpoint (from 1.1.0)

求pcpatch中所有pcpoint的所有维度的最大值

PC_PatchMin(p pcpatch) returns pcpoint (from 1.1.0)

求pcpatch中所有pcpoint的所有维度的最小值

PC_FilterGreaterThan(p pcpatch, dimname text, float8 value) returns pcpatch

返回pcpatch中在指定维度上大于指定值的pcpoint

SELECT PC_AsText(PC_FilterGreaterThan(pa, 'y', 45.57))   
FROM patches WHERE id = 7;  

 {"pcid":1,"pts":[[-126.42,45.58,58,5],[-126.41,45.59,59,5]]}

        
          
        
        
        
          
          AI 代码解读

PC_FilterLessThan(p pcpatch, dimname text, float8 value) returns pcpatch

返回pcpatch中在指定维度上小于指定值的pcpoint

PC_FilterBetween(p pcpatch, dimname text, float8 value1, float8 value2) returns pcpatch

返回pcpatch中在指定维度上在指定范围的pcpoint

PC_FilterEquals(p pcpatch, dimname text, float8 value) returns pcpatch

返回pcpatch中在指定维度上等于指定值的pcpoint

PC_Compress(p pcpatch,global_compression_scheme text,compression_config text) returns pcpatch (from 1.1.0)

压缩pcpatch

Allowed global compression schemes are:  
auto -- determined by pcid  
ght -- no compression config supported  
laz -- no compression config supported  
dimensional configuration is a comma-separated list of per-dimension compressions from this list:  
  auto -- determined automatically, from values stats  
  zlib -- deflate compression  
  sigbits -- significant bits removal  
  rle -- run-length encoding

        
          
        
        
        
          
          AI 代码解读

PC_PointN(p pcpatch, n int4) returns pcpoint

返回pcpatch中第n个pcpoint，正值从头开始计数，负值反向计数。

point cloud和PostGIS结合使用

CREATE EXTENSION postgis;  
CREATE EXTENSION pointcloud;  
CREATE EXTENSION pointcloud_postgis;

        
          
        
        
        
          
          AI 代码解读

PC_Intersects(p pcpatch, g geometry) returns boolean

PC_Intersects(g geometry, p pcpatch) returns boolean

判断pcpatch和geometry是否有相交

SELECT PC_Intersects('SRID=4326;POINT(-126.451 45.552)'::geometry, pa)  
FROM patches WHERE id = 7;  

t

        
          
        
        
        
          
          AI 代码解读

PC_Intersection(pcpatch, geometry) returns pcpatch

返回pcpatch中与geometry相交的点组成的一个新的pcpatch

SELECT PC_AsText(PC_Explode(PC_Intersection(  
      pa,   
      'SRID=4326;POLYGON((-126.451 45.552, -126.42 47.55, -126.40 45.552, -126.451 45.552))'::geometry  
)))  
FROM patches WHERE id = 7;  

             pc_astext                 
--------------------------------------  
 {"pcid":1,"pt":[-126.44,45.56,56,5]}  
 {"pcid":1,"pt":[-126.43,45.57,57,5]}  
 {"pcid":1,"pt":[-126.42,45.58,58,5]}  
 {"pcid":1,"pt":[-126.41,45.59,59,5]}

        
          
        
        
        
          
          AI 代码解读

Geometry(pcpoint) returns geometry

pcpoint::geometry returns geometry

将pcpatch中的位置属性的信息转换为geometry类型

SELECT ST_AsText(PC_MakePoint(1, ARRAY[-127, 45, 124.0, 4.0])::geometry);  

POINT Z (-127 45 124)

        
          
        
        
        
          
          AI 代码解读

point cloud的压缩

PostgreSQL point cloud，使用document输入时，可以指定压缩方法。
写法如下，

<pc:metadata>  
  <Metadata name="compression">dimensional</Metadata>  
</pc:metadata>

        
          
        
        
        
          
          AI 代码解读

支持的压缩方法如下

None,   
  which stores points and patches as byte arrays using the type and formats described in the schema document.  

Dimensional,   
  which stores points the same as 'none' but stores patches as collections of dimensional data arrays, with an "appropriate" compression applied.   
Dimensional compression makes the most sense for smaller patch sizes, since small patches will tend to have more homogeneous dimensions.  

GHT or "GeoHash Tree",   
  which stores the points in a tree where each node stores the common values shared by all nodes below.   
For larger patch sizes, GHT should provide effective compression and performance for patch-wise operations.   
You must build Pointcloud with libght support to make use of the GHT compression.  

LAZ or "LASZip".   
  You must build Pointcloud with LAZPERF support to make use of the LAZ compression.  

If no compression is declared in <pc:metadata>, then a compression of "none" is assumed.

        
          
        
        
        
          
          AI 代码解读

point cloud的二进制格式

The point and patch binary formats start with a common header, which provides:

endianness flag, to allow portability between architectures
pcid number, to look up the schema information in the pointcloud_formats table

Point Binary

byte:     endianness (1 = NDR, 0 = XDR)  
uint32:   pcid (key to POINTCLOUD_SCHEMAS)  
uchar[]:  pointdata (interpret relative to pcid)

        
          
        
        
        
          
          AI 代码解读

The patch binary formats have additional standard header information:

the compression number, which indicates how to interpret the data
the number of points in the patch

Patch Binary (Uncompressed)

byte:         endianness (1 = NDR, 0 = XDR)  
uint32:       pcid (key to POINTCLOUD_SCHEMAS)  
uint32:       0 = no compression  
uint32:        npoints  
pointdata[]:  interpret relative to pcid

        
          
        
        
        
          
          AI 代码解读

pcpatch的压缩格式的二进制表述请参考
https://github.com/pgpointcloud/pointcloud

如果将数据导入point cloud

有两种格式导入
From WKB

From PDAL

参考
https://github.com/pgpointcloud/pointcloud

pcpoint和pcpatch类型的SQL定义

CREATE TYPE pcpoint (  
    internallength = variable,  
    input = pcpoint_in,  
    output = pcpoint_out,  
    -- send = geometry_send,  
    -- receive = geometry_recv,  
    typmod_in = pc_typmod_in,  
    typmod_out = pc_typmod_out,  
    -- delimiter = ':',  
    -- alignment = double,  
    -- analyze = geometry_analyze,  
    storage = external -- do not try to compress it please  
);  

CREATE TYPE pcpatch (  
    internallength = variable,  
    input = pcpatch_in,  
    output = pcpatch_out,  
    -- send = geometry_send,  
    -- receive = geometry_recv,  
    typmod_in = pc_typmod_in,  
    typmod_out = pc_typmod_out,  
    -- delimiter = ':',  
    -- alignment = double,  
    -- analyze = geometry_analyze,  
    storage = external  
);  

CREATE TYPE pointcloud_abs (  
    internallength = 8,  
    input = pointcloud_abs_in,  
    output = pointcloud_abs_out,  
    alignment = double  
);

        
          
        
        
        
          
          AI 代码解读

pcpoint 数据类型输入输出对应的C函数

PG_FUNCTION_INFO_V1(pcpoint_in);  
Datum pcpoint_in(PG_FUNCTION_ARGS)  
{  
    char *str = PG_GETARG_CSTRING(0);  
    /* Datum pc_oid = PG_GETARG_OID(1); Not needed. */  
    int32 typmod = 0;  
    uint32 pcid = 0;  
    PCPOINT *pt;  
    SERIALIZED_POINT *serpt = NULL;  

    if ( (PG_NARGS()>2) && (!PG_ARGISNULL(2)) )  
    {  
        typmod = PG_GETARG_INT32(2);  
        pcid = pcid_from_typmod(typmod);  
    }  

    /* Empty string. */  
    if ( str[0] == '\0' )  
    {  
        ereport(ERROR,(errmsg("pcpoint parse error - empty string")));  
    }  

    /* Binary or text form? Let's find out. */  
    if ( str[0] == '0' )  
    {  
        /* Hex-encoded binary */  
        pt = pc_point_from_hexwkb(str, strlen(str), fcinfo);  
        pcid_consistent(pt->schema->pcid, pcid);  
        serpt = pc_point_serialize(pt);  
        pc_point_free(pt);  
    }  
    else  
    {  
        ereport(ERROR,(errmsg("parse error - support for text format not yet implemented")));  
    }  

    if ( serpt ) PG_RETURN_POINTER(serpt);  
    else PG_RETURN_NULL();  
}  

PG_FUNCTION_INFO_V1(pcpoint_out);  
Datum pcpoint_out(PG_FUNCTION_ARGS)  
{  
    PCPOINT *pcpt = NULL;  
    PCSCHEMA *schema = NULL;  
    SERIALIZED_POINT *serpt = NULL;  
    char *hexwkb = NULL;  

    serpt = PG_GETARG_SERPOINT_P(0);  
    schema = pc_schema_from_pcid(serpt->pcid, fcinfo);  
    pcpt = pc_point_deserialize(serpt, schema);  
    hexwkb = pc_point_to_hexwkb(pcpt);  
    pc_point_free(pcpt);  
    PG_RETURN_CSTRING(hexwkb);  
}

        
          
        
        
        
          
          AI 代码解读

pcpatch 数据类型输入输出对应的C函数

PG_FUNCTION_INFO_V1(pcpatch_in);  
Datum pcpatch_in(PG_FUNCTION_ARGS)  
{  
    char *str = PG_GETARG_CSTRING(0);  
    /* Datum geog_oid = PG_GETARG_OID(1); Not needed. */  
    uint32 typmod = 0, pcid = 0;  
    PCPATCH *patch;  
    SERIALIZED_PATCH *serpatch = NULL;  

    if ( (PG_NARGS()>2) && (!PG_ARGISNULL(2)) )  
    {  
        typmod = PG_GETARG_INT32(2);  
        pcid = pcid_from_typmod(typmod);  
    }  

    /* Empty string. */  
    if ( str[0] == '\0' )  
    {  
        ereport(ERROR,(errmsg("pcpatch parse error - empty string")));  
    }  

    /* Binary or text form? Let's find out. */  
    if ( str[0] == '0' )  
    {  
        /* Hex-encoded binary */  
        patch = pc_patch_from_hexwkb(str, strlen(str), fcinfo);  
        pcid_consistent(patch->schema->pcid, pcid);  
        serpatch = pc_patch_serialize(patch, NULL);  
        pc_patch_free(patch);  
    }  
    else  
    {  
        ereport(ERROR,(errmsg("parse error - support for text format not yet implemented")));  
    }  

    if ( serpatch ) PG_RETURN_POINTER(serpatch);  
    else PG_RETURN_NULL();  
}  

PG_FUNCTION_INFO_V1(pcpatch_out);  
Datum pcpatch_out(PG_FUNCTION_ARGS)  
{  
    PCPATCH *patch = NULL;  
    SERIALIZED_PATCH *serpatch = NULL;  
    char *hexwkb = NULL;  
    PCSCHEMA *schema = NULL;  

    serpatch = PG_GETARG_SERPATCH_P(0);  
    schema = pc_schema_from_pcid(serpatch->pcid, fcinfo);  
    patch = pc_patch_deserialize(serpatch, schema);  
    hexwkb = pc_patch_to_hexwkb(patch);  
    pc_patch_free(patch);  
    PG_RETURN_CSTRING(hexwkb);  
}

        
          
        
        
        
          
          AI 代码解读

参考

https://en.wikipedia.org/wiki/Lidar
http://baike.baidu.com/view/2922098.htm
https://github.com/pgpointcloud/pointcloud
http://pointcloud.org/
https://en.wikipedia.org/wiki/Point_cloud
http://www.pdal.io/

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

PostgreSQL 内核扩展之 - 管理十亿级3D扫描数据(基于Lidar产生的point cloud数据)