PostgreSQL 作为图数据库存储引擎-阿里云开发者社区

PostgreSQL 作为图数据库存储引擎

2016-01-18 12018

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： PostgreSQL 与开源图数据库CayLey的结合。

CayLey是GO语言写的一个图数据库引擎，支持RESTful API，内置查询编辑器和可视化，支持MQL和JAVASCRIPT查询接口，后端存储支持文件格式，PostgreSQL，mongodb，LevelDB，Bolt。模块化设计，扩展后端存储非常容易。

本文将以PostgreSQL为例，演示一下CayLey的使用。

安装go：

yum install -y go

执行以下命令，克隆cayley和依赖：

mkdir -p ~/cayley && cd ~/cayley

export GOPATH=`pwd`

export PATH=$PATH:~/cayley/bin

mkdir -p bin pkg src/github.com/google

cd src/github.com/google

git clone https://github.com/google/cayley

cd cayley

go get github.com/tools/godep

godep restore

go build ./cmd/cayley

样本数据：

$ ll data

-rw-rw-r--. 1 postgres postgres 26M Jan 17 21:45 30kmoviedata.nq.gz

-rw-rw-r--. 1 postgres postgres 463 Jan 17 21:45 testdata.nq

$ gunzip 30kmoviedata.nq.gz

cayley使用帮助：

$ ./cayley --help

No command --help

Usage:

cayley COMMAND [flags]

Commands:

init Create an empty database.

load Bulk-load a quad file into the database.

http Serve an HTTP endpoint on the given host and port.

dump Bulk-dump the database into a quad file.

repl Drop into a REPL of the given query language.

version Version information.

Flags:

-alsologtostderr=false: log to standard error as well as files

-assets="": Explicit path to the HTTP assets.

-config="": Path to an explicit configuration file.

-db="memstore": Database Backend.

-dbpath="/tmp/testdb": Path to the database.

-dump="dbdump.nq": Quad file to dump the database to (".gz" supported, "-" for stdout).

-dump_type="quad": Quad file format ("json", "quad", "gml", "graphml").

-format="cquad": Quad format to use for loading ("cquad" or "nquad").

-host="127.0.0.1": Host to listen on (defaults to all).

-ignoredup=false: Don't stop loading on duplicated key on add

-ignoremissing=false: Don't stop loading on missing key on delete

-init=false: Initialize the database before using it. Equivalent to running `cayley init` followed by the given command.

-load_size=10000: Size of quadsets to load

-log_backtrace_at=:0: when logging hits line file:N, emit a stack trace

-log_dir="": If non-empty, write log files in this directory

-logstashtype="": enable logstash logging and define the type

-logstashurl="172.17.42.1:5042": logstash url and port

-logtostderr=false: log to standard error instead of files

-port="64210": Port to listen on.

-prof="": Output profiling file.

-quads="": Quad file to load before going to REPL.

-query_lang="gremlin": Use this parser as the query language.

-read_only=false: Disable writing via HTTP.

-replication="single": Replication method.

-stderrthreshold=0: logs at or above this threshold go to stderr

-timeout=30s: Elapsed time until an individual query times out.

-v=0: log level for V logs

-vmodule=: comma-separated list of pattern=N settings for file-filtered logging

假设已有一个PostgreSQL数据库。

IP : 192.168.150.132

PORT : 1921

DBNAME : postgres

USER : digoal

PWD : digoal_pwd

初始化

./cayley init -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable"

导入数据

./cayley load -quads="data/" -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable"

50亿测试数据约2TB。

开启repl或http接口服务。

./cayley repl -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"

或

./cayley http -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"

使用http接口的图例：

Query Shape：

后端是PostgreSQL时，Cayley自动将MQL或JAVASCRIPT自动转换成SQL到数据库查询，并返回结果。

对于PostgreSQL作为后端的场景，优化的手段：

1. 使用GPU加速HASH JOIN和数据扫描。

2. 使用分区表，减少无用块扫描。

3. 其他通用的PG优化手段

如果数据量大到单库的计算资源和IO资源性能支撑不住，可以用Greenplum来实现分布式查询。

查询接口：

Javascript/Gremlin API documentation

图对象

根据节点ID，检索，返回路径

路径对象

路径相交，节点匹配等

查询路径对象

数值转换，等。

[参考]

1. https://github.com/google/cayley

PostgreSQL 作为图数据库存储引擎

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

推荐镜像