PostgreSQL 作為圖數據庫存儲引擎

jopen 8年前發布 | 8K 次閱讀 PostgreSQL 數據庫服務器

CayLey是GO語言寫的一個圖數據庫引擎,支持RESTful API,內置查詢編輯器和可視化,支持MQL和JAVASCRIPT查詢接口,后端存儲支持文件格式,PostgreSQL,mongodb,LevelDB,Bolt。模塊化設計,擴展后端存儲非常容易。

本文將以PostgreSQL為例,演示一下CayLey的使用。

c34233c39a8d52a7bfc00b3362baa40d63a7a5d4


安裝go:

yum install -y go


執行以下命令,克隆cayley和依賴:

mkdir -p ~/cayley && cd ~/cayley

export GOPATH=`pwd`

export PATH=$PATH:~/cayley/bin

mkdir -p bin pkg src/github.com/google

cd src/github.com/google

git clone https://github.com/google/cayley

cd cayley

go get github.com/tools/godep

godep restore

go build ./cmd/cayley
</div>


樣本數據:

$ ll data

-rw-rw-r--. 1 postgres postgres 26M Jan 17 21:45 30kmoviedata.nq.gz

-rw-rw-r--. 1 postgres postgres 463 Jan 17 21:45 testdata.nq
</div>

$ gunzip 30kmoviedata.nq.gz


cayley使用幫助:

$ ./cayley --help

No command --help


Usage:

  cayley COMMAND [flags]


Commands:

  init      Create an empty database.

  load      Bulk-load a quad file into the database.

  http      Serve an HTTP endpoint on the given host and port.

  dump      Bulk-dump the database into a quad file.

  repl      Drop into a REPL of the given query language.

  version   Version information.


Flags:

  -alsologtostderr=false: log to standard error as well as files

  -assets="": Explicit path to the HTTP assets.

  -config="": Path to an explicit configuration file.

  -db="memstore": Database Backend.

  -dbpath="/tmp/testdb": Path to the database.

  -dump="dbdump.nq": Quad file to dump the database to (".gz" supported, "-" for stdout).

  -dump_type="quad": Quad file format ("json", "quad", "gml", "graphml").

  -format="cquad": Quad format to use for loading ("cquad" or "nquad").

  -host="127.0.0.1": Host to listen on (defaults to all).

  -ignoredup=false: Don't stop loading on duplicated key on add

  -ignoremissing=false: Don't stop loading on missing key on delete

  -init=false: Initialize the database before using it. Equivalent to running `cayley init` followed by the given command.

  -load_size=10000: Size of quadsets to load

  -log_backtrace_at=:0: when logging hits line file:N, emit a stack trace

  -log_dir="": If non-empty, write log files in this directory

  -logstashtype="": enable logstash logging and define the type

  -logstashurl="172.17.42.1:5042": logstash url and port

  -logtostderr=false: log to standard error instead of files

  -port="64210": Port to listen on.

  -prof="": Output profiling file.

  -quads="": Quad file to load before going to REPL.

  -query_lang="gremlin": Use this parser as the query language.

  -read_only=false: Disable writing via HTTP.

  -replication="single": Replication method.

  -stderrthreshold=0: logs at or above this threshold go to stderr

  -timeout=30s: Elapsed time until an individual query times out.

  -v=0: log level for V logs

  -vmodule=: comma-separated list of pattern=N settings for file-filtered logging
</div>


假設已有一個PostgreSQL數據庫。

IP : 192.168.150.132

PORT : 1921

DBNAME : postgres

USER : digoal

PWD : digoal_pwd


初始化

./cayley init -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable"


導入數據

./cayley load -quads="data/" -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable"

50億測試數據約2TB。


開啟repl或http接口服務。

./cayley repl -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"

./cayley http -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"


使用http接口的圖例:

65a18ac0601d02c0e3b815593896dcf6e54cfa2f


 Query Shape:
63f6027f023a3dc03c89891897245f6179f2610c
后端是PostgreSQL時,Cayley自動將MQL或JAVASCRIPT自動轉換成SQL到數據庫查詢,并返回結果。 </div>


對于PostgreSQL作為后端的場景,優化的手段:

1. 使用GPU加速HASH JOIN和數據掃描。

2. 使用分區表,減少無用塊掃描。

3. 其他通用的PG優化手段


如果數據量大到單庫的計算資源和IO資源性能支撐不住,可以用Greenplum來實現分布式查詢。


查詢接口:

Javascript/Gremlin API documentation

圖對象

  根據節點ID,檢索,返回路徑

路徑對象

  路徑相交,節點匹配等

查詢路徑對象

  數值轉換,等。


[參考]

來自: http://yq.aliyun.com/articles/2987

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!