PostgreSQL 作為圖數據庫存儲引擎
CayLey是GO語言寫的一個圖數據庫引擎,支持RESTful API,內置查詢編輯器和可視化,支持MQL和JAVASCRIPT查詢接口,后端存儲支持文件格式,PostgreSQL,mongodb,LevelDB,Bolt。模塊化設計,擴展后端存儲非常容易。
本文將以PostgreSQL為例,演示一下CayLey的使用。

安裝go:
yum install -y go
執行以下命令,克隆cayley和依賴:
mkdir -p ~/cayley && cd ~/cayley
export GOPATH=`pwd`
export PATH=$PATH:~/cayley/bin
mkdir -p bin pkg src/github.com/google
cd src/github.com/google
git clone https://github.com/google/cayley
cd cayley
go get github.com/tools/godep
godep restore
go build ./cmd/cayley
</div>
樣本數據:
$ ll data
-rw-rw-r--. 1 postgres postgres 26M Jan 17 21:45 30kmoviedata.nq.gz
-rw-rw-r--. 1 postgres postgres 463 Jan 17 21:45 testdata.nq
</div>
$ gunzip 30kmoviedata.nq.gz
cayley使用幫助:
$ ./cayley --help
No command --help
Usage:
cayley COMMAND [flags]
Commands:
init Create an empty database.
load Bulk-load a quad file into the database.
http Serve an HTTP endpoint on the given host and port.
dump Bulk-dump the database into a quad file.
repl Drop into a REPL of the given query language.
version Version information.
Flags:
-alsologtostderr=false: log to standard error as well as files
-assets="": Explicit path to the HTTP assets.
-config="": Path to an explicit configuration file.
-db="memstore": Database Backend.
-dbpath="/tmp/testdb": Path to the database.
-dump="dbdump.nq": Quad file to dump the database to (".gz" supported, "-" for stdout).
-dump_type="quad": Quad file format ("json", "quad", "gml", "graphml").
-format="cquad": Quad format to use for loading ("cquad" or "nquad").
-host="127.0.0.1": Host to listen on (defaults to all).
-ignoredup=false: Don't stop loading on duplicated key on add
-ignoremissing=false: Don't stop loading on missing key on delete
-init=false: Initialize the database before using it. Equivalent to running `cayley init` followed by the given command.
-load_size=10000: Size of quadsets to load
-log_backtrace_at=:0: when logging hits line file:N, emit a stack trace
-log_dir="": If non-empty, write log files in this directory
-logstashtype="": enable logstash logging and define the type
-logstashurl="172.17.42.1:5042": logstash url and port
-logtostderr=false: log to standard error instead of files
-port="64210": Port to listen on.
-prof="": Output profiling file.
-quads="": Quad file to load before going to REPL.
-query_lang="gremlin": Use this parser as the query language.
-read_only=false: Disable writing via HTTP.
-replication="single": Replication method.
-stderrthreshold=0: logs at or above this threshold go to stderr
-timeout=30s: Elapsed time until an individual query times out.
-v=0: log level for V logs
-vmodule=: comma-separated list of pattern=N settings for file-filtered logging
</div>
假設已有一個PostgreSQL數據庫。
IP : 192.168.150.132
PORT : 1921
DBNAME : postgres
USER : digoal
PWD : digoal_pwd
初始化
./cayley init -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable"
導入數據
./cayley load -quads="data/" -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable"
50億測試數據約2TB。
開啟repl或http接口服務。
./cayley repl -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"
或
./cayley http -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"
使用http接口的圖例:


對于PostgreSQL作為后端的場景,優化的手段:
1. 使用GPU加速HASH JOIN和數據掃描。
2. 使用分區表,減少無用塊掃描。
3. 其他通用的PG優化手段
如果數據量大到單庫的計算資源和IO資源性能支撐不住,可以用Greenplum來實現分布式查詢。
查詢接口:
Javascript/Gremlin API documentation
圖對象
根據節點ID,檢索,返回路徑
路徑對象
路徑相交,節點匹配等
查詢路徑對象
數值轉換,等。
本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!