關于PostgreSQL與MongoDB在NoSQL方面的簡單對比
來自: http://my.oschina.net/rocky0202/blog/614041
之前我們在和開發團隊所設計框架里面的MongoDB標準組件PK時,曾經找了一些PG與MongoDB的對比材料。
今天得空又具體看了下,簡單總結一下,供大家借鑒參考:
一、先搞清楚它們支持的數據類型。
PG支持的數據類型叫JSON,從PostgreSQL 9.3版本開始,JSON已經成為內置數據類型,不僅僅是一個擴展了。
PG從9.4開始,又推出了新的JSONB的數據類型。
Document Database – JSON
Document database capabilities in Postgres advanced significantly
when support for the JSON data type was introduced in 2012 as part of
Postgres 9.2. JSON (JavaScript Object Notation) is one of the most
popular data-interchange formats on the web. It is supported by
virtually every programming language in use today, and continues to
gain traction. Some NoSQL-only systems, such as MongoDB, use
JSON (or its more limited binary cousin BSON) as their native data
interchange format.
Postgres offers robust support for JSON. Postgres has a JSON data
type, which validates and stores JSON data and provides functions for
extracting elements from JSON values. And, it offers the ability to
easily encode query result sets using JSON. This last piece of
functionality is particularly important, as it means that applications that
prefer to work natively with JSON can easily obtain their data from
Postgres in JSON.
Below are some examples of using JSON data in Postgres:
In addition to the native JSON data type, Postgres v9.3, released in
2013, added a JSON parser and a variety of JSON functions. This
means web application developers don't need translation layers in
the code between the database and the web framework that uses
JSON. JSON-formatted data can be sent directly to the database
where Postgres will not only store the data, but properly validate it
as well. With JSON functions, Postgres can read relational data
from a table and return it to the application as valid JSON formatted
strings. And, the relational data can be returned as JSON for either a single value or an entire record.
JSONB – Binary JSON
Postgres 9.4 introduces JSONB, a second JSON type with a binary
storage format. There are some significant differences between
JSONB in Postgres and BSON, which is used by one of the largest
document-only database providers. JSONB uses an internal
storage format that is not exposed to clients; JSONB values are
sent and received using the JSON text representation. BSON
stands for Binary JSON, but in fact not all JSON values can be
represented using BSON. For example, BSON cannot represent an
integer or floating-point number with more than 64 bits of precision,
whereas JSONB can represent arbitrary JSON values. Users of
BSON-based solutions should be aware of this limitation to avoid
data loss.
引用Francs的blog:jsonb 的出現帶來了更多的函數, 更多的索引創建方式, 更多的操作符和更高的性能。
關于PG的JSON和JSONB的具體介紹和測試,參考Francs的blog:
http://francs3.blog.163.com/blog/static/40576727201452293027868/
http://francs3.blog.163.com/blog/static/40576727201442264738357/
http://francs3.blog.163.com/blog/static/40576727201341613630793/
MongoDB支持的數據類型叫BSON
其實BSON就是JSON的一個擴展,BSON是一種類json的一種二進制形式的存儲格式,簡稱Binary JSON,它和JSON一樣,支持內嵌的文檔對象和數組對象,但是BSON有JSON沒有的一些數據類型,如Date和BinData類型。
二者的區別參考:
http://www.tuicool.com/articles/iUNbyi
BSON是由10gen開發的一個數據格式,目前主要用于MongoDB中,是MongoDB的數據存儲格式。BSON基于JSON格式,選擇JSON進行改造的原因主要是JSON的通用性及JSON的schemaless的特性。
BSON主要會實現以下三點目標:
1.更快的遍歷速度
對JSON格式來說,太大的JSON結構會導致數據遍歷非常慢。在JSON中,要跳過一個文檔進行數據讀取,需要對此文檔進行掃描才行,需要進行麻煩的數據結構匹配,比如括號的匹配,而BSON對JSON的一大改進就是,它會將JSON的每一個元素的長度存在元素的頭部,這樣你只需要讀取到元素長度就能直接seek到指定的點上進行讀取了。
2.操作更簡易
對JSON來說,數據存儲是無類型的,比如你要修改基本一個值,從9到10,由于從一個字符變成了兩個,所以可能其后面的所有內容都需要往后移一位才可以。而使用BSON,你可以指定這個列為數字列,那么無論數字從9長到10還是100,我們都只是在存儲數字的那一位上進行修改,不會導致數據總長變大。當然,在MongoDB中,如果數字從整形增大到長整型,還是會導致數據總長變大的。
3.增加了額外的數據類型
JSON是一個很方便的數據交換格式,但是其類型比較有限。BSON在其基礎上增加了“byte array”數據類型。這使得二進制的存儲不再需要先base64轉換后再存成JSON。大大減少了計算開銷和數據大小。
當然,在有的時候,BSON相對JSON來說也并沒有空間上的優勢,比如對{“field”:7},在JSON的存儲上7只使用了一個字節,而如果用BSON,那就是至少4個字節(32位)
二、PG和MongoDB在讀寫方面的測試對比:
PG的商業版EnterpriseDB公司在2014.9月發布了一份針對MongoDB v2.6 to Postgres v9.4 beta的對比報告(如果其對比MongoDB 3.0版本可能測試結果會有差別), 簡單翻譯其測試報告結果如下:
5000萬文檔型數據查詢、加載、插入時:
1, 大量數據的加載,PG比MongoDB快2.1倍
2, 同樣的數據量,MongoDB的占用空間是PG的1.33倍。
3, Insert操作MongoDB比PG慢3倍
4, Select操作MongoDB比PG慢2.5倍。
詳細可以參閱:
當然,EnterpriseDB的測試報告有商業宣傳意味,而且據國內數據庫同仁skykiker的blog的模擬測試的結果,可以看得出EnterpriseDB是有水分的。參考《 為什么PostgreSQL比MongoDB還快? 》:
http://blog.chinaunix.net/uid-20726500-id-4960138.html
但是, EDB的這份報告里面的下面這段話與我們之前對開發的說法是一致的,也簡單翻譯了一下,這個我個人理解其實也是企業內復雜應用環境下面,選擇使用PG對比MongoDB的核心優勢,大家可以對開發這樣去灌輸這些理念:
對于應用開發的自由性:
PG的最新版本引領了一個新的開發靈活性的領域, 超過了原來只使用NoSQL帶來的開發自由性。對于像MongoDB這樣在針對性優勢而細分出來的市場中的使用一直在增長,因為開發人員需要從傳統的關系型數據庫的結構化數據中解放出來,他們需要快速交付并且處理新的數據類型。他們選擇強大但是有限制的解決方案可以快速滿足需求,讓他們不用在做任何變動的時候等待DBA。
但是,很多企業發現成功的應用經常需要結構化的數據來落地(down the road這句話我認為用得很好),這種數據對企業更加有價值。PG給到開發人員新的有力的解決方案,從非結構化數據開始著手,當需求變化或增多時,可以將結構化和非結構化的數據在同一個數據庫里面進行有機結合,而且是在一個具有ACID特性的環境。
原文如下:
Developer Freedom
With the newest version, PostgreSQL has ushered in a new era of developer flexibility exceeding the freedom they discovered with NoSQL-only solutions. The use of niche solutions, like MongoDB, increased because developers needed freedom from the structured data model required by relational databases. They needed to move quickly and work with new data types. They choose powerful but limited solutions that addressed immediate needs, that let them make changes without having to wait for a DBA.
However, many organizations have discovered that successful applications often require structure down the road, as data becomes more valuable across the organization. Postgres gives developers broad new powers to start out unstructured, and then when the need arises, combine unstructured and structured data using the same database engine and within an ACID-compliant environment.
- See more at: http://www.enterprisedb.com/postgres-plus-edb-blog/marc-linster/postgres-outperforms-mongodb-and-ushers-new-developer-reality#sthash.wPT3crV7.dpuf
對于我們DBA面對的開發人員來講,即使把大量關系型數據同步到hadoop來進行運算、分析,但是最終產出分析報表,還是要回到關系型數據庫來做精細化的加工處理。從rdbms -> hadoop -> rdbms可以看到要經過hadoop中轉一道,一方面存在數據同步時效(時間和效率)的問題,另一方面提供實時查詢功能不足,另外可能還需要hive的開發技能。而如果使用PG例如Greenplumb時,直接全部都可以在一個PG集群里面完成,上述問題都迎刃而解。
上面這段話是我現在經常用來向開發推銷GP的一種說辭。
以下是2個PG對于關系型數據和非關系型數據整合操作的例子:
1,Combining ANSI SQL Queries and JSON Queries
One of Postgres’ key strengths is the easy integration of conventional
SQL statements, for ANSI SQL tables and records, with JSON and
HSTORE references pointing documents and key-value pairs. Because
JSON and HSTORE are extensions of the underlying Postgres model,
the queries use the same syntax, run in the same ACID transactional
environment, and rely on the same query planner, optimizer and
indexing technologies as conventional SQL-only queries.
這真是一個大殺器,可以將傳統SQL語言和JSON查詢語言聯合一起實現對結構化和非結構化數據的訪問!
2,Bridging Between ANSI SQL and JSON
Postgres provides a number of functions to bridge between JSON and
ANSI SQL. This is an important capability when applications and data
models mature, and designers start to recognize emerging data
structures and relationships.
Postgres can create a bridge between ANSI SQL and JSON, for
example by making a ANSI SQL table look like a JSON data set. This
capability allows developers and DBAs to start with an unstructured
data set, and as the project progresses, adjust the balance between
structured and unstructured data.
更多PG支持的NoSQL的特性和場景, 參考:
http://www.enterprisedb.com/nosql-for-enterprise?src=edb-blog