Hbase的Python API模塊Starbase介紹

287442810 9年前發布 | 32K 次閱讀 Python HBase API Python開發

來自: http://openskill.cn/article/269

The following guest post is provided by Artur Barseghyan, a web developer currently employed by Goldmund, Wyldebeast & Wunderliebe in The Netherlands.

Python is my personal (and primary) programming language of choice and also happens to be the primary programming language at my company. So, when starting to work with a new technology, I prefer to use a clean and easy (Pythonic!) API.

After studying tons of articles on the web, reading (and writing) white papers, and doing basic performance tests (sometimes hard if you’re on a tight schedule), my company recently selected Cloudera for our Big Data platform (including using Apache HBase as our data store for Apache Hadoop), with Cloudera Manager serving a role as “one console to rule them all.”

However, I was surprised shortly thereafter to learn about the absence of a working Python wrapper around the REST API for HBase (aka Stargate). I decided to write one in my free time, and the result, ladies and gentlemen, wasStarbase (GPL).

In this post, I will provide some code samples and briefly explain what work has been done on Starbase. I assume that reader of this blog post already has some basic understanding of HBase (that is, of tables, column families, qualifiers, and so on).

一、安裝

Next, I’ll show you some frequently used commands and use cases. But first, install the current version of Starbase from CheeseShop (PyPi).

# pip install starbase

導入模塊:

>>> from starbase import Connection

…and create a connection instance. Starbase defaults to 127.0.0.1:8000; if your settings are different, specify them here.

>>> c = Connection()

二、API 操作實例

</div>

2.1 顯示所有的表

假設有兩個現有的表名為table1和table2表,以下將會打印出來。

>>> c.tables()

['table1', 'table2']</pre>

2.2 表的設計操作

每當你需要操作的表,你需要先創建一個表的實例。

創建一個表實例(注意,在這一步驟中沒有創建表):

>>> t = c.table('table3')

Create a new table:

Create a table with columns ‘column1′, ‘column2′, ‘column3′ (here the table is actually created):

>>> t.create('column1', 'column2', 'column3')

201</pre>

檢查表是否存在:

>>> t.exists()

True</pre>

查看表的列:

>>> t.columns()

['column1', 'column2', 'column3']</pre>

將列添加到表,(‘column4’,‘column5’,‘column6’,‘column7’):

>>> t.add_columns('column4', 'column5', 'column6', 'column7')

200</pre>

刪除列表,(‘column6’, ‘column7’):

>>> t.drop_columns('column6', 'column7')

201</pre>

刪除整個表:

>>> t.drop()

200</pre>

2.3 表的數據操作

將數據插入一行:

>>> t.insert(

>>> 'my-key-1',

>>> {

>>> 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},

>>> 'column2': {'key21': 'value 21', 'key22': 'value 22'},

>>> 'column3': {'key32': 'value 31', 'key32': 'value 32'}

>>> }

>>> )

200</pre>

請注意,您也可以使用“本地”的命名方式列和細胞(限定詞)。以下的結果等于前面的例子的結果。

>>> t.insert(

>>> 'my-key-1a',

>>> {

>>> 'column1:key11': 'value 11', 'column1:key12': 'value 12', 'column1:key13': 'value 13',

>>> 'column2:key21': 'value 21', 'column2:key22': 'value 22',

>>> 'column3:key32': 'value 31', 'column3:key32': 'value 32'

>>> }

>>> )

200</pre>

更新一排數據:

>>> t.update(

>>> 'my-key-1',

>>> {'column4': {'key41': 'value 41', 'key42': 'value 42'}}

>>> )

200</pre>

Remove a row cell (qualifier):

>>> t.remove('my-key-1', 'column4', 'key41')

200</pre>

Remove a row column (column family):

>>> t.remove('my-key-1', 'column4')

200</pre>

Remove an entire row:

>>> t.remove('my-key-1')

200</pre>

Fetch a single row with all columns:

>>> t.fetch('my-key-1')

{

  'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},

  'column2': {'key21': 'value 21', 'key22': 'value 22'},

  'column3': {'key32': 'value 31', 'key32': 'value 32'}

}</pre>

Fetch a single row with selected columns (limit to ‘column1′ and ‘column2′ columns):

>>> t.fetch('my-key-1', ['column1', 'column2'])

{

  'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},

  'column2': {'key21': 'value 21', 'key22': 'value 22'},

}</pre>

Narrow the result set even more (limit to cells ‘key1′ and ‘key2′ of column `column1` and cell ‘key32′ of column ‘column3′):

>>> t.fetch('my-key-1', {'column1': ['key11', 'key13'], 'column3': ['key32']})

{

  'column1': {'key11': 'value 11', 'key13': 'value 13'},

  'column3': {'key32': 'value 32'}

}</pre>

Note that you may also use the native means of naming the columns and cells (qualifiers). The example below does exactly the same thing as the example above.

>>>  t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32'])

{

  'column1': {'key11': 'value 11', 'key13': 'value 13'},

  'column3': {'key32': 'value 32'}

}</pre>

If you set the perfect_dict argument to False, you’ll get the native data structure:

>>>  t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32'], perfect_dict=False)

{

'column1:key11': 'value 11', 'column1:key13': 'value 13',

'column3:key32': 'value 32'

}</pre>

2.4 對表數據批處理操作

Batch operations (insert and update) work similarly to routine insert and update, but are done in a batch. You are advised to operate in batch as much as possible.

In the example below, we will insert 5,000 records in a batch:  

>>> data = {

>>> 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},

>>> 'column2': {'key21': 'value 21', 'key22': 'value 22'},

>>> }

>>> b = t.batch()

>>> for i in range(0, 5000):

>>> b.insert('my-key-%s' % i, data)

>>> b.commit(finalize=True)

{'method': 'PUT', 'response': [200], 'url': 'table3/bXkta2V5LTA='}</pre>

In the example below, we will update 5,000 records in a batch:

>>> data = {

>>> 'column3': {'key31': 'value 31', 'key32': 'value 32'},

>>> }

>>> b = t.batch()

>>> for i in range(0, 5000):

>>> b.update('my-key-%s' % i, data)

>>> b.commit(finalize=True)

{'method': 'POST', 'response': [200], 'url': 'table3/bXkta2V5LTA='}</pre>

Note: The table batch method accepts an optional size argument (int). If set, an auto-commit is fired each the time the stack is full.

2.5 表數據搜索(行掃描)

A table scanning feature is in development. At the moment it’s only possible to fetch all rows from a table. The result set returned is a generator.

注意:表數據掃描功能正在開發中。目前僅支持取出表中所有數據(Full Table Scan),暫不支持范圍掃描(RowKey Range Scan),其結果以一個迭代器形式返回。

>>> t.fetch_all_rows()

就介紹到這里了,沒有時間翻譯,聽簡單的英文!

</div>

 本文由用戶 287442810 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!