Cassandra操作入門

foreverlys 10年前發布 | 58K 次閱讀 Apache Cassandra NoSQL數據庫

來自： http://blog.csdn.net/fenglibing/article/details/9411021

1. Cassandra是什么

Apache Cassandra是一套開源分布式NoSQL數據庫系統。它最初由非死book開發，用于儲存收件箱等簡單格式數據，集Google BigTable的數據模型與Amazon Dynamo的完全分布式的架構于一身。非死book于2008將 Cassandra 開源，此后，由于Cassandra良好的可擴放性，被Digg、推ter等知名Web 2.0網站所采納，成為了一種流行的分布式結構化數據存儲方案。

詳細參看:http://zh.wikipedia.org/wiki/Cassandra

2. 下載、安裝、運行服務端及客戶端

下載：http://cassandra.apache.org/download/

安裝：因Cassandra是JAVA編寫，所以理論上是在具有JDK6及上版本的機器上都可以運行，官方測試的和JDK有OpenJDK 及Sun的JDK。

運行服務端：在WINDOWS上可不用修改任何文件，直接運行bin/cassandra.bat；

在Linux上，如果不修改配置文件，一定要保證目錄“/var/log/cassandra”及“/var/lib/cassandra”是必須存在的，并且擁有權限，直接運行bin/cassandra

運行客戶端：

在windows上面運行bin/cassandra-cli.bat，linux上面運行bin/cassandra-cli，沒有報錯且出現類似這樣的提示符就說明成功連接上了：

[default]

3、配置文件

conf/cassandra.yaml：這個是核心配置文件，包括各種策略、數據日志及cache data存放的地方等，如數據文件的配置項“data_file_directories”，上面我們是直接啟動了cassandra，默認在的日志及數據存放目錄分別是：

Windows：

在Cassandra運行的所在盤的根目錄下面，會有一個var這樣的目錄，然后下面分別會有log、lib目錄分別用于存放數據及日志；

Linux：

存放日志及數據的目錄是“/var/log/cassandra”及“/var/lib/cassandra”

詳細的配置項就自己看了。

4、操作示例

4.1 簡介

Cassandra的操作命令，類似于我們平時操作的關系數據庫一樣，對熟悉MYSQL的朋友來說，看到的都會是一些熟悉的身影，如創建是用create，刪除是用drop，更新是用update，查看對象是用show，要使用某個列族長則用use，非常的好記。如果是第一次使用，建設還是看這個官方的入門操作文檔吧：http://wiki.apache.org/cassandra/GettingStarted。

4.2 創建keyspace

Cassandra的存儲抽象結構和數據庫一樣，keyspace對應關系數據庫的database或schema，column family對應于table，所以我們現在就和操作關系數據庫一樣，在連上去過后的第一步，就是創建一個keyspace（注：如果不知道命令如何使用，打入help命令，很多東西都可以看到如何使用）：

create keyspace myspace
    with placement_strategy='org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options={replication_factor:1};

第一行很簡單理解，就是創建一個名為myspace的keyspace，第二行就是存儲策略，這里共有三種存儲策略，第三行就是指定的存儲策略的參數選項了。三種存儲策略分別是：

org.apache.cassandra.locator.SimpleStrategy
org.apache.cassandra.locator.NetworkTopologyStrategy
org.apache.cassandra.locator.OldNetworkTopologyStrategy

SimpleStrategy針對是一個data center中的多個存儲節點(node)的存儲，strategy_options表示數據存儲所有存儲節點(node)的復本數量，選擇node的規則是在data center中按照順時針的方向進行選擇；

NetworkTopologyStrategy是針對多個data center的情況進行處理，這個是以防同一個data center中的所以節點同時出現問題，如掉電；

OldNetworkT opologyStrategy，這個可能會很少用上了，對data center的個數及復本的數量支持的有限，有了NetworkTopologyStrategy就不需要OldNetworkTopologyStrategy了。

詳細請參看：http://www.datastax.com/docs/1.0/cluster_architecture/replication

4.3 創建column family

首先得選擇我們剛才創建的keyspace：

use myspace;

創建column family：

create column family mycolumn               
    with key_validation_class = 'UTF8Type'    
    and comparator = 'UTF8Type'               
    and default_validation_class = 'UTF8Type';

4.4 插入及獲取數據庫

插入數據：

set mycolumn[1][name1]=tom;

獲取數據：

get mycolumn[1];

會顯示如下：

[default@myspace] get mycolumn[1];
=> (name=name1, value=tom, timestamp=1374485996562000)
Returned 1 results.
Elapsed time: 7.99 msec(s).

4.5、通過JAVA操作Cassandra

Hector是一個比較好的選擇，完全開源，這個是GitHub的源碼地址：https://github.com/rantav/hector，以下是一個基于Hector的CRUB的示例，依賴的包在Cassandra的lib目錄下面就可以找到：

package test.cassandra;

import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.beans.ColumnSlice;
import me.prettyprint.hector.api.beans.Rows;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
import me.prettyprint.hector.api.query.MultigetSliceQuery;
import me.prettyprint.hector.api.query.QueryResult;
import me.prettyprint.hector.api.query.SliceQuery;

public class CassandraExample {

    // The string serializer translates the byte[] to and from String using
    // utf-8 encoding
    private static StringSerializer stringSerializer = StringSerializer.get();

    public static void insertData() {
        try {
            // Create a cluster object from your existing Cassandra cluster
            Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "localhost:9160");

            // Create a keyspace object from the existing keyspace we created
            // using CLI
            Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);

            // Create a mutator object for this keyspace using utf-8 encoding
            Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

            // Use the mutator object to insert a column and value pair to an
            // existing key
            mutator.insert("sample", "authCollection", HFactory.createStringColumn("username", "admin"));
            mutator.insert("sample", "authCollection", HFactory.createStringColumn("password", "admin"));

            System.out.println("Data Inserted");
            System.out.println();
        } catch (Exception ex) {
            System.out.println("Error encountered while inserting data!!");
            ex.printStackTrace();
        }
    }

    public static void retrieveData() {
        try {
            // Create a cluster object from your existing Cassandra cluster
            Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "localhost:9160");

            // Create a keyspace object from the existing keyspace we created
            // using CLI
            Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);
            SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);
            sliceQuery.setColumnFamily("authCollection").setKey("sample");
            sliceQuery.setRange("", "", false, 4);

            QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();
            System.out.println("\nInserted data is as follows:\n" + result.get());
            System.out.println();
        } catch (Exception ex) {
            System.out.println("Error encountered while retrieving data!!");
            ex.printStackTrace();
        }
    }

    public static void updateData() {
        try {

            // Create a cluster object from your existing Cassandra cluster
            Cluster cluster = HFactory.getOrCreateCluster("Test Sample", "localhost:9160");

            // Create a keyspace object from the existing keyspace we created
            // using CLI
            Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);

            // Create a mutator object for this keyspace using utf-8 encoding
            Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

            // Use the mutator object to update a column and value pair to an
            // existing key
            mutator.insert("sample", "authCollection", HFactory.createStringColumn("username", "administrator"));

            // Check if data is updated
            MultigetSliceQuery<String, String, String> multigetSliceQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);
            multigetSliceQuery.setColumnFamily("authCollection");
            multigetSliceQuery.setKeys("sample");

            // The 3rd parameter returns the columns in reverse order if true
            // The 4th parameter in setRange determines the maximum number of
            // columns returned per key
            multigetSliceQuery.setRange("username", "", false, 1);
            QueryResult<Rows<String, String, String>> result = multigetSliceQuery.execute();
            System.out.println("Updated data..." + result.get());

        } catch (Exception ex) {
            System.out.println("Error encountered while updating data!!");
            ex.printStackTrace();
        }
    }

    public static void deleteData() {
        try {

            // Create a cluster object from your existing Cassandra cluster
            Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "localhost:9160");

            // Create a keyspace object from the existing keyspace we created
            // using CLI
            Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);

            // Create a mutator object for this keyspace using utf-8 encoding
            Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

            // Use the mutator object to delete row
            mutator.delete("sample", "authCollection", null, stringSerializer);

            System.out.println("Data Deleted!!");

            // try to retrieve data after deleting
            SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);
            sliceQuery.setColumnFamily("authCollection").setKey("sample");
            sliceQuery.setRange("", "", false, 4);

            QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();
            System.out.println("\nTrying to Retrieve data after deleting the key 'sample':\n" + result.get());

            // close connection
            cluster.getConnectionManager().shutdown();

        } catch (Exception ex) {
            System.out.println("Error encountered while deleting data!!");
            ex.printStackTrace();
        }
    }

    public static void main(String[] args) {

        insertData();
        retrieveData();
        updateData();
        deleteData();

    }
}

5、搭建及驗證多節點集群

Cassandra是基于Gossip協議，水平擴展是非常的方便，增加新的節點，不需要重啟服務，他們會自動發現，因頁搭建單個集群的多結點，是非常簡單的一件事情，只需要做幾件事情：

5.1 在conf/cassandra.yaml中的“seed_provider”指定現在有的節點IP，這里的IP一定要是基于當前網卡的IP地址，而不能夠是127.0.0.1之類的：

seeds: "192.168.26.128,192.168.2.204"

分隔符為逗號，可以同時指定多個IP；

5.2 指定listen_address，這個是用于監聽其它節點，這里一定要寫成當前節點的網站IP地址，如：192.168.26.128；

5.3 指定rpc_address，這個設置是表示在哪里監聽客戶端，因為某個服務器可能有多個網卡，這里可以設置為與listen_address的值一樣，也可以設置為0.0.0.0，表示監聽所有的網卡。

上面就完全成了一個存儲節點的配置，搭建多個節點，只需要將這些這個結點上的Cassandra拷貝到新的結點服務器上去就可以了，需要做的就是修改listen_address及rpc_address為新的結點的網卡IP地址，seeds就不用修改了。

搭建這樣就OK了，下面我們就驗證一下了。

5.4 驗證多點節集群
Cassandra自帶非常好的工具接口nodetool，它通過JMX的方式將命令發送到cassandra上去執行，然后得到返回結果。當前nodetool只能夠具有cassandra環境的節點上面執行，因為它需要共享cassandra本身的一些配置文件，如log4j等。執行nodetool需要帶IP和JMX端口，命令格式為“nodetool -host <host> -port <JMX_PORT> <command>”，示例如下：

nodetool -host 192.168.26.128 -port 7199 ring

注：JMX_PORT這個變量在cassandra-env.sh里面有設置，里面可以看到值為7199，但是windows的配置文件中沒有看到有，應該是默認為7199吧。
noodtool常用的命令有

ring — ring命令用于查看集群的節點信息，ring來源于consistent hash，在consistent hash中，各個節點組成一個環，通常稱為ring。

ring命令的輸出中包括當前集群的節點，各個節點的狀態（Up還是Down），節點的load（數據量），節點在ring上的位置等信息。

示例輸出：

Starting NodeTool
Note: Ownership information does not include topology; for complete information,
 specify a keyspace

Datacenter: datacenter1
==========
Address         Rack        Status State   Load            Owns                T
oken
                                                                               7
160946931665707836
192.168.26.128  rack1       Up     Normal  78.18 KB        43.86%              -
3195122621607553968
192.168.2.204   rack1       Up     Normal  81.56 KB        56.14%              7
160946931665707836

這個示例里面顯示了兩個節點，當前狀態都是Up。

info — info命令用于顯示一個節點的信息，包括當前的load（數據量），運行時間，內存使用情況等。

示例輸出：

Starting NodeTool
Token            : -3195122621607553968
ID               : 1c65f178-0742-4379-bd8d-9011b9f7c4a3
Gossip active    : true
Thrift active    : true
Load             : 78.18 KB
Generation No    : 1374563151
Uptime (seconds) : 3802
Heap Memory (MB) : 18.18 / 1022.44
Data Center      : datacenter1
Rack             : rack1
Exceptions       : 0
Key Cache        : size 952 (bytes), capacity 53477376 (bytes), 43 hits, 59 requ
ests, 0.729 recent hit rate, 14400 save period in seconds
Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN r
ecent hit rate, 0 save period in seconds

cfstats — 查看各個column family的詳細信息，包括讀寫次數、響應時間、memtable、sstable等。

輸出比較多，就不貼示例輸出了。

Cassandra操作入門

相關經驗

相關資訊

相關文檔

目錄