Nagios監控mongodb分片集群服務實戰

jopen 10年前發布 | 24K 次閱讀 Nagios 系統監控


1,監控插件下載
Mongodb
插件下載地址為:git clone git://github.com/mzupan/nagios-plugin-mongodb.git,剛開始本人這里沒有安裝gitpub環境,找網友草根幫忙下載的,之后上傳到了csdn資源頁面,新的下載地址為:http://download.csdn.net/detail/mchdba/8019077

 

2,添加新的mongodb監控命令

因為mongodb服務是和mysql從庫公用一臺物理機,之前已經做了基礎nagios以及mysql服務監控,所以這里只需要在原來的基礎上添加mongodb命令和服務即可。Nagios監控mysql請參考:http://blog.itpub.net/26230597/viewspace-760141/以及http://blog.itpub.net/26230597/viewspace-1217246/。所以這里需要添加的mongodb監控命令如下所示:

    [root@wgq objects]# cd /usr/local/nagios/etc/objects
[root@wgq objects]# vim commands.cfg
define command {
command_name check_mongodb
command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$
}

define command {  
    command_name check_mongodb_database  
    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$  
}  

define command {  
    command_name check_mongodb_collection  
    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$ -c $ARG6$  
}  

define command {  
    command_name check_mongodb_replicaset  
    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -r $ARG5$  
}  

define command {  
    command_name check_mongodb_query  
    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -q $ARG5$  
}  </pre><br />


3,添加mongodb監控服務

mongodb的服務也需要單獨重新添加,如下所示:


    #檢測mongodb服務的連接時間,超過2秒就普通報警,5秒就嚴重報警
define service{
host_name dbm1slave1
service_description Mongo Connect Check
check_command check_mongodb!connect!30000!2!5
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups ops
}

#檢查mongodb的連接數,超過150普通報警,200嚴重報警  
define service{  
        host_name dbm1slave1  
        service_description Mongo Free Connections  
        check_command check_mongodb!connections!27017!70!80  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  


#檢查mongodb復制完成的百分比率,確保primary和standby的time是一致的。  
define service{  
        host_name dbm1slave1  
        service_description Mongo Replication Lag  
        check_command check_mongodb!replication_lag!27017!15!30  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

#檢查mongodb內存使用率,閥值與mongodb所在機器的總內存數相關  
define service{  
        host_name dbm1slave1  
        service_description Mongo Memory Usage  
        check_command check_mongodb!memory!27017!20!28  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

#檢查mongodb Mapped的內存使用率,閥值與mongodb所在機器的總內存數相關  
define service{  
        host_name dbm1slave1  
        service_description Mongo Mapped Memory Usage  
        check_command check_mongodb!memory_mapped!27017!20!28  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

#檢查Lock Time的百分率,如果lock time占據mongo運行時間的5%就普通報警,如果超過10%就嚴重報警  
define service{  
        host_name dbm1slave1  
        service_description Mongo Lock Percentage  
        check_command check_mongodb!lock!27017!5!10  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check Average Flush Time,檢查mongo服務器的平均flush時間,  
define service{  
        host_name dbm1slave1  
        service_description Mongo Flush Average  
        check_command check_mongodb!flushing!27017!100!200  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check Last Flush Time,檢查最新的flush時間,如果超過200ms就普通報警,超過400ms就嚴重報警  
define service{  
        host_name dbm1slave1  
        service_description Mongo Last Flush Time  
        check_command check_mongodb!last_flush_time!27017!200!400  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check status of mongodb replicaset,檢查mongo復制的狀態  
define service{  
        host_name dbm1slave1  
        service_description MongoDB state  
        check_command check_mongodb!replset_state!27017!0!0  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check status of index miss ratio,檢查索引命中率,  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Index Miss Ratio  
        check_command check_mongodb!index_miss_ratio!27017!.005!.01  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check number of databases and number of collections  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Number of databases  
        check_command check_mongodb!databases!27017!300!500  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Number of collections  
        check_command check_mongodb!collections!27017!300!500  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }          

# Check size of a database,檢查庫的大小  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Database size your-database  
        check_command check_mongodb_database!database_size!27017!300!500!your-database  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }                  

# Check index size of a database,檢查庫索引的大小  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Database index size your-database  
        check_command check_mongodb_database!database_indexes!27017!50!100!your-database  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }              

# Check index size of a collection,檢查集合collection的索引大小  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Database index size your-database  
        check_command check_mongodb_collection!collection_indexes!27017!50!100!your-database!your-collection  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check the primary server of replicaset,檢查復制的primary服務  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Replicaset Master Monitor: your-replicaset  
        check_command check_mongodb_replicaset!replica_primary!27017!0!1!your-replicaset   
        #示例:check_command check_mongodb_replicaset!replica_primary!27017!0!1!shard2  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  


# Check the number of queries per second,檢查每一秒的查詢數量  
define service{  
        host_name dbm1slave1  
        service_description MongoDB Updates per Second  
        check_command check_mongodb_query!queries_per_second!27017!200!150!update  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check Primary Connection,檢查復制中與primary庫的連接時間,超過2秒就普通報警,超過4秒就嚴重報警  
define service{  
        host_name dbm1slave1  
        service_description Mongo Connect Check  
        check_command check_mongodb!connect_primary!27017!2!4  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  

# Check Collection State,檢查collection狀態,檢查mongo服務組列表的每一個主機,可以檢查重要collection的高可用性(鎖、超時、服務配置的可用性),如果發現一個查詢失敗就會報警。  
define service{  
        host_name dbm1slave1  
        service_description Mongo Collection State  
        check_command check_mongodb!collection_state!27017!your-database!your-collection  
        max_check_attempts 5  
        normal_check_interval 3  
        retry_check_interval 2  
        check_period 24x7  
        notification_interval 10  
        notification_period 24x7  
        notification_options w,u,c,r  
        contact_groups ops  
        }  </pre><br />


</div>



4,查看部分監控項效果

配置完nagios端服務,重啟下service nagios restart; 等上幾分鐘,nagios監控界面就會出現完整的mongo服務信息,如下所示:




5
,從ps中確定mongodb的架構

[root@db-m1-slave-1 ~]# ps -eaf|grep mongo

mongodb   2457     1  0  2013 ?        2-03:39:08 ./mongod --configsvr --dbpath /home/data/mongodb/config --port 20000 --logpath /home/data/mongodb/config.log --logappend --fork

mongodb   2804     1  0  2013 ?        1-10:02:33 mongos --configdb 192.168.12.62:20000,192.168.12.63:20000,192.168.12.72:20000 --port 30000 --chunkSize 64 --logpath /home/data/mongodb/mongos.log --logappend --fork

mongodb   3072     1  0  2013 ?        1-10:17:20 mongod --shardsvr --replSet shard1 --port 27017 --dbpath /home/data/mongodb/shard11 --oplogSize 2048 --logpath /home/data/mongodb/shard11.log --logappend --fork

root     11179  9391  0 11:14 pts/1    00:00:00 grep mongo

mongodb  30414     1  0 Feb14 ?        1-06:20:50 mongod --shardsvr --replSet shard2 --port 27018 --dbpath /home/data/mongodb/shard21 --oplogSize 2048 --logpath /home/data/mongodb/shard21.log --logappend --fork

[root@db-m1-slave-1 ~]#

 

看到有4個mongo進程,

a)         啟動參數有“--configdb”的就是集群入口進程;

b)         Shard Server,啟動參數帶“--shardsvr --replSet”的是集群分片的一個片組啟動進程,用戶存儲實際的數據塊,也就是27017端口和27018端口的mongodb服務實例。至于如何判斷27017端口中哪個是primary哪個是secondary需要去登錄27107端口執行rs.status();去查看一下。

c)         Config Server:啟動參數帶“--configsvr”的進程,存儲了整個Cluster Metadata,其中包括chunk信息,也就是20000端口的mongodb服務實例。

d)         Route Server:啟動參數帶“mongos --configdb”的進程,前端路由,客戶端由此接入,且讓整個集群看上去像單一數據庫,前端應用可以透明使用,也就是30000端口的mongodb實例。


6,調試中出現過的錯誤

錯誤1:

[root@wgq nagios ~]# tail -f /usr/local/nagios/var/nagios.log

[1412819956] Warning: Return code of 13 for check of service 'Mongo Memory Usage' on host 'dbm1slave1' was out of bounds.

[1412819956] SERVICE ALERT: dbm1slave1;Mongo Memory Usage;CRITICAL;SOFT;1;(Return code of 13 is out of bounds)

[1412819975] Warning: Return code of 13 for check of service 'Mongodb Connect Check' on host 'dbm1slave1' was out of bounds.

[1412819975] SERVICE ALERT: dbm1slave1;Mongodb Connect Check;CRITICAL;SOFT;1;(Return code of 13 is out of bounds)

[1412820058] Warning: Return code of 13 for check of service 'Mongo Free Connections' on host 'dbm1slave1' was out of bounds.

 

需要賦值nagios用戶所有權限以及r執行權限

chmod 770 /usr/lib/nagios/plugins/check_mongodb.py

chown -R nagios.nagios /usr/lib/nagios/plugins/check_mongodb.py

 

錯誤2:

監控界面Status Information一欄出現 No module named pymongo報錯提示信息:

出現這個提示是因為需要安裝pymongo模塊,執行easy_install pymongo命令安裝即可,如下所示:

[root@wgq objects]# easy_install pymongo

Searching for pymongo

Reading http://pypi.python.org/simple/pymongo/

Best match: pymongo 2.7.2

......

zip_safe flag not set; analyzing archive contents...

Adding pymongo 2.7.2 to easy-install.pth file

 

Installed /usr/lib/python2.6/site-packages/pymongo-2.7.2-py2.6-linux-x86_64.egg

Processing dependencies for pymongo

Finished processing dependencies for pymongo

----------------------------------------------------------------------------------------------------------------

<版權所有,文章允許轉載,但必須以鏈接方式注明源地址,否則追究法律責任!>
原博客地址:http://blog.itpub.net/26230597/viewspace-1293589/
原作者:黃杉 (mchdba)

----------------------------------------------------------------------------------------------------------------

 

參考文章:https://github.com/mzupan/nagios-plugin-mongodb/blob/master/README.md

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!