MMM結合Semisync機制實現Mysql Master-Master高可用

jopen 10年前發布 | 23K 次閱讀 MySQL 數據庫服務器

架構:

兩個Master(主備模式),一個或多個Slave(也可以沒有Slave,只有主備Master):

1、Monitor運行MMM Daemon程序,實現所有Mysql服務器的監控和故障切換工作;

2、Master1和Master2互為主備,同時只有一個主可用于寫操作(也可同時分擔讀操作),另一個作為備用,可以分擔讀操作,讀寫分離需要應用程序實現;

3、Slave機器與當前active Master同步,如當前active Master故障后,Master將切換到passive Master,同時MMM修改Slave與新的Master同步;

4、Application通過write和read ip進行讀寫操作;

MMM結合Semisync機制實現Mysql Master-Master高可用

 

環境:

主機名 服務器IP地址 Write IP Read IP 備注
mysql01 10.0.60.100 10.0.60.160 10.0.60.161 默認為active Master,運行mmm agent
mysql02 10.0.60.101      10.0.60.162 默認為passive  Master,運行mmm agent
mysql03 10.0.60.102      10.0.60.163 Slave,由MMM維護,運行mmm agent
mysql04 10.0.60.103           監控機,運行MMM  Daemon程序

 

軟件信息:

Mysql:5.6.17-log MySQL Community Server (GPL)

MMM:mysql-mmm-2.2.1

OS:CentOS release 6.4 (Final),kernel 2.6.32-358.el6.x86_64

 

一、配置復制環境

這里是全新配置,如果是已經存在了單master和slave環境,將配置不一樣,可以結合xtrabackup工具實現數據的備份和恢復,配置主備master環境。

前提要求:

1、所有mysql實例開啟read_only=1;

2、主備master需要開啟log_bin;

3、所有mysql實例配置不同的server_id以及不同的二進制日志、relay日志文件名;

 

參考配置參數:

mysql01的特殊配置參數:

mysql01>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql01-bin
server_id = 1
read_only
mysql01>

 

mysql02的特殊配置參數:

mysql02>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql02-bin
server_id = 2
read_only
mysql02>

 

mysql03的特殊配置參數:

mysql03>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql_bin
server_id = 3
read_only

 

配置主從:

1、配置mysql01和mysql02互為主從:

在mysql01和mysql02上創建同樣的復制賬號:

grant replication slave on *.* to 'repl'@'10.0.60.%' identified by 'repl';

 

查看master狀態:

show master status;

MMM結合Semisync機制實現Mysql Master-Master高可用

 

MMM結合Semisync機制實現Mysql Master-Master高可用

 

在每個節點執行CHANGE MASTER TO語句:

mysql01change master to master_host = '10.0.60.101'
master_user='repl'
master_password='repl'
master_log_file='mysql02-bin.000001'
master_log_pos=545;

 

mysql02> change master to master_host = '10.0.60.100'
master_user='repl'
master_password='repl'
master_log_file='mysql01-bin.000001'
master_log_pos=545;

 

在兩個節點開啟slave:

start slave;


查看slave狀態是否正常:

mysql02>show slave status\G;
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

 

2、配置mysql03為mysql01的從服務器

mysql03> change master to master_host = '10.0.60.100'
master_user='repl'
master_password='repl'
master_log_file='mysql01-bin.000001'
master_log_pos=545;

 

驗證slave狀態正常后,開始下面的步驟。

 

二、配置半同步

使用半同步機制,可以確保至少一臺slave收到master的二進制日志,在一定程度上保證了數據的一致性,減少了當master當機時,造成數據丟失。

半同步機制由google貢獻,從mysql 5.5開始原生支持該特性。

 

前提要求:

1、主備master都要安裝并開啟semisync master和slave,因mmm不能進行semisync配置和管理;

2、slave需要安裝并開啟semisync slave;

 

配置步驟:

1、mysql01和mysql02安裝semisync master和slave插件:

mysql01>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.01 sec)

mysql01>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.00 sec)

 

mysql02>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.05 sec)

mysql02>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)

 

2、mysql01和mysql02開啟semisync master和slave:

mysql01>SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql01>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

 

mysql02>SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql02>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

 

同時將參數寫入到配置文件,以mysql實例開啟時自動開啟半同步:

mysql02>\! cat my.cnf|grep semi
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 1

 

3、mysql03安裝并開啟semisync slave插件:

mysql03>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)

mysql03>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

 

同時將參數寫入到配置文件,以mysql實例開啟時自動開啟半同步:

mysql03>\! cat my.cnf|grep semi
rpl_semi_sync_slave_enabled = 1
mysql03>

 

4、所有mysql實例停止slave并開啟slave,使半同步機制生效:

stop slave;start slave;

 

5、查看semisync狀態

mysql01>show status like '%emi%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
| Rpl_semi_sync_slave_status                 | ON    |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)

 

三、配置MMM

Multi Master Replication Manager for Mysql(MMM)是一套開源的perl腳本,對Mysql Master-Master復制環境(在任何時刻只有一個節點可寫)進行監控、故障恢復以及管理。同時能根據復制的延時情況管理讀負載均衡,通過遷移read虛擬IP地址。同時也能用于數據備份,以及節點之間重同步。

主要由三個腳本組成:

1、mmm_mod:監控daemon程序,進行監控工作,并決定讀、寫角色的遷移;最好運行在專用的監控服務器上,可以管理多套Master-Slave集群。

2、mmm_agentd:客戶端daemon程序,運行在所有mysql實例服務器,與監控節點進行簡單的遠程通信。

3、mmm_control:用于管理mmm_mond進程的命令行腳本。

 

前提需求:

1、支持環境:

兩個節點的Master-Master環 境,MMM需要5個IP地址(每個節點一個固定IP地址,一個write IP地址,兩個read IP地址,write和read IP依據節點的可用性進行自動的遷移),正常情況下,active master有一個write IP和一個read IP地址,standby master有一個read IP地址,如果當前active master故障,write和read IP地址將遷移到standby master;

兩個節點的Master-Master,以及一個或多個slave的環境,同時也是大多數企業使用的方案(可以更好的擴展讀,同時有冗余的Slave可用于備份等工作,防止阻塞正常的事務)。

2、n+1個主機:n個運行mysql實例的服務器,一個機器用于運行MMM監控daemon程序;

3、2*(n+1) IP地址:每個主機一個固定IP地址,同時每臺mysql實例一個read IP地址以及一個write IP地址;

4、monitor數據庫用戶:需要REPLICATION CLIENT權限,用于MMM監控(mmm_mond);

5、agent數據庫用戶:需要SUPER、REPLICATION CLIENT、PROCESS權限,用于MMM 客戶端(mmm_agentd),可以只針對本機IP進行授權;

6、relication數據庫用戶:需要REPLICATION SLAVE權限,用于mysql復制;

7、tools數據庫用戶:需要SUPER、REPLICATION CLIENT、RELOAD權限,用于MMM tools(如mmm_backup、mmm_clone、mmm_restore)


1、在mysql實例服務器安裝依賴包和mmm

安裝依賴包:

yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-ARP

 

如果在標準軟件倉庫和EPEL軟件倉庫沒有,需要單獨下載,可以去以下網址下載:

http://rpm.pbone.net

http://search.cpan.org/

http://www.rpmfind.net/

 

安裝mmm:

tar xvf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install

 

2、在mysql實例服務器配置mmm agent

mmm_agentd使用mmm_agent.conf配置文件:

# cat /etc/mysql-mmm/mmm_agent.conf   
include mmm_common.conf #包含這個公用配置文件
#Description: name of this host,可以不是主機名,每臺mysql實例的host不同(如mysql01設置為db1,mysql02設置為db2,mysql03設置為db3)
this db1
#Description: Enable debug mode,設置1,打印日志到前臺,按ctrl+c將結束進程
debug 0
#Description: Maximum number of retries when killing threads to prevent further
#writes during the removal of the active_master_role.
max_kill_retries 10

 

公用配置文件:mmm_common.conf,每個實例一樣,并要拷貝到監控服務器供mmm_mond使用,進行網卡接口的定義,每個主機的描述,復制和mmm agent的用戶名和密碼配置,以及讀寫規則等

# cat /etc/mysql-mmm/mmm_common.conf 
#Description: name of the role for which identifies the active master,定義活動master為可寫
active_master_role                              writer

<host default> #默認段
        #Description: network interface on which the IPs of the roles should be configured,用于綁定ip的網絡接口
        cluster_interface                       eth0

        pid_path                                /var/run/mmm_agentd.pid
        bin_path                                /usr/lib/mysql-mmm/
        #Description: Port on which mmm agentd listens
        agent_port                              9989
        #Description: Port on which mysqld is listening
        mysql_port                              3306

        #Description: mysql user used for replication
        replication_user                        repl
        #Description: mysql password used for replication
        replication_password                    repl
        #Description: mysql user for MMM Agent
        agent_user                              mmm_agent
        #Description: mysql password for MMM Agent
        agent_password                          mmm_agent
</host>

<host db1> #命名段,指定每個mysql實例主機
        #Description: IP of host
        ip                                      10.0.60.100
        #Description: Mode of host. Either master or slave.
        mode                                    master
        #Description: Name of peer host (if mode is master)
        peer                                    db2
</host>

<host db2>
        ip                                      10.0.60.101
        mode                                    master
        peer                                    db1
</host>

<host db3>
        ip                                      10.0.60.102
        mode                                    slave
</host>


<role writer> #定義write角色
        #Description: Hosts which may take over the role
        hosts                                   db1, db2
        #Description: One or multiple IPs associated with the role,指定浮動write IP地址
        ips                                     10.0.60.160
        #Description: Mode of role. Either balanced or exclusive
        mode                                    exclusive
        #Description: The preferred host for this role. Only allowed for exclusive roles.
        #prefer                                 -
</role>

<role reader>  #定義read角色
        hosts                                   db1, db2, db3
        ips                                     10.0.60.161,10.0.60.162,10.0.60.163 #浮動read IP地址
        mode                                    balanced
</role>

 

3、啟動mmm agent服務

/etc/init.d/mysql-mmm-agent start
chkconfig --level 2345 mysql-mmm-agent on


4、在監控服務器(mysql04)安裝依賴包和mmm

安裝依賴包:

yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-Ping

 

安裝mmm:

tar xvf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install

 

5、配置mmm 監控配置文件

mmm_mond和mmm_control使用mmm_mon.conf或mmm_mon_CLUSTER.conf配置文件

mmm_mon.conf配置文件參考:

# cat /etc/mysql-mmm/mmm_mon.conf 
include mmm_common.conf

#The monitor section is required by mmm_mond and mmm_control
<monitor>
        #Description: IP on which mmm_mond listens
        ip                                              127.0.0.1
        #Description: Port on which mmm mond listens
        port                                    9988
        #Description: Location of pid-file
        pid_path                                /var/run/mmm_mond.pid
        #Description: Path to directory containing MMM binaries
        bin_path                                /usr/lib/mysql-mmm/
        #Description: Location of of status file
        status_path                             /var/lib/misc/mmm_mond.status
        #Description: Break between network checks
        ping_interval                           1
        #Description: IPs used for network checks,指定所有mysql服務器IP,write和read IP地址,用于進行ping檢查
        ping_ips                                10.0.60.100, 10.0.60.101, 10.0.60.102, 10.0.60.160, 10.0.60.161, 10.0.60.162, 10.0.60.163
        #Description: Duration in seconds for flap detection. See flap_count
        flap_duration                           3600
        #Description: Maximum number of downtimes within flap_duration seconds after
        #which a host is considered to be flapping.
        flap_count                              3
        #Description: How many seconds to wait before switching node status from
        #AWAITING_RECOVERY to ONLINE. 0 = disabled.
        auto_set_online                         0
        #Description: Binary used to kill hosts if roles couldn’t be removed because the agent
        #was not reachable. You have to provide a custom binary for this which
        #takes the hostname as first argument and the state of check ping (1 -ok; 0 - not ok) as second argument.
        kill_host_bin                           /usr/lib/mysql-mmm/monitor/kill_host
        #Description: Startup carefully i.e. switch into passive mode when writer role is
        #configured on multiple hosts
        careful_startup                         0
        #Description: Default mode of monitor.
        mode                                    active
        #Description: How many seconds to wait for other master to become ONLINE before
        #switching from mode WAIT to mode ACTIVE. 0 = infinite.
        wait_for_other_master                   120
</monitor>

<host default>
        #Description: mysql user for MMM Monitor
        monitor_user                    mmm_agent
        #Description: mysql password for MMM Monitor
        monitor_password                mmm_agent
</host>

<check  mysql> #check段,mmm執行ping、mysql、rep_threads、rep_backlog四種檢查,可以分別進行檢查間隔等參數配置。
        #Description: Perform check every 5 seconds
        check_period                    5
        #Description: Check is considered as failed if it doesn’t succeed for at least
        #trap period seconds.
        trap_period                     10
        #Description: Check times out after timeout seconds
        timeout                         2
        #Description: Restart checker process after restart after checks
        restart_after                   10000
        #Description: Maximum backlog for check rep_backlog.
        max_backlog                     60
</check>

#設置為1,開啟調試模式,打印日志到前臺,ctrl+c將結束進程,對于調試有幫助
debug 0

 

6、開啟mmm monitor監控

/etc/init.d/mysql-mmm-monitor start
chkconfig --level 2345 mysql-mmm-monitor on


7、使用mmm_control查看狀態

# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163), writer(10.0.60.160)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

 

注:

當節點第一次開啟,狀態為等待恢復。

設置節點online:

# mmm_control set_online db1

 

MMM如何工作:

當故障發生時,mmm迅速的遷移故障節點的IP地址從一個節點到另一個節點,并使用Net::ARP Perl模塊更新ARP表

處理過程:

在故障active master節點:

1、mysql 設置為read_only(set global read_only=1),防止寫事務;

2、中斷活動連接;

3、移除寫ip;

 

在新master節點:

1、運行在passive master的mmm進程被通知即將成為active write;

2、slave將嘗試從master的二進制日志抓取任何剩余事務;

3、關閉read_only(set global read_only=0);

4、綁定write ip,并發生arp通告;

 

四、測試

1、測試mysql01 mysql實例故障

手動關閉mysql01服務器上的mysql實例,期望master將遷移到mysql02

停止mysql01的mysqld進程:也可以使用"killall -15 mysqld"結束mysqld進程

mysql01>\! sh stop.sh

 

查看mmm_mond的日志:總共經過10s時間完成遷移

# tail -f /var/log/mysql-mmm/mmm_mond.log 
2014/05/27 14:19:13  WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:13  WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:22 ERROR Check 'mysql' on 'db1' has failed for 10 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:23 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2014/05/27 14:19:23  INFO Removing all roles from host 'db1':
2014/05/27 14:19:23  INFO     Removed role 'reader(10.0.60.163)' from host 'db1'
2014/05/27 14:19:23  INFO     Removed role 'writer(10.0.60.160)' from host 'db1'
2014/05/27 14:19:23  INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2' #可以看到寫IP已經遷移到mysql02
2014/05/27 14:19:23  INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3'


使用mmm_control命令查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/HARD_OFFLINE. Roles: #mysql01狀態已經變為HARD_OFFLINE,意外著ping錯誤或mysql故障
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)


檢查mysql02的read_only變量是否改變:

mysql02>show global variables like 'read_only'; #默認在passive master時,read_only為ON
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.00 sec)

mysql02>


檢查mysql03是否已經將mysql02作為主:

mysql03>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.60.101 #已經從Mysql02同步
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql02-bin.000014
          Read_Master_Log_Pos: 120
               Relay_Log_File: mysql03-relay-bin.000002
                Relay_Log_Pos: 285
        Relay_Master_Log_File: mysql02-bin.000014
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

 

當再次啟動mysql01的mysql實例,db1的狀態將由HARD_OFFLINE改變為AWAITING_RECOVERY:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)


需要手動設置為online,mmm才會分配read ip給mysql01,并與mysql02同步:

[root@mysql04 ~]# mmm_control set_online db1
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!


mysql01>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.60.101 #mysql01已經從Mysql02同步
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql02-bin.000015
          Read_Master_Log_Pos: 120
               Relay_Log_File: mysql01-relay.000027
                Relay_Log_Pos: 285
        Relay_Master_Log_File: mysql02-bin.000015
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


查看mysql02的semisync狀態:

mysql02>show status like 'Rpl_semi%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 1053  |
| Rpl_semi_sync_master_net_wait_time         | 2106  |
| Rpl_semi_sync_master_net_waits             | 2     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 1015  |
| Rpl_semi_sync_master_tx_wait_time          | 1015  |
| Rpl_semi_sync_master_tx_waits              | 1     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 1     |
| Rpl_semi_sync_slave_status                 | ON    |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)


2、模擬mysql02(Active Master服務器) kernel panic,期望進行遷移

執行上面的測試后,當前active master為mysql02,使用下面命令模擬kernel panic:

mysql02>\! echo "c" > /proc/sysrq-trigger 


查看mmm_mond日志:總共經過了20s的時間完成遷移

# tail -f /var/log/mysql-mmm/mmm_mond.log
2014/05/27 14:44:42  WARN Check 'rep_threads' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)
2014/05/27 14:44:42  WARN Check '
rep_backlog' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)
2014/05/27 14:44:46 FATAL Can't reach agent on host 'db2'
2014/05/27 14:44:49 ERROR Check '
ping' on 'db2' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.101 #ping檢查錯誤
2014/05/27 14:44:55 ERROR Check '
mysql' on 'db2' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4) #mysql檢查錯誤,不能連接
2014/05/27 14:44:59  INFO Check 'ping' on 'db2' is ok!
2014/05/27 14:45:02 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) #改變mysql02的狀態為HARD_OFFLINE
2014/05/27 14:45:02  INFO Removing all roles from host 'db2':  #移除mysql02的角色
2014/05/27 14:45:02  INFO     Removed role 'reader(10.0.60.161)' from host 'db2'
2014/05/27 14:45:02  INFO     Removed role 'writer(10.0.60.160)' from host 'db2'
2014/05/27 14:45:02 FATAL Agent on host 'db2' is reachable again
2014/05/27 14:45:02  INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db1' #分配角色到其他機器,write IP分配到mysql01,永遠不會分配到mysql03
2014/05/27 14:45:02  INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'

 

使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
  db2(10.0.60.101) master/HARD_OFFLINE. Roles: 
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

 

3、模擬active master服務器網絡不通,期望進行遷移,但是網絡恢復后,將不會重啟slave;

當前active master為mysql01,在mysql01上禁用網卡:

mysql01>\! cat down_net.sh
ifdown eth0
sleep 600
ifup eth0
mysql01>\! sh down_net.sh

 

查看mmm_mond日志:總共經過了9s完成遷移

2014/05/27 15:02:35  WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)
2014/05/27 15:02:35  WARN Check '
rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)
2014/05/27 15:02:41 FATAL Can't reach agent on host 'db1'
2014/05/27 15:02:41 ERROR Check '
ping' on 'db1' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.100  #ping檢查錯誤
2014/05/27 15:02:44 FATAL State of host '
db1' changed from ONLINE to HARD_OFFLINE (ping: not OK, mysql: OK) #改變狀態
2014/05/27 15:02:44  INFO Removing all roles from host '
db1': #移除角色
2014/05/27 15:02:44  INFO     Removed role '
reader(10.0.60.163)' from host 'db1'  
2014/05/27 15:02:44  INFO     Removed role '
writer(10.0.60.160)' from host 'db1'
2014/05/27 15:02:44 ERROR Can'
t send offline status notification to 'db1' - killing it!
2014/05/27 15:02:44 FATAL Could not kill host 'db1' - there may be some duplicate ips now! (There's no binary configured for killing hosts.)
2014/05/27 15:02:44  INFO Orphaned role '
writer(10.0.60.160)' has been assigned to 'db2'
2014/05/27 15:02:44  INFO Orphaned role '
reader(10.0.60.163)' has been assigned to 'db3'
2014/05/27 15:02:48 ERROR Check '
mysql' on 'db1' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)

 

使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
# Warning: agent on host db1 is not reachable
  db1(10.0.60.100) master/HARD_OFFLINE. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)


當網絡恢復后,mmm會修改mysql01的slave配置,修改主為mysql02,但是沒有重啟slave,造成不能進行數據同步,需要手工重新開啟slave。

使用mmm_control檢查狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

[root@mysql04 ~]# mmm_control set_online db1  #網絡恢復后,手動設置為online
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!
[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)


檢查mysql01的slave狀態:看上去正常的

mysql01>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.60.101
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql02-bin.000015
          Read_Master_Log_Pos: 328
               Relay_Log_File: mysql01-relay.000027
                Relay_Log_Pos: 493
        Relay_Master_Log_File: mysql02-bin.000015
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


但是在mysql02(當前active master)插入數據,mysql01不能從mysql02同步:

mysql02> insert into t1 values(2);
Query OK, 1 row affected (0.02 sec)
mysql02>select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)


mysql03已經同步了數據:

mysql03>select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)

 

而mysql01沒有同步數據:

mysql01>select * from t1;
+----+
| id |
+----+
|  1 |
+----+
1 row in set (0.00 sec)

 

解決方法:先停止slave,然后啟動slave;

mysql01>stop slave;start slave; 
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql01>select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)


截圖:

MMM結合Semisync機制實現Mysql Master-Master高可用


4、模擬slave線程故障,不管是io或sql線程故障,期望進行遷移,恢復時如果在flap_duration時間內超過了flap_count次數的故障,將不會自動恢復,狀態由REPLICATION_FAIL改為 AWAITING_RECOVERY (because it's flapping)

當前active master為mysql02。

停止active master(mysql02)的slave,不會造成遷移:

mmm_mond的日志:已經檢測到db2(mysql02)復制線程錯誤

2014/05/27 15:39:02 ERROR Check 'rep_threads' on 'db2' has failed for 10 seconds! Message: ERROR: Replication is broken

 

使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.161)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.163)

 

如果當其他slave(mysql01、mysql03)的線程(不管是io還是sql線程)故障將會發生遷移:

手工停止io線程:

mysql01>stop slave io_thread;
Query OK, 0 rows affected (0.01 sec)

 

查看mmm_mond日志:

2014/05/27 15:43:28 ERROR Check 'rep_threads' on 'db1' has failed for 10 seconds! Message: ERROR: Replication is broken
2014/05/27 15:43:31 FATAL State of host 'db1' changed from ONLINE to REPLICATION_FAIL
2014/05/27 15:43:31  INFO Removing all roles from host 'db1':
2014/05/27 15:43:31  INFO     Removed role 'reader(10.0.60.161)' from host 'db1'  #移除角色
2014/05/27 15:43:31  INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'

 

使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/REPLICATION_FAIL. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161), reader(10.0.60.163)

 

當重新開啟io線程后,mmm將自動恢復db1,并重新遷移read IP到db1(mysql01)上,如果故障超過:

重新開啟線程:

mysql01>start slave io_thread;  
Query OK, 0 rows affected (0.00 sec)

 

查看mmm_mond日志:

2014/05/27 15:45:23  INFO Check 'rep_threads' on 'db1' is ok!
2014/05/27 15:45:25 FATAL State of host 'db1' changed from REPLICATION_FAIL to ONLINE
2014/05/27 15:45:25  INFO Moving role 'reader(10.0.60.163)' from host 'db3' to host 'db1'

 

使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161)

 

5、復制延時

延時檢查有max_backlog控制,默認為60;

復制延時或錯誤,如果故障時間少于60s,狀態為ONLINE,單會遷移,故障恢復后,mmm自動恢復read IP。如果rep_backlog和rel_threads同時錯誤,狀態將為REPLICATION_FAIL。

 

6、mmm agent或monitor故障

不會遷移角色,如果此時有master或slave故障,也將不能遷移角色

 

參考:

MMM官網:http://mysql-mmm.org/

MMM博客:http://blog.mysql-mmm.org/

 

 


來自為知筆記(Wiz)

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!