Mysql高可用—5.7 gtid 主从 + orchestrator 配置及简单测试

 

参考:https://github.com/openark/orchestrator/tree/master/docs

下载地址:https://github.com/openark/orchestrator/releases

 

目录

--1、环境信息:

--2、安装orchestrator (三台服务器 都要安装)

--3、配置orchestrator及被管理数据库(三台服务器 都要配置)

--4、启动(三台服务器 都要启动orchestrator,让其orchestrator通过raft一致性算法来保障自身高可用)

--4-1、debug启动orchestrator

--4-2、登录web管理orchestrator:

--5、orchestrator-client  后台命令管理orchestrator

--6、orchestrator的自身高可用

--7、被管理的mysql服务 高可用

--7-1、主库宕机 自动恢复

--7-2、优雅的提升主库(主从切换)

--7-3、手工恢复

--7-4、手工强制恢复

--7-5、从库1(192.168.56.60:3306) 宕机

附:启动orchestrator时 报错 及 解决


 

--1、环境信息:

--1-1、版本:
系统版本:CentOS7 
Mysql版本:Mysql5.7
orch版本: orchestrator-3.1.4    下载地址:https://github.com/openark/orchestrator/releases--1-2、目前已配置环境:
--Mysql一主两从: 是属于被管理的三个mysql数据库
主库: 主机名:hostmysql80  IP:192.168.56.80  端口:3306  server_id:1113306  读写
从库1:主机名:hostmysql61  IP:192.168.56.61  端口:3306  server_id:2223306  只读
从库2:主机名:hostmysql60  IP:192.168.56.60  端口:3306  server_id:3333306  只读
--必要参数my.cnf设置如下:GTID复制、ROw模式、log_slave_updates开启、slave_net_timeout实际情况设置 但默认60s才重连主库时间过长
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
log_slave_updates = 1
slave_net_timeout = 4--Orchestrator的后端mysql库 用于存储orch的配置信息 这里三个节点是做raft高可用(复用服务器 在原有mysql服务器上的第二个实例 3307端口 来配置Orchestrator的后端mysql)
--存储orch的信息 可以选择mysql库 也可以选择sqlite库。 这里选择了mysql库 所有的参数配置也是按照mysql库的来配置
192.168.56.80  3307        server_id:1113307       
192.168.56.61  3307        server_id:2223307
192.168.56.60  3307        server_id:3333307--1-3、三台服务器 都要 配置/etc/hosts 如下:
[root@hostmysql80 mysql_setup]# vi /etc/hosts
192.168.56.80  hostmysql80
192.168.56.61  hostmysql61 
192.168.56.60  hostmysql60 

 

--2、安装orchestrator (三台服务器 都要安装)

--2-1、安装orchestrator的依赖包
[root@hostmysql80 mysql_setup]# rpm -ivh oniguruma-5.9.5-3.el7.x86_64.rpm[root@hostmysql80 mysql_setup]# rpm -ivh jq-1.6-1.el7.x86_64.rpm         --2-2、安装orchestrator
[root@hostmysql80 mysql_setup]# rpm -ivh orchestrator-3.1.4-1.x86_64.rpm--查看安装包
[root@hostmysql80 ~]# cd /usr/local/orchestrator
[root@hostmysql80 orchestrator]# ll
total 19480
-rwxr-xr-x. 1 root root 19927832 Jan 26 22:05 orchestrator                                //程序
-rw-rw-r--. 1 root root     5436 Jan 26 22:04 orchestrator-sample.conf.json               //参数配置sample模板,正式参数文件为orchestrator.conf.json
-rw-rw-r--. 1 root root     5060 Jan 26 22:04 orchestrator-sample-sqlite.conf.json        //后端为sqlite数据库的参数模板
drwxr-xr-x. 7 root root       82 Apr  2 16:07 resources                                   //orchestrator相关的文件

 

--3、配置orchestrator及被管理数据库(三台服务器 都要配置)

--3-1、配置 /usr/local/orchestrator/orchestrator.conf.json 参数配置文件 , 参考官网https://github.com/openark/orchestrator/blob/master/docs/configuration.md--3-2、orch 后端数据库 3307 需要建立orchestrator库 及 管理用户 
--(建立orchestrator_srv管理用户  host指定127.0.0.1 本机访问即可)
--参照参数  MySQLOrchestratorHost、MySQLOrchestratorPort、MySQLOrchestratorDatabase、MySQLOrchestratorUser、MySQLOrchestratorPassword、MySQLOrchestratorCredentialsConfigFile  
[root@hostmysql80 orchestrator]# mysql -uroot -pRoot123$ -h127.0.0.1  -P3307
CREATE DATABASE IF NOT EXISTS orchestrator;
CREATE USER 'orchestrator_srv'@'127.0.0.1' IDENTIFIED BY 'orc_server_password';
GRANT ALL PRIVILEGES ON orchestrator.* TO 'orchestrator_srv'@'127.0.0.1';--3-3、被管理的一主两从 mysql库 3306 需要新增orchestrator用来连接的用户
--(建立orch_topology_user被管理用户  host指定orchestrator管理的IP即可 192.168.56.% ,在orchestrator连接被管理库时 用'orch_topology_user'@'hostname' 形式连接 /etc/hosts之前已配置过)
--参照参数  MySQLTopologyUser、MySQLTopologyPassword
[root@hostmysql80 orchestrator]# mysql -uroot -pRoot123$ -h127.0.0.1  -P3306
CREATE USER 'orch_topology_user'@'192.168.56.%' IDENTIFIED BY 'orch_topology_password';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orch_topology_user'@'192.168.56.%';
GRANT SELECT ON mysql.slave_master_info TO 'orch_topology_user'@'192.168.56.%';
GRANT SELECT ON perconadb.* TO 'orch_topology_user'@'192.168.56.%';                       //用户赋权一些监控信息的表权限
--GRANT SELECT ON ndbinfo.processes TO 'orch_topology_user'@'192.168.56.%';               //Only for NDB Cluster 只在NDB环境设置--3-4、被管理的一主两从 mysql库 3306  需要修改从库的复制参数
--slave_net_timeout是等待主库发送数据过来的时间(单位 秒),超过这个时间就会重连主库,5.7.7之后默认60秒。
--MASTER_CONNECT_RETRY是尝试连接主库的间隔时间(单位 秒),默认60秒。
--MASTER_RETRY_COUNT是尝试连接主库的次数 默认86400次。 
--master_heartbeat_period 是主库主动发送给从库的心跳包间隔时间,默认是slave_net_timeout的1/2时间(30秒)。
--默认参数情况下 如果从库等待了slave_net_timeout(60)秒还没有接收到主库的数据(包括每30秒一次的心跳包),就会发起重连主库 会每隔MASTER_CONNECT_RETRY(60)秒重连一次,直到重新连接上或者尝试了MASTER_RETRY_COUNT(86400)次后依然没连接上 放弃了。
--这样等待重连主库的时间会很长 60秒才重试一次,所以修改参数 如下: 从库如果等待4秒没有接收到主库数据(包括每2秒一次的心跳包) 就每隔1秒 重连主库 直到连接上或者尝试86400次之后放弃。set global slave_net_timeout = 4;                                                               // /etc/my.cnf参数文件也需要修改
CHANGE MASTER TO MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400, master_heartbeat_period=2;   //需要stop slave 

 

 

--4、启动(三台服务器 都要启动orchestrator,让其orchestrator通过raft一致性算法来保障自身高可用)

--4-1、debug启动orchestrator


[root@hostmysql80 ~]# cd /usr/local/orchestrator[root@hostmysql80 orchestrator]# ./orchestrator --debug --config=/usr/local/orchestrator/orchestrator.conf.json http

--4-2、登录web管理orchestrator:

http://192.168.56.60:3000   三个orchestrator 节点任意一个就可以

--添加 被管理的数据库(任意一个库就可以)

--查看被管理数据库的主从复制状态

--查看orchestrator 三节点的状态

 

 

--5、orchestrator-client  后台命令管理orchestrator

--5-1、安装orchestrator-client (任意一台服务器安装即可,可以不在orchestrator或者被管理mysql的服务器上安装)
[root@hostmysql80 mysql_setup]# rpm -ivh orchestrator-client-3.1.4-1.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...1:orchestrator-client-1:3.1.4-1    ################################# [100%]--5-2、设置环境变量
[root@hostmysql80 ~]# vi /etc/profile.d/orchestrator-client.sh
export ORCHESTRATOR_API="hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api"--5-3、查看帮助
[root@hostmysql80 ~]# orchestrator-client --help              
Usage: orchestrator-client -c  [flags...]
Example: orchestrator-client -c which-master -i some.replica
Options:-h, --helpprint this help-c , --command indicate the operation to perform (see listing below)-a , --alias cluster alias
......--5-4、基本命令
--列出 所有集群  clusters
[root@hostmysql80 ~]# orchestrator-client -c clusters
hostmysql80:3306
--列出 指定集群下 所有实例 which-cluster-instances
[root@hostmysql80 orchestrator]# orchestrator-client -c which-cluster-instances -i  hostmysql80:3306
hostmysql60:3306
hostmysql61:3306
hostmysql80:3306--指定复制集群的拓扑
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--搜索实例 search
[root@hostmysql80 orchestrator]# orchestrator-client -c search -i  host
hostmysql61:3306
hostmysql80:3306
hostmysql60:3306--遗忘一个指定实例 forget   (forget-cluster 是遗忘集群)
[root@hostmysql80 ~]# orchestrator-client -c forget -i hostmysql60:3306--发现一个指定实例 discover  (async-discover 异步发现)
[root@hostmysql80 ~]# orchestrator-client -c discover -i hostmysql60:3306
hostmysql60:3306

 

--6、orchestrator的自身高可用

-- orchestr ator 通过raft分布式一致性协议来保证自身的高可用 3 节点时 最多1节点失败可保障 orchestrator正常使用,5节点时 最多2节点失败  可保障 orchestrator正常使用。 --下面演示 3节点 有1个节点失败 和 2个节点失败的场景
--6-1、查看当前raft的情况是健康的 ,leader节点 是192.168.56.60:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-leader
192.168.56.60:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-health
healthy--6-2、干掉当前leader(192.168.56.60:10008)节点的orchestrator
[root@hostmysql60 ~]# ps -ef|grep orch
root      5981  5776  2 15:42 pts/1    00:03:29 ./orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json http
root      7867  6138  0 18:27 pts/2    00:00:00 grep --color=auto orch
[root@hostmysql60 ~]# kill -9 5981--6-3、再次查看当前raft的情况是健康的(3节点 失败1节点 无问题) ,leader节点 变成了192.168.56.80:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-leader
192.168.56.80:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-health
healthy--6-4、干掉当前leader(192.168.56.80:10008)节点的orchestrator
[root@hostmysql80 ~]# ps -ef|grep orch
root      7402  7310  1 15:31 pts/2    00:03:17 ./orchestrator --debug --config=/usr/local/orchestrator/orchestrator.conf.json http
root      9722  7687  0 18:29 pts/1    00:00:00 grep --color=auto orch
[root@hostmysql80 ~]# kill -9 7402--6-5、再次查看当前raft的情况是非健康的(3节点 失败2节点 orchestrator挂了) ,如下图 查看web界面一样 变灰 orchestrator 不可用了
[root@hostmysql80 ~]# orchestrator-client -c raft-leader
orchestrator-client[9738]: Cannot determine leader from hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api
[root@hostmysql80 ~]# orchestrator-client -c raft-health
orchestrator-client[9770]: Cannot determine leader from hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api

 

--7、被管理的mysql服务 高可用

--参考官网: https://github.com/openark/orchestrator/blob/master/docs/topology-recovery.md#automated-recovery --高可用方式:1:主库宕机 自动恢复、2:优雅的提升主库(主从切换)、3:手工恢复、4:手工强制恢复

--7-1、主库宕机 自动恢复

--主库宕机了, 根据从库优先提升主库规则 自动的把该从库提升为新主库(此时 旧主库独立 可以手工设置 read_only=1 然后拉回复制集群中 )  。 如果参数 "ApplyMySQLPromotionAfterMasterFailover": true,   那么自动的对新主库执行 reset slave all 和 set read_only=0 (但要注意 /etc/my.cnf参数文件并没有更改)。
--需要在orchestrator.conf.json 参数配置文件 设置如下参数:"FailureDetectionPeriodBlockMinutes": 60,                       //在该时间内再次出现故障,不会在次被监控"RecoveryPeriodBlockSeconds": 3600,                             //在该时间内再次出现故障,不会进行迁移,避免出现恢复冲突"RecoveryIgnoreHostnameFilters": [],                            //恢复将忽略匹配的主机"RecoverMasterClusterFilters": ["*"],                           //设置自动failover 切换主库。只对匹配的集群进行主恢复("*"模式匹配所有)"RecoverIntermediateMasterClusterFilters":["*"],                //设置自动failover 切换主库。只对匹配的集群进行恢复("*"模式匹配所有内容)--设置 提升 主库的规则,优先提升hostmysql61:3306为主库
[root@hostmysql80 ~]# orchestrator-client -c register-candidate -i hostmysql61:3306 --promotion-rule prefer
hostmysql61:3306--查看集群
[root@hostmysql80 orchestrator]# orchestrator-client -c clusters
hostmysql80:3306
--查看集群的复制拓补状态 都正常的
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--主库(hostmysql80:3306)宕机 
[root@hostmysql80 ~]# systemctl stop mysqld.service
--查看集群的复制拓补状态 主库(hostmysql80:3306) 显示不可用, 但从库ok 
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |unknown|invalid|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID
--主库已变成downtimed状态(维护状态)
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|unknown|invalid|5.7.27-log|rw|ROW|>>,GTID,downtimed
--显示downtimed状态(维护状态)的情况
[root@hostmysql80 orchestrator]# orchestrator-client -c downtimed
hostmysql80:3306
--分析复制拓补中存在的问题 主库(hostmysql80:3306) 死掉了 而且没有所属从库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
hostmysql80:3306 (cluster hostmysql80:3306): DeadMasterWithoutSlaves--通过任一从库 查看集群的复制拓补状态,发现hostmysql61:3306已变成新主库
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--查看新主库的状态(hostmysql61:3306),因为参数设置是 "ApplyMySQLPromotionAfterMasterFailover": true,   所以对新主库执行了 reset slave all 和 set read_only=0 。
mysql> show slave status\G
Empty set (0.00 sec)mysql> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.00 sec)--以上的自动恢复流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
OnFailureDetectionProcesses
OnFailureDetectionProcesses
PreFailoverProcesses                      #在执行恢复操作之前立即执行
PostMasterFailoverProcesses               #在主恢复成功结束时执行
PostFailoverProcesses                     #在任何成功恢复结束时执行[root@hostmysql61 ~]# more /tmp/recovery.log
Detected UnreachableMaster on hostmysql80:3306. Affected replicas: 2
Detected DeadMaster on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMaster on hostmysql80:3306
Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql61:3306
(for all types) Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql61:3306--恢复主库(192.168.56.80:3306)
[root@hostmysql80 ~]# systemctl start mysqld.service--再次分析复制拓补已无问题,但hostmysql80:3306成为了独立的库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|0s|ok|5.7.27-log|rw|ROW|>>,GTID--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ,显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库,设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;mysql> stop slave;mysql> reset slave all;mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61', MASTER_PORT=3306,MASTER_USER='repl', MASTER_PASSWORD='replPwd8!', MASTER_AUTO_POSITION=1,MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400, MASTER_HEARTBEAT_PERIOD=2;mysql> start slave;--查看集群的复制拓补状态,hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

--7-2、优雅的提升主库(主从切换)

--优雅的提升一个从库为新主库,在提升新主库之前 会自动把旧主库设置 read_only=1,然后把旧主库变为新主库的从库(维护状态 需要手工start slave拉起)。如果参数"ApplyMySQLPromotionAfterMasterFailover": true,   那么对新主库执行 reset slave all 和 set read_only=0(但要注意 /etc/my.cnf参数文件并没有更改) 。  

--查看集群的复制拓补状态 都正常的。 hostmysql61:3306为主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--优雅的提升hostmysql80:3306为主库,hostmysql61:3306为其从库
[root@hostmysql80 ~]# orchestrator-client -c graceful-master-takeover -a hostmysql61:3306 -d hostmysql80:3306
hostmysql80:3306--查看集群的复制拓补状态 hostmysql80:3306已提升为主库,  hostmysql61:3306变成了从库(downtimed 维护状态)
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql80:3306  |  0s|ok            |5.7.27-log|rw|ROW|>>,GTID          
+ hostmysql60:3306|  0s|ok            |5.7.27-log|ro|ROW|>>,GTID          
- hostmysql61:3306|null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID,downtimed--以上优雅提升主库的流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
PreGracefulTakeoverProcesses
OnFailureDetectionProcesses
PreFailoverProcesses
PostMasterFailoverProcesses
PostFailoverProcesses
PostGracefulTakeoverProcesses[root@hostmysql80 ~]# more /tmp/recovery.log
Planned takeover about to take place on hostmysql61:3306. Master will switch to read_only
Detected DeadMaster on hostmysql61:3306. Affected replicas: 1
Will recover from DeadMaster on hostmysql61:3306
Recovered from DeadMaster on hostmysql61:3306. Failed: hostmysql61:3306; Promoted: hostmysql80:3306
(for all types) Recovered from DeadMaster on hostmysql61:3306. Failed: hostmysql61:3306; Successor: hostmysql80:3306
Planned takeover complete--手工拉起hostmysql61库的复制
mysql> start slave;--查看集群的复制拓补状态,hostmysql61的复制已正常
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

 

--7-3、手工恢复

--在自动恢复禁止或者阻塞时,可以手工恢复复制集群。 流程大体和自动恢复一致  只多了一个手动恢复。

--设置自动恢复禁止,需要在orchestrator.conf.json 参数配置文件 设置如下参数:"FailureDetectionPeriodBlockMinutes": 60,"RecoveryPeriodBlockSeconds": 3600,"RecoveryIgnoreHostnameFilters": [],"RecoverMasterClusterFilters": ["_auto_failure_"],                    //匹配不上即可"RecoverIntermediateMasterClusterFilters": ["_auto_failure_"],        //匹配不上即可--主库宕机
[root@hostmysql80 orchestrator]# systemctl stop mysqld.service--查看集群的复制拓补状态,集群挂了(因为没有自动恢复 所以会一直挂起)
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql80:3306  |unknown|invalid       |5.7.27-log|rw|ROW|>>,GTID
- hostmysql60:3306|   null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID
- hostmysql61:3306|   null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID--对失败的实例 执行手动恢复
[root@hostmysql80 orchestrator]# orchestrator-client -c recover -i hostmysql80:3306
hostmysql61:3306--查看集群的复制拓补状态,已把失败的实例(主库)剔除集群,提升hostmysql61:3306为主库,复制集群恢复正常
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID --以上的手动恢复流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
OnFailureDetectionProcesses
OnFailureDetectionProcesses
PreFailoverProcesses                      #在执行恢复操作之前立即执行
PostMasterFailoverProcesses               #在主恢复成功结束时执行
PostFailoverProcesses                     #在任何成功恢复结束时执行[root@hostmysql80 orchestrator]# more /tmp/recovery.log
Detected UnreachableMaster on hostmysql80:3306. Affected replicas: 2
Detected DeadMasterAndSomeSlaves on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMasterAndSomeSlaves on hostmysql80:3306
Recovered from DeadMasterAndSomeSlaves on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql60:3306
(for all types) Recovered from DeadMasterAndSomeSlaves on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql60:3306--恢复主库(192.168.56.80:3306)
[root@hostmysql80 ~]# systemctl start mysqld.service--再次分析复制拓补已无问题,但hostmysql80:3306成为了独立的库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|0s|ok|5.7.27-log|rw|ROW|>>,GTID--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ,显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库,设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;mysql> stop slave;mysql> reset slave all;mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61',MASTER_PORT=3306,MASTER_USER='repl',MASTER_PASSWORD='replPwd8!',MASTER_AUTO_POSITION=1,MASTER_CONNECT_RETRY=1,MASTER_RETRY_COUNT=86400,MASTER_HEARTBEAT_PERIOD=2;mysql> start slave;--查看集群的复制拓补状态,hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

 

--7-4、手工强制恢复

--不管orchestrator的任何想法,强制进行故障转移

--查看集群的复制拓补状态,hostmysql80是复制集群的主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--手工强制让hostmysql61:3306变成主库,忽略orchestrator的想法
[root@hostmysql80 ~]# orchestrator-client -c force-master-failover -i hostmysql61:3306
hostmysql61:3306--查看之前的旧主库hostmysql80:3306  变成了独立的库。
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306  
hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
--新主库hostmysql61:3306 是集群复制的主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--以上的手动恢复流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数OnFailureDetectionProcesses
PreFailoverProcesses
PostMasterFailoverProcesses
PostFailoverProcesses[root@hostmysql60 ~]# more /tmp/recovery.log    
Detected DeadMaster on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMaster on hostmysql80:3306
Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql61:3306
(for all types) Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql61:3306--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ,显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库,设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;mysql> stop slave;mysql> reset slave all;mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61',MASTER_PORT=3306,MASTER_USER='repl',MASTER_PASSWORD='replPwd8!',MASTER_AUTO_POSITION=1,MASTER_CONNECT_RETRY=1,MASTER_RETRY_COUNT=86400,MASTER_HEARTBEAT_PERIOD=2;mysql> start slave;--查看集群的复制拓补状态,hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

 

--7-5、从库1(192.168.56.60:3306) 宕机

--从库1(192.168.56.60:3306) 宕掉
[root@hostmysql61 system]# systemctl stop mysqld.service--查看集群的复制拓补状态 从库1(192.168.56.60:3306) 显示不可用
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |     0s|ok     |5.7.27-log|rw|ROW|>>,GTID
- hostmysql60:3306|unknown|invalid|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID--恢复从库1(192.168.56.60:3306)
[root@hostmysql61 ~]# systemctl start mysqld.service--从库会自动加入orchestrator监控中
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

附:启动orchestrator时 报错 及 解决

问题1:2020-04-07 10:01:11 ERROR ReadTopologyInstance(192.168.56.80:3306) show variables like 'maxscale%': QueryRowsMap unexpected error: runtime error: invalid memory address or nil pointer dereference
解决1:配置文件设置 "SkipMaxScaleCheck": true,        问题2:2020-04-07 10:12:42 ERROR ReadTopologyInstance(192.168.56.80:3306) show global status like 'Uptime': TLS requested but server does not support TLS
解决2:配置文件设置 "MySQLTopologyUseMixedTLS": false,问题3:2020-04-08 09:45:34 ERROR ReadTopologyInstance(hostmysql80:3306) show slave hosts: ReadTopologyInstance(hostmysql80:3306) 'show slave hosts' returned row with : <,3306>
解决3:配置文件设置 "DiscoverByShowSlaveHosts": true,   从库添加 report_host--主库查看 无host
mysql> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                           |
+-----------+------+------+-----------+--------------------------------------+
|   2223306 |      | 3306 |   1113306 | c959bdb9-f94c-11e9-b3b8-0800277143f5 |
|   3333306 |      | 3306 |   1113306 | 78a79555-758e-11ea-a2cc-080027c2dbc3 |
+-----------+------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)--两个从库分别添加
report_host=192.168.56.61  /60
report_port=3306--主库再次查看有host信息,无报错
mysql> show slave hosts;
+-----------+---------------+------+-----------+--------------------------------------+
| Server_id | Host          | Port | Master_id | Slave_UUID                           |
+-----------+---------------+------+-----------+--------------------------------------+
|   2223306 | 192.168.56.61 | 3306 |   1113306 | c959bdb9-f94c-11e9-b3b8-0800277143f5 |
|   3333306 | 192.168.56.60 | 3306 |   1113306 | 78a79555-758e-11ea-a2cc-080027c2dbc3 |
+-----------+---------------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)

 


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部