Mysql高可用—5.7 gtid 主从 + orchestrator 配置及简单测试

2023-10-19 02:50:16

参考：https://github.com/openark/orchestrator/tree/master/docs

下载地址：https://github.com/openark/orchestrator/releases

--1、环境信息：

--2、安装orchestrator (三台服务器都要安装)

--3、配置orchestrator及被管理数据库(三台服务器都要配置)

--4、启动(三台服务器都要启动orchestrator，让其orchestrator通过raft一致性算法来保障自身高可用)

--4-1、debug启动orchestrator

--4-2、登录web管理orchestrator：

--5、orchestrator-client 后台命令管理orchestrator

--6、orchestrator的自身高可用

--7、被管理的mysql服务高可用

--7-1、主库宕机自动恢复

--7-2、优雅的提升主库(主从切换)

--7-3、手工恢复

--7-4、手工强制恢复

--7-5、从库1(192.168.56.60:3306) 宕机

附：启动orchestrator时报错及解决

--1、环境信息：

--1-1、版本：
系统版本：CentOS7 
Mysql版本：Mysql5.7
orch版本: orchestrator-3.1.4    下载地址：https://github.com/openark/orchestrator/releases--1-2、目前已配置环境：
--Mysql一主两从： 是属于被管理的三个mysql数据库
主库： 主机名：hostmysql80  IP：192.168.56.80  端口：3306  server_id：1113306  读写
从库1：主机名：hostmysql61  IP：192.168.56.61  端口：3306  server_id：2223306  只读
从库2：主机名：hostmysql60  IP：192.168.56.60  端口：3306  server_id：3333306  只读
--必要参数my.cnf设置如下：GTID复制、ROw模式、log_slave_updates开启、slave_net_timeout实际情况设置 但默认60s才重连主库时间过长
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
log_slave_updates = 1
slave_net_timeout = 4--Orchestrator的后端mysql库 用于存储orch的配置信息 这里三个节点是做raft高可用(复用服务器 在原有mysql服务器上的第二个实例 3307端口 来配置Orchestrator的后端mysql)
--存储orch的信息 可以选择mysql库 也可以选择sqlite库。 这里选择了mysql库 所有的参数配置也是按照mysql库的来配置
192.168.56.80  3307        server_id：1113307       
192.168.56.61  3307        server_id：2223307
192.168.56.60  3307        server_id：3333307--1-3、三台服务器 都要 配置/etc/hosts 如下：
[root@hostmysql80 mysql_setup]# vi /etc/hosts
192.168.56.80  hostmysql80
192.168.56.61  hostmysql61 
192.168.56.60  hostmysql60

--2、安装orchestrator (三台服务器都要安装)

--2-1、安装orchestrator的依赖包
[root@hostmysql80 mysql_setup]# rpm -ivh oniguruma-5.9.5-3.el7.x86_64.rpm[root@hostmysql80 mysql_setup]# rpm -ivh jq-1.6-1.el7.x86_64.rpm         --2-2、安装orchestrator
[root@hostmysql80 mysql_setup]# rpm -ivh orchestrator-3.1.4-1.x86_64.rpm--查看安装包
[root@hostmysql80 ~]# cd /usr/local/orchestrator
[root@hostmysql80 orchestrator]# ll
total 19480
-rwxr-xr-x. 1 root root 19927832 Jan 26 22:05 orchestrator                                //程序
-rw-rw-r--. 1 root root     5436 Jan 26 22:04 orchestrator-sample.conf.json               //参数配置sample模板，正式参数文件为orchestrator.conf.json
-rw-rw-r--. 1 root root     5060 Jan 26 22:04 orchestrator-sample-sqlite.conf.json        //后端为sqlite数据库的参数模板
drwxr-xr-x. 7 root root       82 Apr  2 16:07 resources                                   //orchestrator相关的文件

--3、配置orchestrator及被管理数据库(三台服务器都要配置)

--3-1、配置 /usr/local/orchestrator/orchestrator.conf.json 参数配置文件 ， 参考官网https://github.com/openark/orchestrator/blob/master/docs/configuration.md--3-2、orch 后端数据库 3307 需要建立orchestrator库 及 管理用户 
--(建立orchestrator_srv管理用户  host指定127.0.0.1 本机访问即可)
--参照参数  MySQLOrchestratorHost、MySQLOrchestratorPort、MySQLOrchestratorDatabase、MySQLOrchestratorUser、MySQLOrchestratorPassword、MySQLOrchestratorCredentialsConfigFile  
[root@hostmysql80 orchestrator]# mysql -uroot -pRoot123$ -h127.0.0.1  -P3307
CREATE DATABASE IF NOT EXISTS orchestrator;
CREATE USER 'orchestrator_srv'@'127.0.0.1' IDENTIFIED BY 'orc_server_password';
GRANT ALL PRIVILEGES ON orchestrator.* TO 'orchestrator_srv'@'127.0.0.1';--3-3、被管理的一主两从 mysql库 3306 需要新增orchestrator用来连接的用户
--(建立orch_topology_user被管理用户  host指定orchestrator管理的IP即可 192.168.56.% ，在orchestrator连接被管理库时 用'orch_topology_user'@'hostname' 形式连接 /etc/hosts之前已配置过)
--参照参数  MySQLTopologyUser、MySQLTopologyPassword
[root@hostmysql80 orchestrator]# mysql -uroot -pRoot123$ -h127.0.0.1  -P3306
CREATE USER 'orch_topology_user'@'192.168.56.%' IDENTIFIED BY 'orch_topology_password';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orch_topology_user'@'192.168.56.%';
GRANT SELECT ON mysql.slave_master_info TO 'orch_topology_user'@'192.168.56.%';
GRANT SELECT ON perconadb.* TO 'orch_topology_user'@'192.168.56.%';                       //用户赋权一些监控信息的表权限
--GRANT SELECT ON ndbinfo.processes TO 'orch_topology_user'@'192.168.56.%';               //Only for NDB Cluster 只在NDB环境设置--3-4、被管理的一主两从 mysql库 3306  需要修改从库的复制参数
--slave_net_timeout是等待主库发送数据过来的时间(单位 秒)，超过这个时间就会重连主库，5.7.7之后默认60秒。
--MASTER_CONNECT_RETRY是尝试连接主库的间隔时间(单位 秒)，默认60秒。
--MASTER_RETRY_COUNT是尝试连接主库的次数 默认86400次。 
--master_heartbeat_period 是主库主动发送给从库的心跳包间隔时间，默认是slave_net_timeout的1/2时间(30秒)。
--默认参数情况下 如果从库等待了slave_net_timeout(60)秒还没有接收到主库的数据(包括每30秒一次的心跳包)，就会发起重连主库 会每隔MASTER_CONNECT_RETRY(60)秒重连一次，直到重新连接上或者尝试了MASTER_RETRY_COUNT(86400)次后依然没连接上 放弃了。
--这样等待重连主库的时间会很长 60秒才重试一次，所以修改参数 如下: 从库如果等待4秒没有接收到主库数据(包括每2秒一次的心跳包) 就每隔1秒 重连主库 直到连接上或者尝试86400次之后放弃。set global slave_net_timeout = 4;                                                               // /etc/my.cnf参数文件也需要修改
CHANGE MASTER TO MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400, master_heartbeat_period=2;   //需要stop slave

--4、启动(三台服务器都要启动orchestrator，让其orchestrator通过raft一致性算法来保障自身高可用)

--4-1、debug启动orchestrator


[root@hostmysql80 ~]# cd /usr/local/orchestrator[root@hostmysql80 orchestrator]# ./orchestrator --debug --config=/usr/local/orchestrator/orchestrator.conf.json http

--4-2、登录web管理orchestrator：

http://192.168.56.60:3000 三个orchestrator 节点任意一个就可以

--添加被管理的数据库(任意一个库就可以)

--查看被管理数据库的主从复制状态

--查看orchestrator 三节点的状态

--5、orchestrator-client 后台命令管理orchestrator

--5-1、安装orchestrator-client (任意一台服务器安装即可，可以不在orchestrator或者被管理mysql的服务器上安装)
[root@hostmysql80 mysql_setup]# rpm -ivh orchestrator-client-3.1.4-1.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...1:orchestrator-client-1:3.1.4-1    ################################# [100%]--5-2、设置环境变量
[root@hostmysql80 ~]# vi /etc/profile.d/orchestrator-client.sh
export ORCHESTRATOR_API="hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api"--5-3、查看帮助
[root@hostmysql80 ~]# orchestrator-client --help              
Usage: orchestrator-client -c  [flags...]
Example: orchestrator-client -c which-master -i some.replica
Options:-h, --helpprint this help-c , --command indicate the operation to perform (see listing below)-a , --alias cluster alias
......--5-4、基本命令
--列出 所有集群  clusters
[root@hostmysql80 ~]# orchestrator-client -c clusters
hostmysql80:3306
--列出 指定集群下 所有实例 which-cluster-instances
[root@hostmysql80 orchestrator]# orchestrator-client -c which-cluster-instances -i  hostmysql80:3306
hostmysql60:3306
hostmysql61:3306
hostmysql80:3306--指定复制集群的拓扑
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--搜索实例 search
[root@hostmysql80 orchestrator]# orchestrator-client -c search -i  host
hostmysql61:3306
hostmysql80:3306
hostmysql60:3306--遗忘一个指定实例 forget   (forget-cluster 是遗忘集群)
[root@hostmysql80 ~]# orchestrator-client -c forget -i hostmysql60:3306--发现一个指定实例 discover  (async-discover 异步发现)
[root@hostmysql80 ~]# orchestrator-client -c discover -i hostmysql60:3306
hostmysql60:3306

--6、orchestrator的自身高可用

-- orchestr ator 通过raft分布式一致性协议来保证自身的高可用 ， 3 节点时最多1节点失败可保障 orchestrator正常使用，5节点时最多2节点失败 可保障 orchestrator正常使用。 --下面演示 3节点有1个节点失败和 2个节点失败的场景

--6-1、查看当前raft的情况是健康的 ，leader节点 是192.168.56.60:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-leader
192.168.56.60:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-health
healthy--6-2、干掉当前leader(192.168.56.60:10008)节点的orchestrator
[root@hostmysql60 ~]# ps -ef|grep orch
root      5981  5776  2 15:42 pts/1    00:03:29 ./orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json http
root      7867  6138  0 18:27 pts/2    00:00:00 grep --color=auto orch
[root@hostmysql60 ~]# kill -9 5981--6-3、再次查看当前raft的情况是健康的(3节点 失败1节点 无问题) ，leader节点 变成了192.168.56.80:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-leader
192.168.56.80:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-health
healthy--6-4、干掉当前leader(192.168.56.80:10008)节点的orchestrator
[root@hostmysql80 ~]# ps -ef|grep orch
root      7402  7310  1 15:31 pts/2    00:03:17 ./orchestrator --debug --config=/usr/local/orchestrator/orchestrator.conf.json http
root      9722  7687  0 18:29 pts/1    00:00:00 grep --color=auto orch
[root@hostmysql80 ~]# kill -9 7402--6-5、再次查看当前raft的情况是非健康的(3节点 失败2节点 orchestrator挂了) ,如下图 查看web界面一样 变灰 orchestrator 不可用了
[root@hostmysql80 ~]# orchestrator-client -c raft-leader
orchestrator-client[9738]: Cannot determine leader from hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api
[root@hostmysql80 ~]# orchestrator-client -c raft-health
orchestrator-client[9770]: Cannot determine leader from hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api

--7、被管理的mysql服务高可用

--参考官网: https://github.com/openark/orchestrator/blob/master/docs/topology-recovery.md#automated-recovery --高可用方式：1：主库宕机自动恢复、2：优雅的提升主库(主从切换)、3：手工恢复、4：手工强制恢复

--7-1、主库宕机自动恢复

--主库宕机了，根据从库优先提升主库规则自动的把该从库提升为新主库(此时 旧主库独立可以手工设置 read_only=1 然后拉回复制集群中 ) 。如果参数 "ApplyMySQLPromotionAfterMasterFailover": true, 那么自动的对新主库执行 reset slave all 和 set read_only=0 (但要注意 /etc/my.cnf参数文件并没有更改)。

--需要在orchestrator.conf.json 参数配置文件 设置如下参数："FailureDetectionPeriodBlockMinutes": 60,                       //在该时间内再次出现故障，不会在次被监控"RecoveryPeriodBlockSeconds": 3600,                             //在该时间内再次出现故障，不会进行迁移，避免出现恢复冲突"RecoveryIgnoreHostnameFilters": [],                            //恢复将忽略匹配的主机"RecoverMasterClusterFilters": ["*"],                           //设置自动failover 切换主库。只对匹配的集群进行主恢复（"*"模式匹配所有）"RecoverIntermediateMasterClusterFilters":["*"],                //设置自动failover 切换主库。只对匹配的集群进行恢复（"*"模式匹配所有内容）--设置 提升 主库的规则，优先提升hostmysql61:3306为主库
[root@hostmysql80 ~]# orchestrator-client -c register-candidate -i hostmysql61:3306 --promotion-rule prefer
hostmysql61:3306--查看集群
[root@hostmysql80 orchestrator]# orchestrator-client -c clusters
hostmysql80:3306
--查看集群的复制拓补状态 都正常的
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--主库(hostmysql80:3306)宕机 
[root@hostmysql80 ~]# systemctl stop mysqld.service
--查看集群的复制拓补状态 主库(hostmysql80:3306) 显示不可用， 但从库ok 
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |unknown|invalid|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID
--主库已变成downtimed状态(维护状态)
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|unknown|invalid|5.7.27-log|rw|ROW|>>,GTID,downtimed
--显示downtimed状态(维护状态)的情况
[root@hostmysql80 orchestrator]# orchestrator-client -c downtimed
hostmysql80:3306
--分析复制拓补中存在的问题 主库(hostmysql80:3306) 死掉了 而且没有所属从库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
hostmysql80:3306 (cluster hostmysql80:3306): DeadMasterWithoutSlaves--通过任一从库 查看集群的复制拓补状态，发现hostmysql61:3306已变成新主库
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--查看新主库的状态(hostmysql61:3306)，因为参数设置是 "ApplyMySQLPromotionAfterMasterFailover": true,   所以对新主库执行了 reset slave all 和 set read_only=0 。
mysql> show slave status\G
Empty set (0.00 sec)mysql> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.00 sec)--以上的自动恢复流程 查看/tmp/recovery.log日志， 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
OnFailureDetectionProcesses
OnFailureDetectionProcesses
PreFailoverProcesses                      #在执行恢复操作之前立即执行
PostMasterFailoverProcesses               #在主恢复成功结束时执行
PostFailoverProcesses                     #在任何成功恢复结束时执行[root@hostmysql61 ~]# more /tmp/recovery.log
Detected UnreachableMaster on hostmysql80:3306. Affected replicas: 2
Detected DeadMaster on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMaster on hostmysql80:3306
Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql61:3306
(for all types) Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql61:3306--恢复主库(192.168.56.80:3306)
[root@hostmysql80 ~]# systemctl start mysqld.service--再次分析复制拓补已无问题，但hostmysql80:3306成为了独立的库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|0s|ok|5.7.27-log|rw|ROW|>>,GTID--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ，显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库，设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;mysql> stop slave;mysql> reset slave all;mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61', MASTER_PORT=3306,MASTER_USER='repl', MASTER_PASSWORD='replPwd8!', MASTER_AUTO_POSITION=1,MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400, MASTER_HEARTBEAT_PERIOD=2;mysql> start slave;--查看集群的复制拓补状态，hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

--7-2、优雅的提升主库(主从切换)

--优雅的提升一个从库为新主库，在提升新主库之前会自动把旧主库设置 read_only=1，然后把旧主库变为新主库的从库(维护状态需要手工start slave拉起)。如果参数"ApplyMySQLPromotionAfterMasterFailover": true, 那么对新主库执行 reset slave all 和 set read_only=0(但要注意 /etc/my.cnf参数文件并没有更改) 。

--查看集群的复制拓补状态 都正常的。 hostmysql61:3306为主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--优雅的提升hostmysql80:3306为主库，hostmysql61:3306为其从库
[root@hostmysql80 ~]# orchestrator-client -c graceful-master-takeover -a hostmysql61:3306 -d hostmysql80:3306
hostmysql80:3306--查看集群的复制拓补状态 hostmysql80:3306已提升为主库，  hostmysql61:3306变成了从库(downtimed 维护状态)
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql80:3306  |  0s|ok            |5.7.27-log|rw|ROW|>>,GTID          
+ hostmysql60:3306|  0s|ok            |5.7.27-log|ro|ROW|>>,GTID          
- hostmysql61:3306|null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID,downtimed--以上优雅提升主库的流程 查看/tmp/recovery.log日志， 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
PreGracefulTakeoverProcesses
OnFailureDetectionProcesses
PreFailoverProcesses
PostMasterFailoverProcesses
PostFailoverProcesses
PostGracefulTakeoverProcesses[root@hostmysql80 ~]# more /tmp/recovery.log
Planned takeover about to take place on hostmysql61:3306. Master will switch to read_only
Detected DeadMaster on hostmysql61:3306. Affected replicas: 1
Will recover from DeadMaster on hostmysql61:3306
Recovered from DeadMaster on hostmysql61:3306. Failed: hostmysql61:3306; Promoted: hostmysql80:3306
(for all types) Recovered from DeadMaster on hostmysql61:3306. Failed: hostmysql61:3306; Successor: hostmysql80:3306
Planned takeover complete--手工拉起hostmysql61库的复制
mysql> start slave;--查看集群的复制拓补状态，hostmysql61的复制已正常
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

--7-3、手工恢复

--在自动恢复禁止或者阻塞时，可以手工恢复复制集群。流程大体和自动恢复一致只多了一个手动恢复。

--设置自动恢复禁止，需要在orchestrator.conf.json 参数配置文件 设置如下参数："FailureDetectionPeriodBlockMinutes": 60,"RecoveryPeriodBlockSeconds": 3600,"RecoveryIgnoreHostnameFilters": [],"RecoverMasterClusterFilters": ["_auto_failure_"],                    //匹配不上即可"RecoverIntermediateMasterClusterFilters": ["_auto_failure_"],        //匹配不上即可--主库宕机
[root@hostmysql80 orchestrator]# systemctl stop mysqld.service--查看集群的复制拓补状态，集群挂了(因为没有自动恢复 所以会一直挂起)
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql80:3306  |unknown|invalid       |5.7.27-log|rw|ROW|>>,GTID
- hostmysql60:3306|   null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID
- hostmysql61:3306|   null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID--对失败的实例 执行手动恢复
[root@hostmysql80 orchestrator]# orchestrator-client -c recover -i hostmysql80:3306
hostmysql61:3306--查看集群的复制拓补状态，已把失败的实例(主库)剔除集群，提升hostmysql61:3306为主库，复制集群恢复正常
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID --以上的手动恢复流程 查看/tmp/recovery.log日志， 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
OnFailureDetectionProcesses
OnFailureDetectionProcesses
PreFailoverProcesses                      #在执行恢复操作之前立即执行
PostMasterFailoverProcesses               #在主恢复成功结束时执行
PostFailoverProcesses                     #在任何成功恢复结束时执行[root@hostmysql80 orchestrator]# more /tmp/recovery.log
Detected UnreachableMaster on hostmysql80:3306. Affected replicas: 2
Detected DeadMasterAndSomeSlaves on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMasterAndSomeSlaves on hostmysql80:3306
Recovered from DeadMasterAndSomeSlaves on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql60:3306
(for all types) Recovered from DeadMasterAndSomeSlaves on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql60:3306--恢复主库(192.168.56.80:3306)
[root@hostmysql80 ~]# systemctl start mysqld.service--再次分析复制拓补已无问题，但hostmysql80:3306成为了独立的库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|0s|ok|5.7.27-log|rw|ROW|>>,GTID--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ，显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库，设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;mysql> stop slave;mysql> reset slave all;mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61',MASTER_PORT=3306,MASTER_USER='repl',MASTER_PASSWORD='replPwd8!',MASTER_AUTO_POSITION=1,MASTER_CONNECT_RETRY=1,MASTER_RETRY_COUNT=86400,MASTER_HEARTBEAT_PERIOD=2;mysql> start slave;--查看集群的复制拓补状态，hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

--7-4、手工强制恢复

--不管orchestrator的任何想法，强制进行故障转移

--查看集群的复制拓补状态，hostmysql80是复制集群的主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--手工强制让hostmysql61:3306变成主库，忽略orchestrator的想法
[root@hostmysql80 ~]# orchestrator-client -c force-master-failover -i hostmysql61:3306
hostmysql61:3306--查看之前的旧主库hostmysql80:3306  变成了独立的库。
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306  
hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
--新主库hostmysql61:3306 是集群复制的主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID--以上的手动恢复流程 查看/tmp/recovery.log日志， 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数OnFailureDetectionProcesses
PreFailoverProcesses
PostMasterFailoverProcesses
PostFailoverProcesses[root@hostmysql60 ~]# more /tmp/recovery.log    
Detected DeadMaster on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMaster on hostmysql80:3306
Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql61:3306
(for all types) Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql61:3306--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ，显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库，设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;mysql> stop slave;mysql> reset slave all;mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61',MASTER_PORT=3306,MASTER_USER='repl',MASTER_PASSWORD='replPwd8!',MASTER_AUTO_POSITION=1,MASTER_CONNECT_RETRY=1,MASTER_RETRY_COUNT=86400,MASTER_HEARTBEAT_PERIOD=2;mysql> start slave;--查看集群的复制拓补状态，hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

--7-5、从库1(192.168.56.60:3306) 宕机

--从库1(192.168.56.60:3306) 宕掉
[root@hostmysql61 system]# systemctl stop mysqld.service--查看集群的复制拓补状态 从库1(192.168.56.60:3306) 显示不可用
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |     0s|ok     |5.7.27-log|rw|ROW|>>,GTID
- hostmysql60:3306|unknown|invalid|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID--恢复从库1(192.168.56.60:3306)
[root@hostmysql61 ~]# systemctl start mysqld.service--从库会自动加入orchestrator监控中
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

附：启动orchestrator时报错及解决

问题1：2020-04-07 10:01:11 ERROR ReadTopologyInstance(192.168.56.80:3306) show variables like 'maxscale%': QueryRowsMap unexpected error: runtime error: invalid memory address or nil pointer dereference
解决1：配置文件设置 "SkipMaxScaleCheck": true,        问题2：2020-04-07 10:12:42 ERROR ReadTopologyInstance(192.168.56.80:3306) show global status like 'Uptime': TLS requested but server does not support TLS
解决2：配置文件设置 "MySQLTopologyUseMixedTLS": false,问题3：2020-04-08 09:45:34 ERROR ReadTopologyInstance(hostmysql80:3306) show slave hosts: ReadTopologyInstance(hostmysql80:3306) 'show slave hosts' returned row with : <,3306>
解决3：配置文件设置 "DiscoverByShowSlaveHosts": true,   从库添加 report_host--主库查看 无host
mysql> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                           |
+-----------+------+------+-----------+--------------------------------------+
|   2223306 |      | 3306 |   1113306 | c959bdb9-f94c-11e9-b3b8-0800277143f5 |
|   3333306 |      | 3306 |   1113306 | 78a79555-758e-11ea-a2cc-080027c2dbc3 |
+-----------+------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)--两个从库分别添加
report_host=192.168.56.61  /60
report_port=3306--主库再次查看有host信息，无报错
mysql> show slave hosts;
+-----------+---------------+------+-----------+--------------------------------------+
| Server_id | Host          | Port | Master_id | Slave_UUID                           |
+-----------+---------------+------+-----------+--------------------------------------+
|   2223306 | 192.168.56.61 | 3306 |   1113306 | c959bdb9-f94c-11e9-b3b8-0800277143f5 |
|   3333306 | 192.168.56.60 | 3306 |   1113306 | 78a79555-758e-11ea-a2cc-080027c2dbc3 |
+-----------+---------------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > MySQL高可用架构Orchestrator配置文件详解
下一篇 > Ceph入门到精通-Ceph版本升级命令

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

Mysql高可用—5.7 gtid 主从 + orchestrator 配置及简单测试

--1、环境信息：

--2、安装orchestrator (三台服务器 都要安装)

--3、配置orchestrator及被管理数据库(三台服务器 都要配置)

--4、启动(三台服务器 都要启动orchestrator，让其orchestrator通过raft一致性算法来保障自身高可用)

--4-1、debug启动orchestrator

--4-2、登录web管理orchestrator：

--5、orchestrator-client 后台命令管理orchestrator

--6、orchestrator的自身高可用

--7、被管理的mysql服务 高可用

--7-1、主库宕机 自动恢复

--7-2、优雅的提升主库(主从切换)

--7-3、手工恢复

--7-4、手工强制恢复

--7-5、从库1(192.168.56.60:3306) 宕机

附：启动orchestrator时 报错 及 解决

相关文章

--2、安装orchestrator (三台服务器都要安装)

--3、配置orchestrator及被管理数据库(三台服务器都要配置)

--4、启动(三台服务器都要启动orchestrator，让其orchestrator通过raft一致性算法来保障自身高可用)

--7、被管理的mysql服务高可用

--7-1、主库宕机自动恢复

附：启动orchestrator时报错及解决