redis哨兵集群

男人50

浏览: 243764 次
性别:
来自: 珠海

最近访客更多访客>>

xiang3630

hui963966800

knight_black_bob

wangyy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Redis

一、引言

             上一篇文章我们详细的讲解了Redis的主从集群模式，其实这个集群模式配置很简单，只需要在Slave的节点上进行配置，Master主节点的配置不需要做任何更改，但是有一点，Master和Slave两个节点的持久化配置尽量保持一致，否则会有奇怪的问题出现。从今天开始我们开始讲Redis集群模式的第二模式，也就是“哨兵”模式，该模式是从Redis的2.6版本开始提供的，但是当时这个版本的模式是不稳定的，直到Redis的2.8版本以后，这个哨兵模式才稳定下来，在生产环境中，如果想要使用Redis的哨兵模式，也会尽量使用Redis的2.8版本之后的版本。无论是主从模式，还是哨兵模式，这两个模式都有一个问题，不能水平扩容，并且这两个模式的高可用特性都会受到Master主节点内存的限制。还有一点，实现哨兵模式的配置也不简单，甚至可以说有些繁琐，所以在工业场景里这两个模式都不建议使用，如果要使用必须有相关的问题的解决方案，以免后续带来的问题。

二、Redis Sentinel简介

             Sentinel(哨兵)进程是用于监控redis集群中Master主服务器工作的状态，在Master主服务器发生故障的时候，可以实现Master和Slave服务器的切换，保证系统的高可用，其已经被集成在redis2.6+的版本中，Redis的哨兵模式到了2.8版本之后就稳定了下来。一般在生产环境也建议使用Redis的2.8版本的以后版本。哨兵(Sentinel) 是一个分布式系统，你可以在一个架构中运行多个哨兵(sentinel) 进程，这些进程使用流言协议(gossipprotocols)来接收关于Master主服务器是否下线的信息，并使用投票协议(Agreement Protocols)来决定是否执行自动故障迁移,以及选择哪个Slave作为新的Master。每个哨兵(Sentinel)进程会向其它哨兵(Sentinel)、Master、Slave定时发送消息，以确认对方是否”活”着，如果发现对方在指定配置时间(可配置的)内未得到回应，则暂时认为对方已掉线，也就是所谓的”主观认为宕机” ，英文名称：Subjective Down，简称SDOWN。有主观宕机，肯定就有客观宕机。当“哨兵群”中的多数Sentinel进程在对Master主服务器做出 SDOWN 的判断，并且通过 SENTINEL is-master-down-by-addr 命令互相交流之后，得出的Master Server下线判断，这种方式就是“客观宕机”，英文名称是：Objectively Down，简称 ODOWN。通过一定的vote算法，从剩下的slave从服务器节点中，选一台提升为Master服务器节点，然后自动修改相关配置，并开启故障转移（failover）。

            哨兵(sentinel) 虽然有一个单独的可执行文件 redis-sentinel ,但实际上它只是一个运行在特殊模式下的 Redis 服务器，你可以在启动一个普通 Redis 服务器时通过给定 --sentinel 选项来启动哨兵(sentinel)，哨兵(sentinel) 的一些设计思路和zookeeper非常类似。

           Sentinel集群之间会互相通信，沟通交流redis节点的状态，做出相应的判断并进行处理，这里的主观下线状态和客观下线状态是比较重要的状态，它们决定了是否进行故障转移，可以通过订阅指定的频道信息，当服务器出现故障得时候通知管理员，客户端可以将 Sentinel 看作是一个只提供了订阅功能的 Redis 服务器，你不可以使用 PUBLISH 命令向这个服务器发送信息，但你可以用 SUBSCRIBE 命令或者 PSUBSCRIBE 命令，通过订阅给定的频道来获取相应的事件提醒。一个频道能够接收和这个频道的名字相同的事件。比如说，名为 +sdown 的频道就可以接收所有实例进入主观下线（SDOWN）状态的事件。

      1、Sentinel（哨兵）进程的作用：

              1】、监控(Monitoring): 哨兵(sentinel) 会不断地检查你的Master和Slave是否运作正常。

              2】、提醒(Notification)：当被监控的某个Redis节点出现问题时, 哨兵(sentinel) 可以通过 API 向管理员或者其他应用程序发送通知。

              3】、自动故障迁移(Automatic failover)：当一个Master不能正常工作时，哨兵(sentinel) 会开始一次自动故障迁移操作，它会将失效Master的其中一个Slave升级为新的Master, 并让失效Master的其他Slave改为复制新的Master；当客户端试图连接失效的Master时，集群也会向客户端返回新Master的地址，使得集群可以使用现在的Master替换失效Master。Master和Slave服务器切换后，Master的redis.conf、Slave的redis.conf和sentinel.conf的配置文件的内容都会发生相应的改变，即，Master主服务器的redis.conf配置文件中会多一行slaveof的配置，sentinel.conf的监控目标会随之调换。


       2、Sentinel（哨兵）进程的工作方式：

             1】、每个Sentinel（哨兵）进程以每秒钟一次的频率向整个集群中的Master主服务器，Slave从服务器以及其他Sentinel（哨兵）进程发送一个 PING 命令。

             2】、如果一个实例（instance）距离最后一次有效回复 PING 命令的时间超过 down-after-milliseconds 选项所指定的值，则这个实例会被 Sentinel（哨兵）进程标记为主观下线（SDOWN）。

             3】、如果一个Master主服务器被标记为主观下线（SDOWN），则正在监视这个Master主服务器的所有 Sentinel（哨兵）进程要以每秒一次的频率确认Master主服务器的确进入了主观下线状态。

             4】、当有足够数量的 Sentinel（哨兵）进程（大于等于配置文件指定的值）在指定的时间范围内确认Master主服务器进入了主观下线状态（SDOWN），则Master主服务器会被标记为客观下线（ODOWN）。

             5】、在一般情况下，每个 Sentinel（哨兵）进程会以每 10 秒一次的频率向集群中的所有Master主服务器、Slave从服务器发送 INFO 命令。

             6】、当Master主服务器被 Sentinel（哨兵）进程标记为客观下线（ODOWN）时，Sentinel（哨兵）进程向下线的 Master主服务器的所有 Slave从服务器发送 INFO 命令的频率会从 10 秒一次改为每秒一次。

             7】、若没有足够数量的 Sentinel（哨兵）进程同意 Master主服务器下线， Master主服务器的客观下线状态就会被移除。若 Master主服务器重新向 Sentinel（哨兵）进程发送 PING 命令返回有效回复，Master主服务器的主观下线状态就会被移除。

       3、哨兵模式的环境：

             1】、Master主服务器配置信息：IP:192.168.127.128, Port:6379，OS:Linux

             2】、Slave从服务器的配置信息：IP:192.168.127.129 Port:6379，OS:Linux

             3】、在Slave从服务器上安装了一个哨兵进程（Sentinel），在Master服务器也安装了一个哨兵进程（Sentinel）。

             由于两个Redis服务器都是安装在Linux操作系统上，而且这两个Redis服务器会在Master主服务器发生故障的时候会进行切换，必须保证两个Redis服务器的端口号已经增加进了防火墙，或者把两个Linux操作系统的防火墙关闭，否则会提示Master-link-Status:down，没有连接上Master主服务器。解决办法有两个：第一个办法是关闭两个Linux操作系统的防火墙；第二个办法是把各个Redis服务的端口号增加到防火墙里面，允许通过该端口号进行通信。可以先使用命令【firewall-cmd --query-port=6379/tcp】,如果结果是 No，那就继续执行以下命令【firewall-cmd --add-port=6379/tcp】，命令执行后，返回Success，表示增加成功。这样两个Linux系统上的Redis服务器就可以顺利切换，执行哨兵模式的操作。

Sentinel（哨兵）是Redis 的高可用性解决方案：由一个或多个Sentinel 实例组成的Sentinel 系统可以监视任意多个主服务器，以及这些主服务器属下的所有从服务器，并在被监视的主服务器进入下线状态时，自动将下线主服务器属下的某个从服务器升级为新的主服务器。

　　　　例如：

　　　　

　　　　在Server1 掉线后：

　　　　升级Server2 为新的主服务器：

　　

2、Redis 主从分离

　在讲解Sentinel 哨兵集群之前，我们先来搭建一个简单的主从分离（读写分离）。

   首先，我们默认大家都已经安装了redis，然后我们将 redis.conf 拷贝多份，并且创建多个目录，用于区分多个redis 服务：

　　　

　这里面，每个目录中都有自己的redis.conf 配置文件，接下来，我们先设置主服务器的配置文件。
一、配置Master

   1、修改端口

# Accept connections on the specified port, default is 6379 (IANA #815344).
# If port 0 is specified Redis will not listen on a TCP socket.
port 6380

　　redis 的默认端口是6379，这里我们把主服务器的端口设置为6380

　2、修改pidfile

# If a pid file is specified, Redis writes it where specified at startup
# and removes it at exit.
#
# When the server runs non daemonized, no pid file is created if none is
# specified in the configuration. When the server is daemonized, the pid file
# is used even if not specified, defaulting to "/var/run/redis.pid".
#
# Creating a pid file is best effort: if Redis is not able to create it
# nothing bad happens, the server will start and run normally.
pidfile /var/run/redis_6380.pid

　　pidfile 是我们启动redis 的时候，linux 为我们分配的一个pid 进程号，如果这里不作修改，会影响后面redis服务的启动

   3、启动 redis

　　启动redis，我们可以看到，redis已经占领了6380 端口

　　进入客户端

redis-cli -p 6380
127.0.0.1:6380> info
...
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
...

　　我们可以看到，redis 现在的角色是一个master 启动的服务。

二、配置Slave

　　和上面配置 master一样，我们需要修改端口号和pid 文件，在修改完之后，我们有两种方法配置从服务

　　1、在配置文件中配置从服务

################################# REPLICATION #################################

# Master-Slave replication. Use slaveof to make a Redis instance a copy of
# another Redis server. A few things to understand ASAP about Redis replication.
#
# 1) Redis replication is asynchronous, but you can configure a master to
#    stop accepting writes if it appears to be not connected with at least
#    a given number of slaves.
# 2) Redis slaves are able to perform a partial resynchronization with the
#    master if the replication link is lost for a relatively small amount of
#    time. You may want to configure the replication backlog size (see the next
#    sections of this file) with a sensible value depending on your needs.
# 3) Replication is automatic and does not need user intervention. After a
#    network partition slaves automatically try to reconnect to masters
#    and resynchronize with them.
#
# slaveof <masterip> <masterport>

slaveof 127.0.0.1 6380

　　我们可以在配置文件中直接修改 slaveof 属性，我们直接配置主服务器的ip 地址，和端口号，如果这里主服务器有配置密码

　　可以通过配置masterauth 来设置链接密码

# If the master is password protected (using the "requirepass" configuration
# directive below) it is possible to tell the slave to authenticate before
# starting the replication synchronization process, otherwise the master will
# refuse the slave request.
#
# masterauth <master-password>

　　

      启动redis 服务：

　　我们可以看到，现在有两个现在在运行，我们进入6381的客户端，看一下他的状态，

# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:71
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

　　我们可以看到，现在的redis 是一个从服务的角色，连接着6380的服务。

　　2、在服务启动后设置

　　　　我们修改6382端口的服务器配置文件之后，启动服务

　　　　进入客户端，查看当前服务器的状态：

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

　　　　我们可以看到，当前服务器的状态时作为一个主服务的角色在运行，我们接下来修改他的状态：

127.0.0.1:6382> slaveof 127.0.0.1 6380

//修改后状态
# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:617
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

　　　　
　　3、总结

　　　我们先看一下目前master 的状态：

# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6381,state=online,offset=785,lag=0
slave1:ip=127.0.0.1,port=6382,state=online,offset=785,lag=0
master_repl_offset:785
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:784

　　　我们可以可以看到，两个从服务已经在连着主服务器，上面两种配置的区别在于，当salve 断线重连之后，

　　　如果我们是修改类配置文件，重连之后会自己链接上去master，并且同步master 上面的数据，

　　　如果我们是手动连接上去的主服务器，重连之后，从服务器会读取自己本地的 rdb 回复数据，而不会去自动链接主服务

　　   我们如果需要设置读写分离，只需要在主服务器中设置：

# Note: read only slaves are not designed to be exposed to untrusted clients
# on the internet. It's just a protection layer against misuse of the instance.
# Still a read only slave exports by default all the administrative commands
# such as CONFIG, DEBUG, and so forth. To a limited extent you can improve
# security of read only slaves using 'rename-command' to shadow all the
# administrative / dangerous commands.
slave-read-only yes

3、Sentinel 哨兵

　　1、配置端口

　　　在sentinel.conf 配置文件中，我们可以找到port 属性，这里是用来设置sentinel 的端口，一般情况下，至少会需要三个哨兵对redis 进行监控，我们可以通过修改端口启动多个sentinel 服务。

# port <sentinel-port>
# The port that this sentinel instance will run on
port 26379


　　2、配置主服务器的ip 和端口

　　　我们把监听的端口修改成6380，并且加上权值为2，这里的权值，是用来计算我们需要将哪一台服务器升级升主服务器

# sentinel monitor <master-name> <ip> <redis-port> <quorum>
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least <quorum> sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Slaves are auto-discovered, so you don't need to specify slaves in
# any way. Sentinel itself will rewrite this configuration file adding
# the slaves using additional configuration options.
# Also note that the configuration file is rewritten when a
# slave is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel monitor mymaster 127.0.0.1 6380 2

　
　　3、启动Sentinel

/sentinel$ redis-sentinel sentinel.conf

　　sentinel 启动之后，就会监视到现在有一个主服务器，两个从服务器

　　当我们把其中一个从服务器器关闭之后，我们可以看到日志：

10894:X 30 Dec 16:27:03.670 # +sdown slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380

　　日志表示，6381这个从服务器已经从主服务器中脱离了出来，我们重新把6381 接回去。

10894:X 30 Dec 16:28:43.288 * +reboot slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
10894:X 30 Dec 16:28:43.365 # -sdown slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380

　　
　　4、关闭Master

　　　　我们手动关闭Master 之后，sentinel 在监听master 确实是断线了之后，将会开始计算权值，然后重新分配主服务器

　　　　我们可以看到，6380主服务器断了之后，sentinel 帮我们选了6382作为新的主服务器

　　　我们进到6382的客户端，查看他的状态：

# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6381,state=online,offset=13751,lag=0
master_repl_offset:13751
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:13750

　　　　我们可以看到 6382，重slave 荣升为master

127.0.0.1:6382> set name jaycekon
OK

　　　　原本的没有权限写，也得到了相应的权限

　　
　　5、重连Master

　　　　大家可能会好奇，如果master 重连之后，会不会抢回属于他的位置，答案是否定的，就比如你被一个小弟抢了你老大的位置，他肯给回你这个位置吗。因此当master 回来之后，他也只能当个小弟　　

4、Sentinel 总结
一、Sentinel的作用：

A、Master 状态监测

B、如果Master 异常，则会进行Master-slave 转换，将其中一个Slave作为Master，将之前的Master作为Slave

C、Master-Slave切换后，master_redis.conf、slave_redis.conf和sentinel.conf的内容都会发生改变，即master_redis.conf中会多一行slaveof的配置，sentinel.conf的监控目标会随之调换

二、Sentinel的工作方式:

1)：每个Sentinel以每秒钟一次的频率向它所知的Master，Slave以及其他 Sentinel 实例发送一个 PING 命令
2)：如果一个实例（instance）距离最后一次有效回复 PING 命令的时间超过 down-after-milliseconds 选项所指定的值，则这个实例会被 Sentinel 标记为主观下线。
3)：如果一个Master被标记为主观下线，则正在监视这个Master的所有 Sentinel 要以每秒一次的频率确认Master的确进入了主观下线状态。
4)：当有足够数量的 Sentinel（大于等于配置文件指定的值）在指定的时间范围内确认Master的确进入了主观下线状态，则Master会被标记为客观下线
5)：在一般情况下，每个 Sentinel 会以每 10 秒一次的频率向它已知的所有Master，Slave发送 INFO 命令
6)：当Master被 Sentinel 标记为客观下线时，Sentinel 向下线的 Master 的所有 Slave 发送 INFO 命令的频率会从 10 秒一次改为每秒一次
7)：若没有足够数量的 Sentinel 同意 Master 已经下线， Master 的客观下线状态就会被移除。
若 Master 重新向 Sentinel 的 PING 命令返回有效回复， Master 的主观下线状态就会被移除。

1
顶

0
踩

分享到：

Slave_IO_Running: No ，Slave_SQL_Runnin ... | 如何养成与世无争的心态

2018-06-12 13:14
浏览 1507
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论