使用kubernetes部署redis cluster

介绍

使用kubernetes部署3主3从的redis cluster,为保证redis 的master和slave实例分布到不同的node上,至少需要2个node节点。

部署方案

将redis cluster部署到docker中,使用kubernetes管理docker集群,可以依靠kubernetes的一些特性实现redis cluster的node节点选取,宕机恢复等工作。
kubernetes:

  1. 使用kubernetes 的ReplicaSet类型,此类型为ReplicationController的升级版,更好的支持selector,且当redis cluster 扩容缩容时修改RS配置后相应的pod不会重启导致redis实例异常
  2. 创建2个pod副本,2个副本分布作为master和slave。使用kubernetes的pod反亲和性将2个副本部署到不同node上。
  3. 使用环境变量配置redis实例需要的port、maxmemory、requirepass等信息,并可以写入到docker容器中,使redis.conf生效
  4. kubernetes中限制docker容器的cpu和内存
  5. redis cluster规范中建议使用docker host网络模式,网络损耗小,且易于管理
  6. 由于redis可持久化,所以实例的持久化文件需要通过hostPath方式挂载到docker的宿主机上,以便用于恢复

docker:

  1. docker镜像中打包 redis server
  2. 在docker容器中使用dockerize生成redis.conf文件

部署流程

1. 部署环境和版本

node ip 用途
10.10.0.1 kubernetes master
10.10.0.2 kubernetes proxy
server version
docker 1.12.3
kubernetes 1.6.4
redis 3.2.9

2. 制作docker镜像

编写dockerfile,其中使用 dockerize插件通过redis.tmpl模板生成redis.conf配置文件

1
2
3
4
5
6
7
FROM centos:7
MAINTAINER The centos project for redis
ENV REDIS_VERSION 3.2.9
COPY dockerize /usr/local/bin/
COPY redis.tmpl /opt/
COPY redis-3.2.9 /opt/redis-3.2.9
RUN mkdir -p /opt/data

redis.tmpl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
bind 0.0.0.0
protected-mode no
port {{ .Env.REDIS_PORT }}
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile /var/run/redis.pid
loglevel notice
logfile "/opt/data/logs/{{ .Env.REDIS_CLUSTER_ID }}_redis.log"
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename {{ .Env.REDIS_CLUSTER_ID }}_dump.rdb
dir /opt/data/dump
masterauth {{ default .Env.REDIS_PASS "sohuRedis"}}
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
requirepass {{ default .Env.REDIS_PASS "sohuRedis" }}
appendonly yes
appendfilename "{{ .Env.REDIS_CLUSTER_ID }}_appendonly.aof"
appendfsync everysec
maxmemory {{ default .Env.REDIS_MAXMEM "8589934592" }}
maxmemory-policy {{ default .Env.REDIS_POLICY "volatile-lru" }}
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
cluster-enabled yes
cluster-config-file /opt/nodes.conf
cluster-node-timeout 3000
cluster-require-full-coverage yes
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

执行docker命令构建镜像

1
docker build -t centos/redis-cluster .

3. 创建redis cluster pod

需要创建3组RS,端口分别为7000,7001,7002
创建命令

1
kubectl create -f redis-cluster.yaml

redis-cluster.yaml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: redis-7000
annotations:
description: "redis cluster service"
clusterId: "7000"
spec:
replicas: 2
template:
metadata:
labels:
app: redis
name: redis-7000
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: name
operator: In
values:
- redis-7000
topologyKey: kubernetes.io/hostname
hostNetwork: true
containers:
- image: centos/redis-cluster:latest
name: redis-cluster
imagePullPolicy: Never
env:
- name: REDIS_CLUSTER_ID
value: "7000"
- name: REDIS_PORT
value: "7000"
- name: REDIS_MAXMEM
valueFrom:
configMapKeyRef:
name: redis-config
key: memory
- name: REDIS_PASS
valueFrom:
configMapKeyRef:
name: redis-config
key: password
command: ["/bin/bash", "-c"]
args: [" sleep 60;cd /opt/data/dump/; for i in `ls|grep -P '^$(REDIS_CLUSTER_ID)'`;do name=`date +%T`;mv $i $name$i;done; cd /opt/data/conf/
; for k in `ls|grep -P '^$(REDIS_CLUSTER_ID)'`;do name=`date +%T`; mv $k $name$k; done; dockerize -template /opt/redis.tmpl:/opt/redis-3.2.9/redis-cluster.conf /opt/redis-3.2.9/src/redis-server /opt/redis-3.2.9/redis-cluster.conf"] resources:
requests:
cpu: 1
memory: 6144Mi
volumeMounts:
- mountPath: /opt/data
name: redis-volume
volumes:
- name: redis-volume
hostPath:
path: /opt/redisData

redis-configmap.yaml

1
2
3
4
5
6
7
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
data:
memory: "33554432"
password: "redisPassword"

  1. 组建redis cluster集群、分槽,并添加slave节点

组建redis cluster集群

1
2
3
4
5
/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7000 -a RedisPass cluster meet 10.10.0.1 7001
/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7000 -a RedisPass cluster meet 10.10.0.1 7002
/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7000 -a RedisPass cluster meet 10.10.0.2 7000
/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7000 -a RedisPass cluster meet 10.10.0.2 7001
/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7000 -a RedisPass cluster meet 10.10.0.2 7002

分槽脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

#!/bin/bash
for i in `seq 0 5461`
do
slot1=${slot1}" "${i}
done
echo ${slot1}
/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7000 -a RedisPass cluster addslots ${slot1}

for a in `seq 5462 10922`
do
slot2=${slot2}" "${a}
done
echo ${slot2}
/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7001 -a RedisPass CLUSTER ADDSLOTS ${slot2}

for k in `seq 10923 16383`
do
slot3=${slot3}" "${k}
done

/home/redis-3.2.9/src/redis-cli -h 10.10.0.1 -p 7002 -a RedisPass CLUSTER ADDSLOTS ${slot3}

部署中的问题

  1. redis cluster master故障后slave节点不能自动提升为master.
    问题原因:由于部署pod或者pod中容器重启时间很快,导致redis集群未选举出新的master,所以无法自动把slave提升为master。redis集群选举请参考 redis规范
    临时解决方法:延缓容器中redis服务的启动时间
  2. redis集群缩容时需要更新RS的maxmemory环境变量,更新后若容器故障重启,不会使用更新后的RS配置(delete pod 操作新配置会生效)。
    解决方法:使用configmap配置环境变量