HA Cluster :

集群类型:LB (Ivs/nginx ( http/upstream,stream/upstream) )、HA、HP

SPoF: Single Point of Failure

系统可用性的公式: A=MTBF/ (MTBF+MTTR)

(0.1).95%

几个9(指标):99%,99.5%,…99.999%,99.9999%;

99%: %1,99.9%,0.1%

系统故障:

硬件故障:设计缺陷、wear out、自然灾害、…

软件故障:设计缺陷、

提升系统高用性的解决方案之降低MTTR:

手段:冗余(redundant )

active/passive(主备),active/active(双主)

active --> HEARTBEAT --> passive

active <--> HEARTBEAT <--> active

高可用的是“服务”:

HA nginx service :

vip/nginx process[/shared storage]

资源:组成一个高可用服务的“组件“

keepalived:

vrrp协议的软件实现,原生设计的目的为了高可用ipvs服务

vrrp协议完成地址流动:

为vip地址所在的节点生成ipvs规则(在配置文件中预先定义):

为ipvs集群的各RS做健康状态检测:

基于脚本调用接口通过执行脚本完成脚本中定义的功能,进而影响集群事务;

组件:

核心组件

wrrp stack

ipvs wrapper

checkers

控制组件:配置文件分析器

IO复用器

内存管理组件

HA Cluster的配置前提:

(1)各节点时间必须同步;

ntp,chrony

(2)确保iptables及selinux不会成为阻碍:

(3)各节点之间可通过主机名互相通信(对KA并非必须)

建议使用/etc/hosts文件实现:

(4)确保各节点的用于集群服务的接口支持MULTICAST通信

D类:224-239

keepalived安装配置:

CentOs 6.4+随base仓库提供

程序环境:

主配置文件:/etc/keepalived/keepalived.conf

主程序文件:/usr/sbin/keepalived

Unit File : keepalived.service

Unit File的环境配置文件:/etc/sysconfig/keepalived

配置文件组件部分:

TOP HIERACHY

GLOBAL CONFIGURATION

Global definitions

Static routes/addresses

VRRPD CONFIGURATION

VRRP synchronizatlon group(s):vrrp同步组;

VRRP instance(s):每个vrrp instance即一个vrrp路由器;

LVS CONFIGURATION

Virtual server group(s)

Virtual server(s):ipvs集群的vs和rs ;

高可用的ipvs集群示例

! Configuration File for keepalivedglobal_defs {notification_email {root@xiang}notification_email_from keepalived@xiangsmtp_server 127.0.0.1smtp_connect_timeout 30router_id xiangvrrp_mcast_group4 224.1.101.33
}vrrp_instance VI_1 {state MASTERinterface ens33virtual_router_id 33priority 100advert_int 1authentication {auth_type PASSauth_pass kav5hsNF}virtual_ipaddress {192.168.0.111/16 dev ens33 label ens33:0}notify_master "/etc/keepalived/notify.sh master"notify_backup "/etc/keepalived/notify.sh backup"notify_fault "/etc/keepalived/notify.sh fault"
}

通知脚本的使用方式

示例通知脚本:

#!/bin/bash
contact='root@localhost'
notify(){local mailsubject="$(hostname) to be $1, vip floating"local mailbody="$(date +'%F %T'): vrrp transition, $(hostname) changed to be $1"echo "$mailbody" | mail -s "$mailsubject" $contact
}
case $1 in
master)notify master;;
backup)notify backup;;
fault)notify fault;;
*)echo "Usage: $(basename $0) (master | backup | fault}"exit 1;;
esac

脚本的调用方法:

notify_master "/etc/keepalived/notify.sh master"

notify_backup "/etc/keepalived/notify.sh backup"

notify_fault "/etc/keepalived/notify.sh fault"

虚拟服务器:

配置参数:

virtual_server IP port |
virtual_server fwmark int
{....real_server {....}....
}

常用参数:

delay_loop<INT>:服务轮询的时间间隔;

lb_algo rr | wrr | lc | wlc | lblc | sh | dh:定义调度方法;

Ib _kind NAT | DR | TUN:集群的类型:

persistence_timeout <INT>:持久连接时长;

protocol TCP:服务协议,仅支持TCP;

sorry_server <IPADDR><PORT>:备用服务器地址;

real_semer <IPADDR><PORT>

{

weight <INT>

notify_up <STRING> | <QUOTED-STRING>

notify_down <STRING> | <QUOTED-STRING>

HTTP_GET | SSL_GET | TCP_CHECK | SMTP_CHECK | MISC_CHECK{...}:定义当前主机的健康状态检测方法

}

HTTP_GET | SSL_GET:应用层检测

HTTP_GET | SSL_GET {

url {

path <URL_PATH>:定义要监控的URL;

status_code <INT>:判断上述检测机制为健康状态的响应码;

digest <STRING>:判断上述检测机制为健康状态的响应的内容的校验码;

}

nb_get_retry <INT>:重试次数;

delay_before_retry<INT>:重试之前的延迟时长;

connect_ip <IP ADDRESS>:向当前RS的哪个IP地址发起健康状态检测请求

connect_port<PORT>:向当前RS的哪个PORT发起健康状态检测请求

bindto <IP ADDRESS>:发出健康状态检测请求时使用的源地址

bind_port <PORT>:发出健康状态检测请求时使用的源端口;

connect timeout<INTEGER>:连接请求的超时时长;

}

TCP_CHECK {

connect_ip <IP ADDRESS>:向当前RS的哪个IP地址发起健康状态检测请求

connect_port<PORT>:向当前RS的哪个PORT发起健康状态检测请求

bindto <IP ADDRESS>:发出健康状态检测请求时使用的源地址:

bind port <PORT>:发出健康状态检测请求时使用的源端目;

connect_timeout <INTEGER>:连接请求的超时时长:

}

TCP_CHECK使用示例:

TCP_CHECK {
nb_get_retry 3
delay_before_retry 2
connect_timeout 3
}

keepalived调用外部的辅助脚本进行资源监控,并根据监控的结果状态能实现优先动态调整

分两步:(1)先定义一个脚本;(2)调用此脚本;

vrrp_script <SCRIPT_NAME> {script ""interval INTweight -INT
}
track_script {SCRIPT_NAME_1SCRIPT_NAME_2...
}

示例:高可用nginx服务

! configuration File for keepalived
global_defs {notification_email {root@xiang}notification email_from keepalived@xiangsmtp_server 127.0.0.1smtp_connect_timeout 30router_id xiangvrrp_mcast_group4 224.0.100.19
}
vrrp_scrlpt chk_down {script "[[ -f /etc/keepalived/down ]] && exit 1 | exit 0"interval 1weight -5
}
vrrp_script chk_nginx {script "killall -0 nginx && exit 0 || exit 1"interval 1weight -5fall 2rise 1
}
vrrp_instance VI_1 {state MASTERinterface ens33virtual_router_id 14priority 100advert_int 1authentication {auth_type PASSauth_pass 561f97b2}virtual_ipaddress {10.1.0.93/16 dev ens33}track_script {chk_downchk_nginx}notify_master "/etc/keepallved/notify.sh master"notify_backup "/etc/keepallved/notify.sh backup"notify_fault "/etc/keepallved/notify.sh fault"
}