HA ???????????
???????????? ???????[ 2013/11/26 15:03:34 ] ????????
????????????
????1. configure ha.cf file ??some key need be modified
????ucast eth0 10.0.38.33 // it should be another machine ip address .
????node dc_13 // you should add all node in this cluster
????ping 10.0.38.156 // it only is test if ip fails.
????2. configure haresources file.
????there are three columes.
????first clolume is machine name of the primary node.
????the second is ip address which never be used in this network.
????the third is the application which you want to call . it usually is a script which
????be in /etc/init.d ( call it "any_server")
????3. any_server configure .
????it is a script in /etc/init.d ?? and will be call by heartbeast.
????3. update crm.xml file
????When you modify configure file ?? you should perfrom /usr/lib64/heartbeat/haresources2cib.py
????it will generate cib.xml file again .
????4. fix the problem about master thread switch between primary and backup matchine
????the problem:
????when primary heartbeat thread(A machine) restart ??
????1. when A heartbeat stop ?? HA will reset B machine as primary server
????2. when A heartbeat start ??HA will reset A machine as primary server
????so ?? it will cause the problem that don't get data information and master thread don't start etc.
????Solve:
????We will limit the operation of reset A machine as primary server by some configure
????modified the configure item in /var/lib/heartbeat/crm/cib.xml as following
????<nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
????5. HA??64λ??????????Щ????
????1. libnet??汾???? ??????????64λrpm ?????????????send.c??????????????????????????????
????2. ?????????????£???????????/etc/ha.d/shellfunc ?????????ha_bin ??????/usr/lib64/?????????copy
????????????32λ????????????/usr/lib/??
????3. ????????????????????飬?????????????Ч?????????????????????????????????????????
???????????
????Notes:
????1. The logic of fsimage and fslog synchronization
????when slave master start ?? primary master will send fsimage to slave master server . and then primary master will don't send
????fsimage to slave master again and primary master will send fslog to slave master . the fslog on slave master will increase .
????when master switch ?? slave master will perform that with fslog update fsimage file.
????2. The wait time of heartbeast is 5s
????3. When perform /etc/init.d/delcae start ?? it fail ?? the reson maybe master thread have exist . it's notes is not very clear.
????4. About master switch . there are A and B master ?? A master is configer as primary master ?? B is slave master.
????4.1 When kill A master thread (sometime ?? the thread will be recall by HA. till it realy down ??we will kill it again ??) ?? HA will switch the primary master to B .
????This time ??we can monitor HA (/usr/sbin/crm_mon -i 5) ?? the master thread on A is error status . so HA will don't
????recall this thread on A ?? HA will be restart on A (/etc/init.d/heartbeat restart ) if you want to make it working.
????Otherwise?? Even if B master down ?? A master don't work.
????1. configure ha.cf file ??some key need be modified
????ucast eth0 10.0.38.33 // it should be another machine ip address .
????node dc_13 // you should add all node in this cluster
????ping 10.0.38.156 // it only is test if ip fails.
????2. configure haresources file.
????there are three columes.
????first clolume is machine name of the primary node.
????the second is ip address which never be used in this network.
????the third is the application which you want to call . it usually is a script which
????be in /etc/init.d ( call it "any_server")
????3. any_server configure .
????it is a script in /etc/init.d ?? and will be call by heartbeast.
????3. update crm.xml file
????When you modify configure file ?? you should perfrom /usr/lib64/heartbeat/haresources2cib.py
????it will generate cib.xml file again .
????4. fix the problem about master thread switch between primary and backup matchine
????the problem:
????when primary heartbeat thread(A machine) restart ??
????1. when A heartbeat stop ?? HA will reset B machine as primary server
????2. when A heartbeat start ??HA will reset A machine as primary server
????so ?? it will cause the problem that don't get data information and master thread don't start etc.
????Solve:
????We will limit the operation of reset A machine as primary server by some configure
????modified the configure item in /var/lib/heartbeat/crm/cib.xml as following
????<nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
????5. HA??64λ??????????Щ????
????1. libnet??汾???? ??????????64λrpm ?????????????send.c??????????????????????????????
????2. ?????????????£???????????/etc/ha.d/shellfunc ?????????ha_bin ??????/usr/lib64/?????????copy
????????????32λ????????????/usr/lib/??
????3. ????????????????????飬?????????????Ч?????????????????????????????????????????
???????????
????Notes:
????1. The logic of fsimage and fslog synchronization
????when slave master start ?? primary master will send fsimage to slave master server . and then primary master will don't send
????fsimage to slave master again and primary master will send fslog to slave master . the fslog on slave master will increase .
????when master switch ?? slave master will perform that with fslog update fsimage file.
????2. The wait time of heartbeast is 5s
????3. When perform /etc/init.d/delcae start ?? it fail ?? the reson maybe master thread have exist . it's notes is not very clear.
????4. About master switch . there are A and B master ?? A master is configer as primary master ?? B is slave master.
????4.1 When kill A master thread (sometime ?? the thread will be recall by HA. till it realy down ??we will kill it again ??) ?? HA will switch the primary master to B .
????This time ??we can monitor HA (/usr/sbin/crm_mon -i 5) ?? the master thread on A is error status . so HA will don't
????recall this thread on A ?? HA will be restart on A (/etc/init.d/heartbeat restart ) if you want to make it working.
????Otherwise?? Even if B master down ?? A master don't work.
???????????????????????漰???????????????????SPASVOС??(021-61079698-8054)?????????????????????????