1.实验环境
实验使用三个linux设备
a.abc.com ip 192.168.10.99
b.abc.com ip 192.168.10.100
test 192.168.10.200
拓扑图:
检查内核
[root@a ~]# uname -r
2.6.18-164.el5 #此版本内核默认未添加drbd内核模块安装软件
在 a.abc.com 与b.abc.com 上同时执行安装操作
[root@love ~]# ll
总计 3088 -rw------- 1 root root 1192 10-05 18:43 anaconda-ks.cfg drwxr-xr-x 2 root root 4096 10-05 19:56 Desktop -rw-r--r-- 1 root root 221868 10-05 21:57 drbd83-8.3.8-1.el5.centos.i386.rpm -rw-r--r-- 1 root root 1637238 10-05 21:58 heartbeat-2.1.4-9.el5.i386.rpm -rw-r--r-- 1 root root 293349 10-05 21:58 heartbeat-devel-2.1.4-9.el5.i386.rpm -rw-r--r-- 1 root root 230890 10-05 21:58 heartbeat-gui-2.1.4-9.el5.i386.rpm -rw-r--r-- 1 root root 111742 10-05 21:58 heartbeat-ldirectord-2.1.4-9.el5.i386.rpm -rw-r--r-- 1 root root 92070 10-05 21:58 heartbeat-pils-2.1.4-10.el5.i386.rpm -rw-r--r-- 1 root root 179199 10-05 21:58 heartbeat-stonith-2.1.4-10.el5.i386.rpm -rw-r--r-- 1 root root 35236 10-05 18:42 install.log -rw-r--r-- 1 root root 3995 10-05 18:40 install.log.syslog -rw-r--r-- 1 root root 125974 10-05 21:57 kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -rw-r--r-- 1 root root 56817 10-05 21:58 libnet-1.1.4-3.el5.i386.rpm -rw-r--r-- 1 root root 92071 10-05 21:58 perl-MailTools-1.77-1.el5.noarch.rpm[root@love ~]# yum localinstall *.rpm --nogpgcheck –y
设置在ssh通道中的无验证传输
在 a.abc.com 上创建密钥并将密钥转移到b.abc.com上
[root@a ~]# ssh-keygen -t rsa
Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 49:6f:9d:0c:03:78:94:7c:52:b1:91:4e:35:34:51:f3[root@a .ssh]# ssh-copy-id -i id_rsa.pub b.abc.com
10 The authenticity of host 'a.abc.com (192.168.10.99)' can't be established. RSA key fingerprint is 0a:78:89:da:1e:1d:97:95:0b:8b:03:22:e7:af:22:5c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'a.abc.com,192.168.10.99' (RSA) to the list of known hosts. root@a.abc.com's password: Now try logging into the machine, with "ssh 'b.abc.com'", and check in:.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
此时就可以在a.abc.com上将任意的文件使用 “scp 文件或目录 b.abc.com:/目录”进行移动
使用“ssh b.abc.com ‘命令’” #在“b.abc.com” 主机上执行命令
同步时钟频率
[root@a ~]# date
2012年 10月 06日 星期六 11:17:11 CST [root@a ~]# hwclock -s [root@a ~]# date 2012年 10月 06日 星期六 11:24:08 CST[root@a ~]# ssh b.abc.com 'hwclock -s'
[root@a ~]# ssh b.abc.com 'hwclock' 2012年10月06日 星期六 11时24分33秒 -0.080474 seconds[root@a ~]#modprobe drbd
[root@a ~]#ssh b.abc.com 'modprobe drbd'
新建磁盘分区。在b.abc.com 上进行相同的操作
[root@a ~]# fdisk /dev/sdaThe number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK)Command (m for help): n
Command action e extended p primary partition (1-4) e Selected partition 4 First cylinder (1420-2610, default 1420): Using default value 1420 Last cylinder or +size or +sizeM or +sizeK (1420-2610, default 2610): Using default value 2610Command (m for help): n
First cylinder (1420-2610, default 1420): Using default value 1420 Last cylinder or +size or +sizeM or +sizeK (1420-2610, default 2610): Using default value 2610Command (m for help): w
The partition table has been altered!Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: 设备或资源忙.
The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks. [root@a ~]# partprobe /dev/sda [root@a ~]# cd /usr/share/doc/drbd83-8.3.8/ [root@a drbd83-8.3.8]# ll 总计 64 -rw-r--r-- 1 root root 31183 2010-06-02 ChangeLog -rw-r--r-- 1 root root 17990 2008-11-24 COPYING -rw-r--r-- 1 root root 133 2010-06-02 drbd.conf -rw-r--r-- 1 root root 22 2010-06-04 file.list -rw-r--r-- 1 root root 425 2010-03-02 README [root@a drbd83-8.3.8]# cp drbd.conf /etc/ cp:是否覆盖“/etc/drbd.conf”? y [root@a drbd83-8.3.8]# scp drbd.conf b.abc.com:/etc/ drbd.conf 100% 133 0.1KB/s [root@a drbd83-8.3.8]# cd [root@a ~]# vim /etc/drbd.conf# You can find an example in /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf";
include "drbd.d/*.res"; [root@a ~]# cd /etc/drbd.d [root@a drbd.d]# ll 总计 4 -rwxr-xr-x 1 root root 1418 2010-06-04 global_common.conf [root@a drbd.d]# cp global_common.conf global_common.conf.bak [root@a drbd.d]# ll 总计 8 -rwxr-xr-x 1 root root 1418 2010-06-04 global_common.conf -rwxr-xr-x 1 root root 1418 10-06 11:37 global_common.conf.bak [root@a drbd.d]# vim global_common.conf1 global {
2 usage-count no; 3 } 4 common { 5 protocol C; 6 startup { 7 wfc-timeout 120; 8 degr-wfc-timeout 120; 9 } 10 disk { 11 on-io-error detach; 12 fencing resource-only; 13 } 14 net { 15 cram-hmac-alg "sha1"; 16 shared-secret "mydrbdlab"; 17 } 18 syncer { 19 rate 100M; 20 } 21 }[root@a drbd.d]# vim web.res
1 resource web {
2 on a.abc.com { 3 device /dev/drbd0; 4 disk /dev/sda5; 5 address 192.168.10.99:7789; 6 meta-disk internal; 7 } 8 on b.abc.com { 9 device /dev/drbd0; 10 disk /dev/sda5; 11 address 192.168.10.100:7789; 12 meta-disk internal; 13 } 14 } [root@a drbd.d]# scp * b.abc.com:/etc/drbd.d/ global_common.conf 100% 425 0.4KB/s global_common.conf.bak 100% 1418 1.4KB/s web.res 100% 315 0.3KB/s [root@a drbd.d]# drbdadm create-md web Writing meta data... initializing activity log NOT initialized bitmap New drbd meta data block successfully created. [root@a drbd.d]# ssh b.abc.com 'drbdadm create-md web' NOT initialized bitmap Writing meta data... initializing activity log New drbd meta data block successfully created. [root@a drbd.d]# service drbd start Starting DRBD resources: [ web Found valid meta data in the expected location, 9796272128 bytes into /dev/sda5. d(web) s(web) n(web) ].......... *************************************************************** DRBD's startup script waits for the peer node(s) to appear. - In case this node was already a degraded cluster before the reboot the timeout is 120 seconds. [degr-wfc-timeout] - If the peer was available before the reboot the timeout will expire after 120 seconds. [wfc-timeout] (These values are for resource 'web'; 0 sec -> wait forever) To abort waiting enter 'yes' [ 28]: . [root@a drbd.d]# drbdadm -- --overwrite-data-of-peer primary web [root@a drbd.d]# service drbd status ]drbd driver loaded OK; device status: version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10. 2010-06-04 08:04:16 m:res cs ro ds p mount ... sync'ed: 0.9% (9260/9340)M delay_probe: 0:web SyncSource Primary/Secondary UpToDate/Inconsistent C [root@a drbd.d]# watch -n 1 'cat /proc/drbd' [root@a drbd.d]# mkfs -t ext3 -L drbdweb /dev/drbd0 mke2fs 1.39 (29-May-2006) Filesystem label=drbdweb OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 1196032 inodes, 2391587 blocks 119579 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2449473536 73 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632Writing inode tables: done
Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: doneThis filesystem will be automatically checked every 32 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.nfs共享服务相关设置
[root@a drbd.d]# vim /etc/exports [root@a drbd.d]# scp /etc/exports b.abc.com:/etc/ exports 100% 50 0.1KB/s [root@a drbd.d]# exportfs -rv exporting *:/web [root@a drbd.d]# ssh b.abc.com 'exportfs -rv' exporting *:/web [root@a drbd.d]# service portmap start 启动 portmap: [确定][root@a drbd.d]# chkconfig portmap on
[root@a drbd.d]# ssh b.abc.com 'service portmap start' 启动 portmap:[确定] [root@a drbd.d]# ssh b.abc.com 'chkconfig portmap on' [root@a drbd.d]# service nfs start 启动 NFS 服务: [确定] 关掉 NFS 配额: [确定] 启动 NFS 守护进程: [确定] 启动 NFS mountd: [确定] [root@a drbd.d]# chkconfig nfs on [root@a drbd.d]# ssh b.abc.com 'service nfs start' 启动 NFS 服务: [确定] 关掉 NFS 配额:[确定] 启动 NFS 守护进程:[确定] 启动 NFS mountd:[确定] [root@a drbd.d]# ssh b.abc.com 'chkconfig nfs on' [root@a drbd.d]# vim /etc/init.d/nfs #在b.abc.com主机上执行同样的操作122 killproc nfsd –9
配置heartbeat的配置文件
[root@a drbd.d]# cd /usr/share/doc/heartbeat-2.1.4/ [root@a heartbeat-2.1.4]# ls apphbd.cf DirectoryMap.txt HardwareGuide.html heartbeat_api.txt rsync. authkeys faqntips.html HardwareGuide.txt logd.cf starts AUTHORS faqntips.txt haresources README ChangeLog GettingStarted.html hb_report.html Requirements.html COPYING GettingStarted.txt hb_report.txt Requirements.txt COPYING.LGPL ha.cf heartbeat_api.html rsync.html [root@a heartbeat-2.1.4]# cp authkeys ha.cf haresources /etc/ha.d/ [root@a heartbeat-2.1.4]# cd /etc/ha.d [root@a ha.d]# ll 总计 48 -rw-r--r-- 1 root root 645 10-06 11:58 authkeys -rw-r--r-- 1 root root 10539 10-06 11:58 ha.cf -rwxr-xr-x 1 root root 745 2009-07-25 harc -rw-r--r-- 1 root root 5905 10-06 11:58 haresources drwxr-xr-x 2 root root 4096 10-06 11:34 rc.d -rw-r--r-- 1 root root 692 2009-07-25 README.config drwxr-xr-x 2 root root 4096 10-06 11:34 resource.d -rw-r--r-- 1 root root 7862 2009-07-25 shellfuncs [root@a ha.d]# vim ha.cf24 debugfile /var/log/ha-debug
29 logfile /var/log/ha-log
34 logfacility local0
48 keepalive 2
56 deadtime 10
76 udpport 694
95 bcast eth0 #heartbeat的监控网卡
214 node a.abc.com
215 node b.abc.com [root@a ha.d]# echo "a.abc.com IPaddr::192.168.10.101/24/eth0 drbddisk::web Filesydrbd0::/web::ext3 killnfsd" >> /etc/ha.d/haresources [root@a ha.d]# vim authkeys27 auth 3
28 3 md5 123456 [root@a ha.d]# vim haresources [root@a resource.d]# echo "killall -9 nfsd ; /etc/init.d/nfs restart ; exit 0" >> /etc/ha.d/resource.d/killnfsd [root@a resource.d]# cd ../ [root@a ha.d]# chmod 600 /etc/ha.d/authkeys [root@a ha.d]# chmod 755 /etc/ha.d/resource.d/killnfsd [root@a ha.d]# ll 总计 48 -rw------- 1 root root 665 10-06 12:04 authkeys -rw-r--r-- 1 root root 10576 10-06 12:01 ha.cf -rwxr-xr-x 1 root root 745 2009-07-25 harc -rw-r--r-- 1 root root 6003 10-06 12:05 haresources drwxr-xr-x 2 root root 4096 10-06 11:34 rc.d -rw-r--r-- 1 root root 692 2009-07-25 README.config drwxr-xr-x 2 root root 4096 10-06 12:06 resource.d -rw-r--r-- 1 root root 7862 2009-07-25 shellfuncs [root@a ha.d]# scp ha.cf authkeys haresources b.abc.com:/etc/ha.d/ ha.cf 100% 10KB 10.3KB/s 00:00 authkeys 100% 665 0.7KB/s 00:00 haresources 100% 6003 5.9KB/s 00:00 [root@a ha.d]# scp resource.d/killnfsd b.abc.com:/etc/ha.d/resource.d/ killnfsd 100% 51 0.1KB/s 00:00 [root@a ha.d]# service heartbeat restart Stopping High-Availability services: [确定] Waiting to allow resource takeover to complete: [确定] Starting High-Availability services: 2012/10/06_12:14:54 INFO: Resource is stopped [确定] [root@a ha.d]# ssh b.abc.com 'service heartbeat restart' Stopping High-Availability services: [确定] Waiting to allow resource takeover to complete: [确定] Starting High-Availability services: 2012/10/06_12:15:31 INFO: Resource is stopped [确定][root@a ha.d]# drbdadm -- --overwrite-data-of-peer primary web
[root@a ha.d]# service drbd status drbd driver loaded OK; device status: version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16 m:res cs ro ds p mounted fstype 0:web Connected Primary/Secondary UpToDate/UpToDate C [root@a ha.d]# mount /dev/drbd0 /web [root@a ha.d]# mount /dev/sda2 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/hdc on /mnt/cdrom type iso9660 (ro) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/drbd0 on /web type ext3 (rw)test测试机上进行的配置
[root@love ~]# mkdir /mnt/nfs
[root@love ~]#mount 192.168.10.101:/web /mnt/nfs
[root@love ~]# cd /mnt/nfs
[root@love nfs]# vim /mnt/test.sh
while true
do echo ----\>trying touch x:`date` touch x echo \<-----done touch x:`date` echo sleep 2 done[root@love nfs]# bash /mnt/test.sh #在当前目录下执行每隔两秒创建 文件“x” 的脚本
---->trying touch x:2012年 10月 06日 星期六 03:02:38 CST
<-----done touch x:2012年 10月 06日 星期六 03:02:39 CST---->trying touch x:2012年 10月 06日 星期六 03:02:41 CST touch: 无法触碰 “x”: Stale NFS file handle <-----done touch x:2012年 10月 06日 星期六 03:02:41 CST
#关闭a.abc.com 主机上的heartbeat服务。终端一次,立刻恢复正常
---->trying touch x:2012年 10月 06日 星期六 03:02:52 CST
<-----done touch x:2012年 10月 06日 星期六 03:02:52 CST