Procedure for Splitting and Hardening /dg File System

Printed: November 24, 1999 November, 1999

C:\SETH\SPECS\HARDWARE\DGSPLIT.WP

Overview

The Datagate application currently runs from the /dg File System, which is striped across eight disks, spanning two disk controllers in Tray 1 on the SSA, all of which are mirrored to a matching configuration in Tray 2. A number of factors indicate that separate file systems for the various functions of the Datagate application may improve performance.

The plan is to split the /dg file system into three file systems:

/dg 2 disks; /dg/dghome/queue 5 disks /dg/dghome/log 3 disks.

The overview of the procedure follows:

Advance Steps - can be done before 11 pm

1. Apply SSA newest patches on testdg: 105223-05, 105356-09; Reboot testdg

Preparation - At 11 pm Monday

1. Detach first tray from /dg file system

2. Add new disks to tray 1 of SSA

3. Prepare new file systems in tray 1, mount on /a, /a/dghome/log, /a/dghome/queue

4. Copy /dg file system to new file system space

Implementation

1. Change /etc/vfstab on testdg to include /dg, /dg/dghome/log, /dg/dghome/queue

2. Change ha.env for TAKEOVER_MOUNTS:/dg, /dg/dghome/log, /dg/dghome/queue

3. Shut down datagate running on proddg

4. Copy queues and work directory to new file systems

5. Unmount new file systems: /a, /a/dghome/log, /a/dghome/queue

6. Idle HA - this will release SSA diskset

7. Download firmware into SSA from testdg

8. Bring up Production using Failover procedure

9. Verify functionality

10. Add new disks to tray 2 of SSA - without downtime. Resync disks.

Failback

1. Change /etc/vfstab on proddg

2. Change ha.env for PRIMARY_MOUNTS: /dg, /dg/dghome/log, /dg/dghome/queue

3. Apply SSA patches on proddg: 105223-05, 105356-09; Reboot proddg

4. Shut down production on testdg

5. Failback

Follow-up

1. Attach mirror (second tray) to all components

2. Follow-up later to confirm re-syncing (4 hours)
 
 

Advance Steps - 10:30 p.m.

1 Install Patches on testdg:105223-05 and 105356-09
2
3 Reboot testdg
4  
 
 

PRE-IMPLEMENTATION STEPS - 11:00 pm

5 Verify the current state of the mirror on proddg:

metastat -s ssa1 | more

If any part of the mirror is not OK, there is a problem - STOP until it is resolved

6 Note - Disks will be added to tray 1 without preserving the half-mirror on tray 1,

since the half-mirror on tray 1 will be destroyed shortly in any case.

7 Break the Mirror on the SSA (on proddg):

metadetach -s ssa1 d66 d24

Note! This unmirrors the SSA - without saving changes

that need to be applied to the half-mirror that was detached.

If data is written in this state, the half-mirror is no longer valid.

8 Verify the current state of the mirror on proddg:

metastat -s ssa1 | more

Save md.tab info in a file

cd /etc/opt/SUNWmd

metastat -s ssa1 -p > md.tab.halfmirror

9 Verify that tray 1 is no longer in use:

iostat 5 5 

I/O on 1/md24 should be 0

10 Delete the old d24:

metaclear -s ssa1 d24

11 Shutdown disks in tray 1

ssaadm stop -t 1 c3

11B Verify that tray 2 is functioning

iostat 5 5 

I/O on 1/md42 should be non-0

12 Wait for light to go off on tray 1. Add new disks to tray 1
13 Start Disk Access to tray 1

ssaadm start -t 1 c3

14 Enable new disks - on testdg

drvconfig

disks

Verify that new disks are visible:

ls /dev/rdsk/c3t*d4s0

/dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t4d4s0

15 Enable new disks - on proddg

drvconfig

disks

Verify that new disks are visible:

ls /dev/rdsk/c3t*d4s0

/dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t4d4s0

16 add new disks to metaset ssa1 (from proddg):

metaset -s ssa1 -a c3t0d4 c3t1d4 

Verify:

metaset -s ssa1 | more

17 Create 3 slices on disks in tray 1:

slice 7:cylinders 0-1 slice 6 64MB, slice 0 slice 0: balance of disk

This can be done with the script ~/reformat

Use format to verify new paritions. If not correct, adjust vtoc.ssa and run the following command:

fmthard -s vtoc.ssa /dev/rdsk/c3t0d0s2 /dev/rdsk/c3t0d1s2 /dev/rdsk/c3t0d2s2 /dev/rdsk/c3t0d3s2 /dev/rdsk/c3t0d4s2 /dev/rdsk/c3t1d0s2 /dev/rdsk/c3t1d1s2 /dev/rdsk/c3t1d2s2 /dev/rdsk/c3t1d3s2 /dev/rdsk/c3t1d4s2

STEPS TO CREATE NEW FILE SYSTEMS - 11:10 pm
18 Create metadevices for new /dg file system:

./init_new_dg

If the above does not succeed, the following commands are used:

metainit -s ssa1 d23 1 2 c3t0d0s0 c3t0d1s0 -i 64k (2-column stripe)

metainit -s ssa1 d55 -m d23 (one-way mirror of the stripe)

metainit -s ssa1 d50 1 1 c3t1d2s6 (1-column stripe for trans device)

metainit -s ssa1 d54 -m d50 (one-way mirror for trans device)

metainit -s ssa1 d56 -t d55 d54 (Attach trans device to data device)

19 Create new /dg/dghome/log file system:

./init_new_log

If the above does not succeed, it can be done manually:

metainit -s ssa1 d13 1 3 c3t1d2s0 c3t1d3s0 c3t1d4s0 -i 64k (3-column stripe)

metainit -s ssa1 d44 -m d13 (one-way mirror of the stripe)

metainit -s ssa1 d40 1 1 c3t0d0s6 (1-column stripe for trans device)

metainit -s ssa1 d43 -m d40 (one-way mirror for trans device)

metainit -s ssa1 d45 -t d44 d43 (Attach trans device to data device)

20 new /dg/dghome/queue file system:

./init_new_queue

If the above does not succeed, it can be done manually:

metainit -s ssa1 d26 1 5 c3t0d2s0 c3t1d0s0 c3t0d3s0 c3t1d1s0 c3t0d4s0 -i 8k (5-column stripe)

metainit -s ssa1 d88 -m d26 (one-way mirror of the stripe)

metainit -s ssa1 d80 1 1 c3t0d1s6 (1-column stripe for trans device)

metainit d87 -m d80 (Create 1-way mirror for trans device)

metainit -s ssa1 d89 -t d88 d87 (Attach trans device to data device)

20B Verify new file systems:

metastat -s ssa1 | more

21 Build File Systems. This can be done in parallel, using 3 separate windows or using background

newfs -m 1 /dev/md/ssa1/rdsk/d56 2>&1 > /tmp/d56.out & (Build new /dg file system)

newfs -m 1 /dev/md/ssa1/rdsk/d45 2>&1 > /tmp/d45.out & (Build new log file system)

newfs -m 1 /dev/md/ssa1/rdsk/d89 2>&1 > /tmp/d89.out & (Build new queue file system)

Follow-up note: the newfs will prompt for confirmation if run at the keyboard, even in background.

If the above commands are in a script, there will be no prompting. To respond to the prompts,

use Afg@ (foreground), and to put it in the background again, use ^Z and Abg@ (background).

22 Mount the new file systems:

mount /dev/md/ssa1/dsk/d56 /a (Temporarily mount new file system)

mkdir /a/dghome (Temporarily mount new file system)

mkdir /a/dghome/log (Create datagte log directory)

mkdir /a/dghome/queue (Create datagate queue directory)

mount /dev/md/ssa1/dsk/d45 /a/dghome/log (Temporarily mount new log file system)

mount /dev/md/ssa1/dsk/d89 /a/dghome/queue (Temporarily mount new queue file system)

23 Copy /dg file system to new file system space:

cd /a (Change dir to the target directory)

ufsdump 0f - /dg | ufsrestore xf - (Dump/restore through a pipe)

24 Delete copied queues and logs:

\rm -rf /a/dghome/log/*

\rm -rf /a/dghome/queue/*

25 Verify that File system contents are complete
 

Implementation - 11:30 p.m.

26 Freeze + Lock High Availability on TESTDG: - to ensure we don=t fail over too soon

(Follow-up Note: Since we just rebooted testdg, no action was needed for this step)

HAmon 

(M)anage

(F)reeze 

(a) testdg

(L)ock

(a) testdg

27 Change /etc/vfstab on testdg to include /dg, /dg/dghome/log, /dg/dghome/queue
28 On testdg, change ha.env for TAKEOVER_MOUNTS for /dg, /dg/dghome/log, /dg/dghome/queue
29 Change backup script on testdg to include /dg/dghome/log and /dg/dghome/queue
30 Stop Cron on proddg:

/opt/VRTSfw/bin/startup.d/S99_MMC_160_start_stop_cron stop

31 Shut down datagate on proddg - RECORD TIME
32 Install ext_prog:

Log in as dg36

rm /dg/3.6/sparc-solaris2.6/bin/ext_prog

cd src/dgext

make install

Verify that /dg/3.6/sparc-solaris2.6/bin/ext_prog has current date

Copy to new /a file system:

cp /dg/3.6/sparc-solaris2.6/bin/ext_prog /dg/3.6/sparc-solaris2.6/bin

33 Wait for all datagate processes to stop
34 Copy queues and work directory to new file systems:

(Follow-up note: During implementation, the log directory was Moved after failover, not here)

cp -rp /dg/dghome/log/* /a/dghome/log

cp -rp /dg/dghome/queue/* /a/dghome/queue

Check for new files in work directories:

ls -ltr /dg/dghome/work

ls -ltr /dg/dghome/tables/lwmra/work

ls -ltr /dg/dghome/tables/lwtmsrpt/work

If any files have changed since 10pm, copy them, for example:

cp /dg/dghome/tables/lwmra/work/* /a/dghome/tables/lwmra/work

VERIFY THAT DIRECTORY CONTENTS MATCH PRODUCTION DIRECTORY
35 Unmount new file systems (in bottom-up order):

umount /a/dghome/log /a/dghome/queue /a

36 IDLE High Availability on proddg (unmount /dg and release SSA diskset)

HAmon

Manage

Idle

Proddg

37 Download SSA patch into SSA - from testdg:

ssaadm download -f /usr/lib/firmware/ssa/ssafirmware c3

Reset the SSA using the reset (ASys OK@) button

Another command to accomplish the download is:

luxadm download -f /usr/lib/firmware/ssa/ssafirmware c3 

38 Verify that disks can be accessed
39 Verify that HA is ready to take over on testdg

Failover Production to testdg using HA Failover procedure

40 Verify that production is now running on testdg - RECORD TIME
40B Mount the old /dg file system:

mount /dev/md/ssa1/dsk/d66 / a

 

Move logs from the old file system, before failover, to the new file systems:

cd /a/dghome/log

mkdir Nov-23

mv *.* Nov-23

mv * /dg/dghome/log &

The last command runs in the background and can be checked for completion later.

 

 

Steps to be done while Production is running on testdg - 11:45 p.m.

41 Install SSA patches on proddg - 105223-05, 105356-09
42
42B Verify that the move of the previous logs (from the old file system to the new file system) is done:

ps -eaf | grep mv

ls -l /a/dghome/log

Total 0

43 Shutdown disks in tray 2

ssaadm stop -t 2 c3

(This was moved to before the reboot, because step 45 can happen during reboot)

43B Verify on testdg that tray 2 is functioning

iostat 5 5 

I/O on 1/md42 should be non-0

44 Reboot proddg
45 Wait for light to go off on second tray. Add new disks to second tray
46 Start Disk Access to tray 2

ssaadm start -t 2 c3

46B Enable Fast-Writes on new SSA disks:

ssaadm fast_write -s -e c3

Verify:

ssaadm display c3

47 Enable new disks - on proddg

drvconfig

disks

Verify that new disks are visible:

ls /dev/rdsk/c3t*d4s0

/dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t2d4s0 /dev/rdsk/c3t3d4s0 /dev/rdsk/c3t4d4s0

 

 

Failback - 12:00 am

48A  
48B Change  /etc/vfstab on proddg
49 Change ha.env for PRIMARY_MOUNTS: /dg, /dg/dghome/log, /dg/dghome/queue
50 Change Backup script on proddg to include /dg/dghome/log and /dg/dghome/queue
51 Shut down Datagate running on testdg
52 Failback Production onto proddg
 

 

Follow-Up

53 Enable new disks - on testdg

drvconfig

disks

Verify that new disks are visible:

ls /dev/rdsk/c3t*d4s0

/dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t2d4s0 /dev/rdsk/c3t3d4s0 /dev/rdsk/c3t4d4s0

54 On proddg, add new disks to metaset: 

metaset -s ssa1 -a c3t2d4 c3t3d4

55 Create 64MB slice in slice 6 disks in tray 2:

./reformat2

56 Use format to verify new paritions. If not correct, adjust vtoc.ssa and run the following command:

fmthard -s vtoc.ssa /dev/rdsk/c3t2d0s2 /dev/rdsk/c3t2d1s2 /dev/rdsk/c3t2d2s2 /dev/rdsk/c3t2d3s2 /dev/rdsk/c3t2d4s2 /dev/rdsk/c3t3d0s2 /dev/rdsk/c3t3d1s2 /dev/rdsk/c3t3d2s2 /dev/rdsk/c3t3d3s2 /dev/rdsk/c3t3d4s2 /dev/rdsk/c3t4d4s2 /dev/rdsk/c3t5d0s2 

57 Create and attach mirror (second tray) to all components

./init_mirror

58 If the above reports errors, create the metadevices as needed manually:

metainit -s ssa1 d32 1 2 c3t2d0s0 c3t2d1s0 -i 64k

metainit -s ssa1 d31 1 3 c3t3d2s0 c3t3d3s0 c3t3d4s0 -i 64k

metainit -s ssa1 d62 1 5 c3t2d2s0 c3t3d0s0 c3t2d3s0 c3t3d1s0 c3t2d4s0 -i 8k

metainit -s ssa1 d41 1 1 c3t2d0s6 (1-column stripe for trans device)

metainit -s ssa1 d51 1 1 c3t3d2s6 (1-column stripe for trans device)

metainit -s ssa1 d81 1 1 c3t2d1s6 (1-column stripe for trans device)

59 Attach the new metadevices to the half-mirrors created earlier

metaattach -s ssa1 d43 d41

metaattach -s ssa1 d54 d51

metaattach -s ssa1 d87 d81

metaattach -s ssa1 d44 d31

metaattach -s ssa1 d55 d32

metaattach -s ssa1 d88 d62

60 Verify that all the mirrors are resyncing:

metastat -s ssa1 | more

61 Follow-up later to confirm re-syncing (4 hours)

Note! This procedure does not include creation and associating of hot spare pools for all devices!
That can be done on the fly at a later time, using metatool -s ssa1