Printed: November 24, 1999 November, 1999
C:\SETH\SPECS\HARDWARE\DGSPLIT.WP
Overview
The Datagate application currently runs from the /dg File System, which is striped across eight disks, spanning two disk controllers in Tray 1 on the SSA, all of which are mirrored to a matching configuration in Tray 2. A number of factors indicate that separate file systems for the various functions of the Datagate application may improve performance.
The plan is to split the /dg file system into three file systems:
/dg 2 disks; /dg/dghome/queue 5 disks /dg/dghome/log 3 disks.
The overview of the procedure follows:
Advance Steps - can be done before 11 pm
1. Apply SSA newest patches on testdg: 105223-05, 105356-09; Reboot testdg
Preparation - At 11 pm Monday
1. Detach first tray from /dg file system
2. Add new disks to tray 1 of SSA
3. Prepare new file systems in tray 1, mount on /a, /a/dghome/log, /a/dghome/queue
4. Copy /dg file system to new file system space
Implementation
1. Change /etc/vfstab on testdg to include /dg, /dg/dghome/log, /dg/dghome/queue
2. Change ha.env for TAKEOVER_MOUNTS:/dg, /dg/dghome/log, /dg/dghome/queue
3. Shut down datagate running on proddg
4. Copy queues and work directory to new file systems
5. Unmount new file systems: /a, /a/dghome/log, /a/dghome/queue
6. Idle HA - this will release SSA diskset
7. Download firmware into SSA from testdg
8. Bring up Production using Failover procedure
9. Verify functionality
10. Add new disks to tray 2 of SSA - without downtime. Resync disks.
Failback
1. Change /etc/vfstab on proddg
2. Change ha.env for PRIMARY_MOUNTS: /dg, /dg/dghome/log, /dg/dghome/queue
3. Apply SSA patches on proddg: 105223-05, 105356-09; Reboot proddg
4. Shut down production on testdg
5. Failback
Follow-up
1. Attach mirror (second tray) to all components
2. Follow-up later to confirm re-syncing (4 hours)
Advance Steps - 10:30 p.m. |
|
1 | Install Patches on testdg:105223-05 and 105356-09 |
2 | |
3 | Reboot testdg |
4 |
|
PRE-IMPLEMENTATION STEPS - 11:00 pm |
|
5 | Verify the current state of the mirror on proddg:
metastat -s ssa1 | more If any part of the mirror is not OK, there is a problem - STOP until it is resolved |
6 | Note - Disks will be added to tray 1 without
preserving the half-mirror on tray 1,
since the half-mirror on tray 1 will be destroyed shortly in any case. |
7 | Break the Mirror on the SSA (on proddg):
metadetach -s ssa1 d66 d24 Note! This unmirrors the SSA - without saving changes that need to be applied to the half-mirror that was detached. If data is written in this state, the half-mirror is no longer valid. |
8 | Verify the current state of the mirror on proddg:
metastat -s ssa1 | more Save md.tab info in a file cd /etc/opt/SUNWmd metastat -s ssa1 -p > md.tab.halfmirror |
9 | Verify that tray 1 is no longer in use:
iostat 5 5 I/O on 1/md24 should be 0 |
10 | Delete the old d24:
metaclear -s ssa1 d24 |
11 | Shutdown disks in tray 1
ssaadm stop -t 1 c3 |
11B | Verify that tray 2 is functioning
iostat 5 5 I/O on 1/md42 should be non-0 |
12 | Wait for light to go off on tray 1. Add new disks to tray 1 |
13 | Start Disk Access to tray 1
ssaadm start -t 1 c3 |
14 | Enable new disks - on testdg
drvconfig disks Verify that new disks are visible: ls /dev/rdsk/c3t*d4s0 /dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t4d4s0 |
15 | Enable new disks - on proddg
drvconfig disks Verify that new disks are visible: ls /dev/rdsk/c3t*d4s0 /dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t4d4s0 |
16 | add new disks to metaset ssa1 (from proddg):
metaset -s ssa1 -a c3t0d4 c3t1d4 Verify: metaset -s ssa1 | more |
17 | Create 3 slices on disks in tray 1:
slice 7:cylinders 0-1 slice 6 64MB, slice 0 slice 0: balance of disk This can be done with the script ~/reformat |
Use format to verify new paritions. If
not correct, adjust vtoc.ssa and run the following command:
fmthard -s vtoc.ssa /dev/rdsk/c3t0d0s2 /dev/rdsk/c3t0d1s2 /dev/rdsk/c3t0d2s2 /dev/rdsk/c3t0d3s2 /dev/rdsk/c3t0d4s2 /dev/rdsk/c3t1d0s2 /dev/rdsk/c3t1d1s2 /dev/rdsk/c3t1d2s2 /dev/rdsk/c3t1d3s2 /dev/rdsk/c3t1d4s2 |
|
STEPS TO CREATE NEW FILE SYSTEMS - 11:10 pm | |
18 | Create metadevices for new /dg file system:
./init_new_dg |
If the above does not succeed, the following
commands are used:
metainit -s ssa1 d23 1 2 c3t0d0s0 c3t0d1s0 -i 64k (2-column stripe) metainit -s ssa1 d55 -m d23 (one-way mirror of the stripe) metainit -s ssa1 d50 1 1 c3t1d2s6 (1-column stripe for trans device) metainit -s ssa1 d54 -m d50 (one-way mirror for trans device) metainit -s ssa1 d56 -t d55 d54 (Attach trans device to data device) |
|
19 | Create new /dg/dghome/log file system:
./init_new_log |
If the above does not succeed, it can be done
manually:
metainit -s ssa1 d13 1 3 c3t1d2s0 c3t1d3s0 c3t1d4s0 -i 64k (3-column stripe) metainit -s ssa1 d44 -m d13 (one-way mirror of the stripe) metainit -s ssa1 d40 1 1 c3t0d0s6 (1-column stripe for trans device) metainit -s ssa1 d43 -m d40 (one-way mirror for trans device) metainit -s ssa1 d45 -t d44 d43 (Attach trans device to data device) |
|
20 | new /dg/dghome/queue file system:
./init_new_queue |
If the above does not succeed, it can be done
manually:
metainit -s ssa1 d26 1 5 c3t0d2s0 c3t1d0s0 c3t0d3s0 c3t1d1s0 c3t0d4s0 -i 8k (5-column stripe) metainit -s ssa1 d88 -m d26 (one-way mirror of the stripe) metainit -s ssa1 d80 1 1 c3t0d1s6 (1-column stripe for trans device) metainit d87 -m d80 (Create 1-way mirror for trans device) metainit -s ssa1 d89 -t d88 d87 (Attach trans device to data device) |
|
20B | Verify new file systems:
metastat -s ssa1 | more |
21 | Build File Systems. This can be done in parallel,
using 3 separate windows or using background
newfs -m 1 /dev/md/ssa1/rdsk/d56 2>&1 > /tmp/d56.out & (Build new /dg file system) newfs -m 1 /dev/md/ssa1/rdsk/d45 2>&1 > /tmp/d45.out & (Build new log file system) newfs -m 1 /dev/md/ssa1/rdsk/d89 2>&1 > /tmp/d89.out & (Build new queue file system) Follow-up note: the newfs will prompt for confirmation if run at the keyboard, even in background. If the above commands are in a script, there will be no prompting. To respond to the prompts, use Afg@ (foreground), and to put it in the background again, use ^Z and Abg@ (background). |
22 | Mount the new file systems:
mount /dev/md/ssa1/dsk/d56 /a (Temporarily mount new file system) mkdir /a/dghome (Temporarily mount new file system) mkdir /a/dghome/log (Create datagte log directory) mkdir /a/dghome/queue (Create datagate queue directory) mount /dev/md/ssa1/dsk/d45 /a/dghome/log (Temporarily mount new log file system) mount /dev/md/ssa1/dsk/d89 /a/dghome/queue (Temporarily mount new queue file system) |
23 | Copy /dg file system to new file system
space:
cd /a (Change dir to the target directory) ufsdump 0f - /dg | ufsrestore xf - (Dump/restore through a pipe) |
24 | Delete copied queues and logs:
\rm -rf /a/dghome/log/* \rm -rf /a/dghome/queue/* |
25 | Verify that File system contents are complete |
Implementation - 11:30 p.m. |
|
26 | Freeze + Lock High Availability on TESTDG:
- to ensure we don=t fail over
too soon
(Follow-up Note: Since we just rebooted testdg, no action was needed for this step) HAmon (M)anage (F)reeze (a) testdg (L)ock (a) testdg |
27 | Change /etc/vfstab on testdg to include /dg, /dg/dghome/log, /dg/dghome/queue |
28 | On testdg, change ha.env for TAKEOVER_MOUNTS for /dg, /dg/dghome/log, /dg/dghome/queue |
29 | Change backup script on testdg to include /dg/dghome/log and /dg/dghome/queue |
30 | Stop Cron on proddg:
/opt/VRTSfw/bin/startup.d/S99_MMC_160_start_stop_cron stop |
31 | Shut down datagate on proddg - RECORD TIME |
32 | Install ext_prog:
Log in as dg36 rm /dg/3.6/sparc-solaris2.6/bin/ext_prog cd src/dgext make install Verify that /dg/3.6/sparc-solaris2.6/bin/ext_prog has current date Copy to new /a file system: cp /dg/3.6/sparc-solaris2.6/bin/ext_prog /dg/3.6/sparc-solaris2.6/bin |
33 | Wait for all datagate processes to stop |
34 | Copy queues and work directory to new file systems:
(Follow-up note: During implementation, the log directory was Moved after failover, not here) cp -rp /dg/dghome/log/* /a/dghome/log cp -rp /dg/dghome/queue/* /a/dghome/queue Check for new files in work directories: ls -ltr /dg/dghome/work ls -ltr /dg/dghome/tables/lwmra/work ls -ltr /dg/dghome/tables/lwtmsrpt/work If any files have changed since 10pm, copy them, for example: cp /dg/dghome/tables/lwmra/work/* /a/dghome/tables/lwmra/work |
VERIFY THAT DIRECTORY CONTENTS MATCH PRODUCTION DIRECTORY | |
35 | Unmount new file systems (in bottom-up order):
umount /a/dghome/log /a/dghome/queue /a |
36 | IDLE High Availability on proddg (unmount /dg
and release SSA diskset)
HAmon Manage Idle Proddg |
37 | Download SSA patch into SSA - from testdg:
ssaadm download -f /usr/lib/firmware/ssa/ssafirmware c3 Reset the SSA using the reset (ASys OK@) button Another command to accomplish the download is: luxadm download -f /usr/lib/firmware/ssa/ssafirmware c3 |
38 | Verify that disks can be accessed |
39 | Verify that HA is ready to take over on testdg
Failover Production to testdg using HA Failover procedure |
40 | Verify that production is now running on testdg - RECORD TIME |
40B | Mount the old /dg file system:
mount /dev/md/ssa1/dsk/d66 / a |
Move logs from the old file system, before failover, to the new file systems: cd /a/dghome/log mkdir Nov-23 mv *.* Nov-23 mv * /dg/dghome/log & The last command runs in the background and can be checked for completion later. |
|
|
Steps to be done while Production is running on testdg - 11:45 p.m. |
41 | Install SSA patches on proddg - 105223-05, 105356-09 |
42 | |
42B | Verify that the move of the previous logs (from
the old file system to the new file system) is done:
ps -eaf | grep mv ls -l /a/dghome/log Total 0 |
43 | Shutdown disks in tray 2
ssaadm stop -t 2 c3 (This was moved to before the reboot, because step 45 can happen during reboot) |
43B | Verify on testdg that tray 2 is functioning
iostat 5 5 I/O on 1/md42 should be non-0 |
44 | Reboot proddg |
45 | Wait for light to go off on second tray. Add new disks to second tray |
46 | Start Disk Access to tray 2
ssaadm start -t 2 c3 |
46B | Enable Fast-Writes on new SSA disks:
ssaadm fast_write -s -e c3 Verify: ssaadm display c3 |
47 | Enable new disks - on proddg
drvconfig disks Verify that new disks are visible: ls /dev/rdsk/c3t*d4s0 /dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t2d4s0 /dev/rdsk/c3t3d4s0 /dev/rdsk/c3t4d4s0 |
|
Failback - 12:00 am |
48A |
|
48B | Change /etc/vfstab on proddg |
49 | Change ha.env for PRIMARY_MOUNTS: /dg, /dg/dghome/log, /dg/dghome/queue |
50 | Change Backup script on proddg to include /dg/dghome/log and /dg/dghome/queue |
51 | Shut down Datagate running on testdg |
52 | Failback Production onto proddg |
|
Follow-Up |
53 | Enable new disks - on testdg
drvconfig disks Verify that new disks are visible: ls /dev/rdsk/c3t*d4s0 /dev/rdsk/c3t0d4s0 /dev/rdsk/c3t1d4s0 /dev/rdsk/c3t2d4s0 /dev/rdsk/c3t3d4s0 /dev/rdsk/c3t4d4s0 |
54 | On proddg, add new disks to metaset:
metaset -s ssa1 -a c3t2d4 c3t3d4 |
55 | Create 64MB slice in slice 6 disks in tray 2:
./reformat2 |
56 | Use format to verify new paritions. If
not correct, adjust vtoc.ssa and run the following command:
fmthard -s vtoc.ssa /dev/rdsk/c3t2d0s2 /dev/rdsk/c3t2d1s2 /dev/rdsk/c3t2d2s2 /dev/rdsk/c3t2d3s2 /dev/rdsk/c3t2d4s2 /dev/rdsk/c3t3d0s2 /dev/rdsk/c3t3d1s2 /dev/rdsk/c3t3d2s2 /dev/rdsk/c3t3d3s2 /dev/rdsk/c3t3d4s2 /dev/rdsk/c3t4d4s2 /dev/rdsk/c3t5d0s2 |
57 | Create and attach mirror (second tray) to all
components
./init_mirror |
58 | If the above reports errors, create the metadevices
as needed manually:
metainit -s ssa1 d32 1 2 c3t2d0s0 c3t2d1s0 -i 64k metainit -s ssa1 d31 1 3 c3t3d2s0 c3t3d3s0 c3t3d4s0 -i 64k metainit -s ssa1 d62 1 5 c3t2d2s0 c3t3d0s0 c3t2d3s0 c3t3d1s0 c3t2d4s0 -i 8k metainit -s ssa1 d41 1 1 c3t2d0s6 (1-column stripe for trans device) metainit -s ssa1 d51 1 1 c3t3d2s6 (1-column stripe for trans device) metainit -s ssa1 d81 1 1 c3t2d1s6 (1-column stripe for trans device) |
59 | Attach the new metadevices to the half-mirrors
created earlier
metaattach -s ssa1 d43 d41 metaattach -s ssa1 d54 d51 metaattach -s ssa1 d87 d81 metaattach -s ssa1 d44 d31 metaattach -s ssa1 d55 d32 metaattach -s ssa1 d88 d62 |
60 | Verify that all the mirrors are resyncing:
metastat -s ssa1 | more |
61 | Follow-up later to confirm re-syncing (4 hours) |
Note! This procedure does not include creation and associating of hot
spare pools for all devices!
That can be done on the fly at a later time, using metatool -s ssa1