Help troubleshooting bare-metal restore procedure.

From: Kevin Counts (counts@digicat.org)
Date: Mon Feb 23 2004 - 16:08:50 EST


I am having an issue with a bare-metal recovery procedure
that I am trying to ready asap (before the system
goes live) and I would appreciate any help diagnosing what is going wrong.

In summary, the recovery appears to work successfully
however on reboot, the kernel hangs some point after
parsing the /etc/system file (I see it forceload the md modules).

The last line reported w/ boot -v:

cpu1: SUNW,UltraSPARC-III+ (upaid 1 impl 0x15 ver 0x23 clock 1015 MHz)

(full log below)

--
I have a jumpstart image created with NetBackup and a custom
shell script installed which I am using to automate a bare-metal
recovery procedure for a SunFire 280R. 
The procedure starts using the "boot net - recover" from the ok prompt.
This is a custom changed boot image per Sun BluePrints
(http://www.sun.com/solutions/blueprints/0802/816-7587-10.pdf).
Once it boots up I run the recover script I wrote 
(attached at the bottom of this message) which:
 - prtvtoc's the 36GB HD
 - formats the slices
 - mounts them under /a
 - recovers /, /var, and /opt from backup 
 - undoes disksuite 
 - installs a bootblock
Everything appears to recover correctly and I mounted the 
slices after the recovery to do a basic sanity check and
make sure /, /var/, and /opt were populated. 
To help eliminate a bad copy of the backup I repeated the
procedure w/ a backup two weeks prior and encountered
the same hangup on reboot.
Is there any steps anyone can suggest to further diagnose
what is happening when it hangs when booting? Anyone
seen anything like this? 
I want to mention that the files being restored are
backed up from a disk that I verify is booting correctly.
(I have split off the mirror and am keeping one of
 the disks with the os loaded on my desk while I test
 with the other disk - which I wiped clean before doing
 this).
Thank you,
Kevin Counts
--
Here is the console capture of the boot process displaying where
the system is hanging:
{0} ok boot -v
Boot device: /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@0,0  File and args: -v
Size: 348576+90674+76674 Bytes
SunOS Release 5.8 Version Generic_108528-23 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Ethernet address = 0:3:ba:29:b2:72
NOTICE: socal: 64-bit driver module not found
NOTICE: socal: 64-bit driver module not found
mem = 4194304K (0x100000000)
avail mem = 4118323200
root nexus = Sun Fire 280R (2 X UltraSPARC-III+) 
pcisch0 at root: SAFARI 0x8 0x700000
pcisch0 is /pci@8,700000
pcisch1 at root: SAFARI 0x8 0x600000
pcisch1 is /pci@8,600000
PCI-device: SUNW,qlc@4, qlc0
qlc0 is /pci@8,600000/SUNW,qlc@4
fp0 is /pci@8,600000/SUNW,qlc@4/fp@0,0
ssd1 at fp0: name w21000004cfe9a3ca,0, bus address ef
ssd1 is /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cfe9a3ca,0
        <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cfe9a3ca,0 (ssd1) online
root on /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w21000004cfe9a3ca,0:a fstype ufs
WARNING: forceload of misc/md_trans failed
WARNING: forceload of misc/md_raid failed
WARNING: forceload of misc/md_hotspares failed
WARNING: forceload of misc/md_sp failed
WARNING: forceload of misc/md_stripe failed
WARNING: forceload of misc/md_mirror failed
PCI-device: ebus@5, ebus0
todds12870 at ebus0: offset 1,300070
todds12870 is /pci@8,700000/ebus@5/rtc@1,300070
mc-us30 at root: SAFARI 0x0 0x400000 ...
mc-us30 is /memory-controller@0,400000
mc-us31 at root: SAFARI 0x1 0x400000 ...
mc-us31 is /memory-controller@1,400000
se0 at ebus0: offset 1,400000
se0 is /pci@8,700000/ebus@5/serial@1,400000.cpu0: SUNW,UltraSPARC-III+ (upaid 0 impl 0x15 ver 0x23 clock 
1015 MHz)
cpu1: SUNW,UltraSPARC-III+ (upaid 1 impl 0x15 ver 0x23 clock 1015 MHz)
 -------------  [SYSTEM HANGS HERE] ----------------
Here is the script:
#!/bin/sh
#------------------------------------------------------------------------
# $Id$
#------------------------------------------------------------------------
# Custom script to restore egate2 (run from jumpstart recovery image).
#-------------------------------------------------------------------------
/usr/sbin/fmthard -s - /dev/rdsk/c1t0d0s2 <<EOF
       0      2    00          0   8389656   8389655
       1      3    01    8389656   8389656  16779311
       2      5    00          0  71127180  71127179
       3      7    00   16779312  16779312  33558623
       4      0    00   33558624  37516554  71075177
       6      0    00   71075178     26001  71101178
       7      0    00   71101179     26001  71127179
EOF
echo "y" | /usr/sbin/newfs /dev/rdsk/c1t0d0s0
echo "y" | /usr/sbin/newfs /dev/rdsk/c1t0d0s3
echo "y" | /usr/sbin/newfs /dev/rdsk/c1t0d0s4
/usr/sbin/fsck /dev/rdsk/c1t0d0s0
/usr/sbin/fsck /dev/rdsk/c1t0d0s3
/usr/sbin/fsck /dev/rdsk/c1t0d0s4
mount /dev/dsk/c1t0d0s0 /a
mkdir -p /a/var
mkdir -p /a/opt
mount /dev/dsk/c1t0d0s3 /a/var
mount /dev/dsk/c1t0d0s4 /a/opt
#------------------------------------------------------------------------
server=veritas
log=/var/tmp/bprestore.log
rename=/var/tmp/bprestore.rename
filelist=/var/tmp/bprestore.filelist
extra_opt="-e 1/02/2004 -C egate2"
#  extra_opt="-C egate2"
cat <<EOF > ${filelist}
/
!/egate
EOF
cat <<EOF > ${rename}
change / to /a
EOF
cat /dev/null > ${log}
cat <<EOF
--------------------------------------------------------------------
 Running bprestore in foreground.                                   
 View logfile: $log in another login session for status.
 
 (A message will appear in this window when the restore is complete)
--------------------------------------------------------------------
EOF
echo \
/usr/openv/netbackup/bin/bprestore -w                \
                                   -S ${server}      \
                                   -L ${log}         \
                                   -R ${rename}      \
                                   ${extra_opt}      \
                                   -f ${filelist}
/usr/openv/netbackup/bin/bprestore -w                \
                                   -S ${server}      \
                                   -L ${log}         \
                                   -R ${rename}      \
                                   ${extra_opt}      \
                                   -f ${filelist}
#-------------------------------------------------------------------------
# Make excluded /egate mountpoint
#-------------------------------------------------------------------------
mkdir -p /a/egate
#-------------------------------------------------------------------------
# Unconfigure disksuite mirror
#-------------------------------------------------------------------------
mv /a/etc/lvm/mddb.cf /a/etc/lvm/mddb.cf.bak
sed -e 's!md/!!g'         \
    -e 's!d10!c1t0d0s0!g' \
    -e 's!d20!c1t0d0s1!g' \
    -e 's!d30!c1t0d0s3!g' \
    -e 's!d40!c1t0d0s4!g' \
/a/etc/vfstab > /a/etc/vfstab.tmp
cp /a/etc/vfstab     /a/etc/vfstab.bak
cp /a/etc/vfstab.tmp /a/etc/vfstab
sed -e '/^rootdev/ s/^/*/' \
    -e '/^set md/  s/^/*/' \
/a/etc/system > /a/etc/system.tmp
cp /a/etc/system     /a/etc/system.bak
cp /a/etc/system.tmp /a/etc/system
umount /a/var
umount /a/opt
umount /a
/usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0
echo "--------------------------------------------------------------------"
echo " Restore complete - type \"reboot\" to reboot the system."
echo "--------------------------------------------------------------------"
#-------------------------------------------------------------------------
# End.
#-------------------------------------------------------------------------
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:06 EDT