boot hangs after patch cluster, but works if start things manually from single-user mode

From: Jeffrey P. Elliott (jpelliot@umd.umich.edu)
Date: Wed Jun 11 2003 - 11:24:23 EDT


Hello All,

This past weekend, we applied the latest 8_Recommended cluster to an
E220R (which appeared to be an original Sol 8 install, and had never
been patched before - lucky me). After the installation and reboot, the
system hangs after checking the filesystems, i.e.

...
/dev/dsk/md/d20 is clean
/dev/dsk/md/d30 is stable

and just stops here. The longest I let it go was probably 20 mintues,
just to see if it would eventually do anything. If we boot into
single-user mode, and start up all of the things we need by hand,
however, the system works just fine, as do all services. (it's a
Real/Helix streaming server).

I'm guessing that there is probably an issue with an rc script, since I
can mount the file systems and start services by hand, including an NFS
mount. I'm not familiar enough with the boot sequence to know exactly
the route to take from rcS to rc2 (or even rc3) to have walked through
the required scripts.

I don't know if this will help, but here is the vfstab, just in case
(and yes, I am also not a fan of these mount points, but I inherited the
box :) :

fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c0t0d0s1 - - swap - no -
/dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no -
/dev/md/dsk/d10 /dev/md/rdsk/d10 /var ufs 2 yes -
/dev/md/dsk/d20 /dev/md/rdsk/d20 /usr/local/ ufs 3
yes -
/dev/md/dsk/d30 /dev/md/rdsk/d30 /usr/local/RealServer/Content/
ufs 4 yes -
swap - /tmp tmpfs - yes -
nfs.host:/home - /home nfs - yes
soft,quota,bg

I'm wondering if anyone might have an idea, based on where the boot is
hanging, which scripts I can check for problems. I realize that there
could be mounting issues with the /usr/local items if done out of order
- however, the boot sequence shows them being checked in order, so I am
assuming (incorrectly, maybe?) that they would be mounted in that order.
(and again, they mount fine by hand).

Oh, I should also mention that it appears that some services are
starting, as the box will respond to a ping from a different subnet, so
it must be getting route/network. dmesg confirms this. So does this
indicate that parts of the system are hitting rc2.d/S69inet and
S72inetsvc? it never makes it to any of the other network related
services, tho, such as ssh or the helix server.

It also shows a dump to swap that I am unsure about.

Jun 7 23:04:36 nova genunix: [ID 936769 kern.info] hme0 is
/pci@1f,4000/network@1,1
Jun 7 23:04:40 nova hme: [ID 517527 kern.info] SUNW,hme0 : Internal
Transceiver Selected.
Jun 7 23:04:40 nova hme: [ID 517527 kern.info] SUNW,hme0 : 100 Mbps
Full-Duplex Link Up
Jun 7 23:04:42 nova genunix: [ID 454863 kern.info] dump on
/dev/dsk/c0t0d0s1 size 1000 MB

Any helpful pointers/suggestions/ideas appreciated.

Thanks
jef
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:34 EDT