Re: Server is hanging - Apparent INITTAB problem

From: Bill Verzal (Bill_Verzal@BCBSIL.COM)
Date: Mon Sep 09 2002 - 16:47:28 EDT


Perhaps a hung NFS mount ?
--------------------------------------------------------------------------------------------------------

Bill Verzal
Technical Consultant
Forbes Technical Consulting
(312) 653-3684
bill_verzal@bcbsil.com
MailStop: 27.201C

                    "Theresa
                    Sarver" To: aix-l@Princeton.EDU
                    <IFMC.tsarver@ cc:
                    SDPS.ORG> Subject: Server is hanging - Apparent INITTAB problem
                    Sent by: "IBM
                    AIX Discussion
                    List"
                    <aix-l@Princet
                    on.EDU>

                    09/09/2002
                    03:01 PM
                    Please respond
                    to "IBM AIX
                    Discussion
                    List"

Hi all;

I've got an S80 running AIX 4.3.3 ML 10 that has decided it doesn't want to
finish booting.

A little background...Due to the specifics of this contract I am going
through a pretty rigerous audit trying to get this server "certified"...in
doing so I've had to make TONS of changes from last Wednesday afternoon
through Friday (when the auditors arrived). I've had to disable services,
modifiy group/owner/permissions on system files, enable system auditing,
enable syslogging, and some other stuff that I'm sure I've forgot. All the
changes I've made are documented, and I do have mksysb's up the wazooo I
can restore back to if it comes down to that. But I hope it doesn't.

Anyway, after varying on all VG's and mounting all filesystems, the last
thing I see is "Multi User Initialization Complete" (last line of /etc/rc
script)...and it just sits there. Assuming it is walking down the inittab
(and as I have no /etc/firstboot file) it appears to be hanging on the
srcmstr. Though I'm not real sure why it would be hanging here? I can
ping the box, but I can't telnet/ftp as the tcpip daemons aren't loaded.

This first happened Firday night, I finally had to reboot the box into
maint mode, remove all but the "brc", "init", and "cons" lines out of the
inittab and then the server booted just fine. I then manually executed
everything else in the inittab and I encountered NO problems. NOTHING
hung...NOTHING errored out. Also, no filesystems were full and there were
no errors to speak of in the errpt or the bootlog. I rebooted after I
verified everything was up and running and the server REBOOTED JUST FINE!
?????

On Saturday I had to disable SNMPd as well as all associated daemons dpid2
and muxatmd (not using ATM or SNMP), I also had to comment the 2 ATM lines
out of the inittab. Oh, and I had to move nfs-mountd onto a reserved port.
The server is scheduled to reboot every Monday 5AM - this morning I came in
and it was hung at the same spot. So I did a repeat of Friday night and it
worked just fine. Though I haven't rebooted a second time to see if it
would come back up.

If anyone has any insight into what might be going on I'd sure appreciate
it.
Thanks;
Theresa

INITTAB:

init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
Power
 Failure Detection
:mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
:atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
load64bit:2:wait:/etc/methods/cfg64 >/dev/console 2>&1 # Enable 64-bit
execs
rc:2:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
fbcheck:2:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run
/etc/fi
rstboot
srcmstr:2:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:2:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
adsmsmext:2:wait:/etc/rc.adsmhsm > /dev/console 2>&1 # TSM SpaceMan
rcnfs:2:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:2:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pio/etc/pioinit >/dev/null 2>&1 # pb cleanup
qdaemon:2:wait:/usr/bin/startsrc -sqdaemon
writesrv:2:wait:/usr/bin/startsrc -swritesrv
uprintfd:2:respawn:/usr/sbin/uprintfd
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
pmd:2:wait:/usr/bin/pmd > /dev/console 2>&1 # Start PM daemon
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
httpdlite:2:once:/usr/IMNSearch/httpdlite/httpdlite -r
/etc/IMNSearch/httpdlite/
httpdlite.conf & >/dev/console 2>&1
imnss:2:once:/usr/IMNSearch/bin/imnss -start imnhelp >/dev/console 2>&1
imqss:2:once:/usr/IMNSearch/bin/imq_start >/dev/console 2>&1
sybase:2:wait:su - sybase -c /stars/sybase/11.9.2/install/sybase start 2>&1
dt:2:wait:/etc/rc.dt
autoacs:2:once:/usr/tivoli/tsm/devices/bin/rc.acs_ssi quiet >/dev/console
2>&1 #
Start the ssi agent
cons:0123456789:respawn:/usr/sbin/getty /dev/console
autosrvr:2:respawn:/usr/tivoli/tsm/server/bin/rc.adsmserv >/dev/console 2>
&1
adsm:2:respawn:/usr/bin/dsmc sched > /dev/null 2>&1 # TSM scheduler
tty0:2:off:/usr/sbin/getty /dev/tty0
tty1:2:off:/usr/sbin/getty /dev/tty1



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:11 EDT