Error on reboot

From: Karen R McArthur (kmcarthu@bates.edu)
Date: Fri Oct 15 2004 - 09:54:03 EDT


2 x DS20E cluster (with SAN storage)
Tru64 5.1b
         Patches installed on the system came from following patch kits:
         --------------------------------------------------------------

         - T64KIT0020545-V51BB24-E-20031104 OSF540
         - T64V51BB24AS0003-20030929 OSF540
         - T64V51BB24AS0003-20030929 TCR540
         - TCRKIT0020500-V51BB24-20031029 TCR540

The 2 small internal drives are only local storage and some swap (4 GB).
  Boot devices are on the SAN. Some swap (1.57 GB) is on the SAN.

During reboot of node 2, just after the node has declared itself up and
is Waiting for cluster mount to complete, we received multiple "SCSI0:
SCSI Bus reset; cam_logger: SCSI event packet; cam_logger: bus0; itpsa
SCSI HBA; SCSI Bus was reset" followed by the following errors:

drd_get_disk_attributes(180): ksm_get attributes failed 19
drd_handle_unconfiguted disk: Cannot setup server for disk 180. Error: 19
drd_get_disk_attributes(180): ksm_get_attributes failed 19

I assumed this was a problem with our internal disks, so I halted the
system, removed the disks and tried a reboot. No SCSI bus reset, but
the system complained of missing swapon devices (expected, since some
swap was on those drives), then stopped with the same errors as above.

The boot process appears to hang at that point - we left it there all
night - approximately 13 hours.

Node 1 boots and mounts all disks with no problems. All services are
running and stable.

As this happens so early in the boot process, we cannot get to
single-user mode to diagnose this isssue.

Any ideas?

-- 
Karen R. McArthur, Systems Administrator
Bates College, Information and Library Services
Lewiston, Maine 04240
(207) 786-8236 fax:(207) 786-6057
kmcarthu@bates.edu




This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:09 EDT