SunFire 480, booting problem

From: Mumdziev, Marijan (marijan.mumdziev@siemens.com)
Date: Wed Jan 24 2007 - 12:33:25 EST


Hello SunManagers,

        I have a hardware problem with SunFire 480 machine (4 CPUs, and
2GB RAM). For some reason, machine can not boot the OS Solaris anymore.
Error message I'm getting is "Can't locate boot device". Of course, I
have two disks which are mirrored; therefore I have exchanged their
slots. Nevertheless, no result.

1) When I try "boot" command from ok prompt I get following:
        ok> boot
        Initializing 12MB of memory at addr b0ff000000
        Initializing 4080MB of memory at addr b000000000
        Initializing 4GB of memory at addr a000000000
        Boot device: net File and args:
        Can't locate boot device

2) Second approach:
        ok> boot disk
        Boot device: /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0 File and
args:
        Can't locate boot device

3) The same I get for:
        ok>boot disk1
        ok>boot disk0
        ok>boot /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
        ...
        ok>boot cdrom

4) Additional error message that I get:
        Probing /pci@8,600000 Device 1 Corrected ECC Error
        {3} ok Corrected ECC Error

        In addition to everything, now I have changed a diag-level to
max and it solved a problem partially. Boot command works in general but
disks do not seem to be bootable anymore. I get following errors:

        A) Boot device: /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0 File
and args:
           The file just loaded does not appear to be executable.

        B) mount: I/O error
           mount: cannot mount /dev/dsk/c1t0d0s5
               /sbin/rcS: /usr/bin/loadkeys: not found
           WARNING: /proc could not be mounted
           /sbin/swapadd: expr: not found
           /sbin/swapadd: swap: not found

           WARNING - /usr/sbin/fsck not found. Most likely the
           mount of /usr failed or the /usr filesystem is badly
           damaged. The system is being halted. Either reinstall
           the system or boot with the -b option in an attempt
           to recover.

        At least "boot cdrom -s" can be executed now. Anyhow, do any of
you guys know what might be the problem here as I need to repair this
machine quickly? Of course, long-term solution is needed. I do not want
to setup the whole system which would appear unstable afterwards.

Thank you very much in advance for your efforts. I will summarize the
solution/conclusion as soon as I have one.

Kind regards,
Marijan Mumdziev

P.S. Please scroll down for additional symptoms.
____________________________
SIEMENS
Siemens d.d. - Program and System Engineering
Siemensstrasse 92, AT-1210 Wien, Austria

Marijan Mumdziev
OSS/BSS Project Integration / Emergency Service

mobile +43 67651212 33
tel +43 (0)51707 24354
mailto: marijan.mumdziev@siemens.com

Additional information for troubleshooting:
===========================================

Devalias output:
----------------
ok> devalias
disk1 /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@1,0
disk0 /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
disk /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
ide /pci@8,700000/ide@6
scsi /pci@9,600000/SUNW,qlc@2
cdrom /pci@8,700000/ide@6/cdrom@0,0:f
net /pci@9,700000/network@2
net1 /pci@9,600000/network@1
net0 /pci@9,700000/network@2
flash /pci@9,700000/ebus@1/flashprom@0,0
idprom /pci@9,700000/ebus@1/i2c@1,2e/idprom@4,a4
nvram /pci@9,700000/ebus@1/i2c@1,2e/nvram@4,a4
i2c1 /pci@9,700000/ebus@1/i2c@1,30
i2c0 /pci@9,700000/ebus@1/i2c@1,2e
bbc /pci@9,700000/ebus@1/bbc@1,0
rsc-console /pci@9,700000/ebus@1/rsc-console@1,3083f8
rsc-control /pci@9,700000/ebus@1/rsc-control@1,3062f8
ttya /pci@9,700000/ebus@1/serial@1,400000:a
pci9b /pci@9,700000
pci9a /pci@9,600000
pci8b /pci@8,700000
pci8a /pci@8,600000

Printenv output:
----------------
Variable Name Value Default Value

test-args
diag-passes 1 1
local-mac-address? false false
fcode-debug? false false
silent-mode? false false
scsi-initiator-id 7 7
oem-logo No default
oem-logo? false false
oem-banner No default
oem-banner? false false
ansi-terminal? true true
screen-#columns 80 80
screen-#rows 34 34
ttya-rts-dtr-off false false
ttya-ignore-cd true true
ttya-mode 9600,8,n,1,- 9600,8,n,1,-
output-device ttya ttya
input-device ttya ttya
auto-boot-on-error? false false
load-base 16384 16384
auto-boot? true true
boot-command boot boot
diag-file
diag-device disk net net
boot-file
boot-device disk net disk net
use-nvramrc? false false
nvramrc
security-mode none No default
security-password No default
security-#badlogins 0 No default
diag-out-console false false
post-trigger error-reset power-on-res ... error-reset
power-on-res ...
diag-script normal normal
diag-level min min
diag-switch? false false
obdiag-trigger error-reset power-on-res ... error-reset
power-on-res ...
error-reset-recovery boot boot

test-all output:
----------------
ok> test-all
Testing /pci@9,600000/SUNW,qlc@2
Testing /pci@9,600000/network@1
Testing /pci@9,700000/network@2
Testing /pci@9,700000/usb@1,3
Testing /pci@9,700000/ebus@1
Testing /pci@9,700000/ebus@1/serial@1,400000
Testing /pci@9,700000/ebus@1/rsc-control@1,3062f8
Testing /pci@9,700000/ebus@1/pmc@1,300700
Testing /pci@9,700000/ebus@1/rtc@1,300070
Testing /pci@9,700000/ebus@1/i2c@1,30
Testing /pci@9,700000/ebus@1/i2c@1,2e
Testing /pci@9,700000/ebus@1/bbc@1,0
Testing /pci@9,700000/ebus@1/flashprom@0,0
Testing /pci@8,600000/pci@2/scsi@5
No targets found
Testing /pci@8,600000/pci@2/scsi@4
No targets found
Testing /pci@8,600000/pci@1/SUNW,qfe@3,1
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check -- Using Onboard Transceiver - Timeout waiting for
AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
AutoNegotiation Timeout.
Check Cable or Contact your System Administrator.
Link Down.
failed
Doing more loopback tests -- Did not receive expected loopback packet
failed
/pci@8,600000/pci@1/SUNW,qfe@3,1 selftest failed, return code = -1
Testing /pci@8,600000/pci@1/SUNW,qfe@2,1
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check -- Using Onboard Transceiver - Timeout waiting for
AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
AutoNegotiation Timeout.
Check Cable or Contact your System Administrator.
Link Down.
failed
Doing more loopback tests -- Did not receive expected loopback packet
failed
/pci@8,600000/pci@1/SUNW,qfe@2,1 selftest failed, return code = -1
Testing /pci@8,600000/pci@1/SUNW,qfe@1,1
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check -- Using Onboard Transceiver - Timeout waiting for
AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
AutoNegotiation Timeout.
Check Cable or Contact your System Administrator.
Link Down.
failed
Doing more loopback tests -- Did not receive expected loopback packet
failed
/pci@8,600000/pci@1/SUNW,qfe@1,1 selftest failed, return code = -1
Testing /pci@8,600000/pci@1/SUNW,qfe@0,1
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check -- Using Onboard Transceiver - Timeout waiting for
AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
Timeout waiting for AutoNegotiation Status to be updated.
Timeout reading Link status. Check cable and try again.
AutoNegotiation Timeout.
Check Cable or Contact your System Administrator.
Link Down.
failed
Doing more loopback tests -- Did not receive expected loopback packet
failed
/pci@8,600000/pci@1/SUNW,qfe@0,1 selftest failed, return code = -1
Testing /pci@8,700000/ide@6
Testing /pci@8,700000/SUNW,m64B@4

Display not installed

Test hardware registers - passed Ok
Test RamDAC - passed Ok
Test Frame buffer - passed Ok

Additional output:
------------------
...
0>Diag level set to MAX.
...
0>INFO: 512MB Bank 0
0>INFO: 512MB Bank 1
0>INFO: 512MB Bank 2
0>INFO: 512MB Bank 3

Probing Corrected ECC Error
Can't locate boot device
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:41:31 EDT