"kernel stack not valid halt" / CDROM device name corruption problem

From: Iain Barker (ibarker@aastra.com)
Date: Mon Mar 01 2004 - 14:21:55 EST


I have a problem on a DS10L while attempting to boot the Tru64 5.1b install CDROM.
If I boot from the hard disk, the CDROM drive works OK when mounted from Tru64.

When I power on the system, the CD drive is reported correctly by 'show dev' at SRM:

        dqb0.0.1.13.0 DQB0 CD-224E 9.5B

If I try to boot the GENERIC 5.1b boot-linked kernel from CDROM, I get an error "kernel stack not valid halt":

>>>boot dqb0 -fl a -fi GENERIC
        (boot dqb0.0.1.13.0 -file GENERIC -flags a)
        block 0 of dqb0.0.1.13.0 is a valid boot block
        reading 15 blocks from dqb0.0.1.13.0
        bootstrap code read in
        base = 2c0000, image_start = 0, image_bytes = 1e00(7680)
        initializing HWRPB at 2000
        initializing page table at 1ffee000
        initializing machine state
        setting affinity to the primary CPU
        jumping to bootstrap code
                                                                                        
        UNIX boot - Wednesday October 16, 2002
                                                                                        
        Loading GENERIC ...
        Loading at fffffc0000310000
        Linking 205 objects: 205
        halted CPU 0
                                                                                        
        halt code = 2
        kernel stack not valid halt
        PC = 0

Now if I do 'show dev' again, I get a weird corruption of the device name:

        dqb0.0.1.13.0 DQB0 CD/224G " " " " " " " " ;.7B" "

If I do an 'init' after the problem has occurred, I get a subsequent self-test failure:

        Testing the Disks (read only)
                                                                                
        *** Hard Error - Error #8 -
        Diagnostic Name ID Device Pass Test Hard/Soft 1-JAN-2000
        exer_kid 00000317 dqb0.0.1.13.0 0 0 1 0 12:00:01
        Buffer counts differ - buf1:0, buf2:512, location:2a00
                                                                                
        *** End of Error ***
                                                                                

I thought maybe this was a problem with the CDROM drive, so I changed it for a drive on another system that works OK, but the problem still remains. I've also tried changing the drive ribbon cable but that didn't help either.

The only way to get back to a 'valid' device name is to cycle the power supply.

Any ideas? I'm wondering if this is a fault on the motherboard.

thanks,
        Iain



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:52 EDT