System crash: invalid memory read access from kernel

From: Florent Boucher (Florent.Boucher@cnrs-imn.fr)
Date: Wed Nov 26 2003 - 10:15:27 EST


Dear Managers,
we have just upgraded the memory on a alpha computer DS20 (2x500MHz)
running 4.0F. And now every two or three days we have a crash with the
following error from the uerf command:

********************************* ENTRY 6.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 316.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Nov 26 13:08:37 2003
OCCURRED ON SYSTEM cubitus
SYSTEM ID x00080022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
MESSAGE panic (cpu 0): kernel memory fault

I was able to read from the crash data the following information:

trap: invalid memory read access from kernel mode

    faulting virtual address: 0x00000000000001e9
    pc of faulting instruction: 0xfffffc000025a1d8
    ra contents at time of fault: 0xfffffc0000878750
    sp contents at time of fault: 0xfffffffe8f67f2d8

panic (cpu 0): kernel memory fault

Does any body can help us to deeper understand what is happening on the
system. Is it a problem of memory or a software problem?
Thanks a lot for your help.
With my best regards
Florent Boucher

PS: Please find below more details on the crash

#
# Crash Data Collection (Version 1.4)
#
_crash_data_collection_time: Wed Nov 26 15:02:06 MET 2003
_current_directory: /
_crash_kernel: /var/adm/crash/vmunix.15
_crash_core: /var/adm/crash/vmzcore.15
_crash_arch: alpha
_crash_os: Digital UNIX
TruCluster Software
_host_version: Digital UNIX V4.0F (Rev. 1227); Tue May 21 15:11:23 MET
DST 2002
TruCluster Software V1.6-12 (Rev. 225); 04/05/99 15:11
_crash_version: Digital UNIX V4.0F (Rev. 1227); Tue May 21 15:11:23 MET
DST 2002
TruCluster Software V1.6-12 (Rev. 225); 04/05/99 15:11

_crashtime: struct {
    tv_sec = 1069848517
    tv_usec = 388787
}
_boottime: struct {
    tv_sec = 1069661446
    tv_usec = 779696
}
_config: struct {
    sysname = "OSF1"
    nodename = "cubitus"
    release = "V4.0"
    version = "1227"
    machine = "alpha"
}
_cpu: 57
_system_string: 0xffffffffff800b30 = "AlphaServer DS20 500 MHz"
_ncpus: 2
_avail_cpus: 2
_partial_dump: 1
_physmem(MBytes): 2303
_panic_string: 0xfffffc000080bc20 = "kernel memory fault"
_paniccpu: 0
_panic_thread: 0xfffffc007461ca80
_preserved_message_buffer_begin:
struct {
    hdr = struct {
        msg_magic = 0x880524
        msg_bufx = 0x10c8
        msg_bufr = 0xed1
        msg_size = 0x3fe0
    }
    msg_bufc = "Alpha boot: available memory from 0x356e000 to 0x8fffa000
Digital UNIX V4.0F (Rev. 1227); Tue May 21 15:11:23 MET DST 2002
physical memory = 2304.00 megabytes.
available memory = 2249.82 megabytes.
using 8836 buffers containing 69.03 megabytes of memory
Master cpu at slot 0.
Firmware revision: 6.5-13
PALcode: Digital UNIX version 1.92-74
AlphaServer DS20 500 MHz
pci1 at nexus
isp0 at pci1 slot 7
isp0: QLOGIC ISP1040B/V2
isp0: Firmware revision 5.57 (loaded by console)
isp0: Fast RAM timing enabled.
scsi0 at isp0 slot 0
rz5 at scsi0 target 5 lun 0 (LID=0) (DEC RRD47 (C) DEC 1206)
tu0: DECchip 21140: Revision: 2.2
tu0: auto negotiation capable device
tu0 at pci1 slot 9
tu0: DEC TULIP (10/100) Ethernet Interface, hardware address:
00-00-F8-07-C3-BF
tu0: auto negotiation off: selecting 100BaseTX (UTP) port: full duplex
gpc0 at isa0
PCI device at bus 0, slot 8, function 0 could not be configured:
Vendor ID 0x14e4, Device ID 0x16a7, Base class 0x2, Sub class 0x0
Sub-VID 0xe11 Sub-DID 0x601b
    has no matching entry in the PCI option table
pci0 at nexus
isa0 at pci0
gpc1 not probed
gpc1 not probed
ace0 at isa0
ace1 at isa0
lp0 at isa0
fdi0 at isa0
fd0 at fdi0 unit 0
ata0 at pci0 slot 105 (slot 5, function 1)
ata0: CYPRESS 82C693
scsi1 at ata0 slot 0
ata1 at pci0 slot 205 (slot 5, function 2)
ata1: CYPRESS 82C693
scsi2 at ata1 slot 0
usb0 at pci0 slot 305 (slot 5, function 3)
isp1 at pci0 slot 7
isp1: QLOGIC ISP1040B/V2
isp1: Firmware revision 5.57 (loaded by console)
isp1: Fast RAM timing enabled.
scsi3 at isp1 slot 0
rz24 at scsi3 target 0 lun 0 (LID=1) (DEC RZ1CB-CA (C) DEC LYJ0)
(Wide16)
rz26 at scsi3 target 2 lun 0 (LID=2) (DEC RZ1CB-CS (C) DEC 0844)
(Wide16)
rz27 at scsi3 target 3 lun 0 (LID=3) (DEC RZ2DA-LA (C) DEC N1H1)
(Wide16)
rz28 at scsi3 target 4 lun 0 (LID=4) (SEAGATE ST318406LC 010A)
(Wide16)
tz29 at scsi3 target 5 lun 0 (LID=5) (DEC TLZ10 (C) DEC 04a8)
mchan0: Module revision = 34
mchan0: jumpered as HUB configuration
mchan0 at pci0 slot 9
Created FRU table binary error log packet
lvm0: configured.
lvm1: configured.
kernel console: ace0
dli: configured
ATM Subsystem configured with 2 restart threads
ATM IFMP: configured
clubase: configured
dlmsl: configured
drd: configured.
cnxagent: configured
dlm: configured.
ATMUNI: configured
ATMSIG: 3.x (module=uni3x) configured
ILMI: 3.x (module=ilmi) configured
ATM IP: configured
ATM LANE: configured.
i2c: Server Management Hardware Present
ADVFS: using 21039 buffers containing 164.36 megabytes of memory
vm_swap_init: warning /sbin/swapdefault swap device not found
vm_swap_init: swap is set to lazy (over commitment) mode
Starting secondary cpu 1
rm_sw_init: begin MC initialization.
rm_boot_am_i_alone: entered
checking for existing memory channel nodes
rm_slave_init
slave unit boot phase 0: checking cables
slave unit boot phase 1: request data ...
slave unit boot phase 2: get lock data from all nodes
slave unit boot phase 3: update request ...
memory channel software inited - node 0 on mc0
memory channel - adding node 2
ccomsub: configured
mcnet: configured
MEMORY CHANNEL API - initializing
Environmental Monitoring Subsystem Configured.
chk_bf_quota: user quota underflow for user 402 on fileset /
memory channel - removing node 2
rm_remove_node: removal took 0x0 ticks
MEMORY CHANNEL API - node 2 has left the cluster
MEMORY CHANNEL API - cleaning up after node 2
ccomsub: Successfully reconfigured for member 2 down
ccomsub: state change detected by this node via callback
memory channel request from node 2
memory channel update request from node 2
memory channel - adding node 2
MEMORY CHANNEL API - node 2 has joined the cluster
chk_bf_quota: group quota underflow for group 7 on fileset /
chk_bf_quota: group quota underflow for group 7 on fileset /

trap: invalid memory read access from kernel mode

    faulting virtual address: 0x00000000000001e9
    pc of faulting instruction: 0xfffffc000025a1d8
    ra contents at time of fault: 0xfffffc0000878750
    sp contents at time of fault: 0xfffffffe8f67f2d8

panic (cpu 0): kernel memory fault
syncing disks... device string for dump = SCSI 0 7 0 0 0 0 0.
DUMP.prom: dev SCSI 0 7 0 0 0 0 0, block 300000
device string for dump = SCSI 0 7 0 0 0 0 0.
DUMP.prom: dev SCSI 0 7 0 0 0 0 0, block 300000
"
}

-- 
 --------------------------------------------------------------------------
| Florent BOUCHER                    |                                     |
| Institut des Matériaux Jean Rouxel | Mailto:Florent.Boucher@cnrs-imn.fr  |
| 2, rue de la Houssinière           | Phone: (33) 2 40 37 39 24           |
| BP 32229                           | Fax:   (33) 2 40 37 39 95           |
| 44322 NANTES CEDEX 3 (FRANCE)      | http://www.cnrs-imn.fr              |
 --------------------------------------------------------------------------


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:45 EDT