SUMMARY: ES40 Server Crash

From: Copper, Steve (scopper@westernpower.co.uk)
Date: Mon Oct 13 2003 - 06:43:43 EDT


Apologies for the late summary, but it has taken this long to nearly
determine where the problem lies.

Thanks to Kevin Raubenolt, Dr. Thomas Blinn, Mike Kirkland and Peter Jakobs
on this one who all pointed that it is either memory, cpu or backplane
problems.

After doing tests (including memexer_mp) from the boot prompt it is looking
likely that there is a memory problem with some of the DIMMS, which will be
replaced in due course.

Thanks again.

Steve

-----Original Message-----
From: Copper, Steve [mailto:scopper@westernpower.co.uk]
Sent: 30 September 2003 13:42
To: Tru64 (E-mail)
Subject: ES40 Server Crash

This email and any files transmitted have been checked by the
MessageLabs Virus Scanning Service for the presence of computer
viruses.

Please exercise caution when receiving any attachment(s) from
unknown sources. If in any doubt, do not open any attachment and
delete the message immediately.

Thank you for your co-operation.
_________________________________________________________________

Hi all,

I have an ES40 running v5.1a with pk4 applied, which for some unknown reason
decides to crash/halt/hang for no apparent reason. The only way to sort the
server out is to switch the power off and then back on again. This does the
trick and the server boots up just fine. When it hangs, there is absolutely
no response from the console, the only way we see the server has gone is
because it disappears of the network.

I have been through various logfiles and there is absolutely nothing in them
remotely connected as to why the servers hangs. Compaq Analyse is installed
on this server, the only thing that was reported by this just before the
hang was the below entry relating to a Time Stamp Message. However on the
previous times that the server has hung there has not been a Time Stamp
Message. In fact there are several Time Stamp Messages and the system has
not hung so I doubt it is this.

System is running Oracle v 8.1.7.3 database which looking through the
database alert log showed nothing occurring until the system was restarted.

Does anyone have any clues as to what is going on or what else I can do to
prevent this from happening again?

Thanks in advance

Steve Copper

Description: Tru64 UNIX Time Stamp Message at Sep 30, 2003 1:29:28 PM
GMT+01:00 from wpde7 in file /var/adm/binary.errlog
File: /var/adm/binary.errlog
============================================================================
====
Event_Leader xFFFF FFFE
Header_Length 260
Event_Length 272
Header_Rev_Major 2
Header_Rev_Minor 0
OS_Type 1 -- Tru64 UNIX
Hardware_Arch 4 -- Alpha
CEH_Vendor_ID 3,564 -- Compaq Computer Corp
Hdwr_Sys_Type 34 -- Tsunami/Typhoon Corelogic
Logging_CPU 0 -- CPU Logging this Event
CPUs_In_Active_Set 4
Major_Class 310
Minor_Class 255
Entry_Type 310 -- Tru64 UNIX Time Stamp
Message
DSR_Msg_Num 1,820 -- Compaq AlphaServer ES40
                                               .... CPU Slots: 4 (667Mhz)
                                               .... PCI Slots: 10
                                               .... MMB Slots: 8 (DIMMs)
Chip_Type 11 -- EV67 - 21264A
CEH_Device 255
CEH_Device_ID_0 x0000 03FF
CEH_Device_ID_1 x0000 0007
CEH_Device_ID_2 x0000 0007
Unique_ID_Count 295
Unique_ID_Prefix 8,996
Num_Strings 5
TLV_Time_as_Local Sep 30, 2003 1:29:28 PM GMT+01:00
TLV_Computer_Name wpde7
TLV_DSR_String Compaq AlphaServer ES40
TLV_OS_Version Compaq Tru64 UNIX V5.1A (Rev. 1885)
TLV_Sys_Serial_Num AY04406981
Entry_Type 310
                       NOTE
   - TIME STAMP encountered in Event Log File.
   - Tru64 UNIX Time Stamp Event contains only an event header,
      as seen above, and no content following the event header.

________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com
________________________________________________________________________

________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com
________________________________________________________________________

________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com
________________________________________________________________________



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:38 EDT