Summary: How to analyze Crash, to know if it is Hardware or Softw are ?

From: Ashraf Baker (ABaker@DANA.com.jo)
Date: Sun May 05 2002 - 10:46:15 EDT


Hi Admins
 I'm Sorry for being lat in summarizing this article; I left the case under
monitoring, now it is closed.

Thanks very much to all how respond... specially
 Olle Eriksson, Emmanuel Bove, Lawrie Smith, Raj.,Stan Horwitz, Gavin
Kreuiter, alan, Mackun Roberto & Shaukat Riaz

The Original message was

> I have Digital 4100 with 5.1a operating system with patch kit 1.
> the system always crashes and gives different messages one of them is cpu0
> panic : uncorrectable machine error
> I want to ask how could I analyze crash files to know exactly what cause
> this problem,
> how could I know if the problem HW or SW is there diagnostics could I run
to
> check Server HW ?
>

 Actually cpu0 panic : uncorrectable machine error
was Hardware Error, The problem Found on memory after replacing memory
boards and monitoring for 2 weeks the server worked fine

 I thought there might be diagnostics CD to make Full Hardware Test, but you
can run test command from >>> to test HW.
>>>test
  

 related to log files; we can make use of the following

/var/adm/messages " for ASCII
error LOG"
/usr/sbin/uerf -t s:01-apr-2002 e:11-apr-2002 " for Binary error LOG"
/var/adm/crash/crash-data.nn " Crash File"

not only uerf used to analyze /var/adm/binary.errlog log file also we can
use uerf/dia/ca
we can using Compaq Analyzer (ca),"ca x analyse".
or dia -R

 kernel debugging can be used to collect more information refer to kernel
debugging book
2 files are copied "other than crash-data.nn "used in analyzing crash in
/var/adm/crash/
kernel and crash dump; "But I think it is complex way"

 Thanks again to all how responded

Best regards

ENG. Ashraf Baker
Technical Support Engineer
DANA Information Systems
 
Tech. Dept. : +962 6 5169080
FAX : +962 6 5160663
Mobile : +962 79 5692240

  



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:40 EDT