Getting rid of event warning after memory replacement

From: Andreas Höschler (ahoesch@smartsoft.de)
Date: Thu Feb 09 2006 - 10:02:24 EST


Dear managers,

our Solaris 10 (0305) box reported an error with a memory chip.

Feb 9 15:20:37 sun SOURCE: cpumem-diagnosis, REV: 1.3
Feb 9 15:20:37 sun EVENT-ID: 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec
Feb 9 15:20:37 sun DESC: The number of errors associated with this
memory module has exceeded acceptable levels. Refer to
http://sun.com/msg/SUN4U-8000-35 for more information.
Feb 9 15:20:37 sun AUTO-RESPONSE: Pages of memory associated with this
memory module are being removed from service as errors are reported.
Feb 9 15:20:37 sun IMPACT: Total system memory capacity will be
reduced as pages are retired.
Feb 9 15:20:37 sun REC-ACTION: Schedule a repair procedure to replace
the affected memory module. Use fmdump -v -u <EVENT_ID> to identify
the module.

The memory chips were replaced. Now after the reboot we still get

fmdump
TIME UUID SUNW-MSG-ID
Jan 31 17:39:15.6729 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Jan 31 18:00:25.8867 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Feb 02 11:42:13.6538 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Feb 06 10:34:19.7219 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Feb 06 11:45:28.7726 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Feb 06 15:44:57.0716 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Feb 06 21:25:21.5122 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Feb 06 21:41:56.1696 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
Feb 09 15:20:36.9320 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35

fmdump -v -u 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec
TIME UUID SUNW-MSG-ID
Jan 31 17:39:15.6729 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Jan 31 18:00:25.8867 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Feb 02 11:42:13.6538 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Feb 06 10:34:19.7219 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Feb 06 11:45:28.7726 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Feb 06 15:44:57.0716 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Feb 06 21:25:21.5122 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Feb 06 21:41:56.1696 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

Feb 09 15:20:36.9320 45744a3d-3fd9-c77f-c09a-f6cf8aacf8ec SUN4U-8000-35
    95% fault.memory.bank
          FRU: mem:///component=MB/P0/B0:B0/D0,B0/D1
         rsrc: mem:///component=MB/P0/B0:B0/D0,B0/D1

The latest event Feb 09 15:20:36.9320 was after the replacement. Is
this a sign that something is still wrong or is this normal.

Thanks a lot!

Regards,

   Andreas
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:38:55 EDT