Warning: Time problems on SunFire 6800, 4800, etc.

From: Tyler Brenden (tyler.brenden@bankfirstcorp.com)
Date: Wed Aug 06 2003 - 11:46:54 EDT


Hi Sun Managers,

Just wanted to alert you all to a not-so-fun problem I ran into
yesterday that may begin to affect others as well. There is a bug in
the System Controller firmware on SunFire 6800's (and I assume 4800's as
well?), that causes the system time to become random and unstable after
530 days of continuous SC uptime. The Sun Bug ID for this is 4876369.

Here's the symptom: After the magic 530 days goes by, your system time
will jump backwards several weeks and an odd number of hours. Then
after that, even after manually setting the system clock, exactly every
hour the system time will move backwards one hour. This is a fun one to
troubleshoot! All domains of the SunFire will be affected at the same
time.

Solution: There is no patch for this problem yet, but Sun is aware of
the problem. The current workaround is to either a) restart the SC's
(which will correct the problem for another 530 days) or in the
/etc/system file of each domain, "set tod_broken=1" and reboot. I'm
sure they will have a firmware patch for the SC soon, but in the
meantime either above method works. The second method tells the OS to
not sync it's clock with the hardware clock, which may introduce time
drift problems, so make sure you use ntp to correct for the drift.

I'm assuming more and more people will be running into this problem as
their SC's reach this magic uptime. I hope this saves some of you from
beating your head against the wall for a few hours!

Tyler Brenden
Systems Architect
BANKFIRST
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:53 EDT