SUMMARY: What happens on a T3ES SUN Raid array when 0 days battery life left?

From: Bernhard Sadlowski (sadlowsk@mathematik.uni-bielefeld.de)
Date: Thu Oct 20 2005 - 16:03:01 EDT


Hello,

The answer was not clear at first. Todd suggested that he have seen T3s
fail, when battery life expires.

Chuck has reset the counters and can live with it, as this are not super
critical systems for him. Eugene had even after 2 more years no problems
with the batteries. Replacing the batteries costs a lot of $$$$.

Allan and Jeremy suggested that the answer b) is right: the write back cache
will be disabled. The result is severely inhibited performance.

Jeremy and others suggested also to extend the battery life cycle to 3
years. This is supported by SUN and even a "T3 extender" script is available
from SUN with the Patches 109115 (T3) and 112276 (T3+). All it does is:

    .id write blife u1pcu1 36
    .id write blife u1pcu2 36

etc.

We don't have a support contract for the T3 storage array and we also don't
plan to replace the 2 year old (but still good!) batteries, so this is our
solution!

Disclaimer: if your batteries fail (i.e. fail the refresh cycle), it's
better you get them replaced or you will be punished at least with bad
performance...

Many thanks go to

Allan West
Todd Marine
Aaron Lineberger
Chuck
Jeremy Ahl
Anjan Dave
Sean Burke
JV
Charles Mengel
Eugene Schmidt

Thanks also to Stephen Moccio, Stefan Pohl, Bill Heese, Orville Lewis,
Juergen Waiblinger, Robert Murray, Florian Laws for the informative
Out-of-Office replies... ;-)

Bernhard

* Allan West wrote:

The answer is b. I'll forward my SUMMARY from the time I had to
research them separately. Allan

* Todd Marine wrote:

You NEED to change the batteries.
The T3s 'fail' when the battery life expires.
(I have seen it first hand)

Todd wrote also:

Resetting the counter *might* help since there is really nothing wrong with
the batteries.

"The counter prevails over reality." -JSS

I guess the question you need to ask yourself is about potential downtime.

If the T3s are hosting a business critical application which can not have
downtime I would suggest planning some off-hours downtime to change the
batteries.

The situation I was personally involved in...once the battery life had
expired EVERY user (hundreds) accessing the information on the T3s came to a
screeching halt. :-\

* Chuck wrote:

I had some T3's that sat for a while and all claim to have dead batteries. I
just went thru the to reset the couters and the amber lights went out and
all "seems fine". These batteries are expected to run the drives if it
loses power until the data can be written. If they are under maint. , have
the PS replaced as they are in the PS. My T3's are no longer on super
critical systems, so I can live with the small risk.

* Jeremy Ahl wrote:

After the counter reaches zero, the cache will be disabled, and performance
will be severely inhibited. There is a T3 extender script which will extend
the battery to 3 years [...]
All the script does is: .id write blife u1pcu1 36 etc

* Anjan Dave wrote:

I've replaced a bunch of these.

Nothing happens months after the battery is completely dead. Yes, you'll not
have cache.

Battery life is 2 years max and you'll reset the date upon installing a new
one. It will take about 12 hours for hte new one to start working (fully
charged) and the controller will automatically start using the cache.

You can either buy the batteries only (remove each PS one by one, array
stays online if both PSs were working fine) or buy the entire PS, which
includes the battery.

* "JV" wrote:

When the counter reaches zero, it will initiate the self-destruct sequence
with a loud siren and explode.

(J/K), no just B) will happen.

And Sun does replace T3 batteries for those shops with support contracts.

* Chares Mengel wrote:

All LUNS should still work, but write-through will be turned on.
Depending on the version of FW, the .bat command works. In 2.x.x and before
the command should work. In 3.1.x, you have to enter a new "superuser" mode
to use the command. At the prompt, type "sun" and give it the password of
"arrayservice". The .bat commands I ran after replacing a battery were:

".bat -s u1pcu1" to verify status
".bat -i u1pcu1" to reset the status from failed to normal.

Make sure to let the new battery charge for 12 hours before changing the
status

* Eugene Schmidt wrote:

Based on factor like period on shelf before sold etc, I have had T3's
basically being installed (3 x partner pairs) with the battery just short of
expiry date.

Using the procedure described, I reset the dates and monitored. No ill
effect and all works fine 2 yrs later.

Sun also has a doc out about changing the battery life to 3 yrs based on
field experience that the batteries last longer than 2 yrs.
 
However, it would be silly to spend money (and lots of it $$$) on new
batteries for the sake of extending the life of these arrays.

On 20 Oct 2005 15:15, Bernhard Sadlowski <sadlowsk@mathematik.uni-bielefeld.de> wrote:
> I see the following messages from a T3ES pair:
>
> Oct 19 21:33:57 t3a SCHD[1]: W: u1pcu1 33 days battery life left, Replace battery.
> Oct 19 21:33:57 t3a SCHD[1]: W: u1pcu2 33 days battery life left, Replace battery.
> Oct 19 21:33:57 t3a SCHD[1]: W: u2pcu1 33 days battery life left, Replace battery.
> Oct 19 21:33:57 t3a SCHD[1]: W: u2pcu2 33 days battery life left, Replace battery.
>
> Firmware version: T3B Release 2.01.00 2002/03/22 18:35:03 (x.x.x.x)
>
> What happens, if we don't replace the battery and the counter goes to 0
> days?
>
> a) Nothing?
>
> b) LUNs will still be accessible, but i.e. write-behind caching is turned
> off? See:
>
> http://www.sun.com/storage/midrange/t3es/faq.xml Q.31
>
> "if a problem is detected, future refresh operations are suspended until the
> problem has been fixed, and write-behind caching is turned off automatically
> as a precaution."
>
> c) LUNs will not be accessible or even a T3ES shutdown??
>
> I tried to find the answer in the docs and with google, but didn't succeeed.
>
> I would only worry about c), since we had to speed up the migration of our
> data on T3 to a new array. We don't want to change the battery, as the T3
> will be soon out of production. There is also no support for this array
> anymore.
>
> Bonus Question: Until now every battery seems to be ok and on each refresh
> cycle "battery passed health check". Is it safe enough for a short period to
> reset the counter for each battery? The commands are:
>
> .bat -n u1pcu1
> .id write busage u1pcu1 0
>
> Thanks!
> Bernhard
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:33:10 EDT