Re: p670 availability

From: Bill Verzal (Bill_Verzal@BCBSIL.COM)
Date: Wed Aug 21 2002 - 12:01:11 EDT


I/O drawers are not hot-pluggable. PCI cards not "hot-swappable" cannot be
hot-swapped, even though the Regatta supports it.
Microcode requires a complete outage. CPU and memory of course do as well.
Hot-swappable cards can be changed out on the fly, but each one requires
25-minutes CE time due to the number of "mounting screws" on the carrier.

Also, here are some more issues I raised related to these questions, along
with the answers I received. I had some questions on statements that were
made in the sales manual.

BV

First:

     A minimum of 4 GB of system memory is recommended per LPAR.

     Can you clarify this statement ? Why is it "recommended?"

        This is only a recommendation, depending the environment 1 or 2 GB
        of memory minimum per LPAR might do the job. This recommendation
        will allow for optimal performance with a minimum configured LPAR
        for the average environment. This will allow for applications to
        utilize 2 to 3 GB of memory while leaving enough memory for the
        Hypervisor and AIX. On average, most applications will size 2 - 3
        GB of memory per CPU. This is just an average, some applications
        may require more or less that the 2 - 3 GB average. For example,
        some web base or routing applications requires only 1 - 1 1/2 GB of
        memory per CPU, in this case you will size your total memory for 2
        - 3 GB of memory per CPU. Another example, some Databases can
        utilize 2 - 4 GB of memory per CPU, in this case you would size
        your memory for 3 - 5 GB of memory per CPU. If you are running
        multiple applications in the same LPAR or same OS image then your
        memory requirements will be greater. It is recommended to allocate
        1/2 GB to 1 GB of memory for AIX and the Hypervisor. The memory is
        configured in whole numbers, so as a rule of thumb you size the
        memory at lease a Gigabyte or more higher than what the application
        requires

Second:

     Minimum of two internal SCSI hard disks are required per p690 server.
     It is recommended that these disks be utilized as mirrored boot
     devices. These disks should be mounted in the first 7040-61D I/O
     drawer. This configuration provides service personnel the maximum
     amount of diagnostic information if the system encounters errors in
     the boot sequence.

     What does this mean ?

        This configuration will allow you to minimize your system downtime
        due to a disk Drive failure or service work on an I/O drawer.

     Boot support is also available from local SCSI, SSA, and Fibre Channel
     adapters, or from networks via ENET or token-ring adapters. The
     pSeries 690 does not support booting from FDDI adapters #2741 or #2742
     located in 7040-61D I/O drawers.

     No questions there...

     Consideration should also be given to the placement of AIX rootvg
     volume group in the first I/O drawer. This allows AIX to boot any time
     other I/O drawers are found offline during boot.

     Why would an I/O drawer be offline, and what are scenario's that this
     might affect us on ?

        The key reason for an I/O drawer to be offline is for maintenance
        or repair.

     If the boot source other than internal disk is configured, the
     supporting adapter should also be in the first I/O drawer.

     What does this mean ?

        The p690 will provide a very highly available environment without
        HACMP, but depending on how a p690 is configured the availability
        of a system may drop. To minimize downtime due to a single failure
        you should spread your dependencies across the I/O subsystem as
        much as possible. Dependencies are things like rootvg disk drives,
        network adapters, and adapters used to access disk drives (Fibre
        Channel, SSA, and SCSI). The first I/O drawer is the first drawer
        to come online or have power applied. There is a remote chance that
        a problem in one I/O drawer can affect I/O drawers downstream. A
        problem could be disconnecting downstream cables for maintenance or
        service. Let's say you configured all of the I/O resources (rootvg
        Disk Drives, Ethernet adapters, Fibre Channel adapters, etc.) for
        LPAR #1 in the first half of drawer #3. Your single point of
        failure in this case would be any failure that would affect the
        first planer board in drawer #3, and would cause LPAR #1 to go
        down. If on the other hand, half of the I/O resources was spread
        across two different drawers (drawer 1 and 3) then you can pull
        things like RIO or power cables and LPAR # 1 will not go down.

-----------------------------------------------------------------------------------------------------------

Bill Verzal
Technical Consultant
Forbes Technical Consulting
(312) 653-3684
bill_verzal@bcbsil.com
MailStop: 27.201C

                    Holger.VanKoll@SW
                    ISSCOM.COM To: aix-l@Princeton.EDU
                    Sent by: IBM AIX cc:
                    Discussion List Subject: p670 availability
                    <aix-l@Princeton.
                    EDU>

                    08/21/2002 10:50
                    AM
                    Please respond to
                    IBM AIX
                    Discussion List

Hello,

I have to find out

- which parts cause a scheduled downtime when they fail (not
hot-pluggable, but redundant)
- which parts cause a unscheduled downtime when they fail (not redundant)
- which parts cause no downtime when they fail (hot-pluggable).

I know I can find this out by reading a few books, but maybe anyone did
this before?

I am also interested in incomplete answers.

Thank you and regards,

Holger



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:09 EDT