Re: hdisks missing

From: Klaus Oberle (Klaus.Oberle@LINDE-MH.DE)
Date: Fri Apr 25 2003 - 06:44:25 EDT


Thanks Simon,

> I guess from your post that you have a resource group running on each
node,
> in mutual takeover. So some disks are used by one node, some by the
other
> when everything's running normally. At the moment, each node is OK -
taken
> in isolation - so the actual disk drives must be working.

YES.

They are two MCA Highnodes (Node1, 7x24 production + Node7, testbox)
connected to one 7133-020. Each node has two SSA Enhanced Adapters with
identical FW (3202). At all adapters only the A-Ports are used and each
node can see pdisk0 to pdisk7 in one loop and pdisk8 to pdisk15 in a second
loop.

To clarify the HW-Upgrade:
We inherited a Highnode (fully developed with 8 procs and 4GB of RAM) from
another company. Our Node1 has had 4 procs and 2GB only, so we made the
decision - together with our IBM-TA - to replace the complete CPU/RAM area
with those from the inherited node. Therefore the I/O-Part of the Node1
(including cabling) was left untouched. After this modification and when
Node1 was booted successfully, we plugged the 2GB RAM from "old" Node1 into
Node7.

maymap shows the loops correctly at both nodes and lscfg lists all pdisks
at both nodes.

I did a "varyoffvg testvg" at Node7 and removed all hdisks at Node1 owned
by Node7 with rmdev -dl.. Then i ran cfgmgr which brought the disks back:

hdisk4 00061189b103c28c testvg
hdisk5 00061189b66695a2 testvg
hdisk6 00061189b66699e4 testvg
hdisk7 00061189b66840af testvg
hdisk8 00201586ae7f0a89 prodvg
hdisk9 00201586ae7f0d9f prodvg
hdisk10 00201586ae7f10a2 prodvg
hdisk11 00062764f9e07176 prodvg
hdisk12 00061189b0fce6b7 testvg
hdisk13 00061189b0fcfd25 testvg
hdisk14 00061189b0fd03cb testvg
hdisk15 00061189b10026fd testvg
hdisk16 00061189b1002a6e prodvg
hdisk17 0020158654365297 prodvg
hdisk18 00061189b1002e16 prodvg
hdisk19 00062764f9e075b3 prodvg

However, this didn't help at Node7. After cfgmgr the hdisks are still
missed. Of course, i cannot varyoff the prodvg but i believe it's not
necessary, is it?

/klaus

                      "Green, Simon"
                      <Simon.Green@EU. An: aix-l@Princeton.EDU
                      ALTRIA.COM> Kopie:
                      Gesendet von: Thema: Re: hdisks missing
                      IBM AIX
                      Discussion List
                      <aix-l@Princeton
                      .EDU>

                      25.04.2003 11:35
                      Bitte antworten
                      an IBM AIX
                      Discussion List

I guess from your post that you have a resource group running on each node,
in mutual takeover. So some disks are used by one node, some by the other
when everything's running normally. At the moment, each node is OK - taken
in isolation - so the actual disk drives must be working.

I can't really think of anything which would definitely cause the sort of
problem you're seeing, but here are a few things to check: maybe one of
them
will suggest something to you.

What sort of SSA drawer is it? If it's a 7133-020 or D40, how is it
caballed and how are the bypass cards set?

What does SSA Link Verification tell you? (From the diagnostic Service
Aids.) Run "maymap" if you have it. Although you have not made any
deliberate changes to the SSA loop it's possible that the cables were
disconnected in order to gain access to the node for the upgrade. Are you
certain everything got put back in the right place?

Do you still have all of the volume groups defined on both systems? (If
you've been deleting and re-defining disks, you'll probably need to export
and re-import some of these.)

What are the microcode levels of the adapters? Make sure that they're both
the same.

Did you re-boot the two nodes simultaneously? I have had problems -
particularly with old MCA nodes using Enhanced 4-port Adapters - that if
two
nodes in the same loop try to configure their SSA devices at the same time
strange things can happen, including devices going missing. Always stagger
a reboot - even if it's only by half a minute or so.

I think I'd want to shutdown both nodes, then reboot just one of them and
examine the SSA devices BEFORE re-starting HACMP. If you have HACMP
starting automatically, disable that temporarily. Once one node is OK,
boot
the second one. Only when both nodes' SSA config is OK should you start
HACMP.

Simon Green
Altria ITSC Europe s.a.r.l.

AIX-L Archive at http://marc.theaimsgroup.com/?l=aix-l&r=1&w=2
AIX FAQ at http://www.faqs.org/faqs/aix-faq/

N.B. Unsolicited email from vendors will not be appreciated.

> -----Original Message-----
> From: Klaus Oberle
> Sent: 24 April 2003 12:01
> To: aix-l@Princeton.EDU
> Subject: hdisks missing
>
>
> Hi *,
>
> I have a HACMP-Cluster consisting of two old SP Highnodes
> (AIX4.3.3 - ML
> 08) which shares one SSA-Drawer. Recently they were both
> being upgrated by
> adding additional procs and memory from other obsolete
> Highnodes. After the
> upgrade, both machines came up and the cluster applications runs fine.
> Problem is, "lspv" on both nodes only lists hdisks which
> belongs to the
> active VG of that node - hdisks form the other node are no
> longer there. On
> the other hand, every node can see beside its own pdisks the
> pdisks that
> belongs to the other node. (ok - cabling or something else
> wasn't changed
> during the hardware upgrade).
>
> To get the missed hdisks back (for properly failover), i
> removed it first
> (rmdev -dl hdiskX ..) and ran "cfgmgr" without success. The
> hdisks still
> remain lost. Any hints how to solve this???

This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail
in error) please notify the sender immediately and destroy this e-mail.
Any unauthorised copying, disclosure or distribution of the material
in this e-mail is strictly forbidden.
Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be
the views of Linde Material Handling.

Since January 2002 we use the e-mail domain linde-mh.de instead
of linde-fh.de.

This mail has been swept for the presence of computerviruses.



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:46 EDT