SUMMARY: Boot problem True64 5.0a

From: Christian Wessely (christian.wessely@uni-graz.at)
Date: Tue Jan 24 2006 - 03:01:03 EST


Hello Admin wizards,

as always the list prooved to be very handy in every case :o)

First, the list of contributors to the solution (in order of appearance):
BL Venkatesh, Johan Brusche, Derek Gatherer, Reiner Dassing, Peter
Stern, Tom Blinn (still with us - whooopeee!), Howard Arnold, Brad Bell,
Denise McCracken, Roberto Mackun and John Lanier.

Thank you all - I realize that there are the same names most of the
times that help me out, making me think about a world
tru64-unix-managers-mailinglist meeting sometime ... where I would have
to pay a lot of drinks for you all ;o)

Consider yourself warned, this is a LONG summary !

Solution:

Several ways were suggested, mainly proposing to use the unused dsk3c as
additional swap space:

---------------------------------------------------
Johan Brusche:
rmfsets and rmfdmn filesets and domain from dsk3c
add them to rootdg
use, "volassist make layout=mirror" to create swapvol2 and add
swapvol2 to the swap as indicated earlier:

swapon /dev/disk/dsk3c
If the useage that partition is not markes Swap, then disklabel -sF Swap
dsk3c.
Finally for the next reboot add this swapdevice into /etc/sysconfigtab.

vm:
swapdevice = /dev/disk/dsk0b,/dev/disk/dsk3c

------------------------------------------------------
I refrained from this, however, since I found the unused partition h on
dsk0, so I rather wanted to make this unused space useful and add it to
swap.

Tom Blinn:
Using normal LSM commands (I never use LSM myself, so you are on
your own here) create a basic "disk" from unused space. Then just
use the "swapon" command to add it as swap. If you want it to be
added "permanently" you need to edit /etc/sysconfigtab (no longer
/etc/fstab, that was V4.0x) to add the swap device to the list of
swap devices. Then monitor the system. Being out of swap isn't a
good thing and it can cause critical system processes to fail, in
the normal state of affairs the kernel will survive, so this might
not explain why the system was shut down, but also it might, you can
never know for sure what things failing privileged processes might
do unless you read all their code very carefully.
-------------------------------------------------------
Peter Stern and Denise McCracken pointed out that it might be
appropriate to change the swap policy of the system.

Here the longer answer by Peter Stern:

As far as I remember, a paging file has to be a block special device
(e.g. /dev/rz3b) and not an AdvFS fileset. You might check the man
pages for swapon, the command for adding a swap file. Also, note that:

There are two strategies for swap space allocation: immediate mode and
deferred or over-commitment mode. The two strategies differ in the
point in time at which swap space is allocated. If immediate mode is
used, swap space is allocated when modifiable virtual address space is
created. If deferred mode is used, swap space is not allocated until
the system needs to write a modified virtual page to swap space.
Immediate mode is the default swap space allocation strategy.

Immediate mode is more conservative than deferred mode because each
modifiable virtual page is assigned a page of swap space when it is
created. If you use the immediate mode of swap space allocation, you
must allocate a swap space that is at least as large as the total amount
of modifiable virtual address space that will be created on your system.
  Immediate mode requires significantly more swap space than deferred
mode because it guarantees that there will be enough swap space if every
modifiable virtual page is modified.

et cetera (see the man pages for more including how to determine which
swap space allocation mode is being used.
This might solve your problem in a painless way. If however, you really
need more physical swap space than you have, bad things can happen (i.e.
the system decides which things get swapped out).
------------------------------------------------------------
What I actually did was creating extra space useful for LSM on dsk0h;
afterwards, I used the lsmsa utility to resize the swap partition - this
worked painless following roughly the procedure John Lanier suggested:

1. Initialize the disk into LSM:

#voldisk -f init dsk3c type=nopriv

2. Add the disk into LSM:

#voldg adddisk swapvol2=dsk3c
(where "swapvol2"=disk media name and "dsk3c"=disk access name)

3. Create the additional/secondary swap volume:

#volassist -U gen make swapvol02 size_of_dsk3c layout=nolog swapvol2

NOTE:
Swap volumes, if mirrored, should not use dirty region logging (DRL).
After you create a swap volume, you should also modify the volume's
recovery policy, so that LSM will not resynchronize the plexes after a
system failure.

NOTE:
On a standalone system, use the "gen" usage type for secondary swap volumes.

4. Setup the additional/secondary swap volume for use by the system:

#sysconfig -q vm >stanza-file
(the "swapdevice" is defined in the "vm" kernel subsystem)

#vi stanza-file

.........

Add the following line:

swapdevice = /dev/vol/rootdg/swapvol,/dev/vol/rootdg/swapvol02

Save your work:

<esc>
:wq

.........

5. Merge the changes in the stanza-file into the "vm" kernel subsystem:

#sysconfigdb -m -f stanza-file vm

NOTE:
The parameter "vm: swapdevice" requires a reboot to become active; if the
command "sysconfig -q vm swapdevice" is issued after #6, it will still
report
the original swapdevice value until after the next system reboot.

6. To turn on the additional/secondary swap device prior to reboot:

#swapon -a (Installs all paging partitions specified in the
/etc/sysconfigtab file)
#swapon -s (to check current swap device status)

------------------------------------------------------------------------------

Brad Bell suggested a procedure for finding out swap consuming processes:

If you can get logged in before the swap errors, try this:

ps -eo pid,vsize,pmem,command

This will list all processes, like this:
      PID VSZ %MEM COMMAND
          0 20.2G 3.4 [kernel idle]
          1 608K 0.0 /sbin/init -sa
          3 1.32M 0.0 /sbin/kloadsrv
          5 3.14M 0.0 /sbin/hotswapd
         58 2.93M 0.0 /usr/sbin/esmd
         69 2.16M 0.0 /sbin/update
        194 4.01M 0.0 /usr/sbin/evmd
        235 3.38M 0.0 /usr/sbin/evmlogger -o /var/run/evmlogger.info
-l /var/evm/adm/logfiles/evmlogger.log
        236 2.86M 0.0 /usr/sbin/evmchmgr -l
/var/evm/adm/logfiles/evmchmgr.log
        432 2.94M 0.0 /usr/sbin/syslogd -e
        436 3.89M 0.0 /usr/sbin/binlogd
      38630 2.20M 0.0 /usr/sbin/filterlog -l
     198831 265M 1.3 /usr/local/mysql/libexec/mysqld
--basedir=/usr/local/mysql --datadir=/usr/local/mysql/var --user=mysql --p
     199780 2.68M 0.0 /usr/local/mysql/bin/mysqld_safe
--socket=/tmp/mysql.sock
     332208 147M 0.0 /usr/local/apache2/bin/httpd -k start
-f /usr/local/apache2/conf/httpd-proxy-main.conf
     337584 2.66M 0.0 -voodoo.gre (ftpd)
     434815 2.70M 0.0 -voodoo.gre (ftpd)

The ZSZ and %MEM being the most important. Look for processes with high
%MEM
and/or large VSZ.

You may have a lot of little processes (a runaway forker) or one or a few
processes eating up alot of the memory.

-------------------------------------------------------------------------------

Thanks again to all of you!

CW



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:28 EDT