Inaccessible Subsystem?

From: Pardy, Brian (BPardy@CuraGen.com)
Date: Wed Jul 10 2002 - 10:33:28 EDT


Hello, gurus...

I've got an E6500 here running Solaris 7 that has been having some very
strange issues lately. It happened about a month ago and resolved itself
without intervention, but now that it is happening again I would like to
figure out the actual problem.

The symptoms:

Every 15-30min, we get a sudden burst of a couple hundred emails sent to root
that look like:

> From root Wed Jul 10 10:10:34 2002
> Return-Path: <root>
> Received: (from root@localhost)
> by denali.curagen.com (8.10.2+Sun/8.10.2) id g6AEAY107658
> for root; Wed, 10 Jul 2002 10:10:34 -0400 (EDT)
> Date: Wed, 10 Jul 2002 10:10:34 -0400 (EDT)
> From: Super-User <root>
> Message-Id: <200207101410.g6AEAY107658@denali.curagen.com>
> Content-Length: 78
>
> Unknown
> c04
> Inaccessible Subsystem, Check cable connections on host denali

In and of itself, this isn't much more than an annoyance, but each time this
happens sendmail manages to get into a massive lock-contention fight between
several hundred copies of mail.local and sendmail trying to write to
/var/mail/root and /var/mail/root.lock (as seen through truss).

I don't see anything particularly relevant in /var/adm/messages (except for
more than a few SYSERR(root)'s complaining about "No child processes" due to
the fact that there are ~400 sendmails or mail.locals running).

Sendmail is the only mail software running on this machine, and it's not
running as a daemon (or out of inetd.conf -- this system does not receive
mail). /var/mail is on a local partition, so there shouldn't be any NFS
issues.

I've investigated on Google and SunSolve, and found one other report[0] of a
similar problem when a lot of email arrived simultaneously for a single user,
but haven't seen anything reported as a resolution. The suggestions to the
prior report (which was on Sun-Managers) were to replace mail.local with
procmail, or re-configure sendmail to monitor load averages and/or only
deliver to local users through crontab'd queue delivery runs. Haven't tried
these yet, as they would likely only band-aid the symptom and not fix the
actual problem.

So, does anyone know where this "Inaccessible Subsystem" alert is coming from?
I've run strings on just about everything in /kernel looking for it, and just
can't seem to locate it.

(Yes, we have checked our cable connections and they all appear fine. Someone
else on site remarked that the Ethernet cable (homemade) looked a little
flaky, but my feeling is that if it were indeed an Ethernet issue I wouldn't
be able to telnet into the system at all on this, the only interface.)

I will happily summarize, and appreciate any help that can be provided. We're
about ready to do the NT thing and reboot the box...

Thanks.

[0] http://www.sunmanagers.org/archives/1999/0826.html

--
Brian J. Pardy / x4332
Unix Systems Administrator
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:34 EDT