Notice Of SSH Interop Issues

From: Thor Newman (Thor@airg.com)
Date: Fri Aug 13 2004 - 19:49:22 EDT


Hello Managers,

This is as much an information dump as it is a RFC or feedback. The
issue is relatively minor, but nevertheless aggravating.

Since deploying a Solaris 9 / SPARC system as a database into an
environment with a Linux-driven monitoring service called Nagios, I
noted frequent failures of monitoring scripts checking things like disk
space, load, paging, etc.

Nagios, as with many similar services, operates by retcode... A retcode
of 0 from a check means 'Ok', a retcode of 1 means Warning, 2 means
Critical, and 3 means unknown.

The checks are executed locally via SSH; public-key authentication is
used for the Nagios agent user to authenticate and launch the
appropriate script and pass the return code back to the monitoring
process.

What I was seeing was service states flipping back and forth between OK
and UNKNOWN. I couldn't figure that one out as the checks I wrote (all
vanilla /bin/sh scripts) never returned a retcode of 3 no matter what.
Obviously something else was wrong, and after some digging, it appears
that the problem is some interop glitch between the client side SSH
(SSH-2.0-OpenSSH_3.4p1) and the Solaris server-side
(SSH-2.0-Sun_SSH_1.0.1)

I have no interest in cluttering up the Solaris machine with GPL
versions of SSH or anything else, so the obvious fix of deploying
OpenSSH to the Solaris server is not an option.

I applied the following patch:

113273-07

To the Solaris server, which seemed promising as it specifically
mentioned "...4939055 ssh does not return standard errors..." as one of
the bugs fixed, but this made no difference.

I am about to apply '113273-08', which mentions a PAM closing issue
which may be the cause. I will follow up with a report on that one next
week.

Regardless of the patches, I have noted the following workaround which
seems to exist. In order to troubleshoot the issue further, I set the
log level from 'info' to debug in /etc/ssh/sshd_config ... As soon as I
did so, the UNKNOWN responses stopped occurring. Changing the loglevel
back to 'info' brought them back.

Its possible this is a weird coincidence, but unlikely as the services
all reported properly for days with a loglevel of debug but began
reporting UNKNOWNS again in a matter of a few minutes when the loglevel
was set back.

So, anyone who might be encountering similar strangeness with the return
codes of processes called by (Open)SSH in this matter, adjusting
LogLevel from info to debug may not only give you more debugging info,
it might actually fix the problem bizarrely enough.

Cheers

Thor Newman
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:17 EDT