Solaris TRUSS and pfiles Problem Solving Methodology for Failed Processess under Solaris < 10

From: Steven Sim (steven.sim@faplccc.net)
Date: Wed Dec 07 2005 - 00:59:18 EST


  Hello All;

I have an application which is failing to authenticate itself properly.

However, in this dicussion, I am more interested in the actual TRUSS and
PFILES problem solving methodology rather than the actual solution.

There are several questions I would like to pose to this list.

I trussed the offending process using the following switches;

#truss -faled -p <pid>

The truss output is given below;

15771/3: 6.4546 getpid()
= 15771 [1]
15771/3: 6.4547 door_call(14, 0xFDC00B78) = 0
15771/3: 6.4548 close(14) = 0
*15771/1: 6.4550 read(13, 0x0016247F, 8)
Err#11 EAGAIN*
15771/1: 6.4551 lwp_sema_post(0xFDC01E60) = 0
15771/3: 6.4557 lwp_sema_wait(0xFDC01E60) = 0
15771/1: 6.4558 lwp_mutex_wakeup(0xFF0F3500) = 0
15771/3: 6.4558 lwp_mutex_lock(0xFF0F3500) = 0
*15771/3: 6.4559 write(13, " 0\f020102 e07\n01 04\0".., 14)
= 14*
15771/3: 6.4561 fstat(3, 0xFDB81440) = 0
15771/3: 6.4561 time()
= 1133923824
15771/3: 6.4562 getpid()
= 15771 [1]
15771/3: 6.4563 putmsg(3, 0xFDB80AF8, 0xFDB80AEC, 0) = 0
15771/3: 6.4566 open("/var/run/syslog_door", O_RDONLY) = 14
15771/3: 6.4567 door_info(14, 0xFDB80A30) = 0
15771/3: 6.4568 getpid()
= 15771 [1]
15771/3: 6.4569 door_call(14, 0xFDB80A18) = 0
15771/3: 6.4570 close(14) = 0
15771/1: 6.4571 yield() = 0
15771/3: 6.4571 yield() = 0
15771/1: 6.4572 time()
= 1133923824
15771/1: poll(0xFE401280, 6, -1) (sleeping...)
15771/2: signotifywait() (sleeping...)
15771/3: lwp_sema_wait(0xFDC01E60) (sleeping...)
15771/4: lwp_cond_wait(0xFF0F34F0, 0xFF0F3500, 0xFF0ECD88)
(sleeping...)

What caught my eye immediately was the Err#11 EAGAIN returned from the
read function on fd 13.

I then went to /usr/include/sys and did a

$ grep EAGAIN *
errno.h:#define EAGAIN 11 /* Resource temporarily unavailable */
errno.h:#define EWOULDBLOCK EAGAIN

So now I know what EAGAIN meant, now I needed to now what resource was
"temporarily unavailable"

So I did a pfiles on the same process like so

bash-2.03# pfiles 15771
15771: /opt/openldap/libexec/slapd
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0666 dev:85,10 ino:597058 uid:0 gid:3 rdev:13,2
      O_RDWR
   1: S_IFCHR mode:0666 dev:85,10 ino:597058 uid:0 gid:3 rdev:13,2
      O_RDWR
   2: S_IFCHR mode:0666 dev:85,10 ino:597058 uid:0 gid:3 rdev:13,2
      O_RDWR
   3: S_IFCHR mode:0666 dev:85,10 ino:597054 uid:0 gid:3 rdev:21,0
      O_WRONLY FD_CLOEXEC
   4: S_IFIFO mode:0000 dev:306,0 ino:289885 uid:0 gid:1 size:0
      O_RDWR
   5: S_IFIFO mode:0000 dev:306,0 ino:289885 uid:0 gid:1 size:0
      O_RDWR
   6: S_IFSOCK mode:0666 dev:305,0 ino:10053 uid:0 gid:0 size:0
      O_RDWR
        sockname: AF_INET 0.0.0.0 port: 389
   7: S_IFREG mode:0600 dev:85,10 ino:2489566 uid:0 gid:1 size:32768
      O_RDWR|O_LARGEFILE FD_CLOEXEC
   8: S_IFREG mode:0600 dev:85,10 ino:2489565 uid:0 gid:1 size:43838
      O_RDWR|O_LARGEFILE FD_CLOEXEC
   9: S_IFREG mode:0600 dev:85,10 ino:2489567 uid:0 gid:1 size:8192
      O_RDWR|O_LARGEFILE FD_CLOEXEC
  10: S_IFSOCK mode:0666 dev:305,0 ino:57121 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        sockname: AF_INET 127.0.0.1 port: 389
        peername: AF_INET 127.0.0.1 port: 45556
  11: S_IFSOCK mode:0666 dev:305,0 ino:30118 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        sockname: AF_INET 127.0.0.1 port: 389
        peername: AF_INET 127.0.0.1 port: 45544
  12: S_IFSOCK mode:0666 dev:305,0 ino:36574 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        sockname: AF_INET 127.0.0.1 port: 389
        peername: AF_INET 127.0.0.1 port: 45546
  13: S_IFSOCK mode:0666 dev:305,0 ino:51163 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        sockname: AF_INET 127.0.0.1 port: 389
        peername: AF_INET 127.0.0.1 port: 45568

How do I properly interpret the pfiles output listed above?

fd 13 is a S_IFSOCK. Can the gurus on this list advise me how I can link
the above fd (dev 305,0 ino 51163) to an actual socket or file on the
system?

For that matter, I find it difficult to reconcile inodes numbers given
in pfiles output to actual path and file names in the Filesystem.

I know fd 0, 1 and 2 are standard input, output and error but how do
establish the other fd filenames and pathnames?

In the above example, the write function also returned a status > 0 (14).

What is the proper methodology for finding out what this write function
returned code 14 signify?

One last question, the return code is in Hex right? Not Decimal?

Warmest Regards
Steven Sim

p.s. I realized that Solaris 10 pfiles is much improved. See
http://blogs.sun.com/roller/page/eschrock/20040618#shameless_self_promotion

But most systems out there are still < 10 even though we wish they would
all upgrade immediately. So we have to deal with this with non 10 tools...

Fujitsu Asia Pte. Ltd.
_____________________________________________________

This e-mail is confidential and may also be privileged. If you are not the intended recipient, please notify us immediately. You should not copy or use it for any purpose, nor disclose its contents to any other person.

Opinions, conclusions and other information in this message that do not relate to the official business of my firm shall be understood as neither given nor endorsed by it.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:37:28 EDT