SIGFPE and siginfo_t

From: Eiler, James A. (James.Eiler@alcoa.com)
Date: Fri Jun 27 2003 - 23:51:26 EDT


Hi All,

I apologize, but this is a bit long....

I'm running Tru64 UNIX, several versions (4.0G, 5.1A, 5.1B), various Patch
Kits.

I've got a C program that runs on all of these. This program is timer
driven
and reads analog input data and processes it. This program is fairly large
and was written by numerous folks over a period of years.

Occasionally, one or more of the analog input signals is zero. And when the
program does a divide by zero, a SIGFPE is generated.

We need this program to keep running, so I've put in a signal handler to
process the SIGFPE signal. I know I'm doing a bad thing in the SIGFPE
signal handler in that I'm doing an fprintf to stderr indicating that
the program has done a divide by zero - but at least we know when the
SIGFPE has been generated. But, we'd like to know which line of code is
generating the error.

If I took out the signal handler, I could very easily determine the exact
line of code from the core file. But, like I said, we need to keep this
program running....

I'm trying to use the siginfo_t structure that's described in Section 5.4,
"Realtime Signal Handling", of the Guide to Realtime Programming.
As I read it, when a SIGFPE is generated, siginfo_t should contain the
address of the bad instruction in member si_addr.

I've modified the example code from the section of the manual:

> cat doit.c
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>

main()
{
   pid_t pid;
   sigset_t newmask;
   int ii, jj;
   float fNum;

   struct sigaction action;
   void catchit();

   sigemptyset(&newmask);
   sigaddset(&newmask, SIGFPE);
 
   action.sa_flags = SA_SIGINFO;
   action.sa_sigaction = catchit;
 
   if (sigaction(SIGFPE, &action, NULL) == -1) {
       perror("sigusr: sigaction");
       _exit(1);
   }

   for( ii = 3; ii > -1; ii--) {
      fNum = 42.0 / (float) ii;
   }
}
void catchit(int signo, siginfo_t *info, void *extra)
{
       int int_val = info->si_value.sival_int;
       printf("Signal %d, value %d received\n", signo, int_val);
       printf("si_addr = %d\n", info->si_addr );
       fflush( stdout );
       _exit(0);
}

I compile it, link it, and run it:

> cc -g2 doit.c -lrt -o doit
> ./doit
Signal 8, value 0 received
si_addr = 0

I would have thought si_addr should be something other than 0.

If I step through the signal handler, I can see part of the
address (see the ^^^^^ below):

(ladebug) where
>0 0x120001458 in catchit(signo=0x8, info=0x11fffbc98, extra=0x11fffbcf8)
"doit
.c":34
#1 0x3ff800d5af0 in __sigtramp(...) in /usr/shlib/libc.so
#2 0x120001404 in main() "doit.c":29
    ^^^^^^^^^^^
#3 0x1200012e8 in __start(...) in doit

(ladebug) p *info
struct siginfo {
  si_signo = 0x8;
  si_errno = 0x0;
  si_code = 0x3;
  _sifields = union {
    _sipad = [0] = 0x0,[1] = 0x0,[2] = 0x4,[3] = 0x0,[4] = 0x10,[5] =
0x0,[6] =
0x1,[7] = 0x0,[8] = 0xc0004900,[9] = 0x3ff,[10] = 0x0,[11] = 0x0,[12] =
0x2,[13]
 = 0x0,[14] = 0xe8ec3b2,[15] = 0x0,[16] = 0x1000,[17] = 0x0,[18] =
0x801fc548,[1
9] = 0x3ff,[20] = 0x0,[21] = 0x0,[22] = 0x0,[23] = 0x0,[24] =
0x20001404,[25] =
                                                              ^^^^^^^^^^
0x1,[26] = 0x8,[27] = 0x0;
More (n if no)?

Any help on this will be much appreciated!

Thanks,

Jim



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:25 EDT