Summary: "file name too long" crash.

From: Ermanno Polli (Ermanno.Polli@lnf.infn.it)
Date: Tue Mar 18 2003 - 08:30:24 EST


Thanks to (in order of apperance) ;-)

O'Brien, Pat
Dr Thomas.Blinn
Jason Orendorf

As usual, the most complete answer came from Dr. Thomas Blinn, but,
unfortunately, he left me little hope.

"You are probably hitting an NFS bug of some kind, and it might be in
the Linux box."

And then:

"If you got a crash dump, then there should be a crash data file in
/var/adm/crash that would help a support engineer figure out just
what is failing. But since you don't have software support, you
are out of luck. Again, it's almost certainly bug in the Linux
system's NFS server implementation against which our client side
is not adequately protected. But no one is going to try to make
a patch for this for V5.0A without a "paying customer" problem. I
am sorry but that's reality."

I explained him some other facts I haven't told this ML, yet.. ;-)
But I had no answers... :-(

I'm collecting more evidence with tcpdump and I have some idea of
packets being dropped. For instance, when I access some files from
tru64, on the linux, nfsstat -c increases the badxids counter.

I'll send another mail, shortly, with the other strange things I've
found about NFS.

The original mail, in short, was:

=====================================
I have this problem on tru64 5.0A, on a ES40 server.
Actually it involves the ES40, a dual Pentium PC with Linux SuSe 7.3,
some AlphaStation 600 with Digital Unix 3.2C, and a lot of NFS mounted
disks.
I was doing a copy from a directory to another with tar piped to
another tar (tar c | tar x, in short). The source disk was on an
AlphaStation 600 and the destination disk was on the PC.
In the middle of the copy, the ES40 crashed with "trap: invalid memory
read access from kernel mode". Trying to understand why, I did a ls -lR
on the destination direcory. Well, I saw some symlinks having the
destination file, after the "->", follwed by some text, apparently taken
from some file... There was this "name too long error" and two or three
errors later, the ES40 crashed.
The same tar pipe, issued frome the linux box, succeded!

I opened a call to HP (sigh!) but they told me there was no hardware
failure. Since I only have a hardware maintenance, they helped me
no more..:-(

Yesterday, I had a variant.. I tried both a tar pipe from the linux box
in the tru64 5.0A and a cp -R command. Both followed the same fate: two
crashes.

And the one from cp -R, had a double crash!
==================================

Thank you, everybody.

        Ermanno Polli
        ermanno.polli@lnf.infn.it



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:12 EDT