[SUMMARY] Drowning in "lpd[nnnnnn]: ERROR -- lp: cannot lock file lock: No such file or directory" messages

From: Speakman, John H./Epidemiology-Biostatistics (speakmaj@MSKCC.ORG)
Date: Tue Oct 14 2003 - 15:26:50 EDT


All

Finally I got around to calling support; they have reproduced my problem
in the latest 5.1B patch kit (I have 5.1APK3) so I think we are calling
this a big ol' bug. Basically the "on" symbol in printcap, which is
supposed to allow failover of printing services between two nodes (its
argument is a comma-separated list of nodes), has a problem when you try
and print from a node other than the first in the list - it will try and
print to the first node in the list and spawn thousands of lpr
processes, leading to the huge number of "cannot lock file lock"
messages in lpr.log.

So we have taken the "on" symbol out of our printcap - it's not like it
was a big deal anyway.

John
Ps: lpdcheck didn't report a problem; I think our printcap configuration
is street-legal.

-----Original Message-----
From: Speakman, John H./Epidemiology-Biostatistics
Sent: Sunday, October 05, 2003 11:51 AM
To: tru64-unix-managers@ornl.gov
Cc: Speakman, John H./Epidemiology-Biostatistics
Subject: [SUMMARY RETRACTED] Drowning in "lpd[nnnnnn]: ERROR -- lp:
cannot lock file lock: No such file or directory" messages

It was too good to be true. The problem came back. Looks like a stop
and restart of lpd gives you temporary respite. We may have to delete
all the printers and put them back. Thanks to Johan at HP who supplied
me with 'lpdcheck'; we will play with this also.

John

-----Original Message-----
From: Speakman, John H./Epidemiology-Biostatistics
Sent: Thursday, October 02, 2003 11:14 AM
To: tru64-unix-managers@ornl.gov
Subject: [SUMMARY] Drowning in "lpd[nnnnnn]: ERROR -- lp: cannot lock
file lock: No such file or directory" messages

Thanks to Kris and Barbara; the consensus was that there was "something
funny" with lpd configuration, either in printcap or the spool
directories. Kris also mentioned the 'lpdcheck' utility, which I
couldn't find anywhere either on the system on the internet.

I was planning to remove printcap and start adding all the printers from
scratch (what Kris ended up doing) but I spent a couple of minutes
peering at printcap first. I noticed something about the "on" symbol.
This symbol is new to us (i.e., wasn't in 4.0) and its value is a
comma-separated list of the host(s) in the cluster that the printer is
"on". I noticed that all the errors in syslog were coming from printers
which had no space between the commas ("host1,host2") in printcap and
the ones with a space ("host1, host2") were okay. So I just edited the
printcap to add spaces after the commas in the "on" lines, stopped are
restarted lpd (/sbin/init.d/lpd stop then /sbin/init.d/lpd start) and
since then (10 hours or so) have had no "cannot lock file lock" messages
whereas before we were averaging, I guess, at least one every few
seconds, all the time.

Sounds a little absurd (or of course a bug), and of course it's possible
that there was something wacky about printing that a simple restart of
lpd fixed. But it certainly won't hurt for us to add the space in
printcap, if only out of superstition.

Thanks all
John

-----Original Message-----
From: Speakman, John H./Epidemiology-Biostatistics
[mailto:speakmaj@MSKCC.ORG]
Sent: Monday, September 29, 2003 5:21 PM
To: tru64-unix-managers@ornl.gov
Cc: Speakman, John H./Epidemiology-Biostatistics
Subject: Drowning in "lpd[nnnnnn]: ERROR -- lp: cannot lock file lock:
No such file or directory" messages

Hi all

Two node cluster of GS80s running 5.1apk3. Ever since we set up a
couple of printers, using lpr, we have gotten a lot of "lpd[nnnnnn]:
ERROR -- lp: cannot lock file lock: No such file or directory" messages.
Sometimes lp is another printer, seems to be always what was printed on
last. The funny thing is we don't do much printing; maybe two or three
jobs a day. But the jobs we do print come out fine. This first started
happening months ago; I first noticed because we got a few "high event
activity" e-mails and it turned out to be zillions of these messages.
Printing seems to work just fine, so foolishly I just thought "dumb
machine, bothering me with this stuff" and put a little filter in the
sysman EVM viewer to filter them out, then went back to sleep.

Then last week /var filled up all the way and I realize it's these
wretched messages. I deleted the lpr logs from the subdirectories
within syslog.dated to give us some breathing space, but I need a more
permanent solution. There are 2500 files in the evmlog directory,
mainly consisting of billions of these messages. There are now so many
I can't even run EVM from sysman, or even an evmget for the last five
seconds, without filling up the /tmp directory, i.e., the root
partition, with a jumbo temporary file.

Any ideas? Thanks in advance; I will summarize.
John

John Speakman
Manager, Clinical Research Systems
Memorial Sloan-Kettering Cancer Center
New York, NY, USA
646 735 8187 / speakman@biost.mskcc.org

 
 
=====================================================================
     
     Please note that this e-mail and any files transmitted with it may
be
     privileged, confidential, and protected from disclosure under
     applicable law. If the reader of this message is not the intended
     recipient, or an employee or agent responsible for delivering this
     message to the intended recipient, you are hereby notified that any

     reading, dissemination, distribution, copying, or other use of this

     communication or any of its attachments is strictly prohibited. If

     you have received this communication in error, please notify the
     sender immediately by replying to this message and deleting this
     message, any attachments, and all copies and backups from your
     computer.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:38 EDT