[HPADM] SUMMARY2: awk script to compare "ll" output of similar directori es

From: Armanini Oscar (Oscar.Armanini@bci.ch)
Date: Mon Jun 24 2002 - 07:58:29 EDT


Hi all

I am posting a second summary because:
a) three weeks ago (sorry for the delay) I received another useful replay
b) two people asked me to share the awk script (possibly with the whole
list).

######
By Allan Marillier:
You could also look at Midnight Commander, written by Miguel d'Icaza.
This is a shell, modelled after Norton Commander on MS-DOS from
many years ago. It has a lot of powerful features, including a directory
compare facility. It's freeware, download a source and compile for your
platform very easily. It doesn't take a lot of room.
#######
The awk script provided by David Ledger with some changes made by me:
ll -d /dir1/* /dir2/* | awk '
$1 !~ /^-/ { next } # skip non-files
{ name = $9; sub(/^.*\//, "", name) } # extract the name part of all
(name in mod) { # do this bit if name already seen
    date = sprintf("%s %s %s", $6, $7, $8)
    if (($1 != mod[name]) ||
        ($3 != own[name]) ||
        ($4 != grp[name]) ||
        ($5 != siz[name]) ||
        (date != dat[name]))
    {
        printf("%s %s %s %d %s %s | %s %s %s %d %s %s\n",
        $1, $3, $4, $5, date, $9,
        mod[name], own[name], grp[name], siz[name], dat[name], pth[name])
    }
    delete mod[name] # lets us see unique entries
    next
}
# save details of the first seen
{
    mod[name] = $1
    lnk[name] = $2
    own[name] = $3
    grp[name] = $4
    siz[name] = $5
    dat[name] = sprintf("%s %s %s", $6, $7, $8)
    pth[name] = $9
}
# show which names were only seen once
END {
    for (x in mod) {
        printf("%s %s %d %s %s %s\n",
            mod[x], own[x], grp[x], siz[x], dat[x], pth[x])
    }
}'

############

In order to understand what the script did (the use of arrays and hashes), I
spent some hours learning awk.
I found these resources:
http://www.4awk.com/,
 http://www.canberra.edu.au/~sam/whp/awk-guide.html,
 http://sparky.rice.edu/~hartigan/awk.html,
 http://allman.rhon.itam.mx/dcomp/awk.html,
 http://ortega.cs.ucdavis.edu/~byrav/Mozz/awk.doc,
 http://www.cs.hmc.edu/tech_docs/qref/awk.html,
 http://linus.chem.wesleyan.edu/documentation/gawk/gawk_toc.html,
 http://icarus.weber.edu/home/bob/web/cs213/awk/nawk_toc.html,
 http://www.faqs.org/faqs/computer-lang/awk/faq/:
and also used:
a) awk man pages,
b) 12 awk pages of O'Reilly "Unix in a nutshell",
c) 16 awk pages of Kernighan,Pike "The Unix Programming Environment":

###

Oscar Armanini
BCI
Lugano, Switzerland

----------------------------------------------------------------------------
--------------------------
Hi Admins
thanks for all answers.
1) Erik Platzbecker suggested to search for some Perl script,
which should be more usable than awk for this kind of job
2) Totsch David suggested to use "dircmp(1)" and /or "pdf(4)".
I didn't suspect the existence of such commands (they seem to be
peculiar of HP-UX only) and I have to investigate,
because in case of simple "ll" of files (not using "what" to get
infos on version) they should be OK.
3) Thierry Itty suggested to use diff.
The problem with diff, sdiff, ... is that they don't
realize if two lines are different but they have the same
file name (I thought my example was clear).
4) David Ledger sent me some examples which will be very useful
on writing my own script. That's what I needed.

Thanks again
Oscar Armanini
BCI
Lugano, Switzerland

-----------------------------------------------------------------------

Hi Admins
I need help on writing an "awk" script which compares the output of "ll"
command
on the same similar subdir of two different users and which shows the
differences
(i.e. files existing only in one of the two directories or
files existing in both but which differs because of permissions,
date, bytes, ...).
There are many dirs to check
(/home/user1/dir1 against /home/user2/dir1,
 /home/user1/dir2 against /home/user2/dir2,
 /home/user1/dir3 against /home/user2/dir3,
....)
and each one contains hundreds (sometimes thousands) of files, most of them
exist in both the directories.

The expected result for "dir1" should be something like this:
       USER1 USER2
-rwxr-xr-- 2191 Apr 4 13:49 fileA | -rwxr-xr-- 2199 Apr 4 21:20 fileA
-rwxr-xrw- 7891 Apr 5 12:29 fileG <
> -rwxr-xr-- 6591 Apr 9 13:59 fileM
>From both directories "dir1", the most updated files will be taken into
consideration.

The minimal approach of using:
$ ll /home/user1/dir1/ \
| awk '{ print $1, $5, $6, $7, $8, $9}'> user1_dir1
$ ll /home/user2/dir1/ \
| awk '{ print $1, $5, $6, $7, $8, $9}'> user2_dir1
$ sdiff -s user1_dir1 user2_dir1 \
   | /usr/bin/egrep " < | > | \| " > sdiff_dir_1
doesn't work as I would like.
The resulting file (only a few records) is this:
-rwxr-xr-x 73948 Sep 15 1999 AD01S19.fmx | -rwxr-xr-x 94036 Feb 21 2001
AD01S19.fmx
-rwxr-xr-x 95732 Feb 21 2001 AD01S61.fmx | -rwxr-xr-x 98428 Feb 21 2001
AD01S7.fmx
-rwxr-xr-x 77876 Sep 15 1999 AD01S7.fmx | -rwxr-xr-x 92644 Feb 21 2001
AD01S8.fmx
-rwxr-xr-x 72132 Sep 15 1999 AD01S8.fmx | -rwxr-xr-x 91828 Feb 21 2001
AD01S93.fmx
-rwxr-xr-x 71596 Sep 15 1999 AD01S9.fmx | -rwxr-xr-x 93844 Feb 21 2001
AD02B1.fmx
-rwxr-xr-x 73852 Sep 15 1999 AD02B1.fmx | -rwxr-xr-x 82596 Apr 5 14:36
AD02B2.fmx
and as you can see the two columns are simply shifted below / upper when in
one file there is a record
which doesn't exist in the other.

The right approach should be something this way (using "awk"):
a) put the records from both "dir1" into the same file (adding a label to
recognize
if they came from USER1 or USER2)
2) sort the file on the "filename", "label" fields
3) process the file:
  a-if current record from USER1 is egual to subsequent from USER2 (except
that in the "label")
   delete both current record and subsequent, else
  b-if current record from USER1 is different than subsequent from USER2
(but has same "filename")
   put them on the same line, separated by " | ", else
  c-if current record from USER1 is not followed by another from USER2 with
same "filename"
   add at the end of line a " < ", else
  d-if current record is from USER2, shift it on the right and add a " > "
at the beginning
but I don't know "awk" that much.
Maybe other solutions exist, but I have no idea.

I don't want to use "find . -newer ...." or "find . -ctime ..."
because I have to check permissions and bits too. Also the
script should be usable if don't I use "ll" to generate a list of file
but something else
(for example if I use "$ for i in /home/user1/dir1/* do; what $i >>
user1_dir1; done"
and the "what" command gives me 'filename version').

I realize this is a real challenge.
If there is not a simple solution I will write a C program or insert the
records into a database
and work with SQL.

I will summarize.
Thanks in advance

Oscar Armanini
BCI
Lugano, Switzerland

--
             ---> Please post QUESTIONS and SUMMARIES only!! <---
        To subscribe/unsubscribe to this list, contact majordomo@dutchworks.nl
       Name: hpux-admin@dutchworks.nl     Owner: owner-hpux-admin@dutchworks.nl
 
 Archives:  ftp.dutchworks.nl:/pub/digests/hpux-admin       (FTP, browse only)
            http://www.dutchworks.nl/htbin/hpsysadmin   (Web, browse & search)


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 11:02:15 EDT