AWK programming lesson 2

The full syntax used in an awk program is something like

PATTERN {COMMAND(S)}

What this means is,

"For each line of input, go look and see if the PATTERN is present. If it is present, run the stuff between {}"

[If there is no pattern specified, the command gets called for EVERY line]

A specific example:


  awk '/#/ {print "Got a comment in the line"}' /etc/hosts

will print out "Got a comment" for every line that contains at least one '#', **anywhere in the line**, in /etc/hosts

The '//' bit in the pattern is one way to specify matching. THere are also other wasy to specify if a line matches. For example,


  $1 == "#" {print "got a lone, leading hash"}

will match lines that the first column is a single '#'. The '==' means an EXACT MATCH of the ENTIRE column1.

On the other hand, if you want a partial match of a particular column, use the '~' operator


  $1 ~ /#/ {print "got a hash, SOMEWHERE in column 1"}

NOTE THAT THE FIRST COLUMN CAN BE AFTER WHITESPACE.

Input of "# comment" will get matched
Input of " # comment" will ALSO get match

If specifically wanted to match "a line that begins with exactly # and a space" you should use


  /^# /  {do something}


Multiple matching

Awk will process ALL PATTERNS that match the current line. So if the following example is used,

  awk '
     /#/ {print "Got a comment"}
     $1 == "#" {print "got comment in first column"}
     /^# /  {print "Found comment at beginning"}
   ' /etc/hosts

you will get THREE printouts, for a line like
# This is a comment
TWO printouts for
  # This is an indented comment
and only one for
1.2.3.4 hostname # a final comment

Keeping track of context

Not all lines are created equal, even if they look the same. Sometimes you want to do something with a line, based on lines that came before it.

Here is a quick example that prints "ADDR" lines, if you are not in a "secret" section


   awk '

   /secretstart/  	{ secret=1}
   /ADDR/		{ if(secret==0) print $0 } /* $0 is entire line */
   /secretend/		{ secret=0} '

The following will print out stuff that has "ADDR" in it, except if a "secretstart" string has been seen. ORDER MATTERS. For example, if the above was instead written as

   awk '

   /ADDR/		{ if(secret==0) print $0 } /* $0 is entire line */
   /secretstart/  	{ secret=1}

   /secretend/		{ secret=0} '

and given the following input

ADDR a normal addr
secretstart ADDR a secret addr
ADDR another secret addr
a third secret ADDR
secretend
ADDR normal too

it would PRINT OUT the first "secret" addr. Whereas the original would keep both secrets quiet.


Top of AWK lessons
Author: phil@bolthole.com
bolthole main page