AWK programming lesson 5

This is the "everything else" chapter. Currently, I only have one other thing to say on AWK syntax: AWK arrays.

I previously didn't have a reason to use arrays, but since someone recently emailed me an example of when you could use arrays , I feel inclined to share it with folks.

We have previously covered variables, as a name that holds a value for you. Arrays are an extension of variables. Arrays are variables that hold more than one value. How can it hold more than one value? Because it says "take a number".

If you want to hold three numbers, you could say


value1="one"; value2="two"; value3="three";

OR, you could use

values[1]="one"; values[2]="two"; values[3]="three";

You must always have a number when using an array type of variable. You can pick any name for an array name, but from then on, that name can ONLY be used as an array. You CANNOT do

values[1]="one";
values="newvalue";

You CAN reassign values, just like normal variables, however. So the following IS valid:

values[1]="1";
print values[1];
values[1]="one";
print values[1];

The really interesting this is that unlike some other languages, you dont have to just use numbers. The [1],[2],[3] above are actually treated as ["1"], ["2"], ["3"]. Which means you can also use other strings as identifiers, and treat the array almost as a single column database.
numbers["one"]=1;
numbers["two"]=2;
print numbers["one"];
value="two";
print numbers[value];
value=$1;
if(numbers[value] = ""){ print "no such number"; }


When and how to use arrays

There are different times you might choose to use arrays. As I mentioned, I personally have never needed them :-) But here are some instances that might be relevant to you.

Storing info for later

Personally, I would just print this information out in a temporary file. But if you have a reason to do so, you could save particular words in memory, and print them out all at the end, which would be faster than using a temporary file.

/special/{ savedwords[lnum]=$2; lnum+=1; }
END	{
		count=0;
		while(savedwords[count] != "")
		{
			print count,savedwords[count];
			count+=1;
		}
	}
			
Instead of just printing the words out, you could use the END section to do some additional processing that you might need.

Arrays, and the split() function

The other primary reason to use arrays, is if you want to do subfields. Lets say you have a line, that has some course divisions, and some fine divisions. In other words, top level fields are separated by spaces, but then you get smaller words separated by colons.

  This is a variable:field:type line
  There can be multiple:type:values here

In the above, the fourth space-separated field, has subfields separated by colons.
Now let's say you wanted to know the value of the second subfield, in the fourth major field. One way to deal with this, would be to call two awks, piped together:

awk '{print $4}' | awk -F: '{print $2}'

Yet another way would be to change the field separator variable 'FS', on the fly:

awk '{ newline=$4; fs=FS; FS=":";  $0=newline; print $2 ; FS=fs; }'

But you could also do it with arrays, using the split() function, as follows:

awk '{ newline=$4; split(newline,subfields,":"); print subfields[2]} '


As you can see, it is rarely, if ever, *neccessary* to use arrays. But I present them here for completeness :-)

This is probably the last of the AWK lessons, simply because I think this is all the syntax AWK has. I'd be happy to give some task-specific example, if you email me with some interesting problem for awk.



AWK summary

All the features I have mentioned, make AWK a fairly decent langauge. Its main drawback is that it is so line-oriented. It would be kind of nice to have a proceedural language with all the power of AWK. Which is why perl was invented.

Unfortunately, Larry then decided to go waaaaay beyond the simple concept, by throwing in the kitchen sink, AND destroying the cleanness of the AWK language syntax, all in the name of reducing the number of keystrokes needed to accomplish something. The extra functions in perl are good. The syntax, however, is disgusting to programmers capable of touch-typing more than 20 words a minute.

Here endeth the lesson


Top of AWK lessons
Author: phil@bolthole.com
bolthole main page