Stanford University
Search Directory News & Events Computing Resources Home
School of Earth Sciences
  

Shell Programming

last revision November 14, 2002

This discussion applies to scripts written for the C-shell (csh). Bourne shell (sh) scripts are similar, although the exact syntax differs.

Table of Contents

  1. Overview of shell scripts
  2. Basic types of statements in a shell script
  3. How do you execute a shell script?
  4. Examples of simple scripts
  5. Working with script variables, including command-line arguments
  6. Command substitution
  7. Expressions involving variables
  8. Other forms of input to shell variables or commands in a script
  9. Flow-of-control statements

Overview of shell scripts

A shell script is a file containing a set of commands to be executed (run) by the shell in sequence as it reads the file.

In its simplest form, a shell script simply saves having to re-type a set of commands that are often executed, such as the the initialization commands for your login session that are stored in the .login script file.

Strictly speaking, everything that you can put into a shell script can also be executed interactively by typing on the command line, although the looping constructs can be cumbersome. This gives you the chance to test out the syntax of various shell constructs.

The shell provides tools to make shell scripts more powerful, even full-fledged programs. Shell programming is organized around the concepts of "substitution" and "flow-of-control".

  • Substitution is used to manipulate values within the script. It involves using the values of variables as part of a command; and taking the output of a command to be the new value of a variable.
  • Flow-of-control refers to common programming constructs such as loops and if-then-else statements that are used to control which statements in the script are executed in which order, often depending upon the value of a variable; or to cause repetitive execution of a set of commands with different variable or input values.

Back to Table of Contents

Basic types of statements in a shell script

Comments
The "pound" or "hash" sign (#) signals the start of a comment. This sign and anything that follows it up to the end of the line is interpreted as a comment for the author and is not executed.

Commands to be executed in a new process
If a line in the script does not begin with one of the reserved keywords that is used for variable operations or flow-control, it is assumed to be a command that you want to execute as a new child process. The entire line is scanned, looking for variables that need to be substituted and wildcards characters ("*", "?", etc) that need to be matched to filenames. After all substitutions are performed, it is executed as if you had typed it at the terminal.

Setting and substituting variable values
Special keywords such as "set" or the "$" symbol indicate a variable is to be set or used.

Flow-of-control: loops, conditionals
Special keywords such as "if" or "foreach" are used to start lines in the script that provide flow control or conditional execution.

Back to Table of Contents

How do you execute (run) a shell script?

There are two methods you can use to execute a shell script.

First, you can give the script file name as an argument to an instance of the shell program, that is, type a command like:

    csh filename
"csh" is the name of the C-shell program itself. This command starts up a new C-shell process that executes the commands in the script "filename" and then terminates.

Second, you can give the name of the shell script itself as a command, just like any other program on UNIX. First, you have to let the UNIX kernel recognize that this is a shell script by doing the following two steps. Then you can simply type the shell's filename as a command name to execute it. The kernel will automatically start up a new C-shell process to execute the commands in the script.

Note that your login shell has to be able to find the shell script file when you type its name as a command. The login shell only looks in a set of specific directories, called its "path", to find files that contain programs. On pangea, the default path includes your current working directory, so you can run a shell script in the current directory simply by typing its name. Otherwise, type the absolute pathname of the script (for example, /home/sysop/farrell/programs/addup), or add the directory where the script lives to your standard path. Whenever you add a directory to your standard path, you must run the "rehash" built-in C-shell command to tell your login shell to rebuild its list of programs using your new path definition.

To make your shell script file executable as a program, do these steps:

  • Put the following line as the first line in your script:
      #! /bin/csh -f
    This is a special comment telling the kernel that you want this script to be executed by the C-shell (there is an alternate shell named simply "sh"). The "-f" option helps the command to start up faster by skipping the initial read of the .cshrc file.
  • Use chmod to set execute permission for your file. For instance, if you want anyone to be able to execute the script file, use
      chmod ugo+x filename

Back to Table of Contents

Examples of simple scripts

Here are examples of some simple scripts that simply collect commands together in a script file to save having to type them every time they are needed.

/home/sysop/farrell/bin/atlookall on pangea.

This script runs the "atlook" command, which searches specified AppleTalk network zones for devices, with a long list of zones applicable to Earth Sciences. Making a script saves having to type all those zone names every time I run it.

#! /bin/csh -f
# Run atlook to check all Earth Sciences AppleTalk zones
# insert any options ahead of zone names
# backslashes at ends of lines continue the command to the next line.

atlook $* ES-Ethernet ES-Green-East ES-Green-West ES-Green-Concourse \
	ES-Mitchell-SB ES-Mitchell-B ES-Mitchell-1st ES-Mitchell-3rd \
	ES-Mitchell-4th ES-GeoCorner-1st ES-GeoCorner-2nd ES-GeoCorner-3rd

/local/bin/handcart on pangea.

Typing the command "handcart" on pangea results in the execution of this script, which is stored in one of the standard system program directories. All the script does is print information about how to use the hardcart in the Mitchell Building.
#! /bin/csh -f
more << EOF
Combination for the combination lock on the
School of Earth Sciences hand cart:

Turn RIGHT at least three full turns, stop at 8
Turn LEFT 1.75 turns (turn past 8), stop at 38
Turn RIGHT, stop at 4, pull shackle

The cart is parked under the south stairwell
in the subbasement of the Mitchell Building.
Rules for use are posted nearby.  Please
contact Felicia in the Dean's Office (723-5490)
for further help.
EOF

/local/bin/pine on pangea

In order to provide global options that will affect all runs of the "pine" e-mail reader program on pangea, I created a simple script in the standard system program directory /local/bin that first makes the settings I want, and then calls the real pine program, which is stored under another name.
#! /bin/csh -f
#  Driver shell for the "pine" mail handling program.
#  P. Farrell, 30 April 1996
#
#  This allows me to set default options.  Previously, I would
#  set the -z option to allow program suspension; but that is now
#  set in the /local/lib/pine.conf file.  Unset the DISPLAY variable,
#  which pangea sets for all logins, whether X terminal or not,
#  so pine won't try to display MIME images on non-X terminals.
#  X logins can use name "pinex" to run the program directly.
#
unsetenv DISPLAY
#
#  The syntax $*:q will substitute the argument list exactly as 
#  passed, with embedded blanks in words, escaped metacharacters
#  uninterpreted, etc.   Verified by testing.
#
exec /local/bin/pine4.33 $*:q

Back to Table of Contents

Working with script variables, including command-line arguments

In order to make general scripts, that can be used with different files or under different circumstances, you need to have variables. A variable in a shell script -- or any programming language -- is like a variable in an algebraic expression. It is simply a name that can stand for a value which can vary. In shell scripts, you can create variables and set their values by many different methods:

  • Command-line arguments typed after the script name when you run it.
  • Simple set statements within the script itself, for example, to initialize a value that may change later, or to gather all the values that you might want to change into a single list of parameters at the top of the script.
  • set statements that follow a test of some kind: if there is one result, set the variable to one value; if another result from the test, set the variable to a different value.
  • Arithmetic computations that modify the value of a variable.
  • A command substitution that runs another command within the script and capture its output to be the value of a variable.

You can use these variables as parts of other commands that you run from within the shell script: as the list of options, or the names of files to be affected, etc.

Specific C-shell implementations may have limits on the size of variables. The csh program on pangea, which runs Tru64 UNIX v4.0g, apparently has no limit on the total size of a variable (total number of bytes of data you can assign to a variable), but does limit individual words within the variable's data to be no more than 1024 bytes each. Here, a word is defined as a string of contiguous characters that includes no blank or tab characters (no "white space"). That is, the contents of the variable are broken into words wherever a blank or tab is found.

Command line arguments to scripts

When you start a script from your interactive login shell, you can provide arguments to that script on the command line. These are automatically turned into variables that can be used inside the script.

If the command line contains filename wildcard characters, variable substitution references, or command substitution references, those are expanded or substituted first. Then the command line string is broken into separate arguments at blanks, except that a quoted string can contain embedded blanks.

You refer to these arguments as separate variables within the script itself by using the dollar sign (variable substitution operator) followed by an integer number, for example,

    cp $1 $2
This statement inside a shell script would run the cp program with the first "argument" to the shell script (first word on the command line that started the shell script) passed as the name of the file to copy via $1, and the second argument to the shell script passed as the name of the new copy via $2.

The entire list of command line arguments can be referenced as one string with the syntax

    $*

Specific C-shell implementations may impose limits on the number or size of arguments that can be passed to a shell script. The csh program on pangea uses a memory area of 38,912 bytes in length to store the expanded list of arguments (after wildcard filename matching or variable or command substitution is done) that can be passed to a shell script, or indeed, to any program that is started by the shell. Environment variables are also stored in this same memory area, so if you have many environment variables, you reduce the total length of an argument list that you can use. You can see how many total bytes are used by your environment variables with this command:
      printenv | wc -c
A typical pangea user will use 500 to 1000 bytes for environment variables, thus reducing the maximum size of the complete argument list for a shell script or other command by that amount

Making and setting your own variables in a script

In addition to the command line arguments, the shell maintains a table of other user-created or special purpose variables in memory. Each variable has a name and a value.

  • Names - up to 20 letters or digits (start with letter) - case matters!
  • Values are strings of characters or digits of arbitrary length without any intrinsic "type". They are treated as character strings or numeric values, depending upon how they are used.

It is also possible to treat any variable as an array of words and access each word separately (see detailed documentation on the C-shell).

Certain variable names are reserved by the shell for special uses, such as "path" or "term".

You can create any number of variables.

Use the set command to create/assign variables, for example:

    set name=single_word
    set name=(word list)
    set name="string with embedded blanks"

A set command with no value just creates the variable as a flag that is "on", and will have the value "true" in a logical expression, for example:

    set optionflag

The unset command removes a variable completely from memory, for example:

    unset name

Using variables in the script

"Variable substitution" is the process of replacing a reference to the name of a variable with its actual value. This is how we use variables.

The dollar sign ($) is the basic substitution operator when it is used as the prefix for a variable name. Anytime you use the dollar sign as the first letter of a word in a shell command, it will expect the word to be the name of a variable. If you want the dollar sign to be interpreted as just a simple dollar sign, precede it wth the backslash (\) "escape" character. Here are the basic formats for variable substitution:

$?name
This tests whether the name variable exists. If the variable does exist, the shell substitutes the value "1" (one, true); if not, the value "0" (zero, false). Use this form if you are just using the variable as a flag. The result can be used in an "if" statement to conditionally execute some commands.

$name
This form causes the entire word list value of name to be substituted for the reference. If name is not defined (was never set), you get an error.

$#name
This substitutes the number of words contained within the name variable. If the variable has a null value (that is, simply set as a "flag" variable), it substitutes zero. If the variable has never been set, you get an error.

$name[n]
This substitutes the "nth" word (blank separated value) from the name variable. The square brackets are required to enclose the value n that specifies which word is wanted, and must follow the variable name with no intervening spaces. This is a way to treat a variable containing a multi-word value as an array of separate words. If you specify a word index value n that is greater than the actual number of words in the variable, you get an error.

Examples:

set a = ($b)
Sets new variable "a" equal to the word list in existing variable "b".

echo $b
Echoes (prints) the value of existing variable "b" to the standard output (terminal).

Back to Table of Contents

Command substitution

The result of any command, meaning whatever it would write to standard output, can be "captured" by the shell and used to set the value of a variable or as part or all of the arguments to another command. This is different from sending the output through a pipe to be the input of another command. Here, the output of one command becomes the arguments of another command, not the input file. This is called "command substitution".

Capturing the output of a command is requested by enclosing that entire command with its arguments in a set of matching backwards quote marks (`) (also called "accent" mark or "grave" mark). This is not the same character as the apostrophe or single quote, which slants forward (').

In captured output, all newline characters (end-of-line markers) are changed to blanks, so the lines in the captured output are joined into one long string of words. Empirical tests on pangea show that the total length of this captured string can be at least 50,000 bytes. Other UNIX systems may have smaller limits. All should allow at least 4,000 bytes in command substitution captured strings.

The shell stores captured output in a temporary area of memory. Once you have captured the output of a command, of course you want to do something with it. You can assign the output to a variable, or use it as part or all of the arguments to another command.

Examples:

  • If a file contains a list of names, I can assign the contents of thaS-Mitchell-4th ES-GeoCorner-1st ES-GeoCorner-2nd ES-GeoCorner-3rd

    /local/bin/handcart on pangea.

    Typing the command "handcart" on pangea results in the execution of this script, which is stored in one of the standard system program directories. All the script does is print information about how to use the hardcart in the Mitchell Building.
    #! /bin/csh -f
    more << EOF
    Combination for the combination lock on the
    School of Earth Sciences hand cart:
    
    Turn RIGHT at least three full turns, stop at 8
    Turn LEFT 1.75 turns (turn past 8), stop at 38
    Turn RIGHT, stop at 4, pull shackle
    
    The cart is parked under the south stairwell
    in the subbasement of the Mitchell Building.
    Rules for use are posted nearby.  Please
    contact Felicia in the Dean's Office (723-5490)
    for further help.
    EOF
    

    /local/bin/pine on pangea

    In order to provide global options that will affect all runs of the "pine" e-mail readeles; as substitutions inreated a simple script in the standard system program directory /local/bin that first makes the settings I want, and then calls the real pine program, which is stored under another name.
    #! /bin/csh -f
    #  Driver shell for the "pine" mail handling program.
    #  P. Farrell, 30 April 1996
    #
    #  This allows me to set default options.  Previously, I would
    #  set the -z option to allow program suspension; but that is now
    #  set in the /local/lib/pine.conf file.  Unset the DISPLAY variable,
    #  which pangea sets for all logins, whether X terminal or not,
    #  so pine won't try to display MIME images on non-X terminals.
    #  X logins can use name "pinex" to run the program directly.
    #
    unsetenv DISPLAY
    #
    #  The syntax $*:q will substitute the argument list exactly as 
    #  passed, with embedded blanks in words, escaped metacharacters
    #  uninterpreted, etc.   Verified by testing.
    #
    exec /local/bin/pine4.33 $*:q
    

    Back to Table of Contents

    Working with script variables, including command-line arguments

    In order to make general scripts, that can be used with different files or under different circumstances, you need to have variables. A variable in a shell script -- or any programming language -- is like a variable in an algebraic expression. It is simply a name that can stand for a value which can vary. In shell scripts, you can create variables and set their values by many different methods:

    • Command-line arguments typed after the script name when you run it.
    • Simple set statements within the script itself, for example, to initialize a value that may change later, or to gather all the values that you might want to change into a single list of parameters at the top of the script.
    • set statements that follow a test of some kind: if there is one result, set the variable to one value; if another result from the test, set the variable to a different value.
    • Arithmetic computations that modify the value of a variable.
    • A command substitution that runs another command within the script and capture its output to be the value of a variable.

    You can use these variables as parts of other commands that you run from within the shell script: as the list of options, or the names of files to be affected, etc.

    Specific C-shell implementations may have limits on the size of variables. The csh program on pangea, which runs Tru64 UNIX v4.0g, apparently has no limit on the total size of a variable (total number of bytes of data you can assign to a variable), but does limit individual words within the variable's data to be no more than 1024 bytes each. Here, a word is defined as a string of contiguous characters that includes no blank or tab characters (no "white space"). That is, the contents of the variable are broken into words wherever a blank or tab is found.

    Command line arguments to scripts

    When you start a script from your interactive login shell, you can provide arguments to that script on the command line. These are automatically turned into variables that can be used inside the script.

    If the command line contains filename wildcard characters, variable substitution references, or command substitution references, those are expanded or substituted first. Then the command line string is broken into separate arguments at blanks, except that a quoted string can contain embedded blanks.

    You refer to these arguments as separate variables within the script itself by using the dollar sign (variable substitution operator) followed by an integer number, for example,

      cp $1 $2
    This statement inside a shell script would run the cp program with the first "argument" to the shell script (first word on the command line that started the shell script) passed as the name of the file to copy via $1, and the second argument to the shell script passed as the name of the new copy via $2.

    The entire list of command line arguments can be referenced as one string with the syntax

      $*

    Specific C-shell implementations may impose limits on the number or size of arguments that can be passed to a shell script. The csh program on pangea uses a memory area of 38,912 bytes in length to store the expanded list of arguments (after wildcard filename matching or variable or command substitution is done) that can be passed to a shell script, or indeed, to any program that is started by the shell. Environment variables are also stored in this same memory area, so if you have many environment variables, you reduce the total length of an argument list that you can use. You can see how many total bytes are used by your environment variables with this command:
          printenv | wc -c
    A typical pangea user will use 500 to 1000 bytes for environment variables, thus reducing the maximum size of the complete argument list for a shell script or other command by that amount

    Making and setting your own variables in a script

    In addition to the command line arguments, the shell maintains a table of other user-created or special purpose variables in memory. Each variable has a name and a value.

    • Names - up to 20 letters or digits (start with letter) - case matters!
    • Values are strings of characters or digits of arbitrary length without any intrinsic "type". They are treated as character strings or numeric values, depending upon how they are used.

    It is also possible to treat any variable as an array of words and access each word separately (see detailed documentation on the C-shell).

    Certain variable names are reserved by the shell for special uses, such as "path" or "term".

    You can create any number of variables.

    Use the set command to create/assign variables, for example:

      set name=single_word
      set name=(word list)
      set name="string with embedded blanks"

    A set command with no value just creates the variable as a flag that is "on", and will have the value "true" in a logical expression, for example:

      set optionflag

    The unset command removes a variable completely from memory, for example:

      unset name

    Using variables in the script

    "Variable substitution" is the process of replacing a reference to the name of a variable with its actual value. This is how we use variables.

    The dollar sign ($) is the basic substitution operator when it is used as the prefix for a variable name. Anytime you use the dollar sign as the first letter of a word in a shell command, it will expect the word to be the name of a variable. If you want the dollar sign to be interpreted as just a simple dollar sign, precede it wth the backslash (\) "escape" character. Here are the basic formats for variable substitution:

    $?name
    This tests whether the name variable exists. If the variable does exist, the shell substitutes the value "1" (one, true); if not, the value "0" (zero, false). Use this form if you are just using the variable as a flag. The result can be used in an "if" statement to conditionally execute some commands.

    $name
    This form causes the entire word list value of name to be substituted for the reference. If name is not defined (was never set), you get an error.

    $#name
    This substitutes the number of words contained within the name variable. If the variable has a null value (that is, simply set as a "flag" variable), it substitutes zero. If the variable has never been set, you get an error.

    $name[n]
    This substitutes the "nth" word (blank separated value) from the name variable. The square brackets are required to enclose the value n that specifies which word is wanted, and must follow the variable name with no intervening spaces. This is a way to treat a variable containing a multi-word value as an array of separate words. If you specify a word index value n that is greater than the actual number of words in the variable, you get an error.

    Examples:

    set a = ($b)
    Sets new variable "a" equal to the word list in existing variable "b".

    echo $b
    Echoes (prints) the value of existing variable "b" to the standard output (terminal).

    Back to Table of Contents

    Command substitution

    The result of any command, meaning whatever it would write to standard output, can be "captured" by the shell and used to set the value of a variable or as part or all of the arguments to another command. This is different from sending the output through a pipe to be the input of another command. Here, the output of one command becomes the arguments of another command, not the input file. This is called "command substitution".

    Capturing the output of a command is requested by enclosing that entire command with its arguments in a set of matching backwards quote marks (`) (also called "accent" mark or "grave" mark). This is not the same character as the apostrophe or single quote, which slants forward (').

    In captured output, all newline characters (end-of-line markers) are changed to blanks, so the lines in the captured output are joined into one long string of words. Empirical tests on pangea show that the total length of this captured string can be at least 50,000 bytes. Other UNIX systems may have smaller limits. All should allow at least 4,000 bytes in command substitution captured strings.

    The shell stores captured output in a temporary area of memory. Once you have captured the output of a command, of course you want to do something with it. You can assign the output to a variable, or use it as part or all of the arguments to another command.

    Examples:

    • If a file contains a list of names, I can assign the contents of that file to a variable by capturing the output of a "cat" command that is listing the file to standard output . Here is the kind of line I might use for that in a shell script:
        set names = ` cat file `

    • I could use that captured list of names from the cat command directly as the argument list to another command, such as finger:
        finger ` cat file `
      This example could also be typed as an interactive command.

    • I could tell the "mv" command to move all the files whose names are listed in the file "list" to the new directory "newdir" with this command:
        mv `cat list` newdir
      Again, this example could be typed as an interactive command.

    Back to Table of Contents

    Expressions involving variables

    Expressions can be used in assigning values to new variables; as substitutions in command lines in the script; and in flow-of-control statements: "if", "foreach", "while", and "switch".

    Expressions are always enclosed in a pair of parentheses "(" and ")". Note that you should put blanks (white space) around the parentheses and the operators within them to insure proper parsing.

    Arithmetic expressions

    Standard arithmetic operators (+ - * /) can be used to operate on integer variable values. To save the results in another variable, however, you should use the "@" command rather than the "set" command for the new variable. For example, you can increment a sum with the value of an argument like this:

      @ sum = 0 # initialize
      @ sum = ( $sum + $1 )
    Note that there must be a space between the "@" command and the name of the variable that it is setting.

    The C-shell cannot do floating point (decimal) arithmetic, for example, 1.1 * 2.3. However, you can invoke the bc calculator program from within a shell script to perform decimal arithmetic. Probably the simplest way to do that is to define an alias, called for example, "MATH", that performs decimal arithmetic via the bc program on the variables or constants passed to the alias. For example, you can define this alias in your script

      # Set MATH alias - takes an arithmetic assignment statement
      # as argument, e.g., newvar = var1 + var2
      # Separate all items and operators in the expression with blanks
      alias MATH 'set \!:1 = `echo "\!:3-$" | bc -l`'
    This says that the word "MATH" will be replaced by a set command which wil type of looping statement is good for repetitively executing the same commands for a set of arguments. For example, you could check all the arguments given to the shell to see if they are plain files, and if so, make backup copies, using these statements in a shell script:
      foreach file ( $* )
      if (-f $file) cp $file $file.bak
      end

    The foreach command can also be used interactively from the terminal! In this form, you will get question mark prompts (?) to enter your commands. When done, type "end" after one of these prompts. For example, I could type the following commands interactively at my terminal to make backup copies of all Fortran programs in my current directory:

      foreach file ( *.f )
      ? if (-f $file) cp $file $file.bak
      ? end

    Back to Table of Contents


    Copyright Phillip Farrell