ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

How to write an awk script to create a simple tally report

Updated on July 21, 2015

What's with the name?

Unix began soon after the dawn of the computing age. Within its many stories and fables, the legend of awk comes straight from the famous Bell Labs and is named for its inventors, Aho, Weinstein, and Kernighan. (Yes, that Kernighan. The man!! You'd have to know a little something about the history of C to get that reference ...)

Awk is an interpreted language with a focus on text file analysis. Some of its notable strengths are support for regular expressions and associative arrays, similar to its successor Perl. If you have the opportunity to learn more about Perl and awk, you will see how they share similarities with their predecessor, the C programming language. Personally, I learned awk a year or two before beginning with Perl, because I found awk to be a little more approachable.

Within the GNU project, awk was rewritten to be royalty free and renamed to gawk (GNU awk) to distinguish from the original program. With a few exceptions, most awk scripts can be run by gawk and vice versa. For this tutorial, I will ignore the distinction between awk and gawk and simply refer to the interpreter as awk. Let's get started writing this awk script.

Example of awk pattern and response
Example of awk pattern and response | Source

Getting started

If you use Mac OS or Linux, you have all you need to get started: awk comes preinstalled. If you're on a Windows computer, head over to Sourceforge and grab the Unxutils download. Once you get that unzipped and installed, it's time to open your favorite text editor and let's get started.

I think of awk as approaching input files with a focus on columns. (You might want to open this reference in a separate window to keep handy as you read the tutorial.) Input files are implicitly opened for reading. The structure of an awk script is

pattern { response }

where pattern can be a regular expression, or one of the keywords, BEGIN or END. As each line is read from the input file, if pattern matches the line, then awk evaluates response. In the screen cap image "Example of awk pattern and response", for every input line containing the string "JDubya" (case sensitive), the entire input string is printed to stdout.

In the awk reference, be sure to dive deeper into the idea of field separator (FS) and record separator (RS). These built-in variables tell the interpreter how to split up input files into lines (or records) and how to split those records into fields (or columns).

Built In variable
field separator
record separator
number of fields in current record
quick list of definitions so far

Separating into records and fields

I mentioned that I think of awk as having a focus on columns. Consider an example. Leaving FS and RS at their default values (white space and newline, respectively), the following input line will be parsed into ten distinct fields, addressed as $1, $2, $3, ... $10.

Here are ten words separated by whitespace ended by newline

As awk parses this record, it stops at the newline (default value of RS). Fields are the information found in a record between the field separator (default of whitespace, such as the space or tab character). The example above yields field 1 (or $1) as "Here", $2 "are", $3 "ten" ... $9 "by", $10 "newline".

Command line settings, and how to invoke your script

To invoke your first awk script, my-first-script.awk, you need to tell the awk interpreter where to find your script with -f, and where to find your input file. Because we're dealing with Unix, input pipes are welcome. (Windows can handle pipes too, although to a lesser degree.) If you don't use redirection, you can explicitly define an input file as a command line parameter.

awk -f my-first-script.awk some-input-file

The built-in variables FS and RS change the way that awk parses input files. The keyword BEGIN tells awk to interpret the response before evaluating any input records. Built-in variables can be set in the BEGIN stanza, but that can get clunky.

A handy shortcut for setting the FS variable on the command line is to pass in -F"FS" where FS is the desired field separator. One of my favorite scenarios for using awk is to parse through CSVs, or text files of comma-separated-values. Let's say my script is called csv.awk, and my csv is called report.csv, and assume I have not adjusted any built-in variables within csv.awk.

awk -F, -f csv.awk report.csv

By invoking the -F command line switch in this way, I've told awk to interpret the instructions in csv.awk using the comma (,) as a field-separator.

team tally
team tally | Source
results from team tally

A little more advanced awk

In 2003 I posted to a DHCP mailing list with a link to an awk script for parsing DHCP lease files. (If you look at the archives, you'll see the link to www3. Don't bother trying to click it. That server has been offline for a while.)

In 2005 that same script got published with an update to the DHCP server bundle in the SUSE linux project.

I vaguely remember the conversation with the SUSE project maintainer, but I had completely forgotten about the DHCP mailing list. When I tried to find a link to that 2005 awk script, Google found the mailing list post from 2003. That's a long memory, Google!

I found the script and posted it in a code block at the end of this article. Post a comment if you have any questions about it, or if you're interested in seeing a full hub devoted to how it works.

Generate a simple tally from a CSV

Consider the example CSV in the code section below. Red, Blue, and Green teams are competing to capture points in the North, East, South, and West quadrants. The CSV reports how many points per team per quadrant, but who won?

Look over the data and consider whether the columns are normalized - do they change meaning between records? No. It looks like the first column is the team name, the second column is quadrant, and the third column is the score for that team-quadrant.

Variables in awk don't have to be declared or initialized. Just think up a name and throw it in there. When I tally the score, I don't really care what quadrant they scored in, because I assume that all points are equal and we just want a sum of the total points per team. (See the screen cap with the caption, "team tally".)

Note the pattern I used: NF == 3. By stating the number of fields I expect per record, I tell awk to skip over badly formed records. The statement c[$1] += $3 tells awk to accumulate the current value of column 3 in a slot named for the current value of column 1. For line 1 in the input, that is the same as saying "add 149 to the value stored in slot Red of array c". Since c didn't exist before line 1 in the input, it gets initialized with a value of 149 in a slot named Red. It turns out our scorekeepers were sloppy and gave us repeated entries for each team. We'll just assume they mean for them all to be added together. Line 2 from the input file takes the value of slot Red in array c from 149 to 183.

After all input files are processed, the script moves on to the stanza for keyword END. The for x in y operator creates a variable x to hold the values encountered as it iterates over the slots in y. Likewise, the variable team is created, and as awk walks through the named slots in c, it stores the name of each slot into team. To access the final value of each accumulator, refer to the named slot using c[team].

Try running the script against the sample input, and see if you get the same results I do. (See the table "results from team tally".)

One last thought to leave you with: we haven't yet discussed output field separators, but as it turns out, there is such a thing (OFS) with a default value of whitespace. I haven't encountered a command line option for modifying OFS apart from setting it in the BEGIN stanza. In the print statement, OFS is inserted between any variables that are separated by comma (i.e., print team, c[team] in the example). To print your results as a CSV, modify OFS to comma (,) in the BEGIN stanza.

Sample CSV data



#!/usr/bin/awk -f

# Author: Jeff Wilson
# Date:   2002
# License: GPL 3.0 ... no warranty, free to reuse in any way

# Expected input: /var/lib/dhcp/db/dhcpd.leases

# (For best results, pre-process the lease 
#  file with 'grep -v "uid \""')

# Usually invoked as, 'leases.awk /var/lib/dhcp/db/dhcpd.leases'

# Format of output (tab-delimited): 
#    ip,hardware[ip],compname[ip],state[ip],expiration-time-in-GMT[ip]

# set the RECORD SEPARATOR, RS, to "}" ... records span multiple lines
BEGIN {RS="}"}

# we only care about records that are greater than so-many-characters
# (why 5? I guess a CRLF may be 2 bytes ... isn't it just LF in BSD?)
length($0) > 5 { total++

  # only want record those variables we've captured,
  # so reset values to null
  endtime = ""
  hwaddr = ""
  cn = ""
  st = ""

  for(i=1;i<=NF;i++) {

    # if this field matches the word "lease"
    if($i ~ /lease/)

      # capture the next field into ipaddr

    # if this field matches the word "ethernet"
    else if($i ~ /ethernet/) {

      # get rid of the trailing semi-colon

      # and capture the hwaddr with ipaddr as key

    # if this field matches the word "client-hostname"
    else if($i ~ /client-hostname/) {

      # get rid of the enclosing quotes

      # capture the computer name

    # if this field matches the word "binding"
    else if($i ~ /binding/) {

      # we don't care about what the next binding state is
      # so go on and process the rest of this record
      if($(i-1) ~ /next/) { # do nothing 
      else {

    # give me a timestamp or two
    else if($i ~ /ends/) {

      #if $i == "ends" then $i+2 is enddate, and $i+3 is endtime
      dtstmp = $(i+2);
      tmstmp = arr[1];
      endtime=sprintf("%s %s",dtstmp,tmstmp)
  if( length(hwaddr) > 0 )
  if( length(cn) > 0 )
  if( length(st) > 0 )
  if( length(endtime) > 0 )

# for every ipaddr we captured, display ip, hardware, and compname
END { for(ip in hardware) {
         print ip


    0 of 8192 characters used
    Post Comment

    No comments yet.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at:

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)