How to write a Perl script to create a simple tally report

Source

Perl makes writing reports easier

I use Perl to automate many of my monitoring and reporting duties as a network administrator. In my next series of hubs, I'd like to show you how to do the same for your environment, even if you're not into networking. My first program with Perl was used to read large text files to produce a quick report. But that's only the tiniest part of what Perl can do. Perl is a "high-level, general-purpose, interpreted, dynamic programming language" invented in 1987 by Larry Wall. If you're familiar with Unix or the C programming language, then much of Perl's syntax will come to you easily. And without the C compiler involved, the time it takes to develop from an idea to a running program is much shorter!

Learning Perl
Learning Perl

A classic Perl text: the Llama book

 
The perl shebang
The perl shebang | Source
How to open a file for reading in Perl
How to open a file for reading in Perl | Source

Getting started: open a file and read its contents

Perl users acknowledge each others' creativity with the mantra, There's more than one way to do it. This yields the acronym TMTOWTDI ("tim toady"). It also leads to some confusion, because for every 5 Perl users, there are probably hundreds of scripts to do mostly the same thing. To the novice, though, let this be a comfort: relax - there's plenty of room for learning, because there's no one right way to do it.

From a high level perspective, we need to design a program that will open a text file, scan through its contents, and print out a utilization report. We are in luck, because we chose Perl to solve the problem, and this kind of reporting is Perl's original purpose. From the details point of view, I assume you have some kind of Perl resource for understanding Perl's syntax particulars, so I won't dive too deeply into the details.

Step one, figure out if you have a Perl interpreter on your system. Mac users and most Linux users have it easy - it's pre-installed. Windows users need to check out the tutorials on ActiveState or here. To verify your Perl installation, open up a command line (Terminal.app for Mac, cmd for Windows) and type "perl -V" If you get some kind of version message from the Perl interpreter, you're good to go.

Step two, find a text editor. A Perl program is just a text file that gets interpreted by Perl.

Step three, we begin. Create a new text file and save it as "report.pl"

The first line in the script is a throwback to Unix executables - the shebang. It looks like a comment, but it's followed by an exclamation mark. Then with no spaces, type in the path to the Perl interpreter. (See the "perl shebang" screencap above.) This is ignored in Windows for two reasons: one, Windows doesn't use the shebang method for locating a script's interpreter; and two, once Perl is invoked, this line looks like a comment and won't be executed.

In the screen cap titled How to open a file for reading in Perl, with very few characters typed, I've instructed the Perl interpreter on where to find the file, how to open the file (readonly is implied, as opposed to write or append), and what to do with each line of the file (so far, nothing), then to close the file.

I'll assume your log file is named myprogramlogs.txt and is in the same directory as your script.

To run the script on Mac (and similarly on Linux), open the Terminal.app and change to the same directory as your script. For example (don't type any comments that follow the # symbol below):

cd scripts
chmod +x report.pl # change mode of report.pl to executable
./report.pl myprogramlogs.txt # substitute the name of your logfile

On Windows, the ActiveState installer should make your .pl files executable. Open a cmd window, change to the same directory as your script, and run the following

report.pl myprogramlogs.txt

If for some reason report.pl won't execute on its own, just prepend perl to the command.

perl report.pl myprogramlogs.txt

Pattern matching using "next unless"

So far, kinda boring. Nothing's happening yet. Let's change that!

A common scenario I encounter is a question of how utilized our network is. How many unique IP addresses are in use? How many unique clients have we seen? Depending on how familiar you are with running a DHCP server, you may realize that these two questions can have very different answers.

Let's look at the log output of the popular ISC DHCP server (see the code block below, at the bottom of this article). For this example, we can answer the question definitively by looking only at the DHCPACK transaction logs: there's a timestamp of when the ACK took place, a listing of the IP address, and an acknowledgment of the MAC address. So let's skip everything but DHCPACK messages.

Put these two lines between your { and } in the while loop:

next unless /DHCPACK/;

print "$. $_";

If you run the script I've described against the log file example I've provided, your output will be the line number ($.) and the log entry ($_) for only those lines matching DHCPACK. Try it and see if you get the same output as I did - my output includes lines 2, 4, 5, 7, 11, 13, 15, 20, 22 ... (see screencap below).

log output from "next unless" example
log output from "next unless" example | Source

Clarify the assumptions

Congratulations, we just landed on the island of "been there done that" called grep. But we're one step closer to our goal. We've isolated the log entries we care about. Our next step will be to start pulling apart the information they contain. Before we go further, let's take a sidebar and clarify some things.

In our example script, it may not be immediately clear to the reader how the pattern match works. The same variety of options that makes Perl so powerful for the veteran can seem intimidating to new learners. Many of the complaints I've heard can be summarized as a misunderstanding of Perl's basic assumptions. If a function or operator has multiple parameters or type signatures, what happens when the parameters aren't fully specified? Let's pick apart the assumptions in our example script.

The open function assumes readonly unless otherwise specified. The < > operator (or angle operator) has many options, but when used as I do in this example, it grabs one line at a time from the filehandle into the default cursor $_. By wrapping the angle operator in a while loop, we scan through the entire file one line at a time. In our "grep" example, next unless /DHCPACK/ could be more explicitly stated as next unless ($_ =~ m/DHCPACK/) - this instruction says to skip further processing on the line in $_ if it fails the comparison operator (m// is match). Looking at the last instruction in the while loop, the print statement uses built-in variables for input file line number ($.) and default cursor ($_). Once we hit the last line of the input file, we fall out of the loop, and clean up our filehandle with the close statement.

Pattern match and capture

Regular expressions make up some of the most powerful concepts within Perl. Our example match operation uses a literal string match - pretty straight forward. If it matches exactly, win! Otherwise move on. But let's take it a bit further by mixing up exact matches with character classes and captures. By specifying a class of character, and positioning a capture within the context of exact matches, we create a handy tool for moving this report towards the finish line.

There are two distinct DHCPACK patterns for us to look at: "DHCPACK on" and "DHCPACK to". As we approach the problem of describing these patterns, let's only get specific enough to eliminate false positives while still capturing the information we need. A capture in a Perl regex is enclosed in parentheses ( ). To match literal parentheses (or any other special-to-regex symbol) precede with the escape character \ (backslash). To match anything that is not a space, specify the \S class. The + (plus) sign says to match one or more. And check out the crazy in the "DHCPACK to" regex: match and capture everything that is not a close parenthesis, until you reach a close parenthesis. (See the screen cap below.)

One more caveat: regex captures use built-in variables, starting at $1 from the left-most open paren to $n for the nth open paren. See how it's used in the print statement?

And as any worthy Perl programmer should repeatedly tell you, always always use warnings and strict in your scripts. These keep you honest in a variety of ways, and help you shorten the troubleshooting loop on crazy behavior.

Using the script below, you should get the same file line numbers as you did with the first example, but this time with only "ip" and "mac" specified.

match and capture with regular expressions
match and capture with regular expressions | Source
output of first-capture example
output of first-capture example | Source

Hash and tally

Perl has three basic types of variables: scalar, list, and hash. The designation is $ for scalar, @ for list (or array) and % for hash (or associative array). I think of them as "one thing", "ordered things", and "named things". The list keeps the original insertion order, but the hash will reorder its items as it optimizes key lookups. To access individual items, use $arr[index] or $hash{key}. To access the container, use @ or % as appropriate: foreach (@arr) or foreach (keys %hash).

Let's make an executive decision here, based on the nature of DHCP. This service hands out a fixed number of IP addresses to a variable number of MAC addresses. Our executive call is that the more interesting number to track is "unique MAC addresses" because that is the same number as "unique devices seen by DHCP".

Because of warnings and strict, we start by declaring the hash before using it. I'll re-use the same regex as before, but this time I copy the captured values into $ip and $mac. Then, I increment a counter for each $mac value seen. After scanning the entire file, I print out the count of how many keys exist in the hash. Another helpful variant would be to print each key and the value of the key's hash entry.

Tally by hash
Tally by hash | Source

Wrapping up

The scalar keyword tells the keys keyword to output a scalar, not a list. I got 19 unique MAC addresses when I ran this script against the sample log output (listed in the code section below). Without specifying scalar, each individual key would be appended to the print output - because print assumes its arguments are all lists. With the scalar key word, the output of a list gets cast into a scalar, which then becomes the count of the items in the list.

I hope this introduction to hash-and-tally has been useful. Let me know in the comments if you have any questions or suggestions.

Example: dhcpd.log

Feb  3 09:02:08 server1 dhcpd: DHCPINFORM from 192.168.192.140 via 192.168.192.2
Feb  3 09:02:08 server1 dhcpd: DHCPACK to 192.168.192.140 (90:b1:1c:5c:de:72) via eth0
Feb  3 09:02:08 server1 dhcpd: DHCPINFORM from 192.168.193.25 via 192.168.192.2
Feb  3 09:02:08 server1 dhcpd: DHCPACK to 192.168.193.25 (18:03:73:e0:8d:c8) via eth0
Feb  3 09:02:09 server1 dhcpd: DHCPACK to 192.168.26.181 (d4:be:d9:01:ac:be) via eth0
Feb  3 09:02:09 server1 dhcpd: DHCPINFORM from 192.168.26.181 via 192.168.26.2
Feb  3 09:02:09 server1 dhcpd: DHCPACK to 192.168.181.6 (b8:ca:3a:a3:89:5f) via eth0
Feb  3 09:02:09 server1 dhcpd: DHCPINFORM from 192.168.181.6 via 192.168.180.2
Feb  3 09:02:09 server1 dhcpd: DHCPDISCOVER from 00:25:90:aa:bb:da via 192.168.32.2: network 192.168.32/22: no free leases
Feb  3 09:02:09 server1 dhcpd: DHCPINFORM from 192.168.166.17 via 192.168.164.2
Feb  3 09:02:09 server1 dhcpd: DHCPACK to 192.168.166.17 (90:b1:1c:85:1d:7a) via eth0
Feb  3 09:02:09 server1 dhcpd: DHCPINFORM from 192.168.193.160 via 192.168.192.2
Feb  3 09:02:09 server1 dhcpd: DHCPACK to 192.168.193.160 (90:b1:1c:5c:e2:29) via eth0
Feb  3 09:02:11 server1 dhcpd: DHCPDISCOVER from 90:b1:1c:f4:36:14 via eth0: network 192.168.0/22: no free leases
Feb  3 09:02:11 server1 dhcpd: DHCPACK to 192.168.163.115 (b8:ac:6f:37:d5:a0) via eth0
Feb  3 09:02:11 server1 dhcpd: DHCPINFORM from 192.168.163.115 via 192.168.162.2
Feb  3 09:02:11 server1 dhcpd: DHCPDISCOVER from e0:46:9a:1b:c0:61 via 192.168.172.2: network 192.168.172/23: no free leases
Feb  3 09:02:11 server1 dhcpd: DHCPDISCOVER from a4:ba:db:3e:cc:26 via 192.168.8.2: network 192.168.8/22: no free leases
Feb  3 09:02:11 server1 dhcpd: DHCPREQUEST for 192.168.29.70 from 00:1f:5b:38:e0:ec (Michael-Bugatti) via 192.168.28.2
Feb  3 09:02:11 server1 dhcpd: DHCPACK on 192.168.29.70 to 00:1f:5b:38:e0:ec (Michael-Bugatti) via 192.168.28.2
Feb  3 09:02:11 server1 dhcpd: DHCPREQUEST for 192.168.29.69 from 00:1f:5b:38:e0:ed (Michael-Bugatti) via 192.168.28.2
Feb  3 09:02:11 server1 dhcpd: DHCPACK on 192.168.29.69 to 00:1f:5b:38:e0:ed (Michael-Bugatti) via 192.168.28.2
Feb  3 09:02:12 server1 dhcpd: DHCPDISCOVER from 90:b1:1c:f4:36:80 via eth0: network 192.168.0/22: no free leases
Feb  3 09:02:12 server1 dhcpd: DHCPACK to 192.168.185.76 (78:2b:cb:99:1c:e0) via eth0
Feb  3 09:02:12 server1 dhcpd: DHCPINFORM from 192.168.185.76 via 192.168.184.2
Feb  3 09:02:12 server1 dhcpd: DHCPDISCOVER from 40:6c:8f:bc:ed:46 via 192.168.36.2: network 192.168.36/22: no free leases
Feb  3 09:02:12 server1 dhcpd: DHCPDISCOVER from 00:60:9f:91:fc:86 via 192.168.32.2
Feb  3 09:02:12 server1 dhcpd: DHCPOFFER on 192.168.35.186 to 00:60:9f:91:fc:86 via 192.168.32.2
Feb  3 09:02:12 server1 dhcpd: DHCPACK to 192.168.22.169 (78:e7:d1:cc:2f:0b) via eth0
Feb  3 09:02:12 server1 dhcpd: DHCPINFORM from 192.168.22.169 via 192.168.20.2
Feb  3 09:02:12 server1 dhcpd: DHCPREQUEST for 192.168.48.176 from 18:03:73:b0:4c:3e (MMH120830013) via eth0
Feb  3 09:02:12 server1 dhcpd: DHCPACK on 192.168.48.176 to 18:03:73:b0:4c:3e (MMH120830013) via eth0
Feb  3 09:02:13 server1 dhcpd: DHCPACK to 192.168.161.241 (90:b1:1c:8e:61:07) via eth0
Feb  3 09:02:13 server1 dhcpd: DHCPINFORM from 192.168.161.241 via 192.168.160.2
Feb  3 09:02:13 server1 dhcpd: DHCPDISCOVER from d4:9a:20:f8:ba:a2 (SRV-100409006) via 192.168.48.2
Feb  3 09:02:13 server1 dhcpd: DHCPOFFER on 192.168.49.175 to d4:9a:20:f8:ba:a2 (SRV-100409006) via 192.168.48.2
Feb  3 09:02:14 server1 dhcpd: DHCPREQUEST for 192.168.49.45 from b8:ca:3a:a3:a9:06 (SRV140805002) via eth0
Feb  3 09:02:14 server1 dhcpd: DHCPACK on 192.168.49.45 to b8:ca:3a:a3:a9:06 (SRV140805002) via eth0
Feb  3 09:02:14 server1 dhcpd: DHCPDISCOVER from 00:0a:f7:01:67:bf via 192.168.192.2: network 192.168.192/23: no free leases
Feb  3 09:02:14 server1 dhcpd: DHCPACK to 192.168.166.165 (18:03:73:bb:db:48) via eth0
Feb  3 09:02:14 server1 dhcpd: DHCPINFORM from 192.168.166.165 via 192.168.164.2
Feb  3 09:02:14 server1 dhcpd: DHCPDISCOVER from 00:25:4b:a1:61:52 via 192.168.176.2: network 192.168.176/23: no free leases
Feb  3 09:02:15 server1 dhcpd: DHCPDISCOVER from e0:46:9a:1b:c0:61 via 192.168.172.2: network 192.168.172/23: no free leases
Feb  3 09:02:15 server1 dhcpd: DHCPDISCOVER from 00:17:4f:0f:92:96 via 192.168.180.2: network 192.168.180/23: no free leases
Feb  3 09:02:16 server1 dhcpd: DHCPDISCOVER from 00:60:9f:98:8b:c5 via 192.168.130.2: network 192.168.130/23: no free leases
Feb  3 09:02:16 server1 dhcpd: DHCPINFORM from 192.168.137.13 via 192.168.136.2
Feb  3 09:02:16 server1 dhcpd: DHCPACK to 192.168.137.13 (00:25:64:c9:60:bc) via eth0
Feb  3 09:02:17 server1 dhcpd: DHCPACK to 192.168.170.227 (78:2b:cb:99:19:06) via eth0
Feb  3 09:02:17 server1 dhcpd: DHCPINFORM from 192.168.170.227 via 192.168.170.2
Feb  3 09:02:17 server1 dhcpd: DHCPACK to 192.168.185.28 (44:37:e6:be:ea:43) via eth0
Feb  3 09:02:17 server1 dhcpd: DHCPINFORM from 192.168.185.28 via 192.168.184.2
Feb  3 09:02:18 server1 dhcpd: DHCPDISCOVER from 00:50:56:6a:89:4b via 192.168.32.2: network 192.168.32/22: no free leases
Feb  3 09:02:18 server1 dhcpd: DHCPDISCOVER from 00:17:4f:0f:92:96 via 192.168.180.2: network 192.168.180/23: no free leases
Feb  3 09:02:18 server1 dhcpd: DHCPINFORM from 192.168.185.78 via 192.168.184.2
Feb  3 09:02:18 server1 dhcpd: DHCPACK to 192.168.185.78 (f0:4d:a2:34:fe:0f) via eth0
Feb  3 09:02:19 server1 dhcpd: DHCPDISCOVER from 14:fe:b5:cb:97:9e via eth0: network 192.168.0/22: no free leases
Feb  3 09:02:19 server1 dhcpd: DHCPDISCOVER from 00:0a:f7:01:67:9d via 192.168.16.2: network 192.168.16/24: no free leases

More by this Author


Comments

No comments yet.

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working