Compare and contrast bash and Perl for simple tasks

Bash and Perl go head to head in today's lineup
Bash and Perl go head to head in today's lineup | Source

How does Bourne-again shell line up against Practical Extraction and Reporting Language?

The stories of Perl and Bash overlap in many ways. Each is affected by the other. It's not entirely fair to pit them head-to-head, as they have different goals and strengths. In my experience, I have found that many of my Perl scripts began life as a shell script.

Shell scripting is a great way to automate complicated tasks. Sometimes I throw together a shell script just to remind myself of the syntax for a particularly complicated command line utility. Other times, a shell script grows in complexity to the point that I fall back to my old standby, Perl.

My goal in this hub is not as much a head-to-head line up, as it is to demonstrate where the chips fall in my particular approach to the daily grind of network and system administration.

Reader feedback

What kind of programmer are you?

See results without voting

Find and solve your pain point

Before we dive too much deeper into the topic, ask yourself: why are you here? No no no, not existentially. Why do you want to learn more about shell scripting and/or Perl? Because you are a problem solver.

So the first question is always, what is the problem? And because you are naturally a smart person, you will find a solution. Then the next question to ask yourself follows shortly after, is it worth my time to automate the solution? If you expect to spend more than a few minutes of your average work day on the problem at hand, the answer is almost always yes. If you are insane (my kind of crazy) and you love to write a script just for the joy of creating it, then why are we still asking these questions?

Here's an example of the evolution of a problem that begins with a shell script and ends up in Perl. Or as those who know me well enough might say, the story of my life.

Useful options for tcpdump

option
name
description
-l
buffer output
useful for piping output into other apps
-n
no lookup
do not resolve hosts or ports
-e
show ethernet
display fields from ethernet frame
-i
interface
specify which interface to listen on

Capture and analyze network traffic

There are countless reasons for capturing network traffic. Sometimes you want to verify communication between specific hosts at a strategic capture point in your network. Sometimes, you want to identify who is going crazy with broadcasts on your LAN because it disrupts everyone else's performance.

My primary workstation is a Mac. I use tcpdump from the command line to sniff packets on the LAN. Knowing that the Mac connects to the LAN on interface en0, I fire off the following from the command line:

tcpdump -lnnei en0 not ether host b9:11:57:3a:20:bd

See the sidebar for a quick rundown on what the command line options mean.

The directive "not ether host b9:11:57:3a:20:bd" leaves off my local adapter's traffic, so I only see broadcast, multicast, or the occasional unicast storm.

We're only interested in a frequency analysis of which MAC address shows up most often, so let's ignore all but the first few columns. We'll do this by piping tcpdump into a quick awk script.

tcpdump -lnnei en0 not ether host b9:11:57:3a:20:bd | awk '{print $1, $2}'

Now instead of a screen full of overwhelming information, I just get the two columns: timestamp and source MAC. But I still don't know who's sending the most broadcasts. Why not modify the awk one-liner to count how many frames are sent by each MAC?

Here's a decision point. If you're comfortable with your command line environment, then maybe you continue firing off one-liners that you modify as you go. Personally, I've got enough invested at this point that I'm willing to dedicate a shell script to the idea.

Transition to shell script

A shell script is just a text file that gets interpreted by a shell. For editing shell scripts, I am a big fan of vi, but any text editor will do.

The first line needs to be a shebang: pound sign, exclamation mark, path to interpreter.

#!/bin/bash

The next line can be a copy/paste of your most recent command line.

tcpdump -lnnei en0 not ether host b9:11:57:3a:20:bd | awk '{print $1, $2}'

Choose your own destiny, but my quirk is that I like to follow the pipe symbol | with a newline.

tcpdump -lnnei en0 not ether host b9:11:57:3a:20:bd |
awk '{print $1, $2}'

That adds some readability, especially if the redirection gets too crazy.

Let's take another look at the awk piece. To count by MAC address, we need to tally each one we see. No problem.

tcpdump -lnnei en0 not ether host b9:11:57:3a:20:bd |
awk '{tally[$2]++} END {for(mac in tally) {print mac, tally[mac]}}'

Uh oh. Where's the output? Once you interrupt the packet capture with Ctrl-C, why doesn't the END stanza kick in? That's a by-product of redirection - the interrupt kills the whole chain, so awk doesn't survive long enough to process the end of its input. We need to try a different approach.

Since the timestamp changes for each row, let's track how it changes to periodically print out an update. The timestamp has the format of HH:MM:SS.microseconds - if we can lop off the microseconds, the remaining timestamp uniquely identifies which second the frame was received. The substr function in awk will do the trick.

print updated information once per second
print updated information once per second | Source

When to transition to Perl

The output now shows blocks of MAC addresses, with total number of hits, updated once per second. As an exercise, go back and modify the awk piece to output the timestamp to separate the updates. What else would you do to improve the report? I'd like to track the frequency of each MAC per second, as well as how many packets per MAC over time, and report only on the currently active MACs.

With practice, you develop your own sense of style. As I mentioned earlier, it comes down to a matter of identifying and relieving pain points in your process. Right about now, the thought of taking this problem space into a Perl solution becomes appealing. Open a text editor, put together a few lines, save and quit. Remember to set the file's mode to executable:

chmod +x mac-pps.pl

Then pipe the output of tcpdump into the perl script to see the results.

tcpdump -lnnei en0 not ether host b9:11:57:3a:20:bd | ./mac-pps.pl

print updates on packets per second, per MAC
print updates on packets per second, per MAC | Source

Sort comparison function explained

Input comparison
Output value
$a < $b
-1
$a == $b
0
$a > $b
1

Taking a simple exercise over the top

From here out, it's all showboating. There may be something below that's useful to you, but at this point, I'm having too much fun to quit.

Read up on Perl package management via CPAN and cpanminus. Install the Curses module.

The code below dives into a few topics I haven't yet covered in any of my hubs.

For example, signal handling is a method for applications to trap signals (like the one sent by hitting Ctrl-C) to define non-default behavior. In this case (line 13 below), I set up an anonymous subroutine to clean up the Curses module and exit cleanly.

On line 23, I opened a filehandle to the output of a child process to simplify the invocation of the MAC counter. Otherwise, the tcpdump utility would have to be launched externally to the perl script, and its output piped into the script's input.

I use split to break apart the input along set delimiters. On line 27, I specify the delimiter as \s+ or "one or more whitespace".

Perl's built-in sort routine can take a comparison subroutine to override its default behavior. It may be confusing at first blush, but the format of the comparison subroutine expects two inputs ($a and $b) and one output (-1, 0, or 1). See the table "Sort comparison function explained" for more information.

Many of the unfamiliar functions listed in the script below - initscr, noecho, cbreak, addstr, refresh - are defined on the CPAN page for Curses. Also, by importing the Curses module, we also import certain builtin variables like $LINES and $COLS that describe the screen's environment.

Curses-based MAC counter

#!/usr/bin/perl

# Author: Jeff Wilson
# Created: 2014
# License: GPL 3.0 ... no warranty, free to re-use in any way

use warnings;
use strict;
use Curses;

# initialize Curses environment
initscr();
noecho();
cbreak();

# register interrupt handler
$SIG{INT} = sub {
  endwin();
  print "Quitting\n";
  exit;
};

my %tally;
my ($ts,$prev);

# open tcpdump as a process handle
open(my $ph,"tcpdump -lnnei en0 not ether host b9:11:57:3a:20:bd 2>/dev/null |");
while (<$ph>) {

  # only the first two columns matter, discard the rest
  my ($ts,$mac) = split /\s+/,$_,3;

  # not interested unless there's a MAC in column 2
  next unless ($mac =~ m/([0-9a-fA-F]{2}\:){5}[0-9a-fA-F]{2}/);

  # grab the first 8 off the timestamp
  $ts = substr($ts,0,8);

  # initialize $prev to current timestamp
  $prev = $ts unless (defined($prev));

  # update screen if this row isn't in the same second
  unless ($prev eq $ts) {

    # clear previous info off the screen
    for (my $row=3; $row < $LINES-2; $row++) {
      addstr($row,0,' ' x $COLS);
    }

    # update timestamp
    addstr(0,0,"$prev");

    # format header row
    addstr(1,0,sprintf("%-17s %5s %10s",qw/MAC pps total/));
    addstr(2,0,sprintf("%17s %5s %10s",'-' x 17, '-' x 5, '-' x 10));

    # keep track of which row to update onscreen 
    # skip first three, since they're already updated
    my $row=3;
    # walk through each MAC, sorting by most active overall
    for my $m (sort {
                $tally{$b}{total} <=> $tally{$a}{total}
        } keys %tally) {
      my $c = 0;
      # get updates if any for this MAC this past second
      if (defined($tally{$m}{$prev})) { 
        # remove the record as we read its value
        $c = delete $tally{$m}{$prev};
      }
      # report MAC's total count with each update
      addstr($row++,0,sprintf("%-17s %5d %10d",$m,$c,$tally{$m}{total}));
      # don't update past the last line onscreen
      last if ($row > $LINES-2);
    }
  }

  # track PPS per MAC
  $tally{$mac}{$ts}++;
  # track total packet count per MAC
  $tally{$mac}{total}++;
  # move previous timestamp forward
  $prev = $ts;

  # push update out to screen
  move(0,0);
  refresh();
}

# never reaches this point, unless tcpdump fails
close($ph);

More by this Author


Comments

No comments yet.

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working