How to passively track network access using Perl, SNMP, and SQL
What's in your network?
Do you have more than 20 Ethernet switches across your networks? Do you route between more than 5 IP routers? Do you have any idea who's connected to the edge of your network? I run a mid-sized network for a private university. I have picked up a trick or two over the years, and would like to share a couple of ideas with you. I hope to make your life easier by the end of this article.
I get the occasional phone call asking me if I can find where this MAC address is. Or maybe it's an off-site abuse complaint about a certain IP address at a certain point in time. Do you dread that phone call? I look forward to it. I have a plan. I'm prepared. I will show you how I do it.
Readers of this tutorial need to have some background in
- writing Perl scripts
- creating and querying SQL tables
- querying network equipment with SNMP
- router administration
Address Resolution Protocol
I don't know anything about your particular network. That's the beautiful thing about computer networks: there are 1,001 ways to solve any given problem. The most common way I've personally encountered of aggregating and distributing access networks is with the three-tiered approach shown in the side bar.
Whether a network routes at the access edge, or defers routing until the core, IP networks must make the transition from layer 2 to layer 3 somewhere. And when they do, there is a router that tracks which MAC address resolves to which IP address in order to deliver packets across the last hop to the destination. Every IP-speaking Ethernet device has an ARP table, and routers are no different.
The MAC address uniquely identifies a network adapter on a local segment. But what if you're tracking down an IP address? As long as the device you're looking for is currently online, then you're almost done. Log on to the router with the subnet for that IP address, show the arp table, and there is your MAC address. Nothing new here.
But what about going back to see who was on that IP address 36 hours ago? Six weeks ago? Six months ago? Most ARP entries time-out after 5 minutes. Some routers hold on longer, but never more than a couple of hours. If you want historical lookup, you will have to consider committing your ARP to persistent storage. You speak Perl, right? Let's head over to CPAN and get the DBD::SQLite package. (You already have DBI, surely!) While you're installing that piece, keep reading. We'll get back to the SQL setup a little later.
Now we need to figure out ... how to harvest the ARP table from the routers ... and how to transfer this information to a database. It's not that hard! Let's keep going.
Networking 101: ARP explained
Simple Network Management Protocol
Does SNMP live up to its name? "Simple"? I say yes. It's just a database engine that answers queries. The tables consist of rows of keys and values. Both the names of the tables and the keys within the tables are decimal-separated numeric strings, or OIDs.
The nut we want to crack here is a table called ipNetToMediaPhyAddress. That's the SNMP name for the ARP table. Its OID is 126.96.36.199.188.8.131.52.1.2. (Look for a MIB parser to help figure out the name-to-OID lookup.)
Before we go too far, make sure you have SNMP access to your router. If you have snmp utilities installed, point to your router and ask this question:
snmpgetnext -c public 10.10.10.1 184.108.40.206.220.127.116.11.1.2
You'll get back the first row from the router's ARP table. Mine said it like this:
IP-MIB::ipNetToMediaPhysAddress.18.104.22.168.1 = STRING: 00:11:88:05:5f:2e
Sounds like the next step is to bring the SNMP queries into Perl. Let's head back to CPAN and this time, let's install Net::SNMP. Initialize your SNMP object with the right host name and credentials. Then check the picture on the right labeled Setting up ... Right after I run this script, I get the results in results from... The format of the answer is a little different this time. The key is numeric, and the value is printed in hexadecimal notation, without any separators. No worries, we'll get some regex to whip that into shape (see hex_to_mac function below).
Walking down the SNMP table
Without getting too deep into the details of SNMP, you need to know that each snmp_get_next returns the answer to the previous query, as well as a key that points to the next query. See Example of Perl loop ... for a snippet of Perl that feeds the next key into the query.
SQL, the Swiss Army Knife of data
Create a table to represent your IP space. My workplace has a Class B, which means I can capture our entire ARP space in 64K rows. (Less, because we're not fully utilized, but I digress.) Look for the sidebar titled Create a table ... I set initial values to default as I load the IP addresses in Initialize your table ...
The stage is set. All the tools are sitting in front of us. All that's left to do is bring it all together.
- Query the arp table.
- Insert the results into the database.
- Put all that on a schedule and repeat at regular intervals.
Before inserting a row into your database, capture a statement handle as the return value from the prepare function on your DBI object. See the screencap for a suggested insert query. Then as you iterate over the SNMP results, push them into the table by calling the execute function on your statement handle, as shown in the next screencap.
With Linux or Mac OS, you have the built-in cron system to set up a scheduled task. I haven't attempted to query SNMP via Perl on Windows, let alone set it up on a schedule ... so I don't have much to offer there. Setting aside Windows for now, back to the crontab editor, set up your querier script to run against your router every n minutes, where n represents the router's arp timeout setting.
Cache the ARP table
The end goal here is to query the ARP tables from all the client-facing routers, and then keep these records long enough to help out with later inquiries. If your organization hasn't yet established retention policies, just pick a number: 30 days? 6 months? At some point, you run out of disk space and DBMS performance, so don't get too crazy.
SQL has more tricks than we have time to discuss in this tutorial. I'll leave it as an exercise for the reader to figure out how to optimize the database operations beyond this introductory example. Some things to consider:
- the schema presented in this example presents a "most recently seen" MAC per IP
- how could you capture all MAC/IP pairs without bloating your database?
- what kinds of view optimizations are possible using cidr math?
- how do you normalize MAC addresses across various router platforms?
- could a similar idea apply to capturing bridge tables for MAC-to-physical-port?
I hope I got your curiosity up with this lightning flash through the idea of querying and caching arp tables. Let me hear from you in the comment section, and I will do my best to answer any questions you have.