ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel
  • »
  • Technology»
  • Internet & the Web

Continuous Monitoring of HTTP Requests in a Home Network

Updated on October 30, 2013

Trying to find a way to log all HTTP requests made from all devices connected to my home LAN, I quickly came across Wireshark. While a great multitalented packet sniffer/analyzer, Wireshark uses lots of memory. In my case, without special measures continuous traffic sniffing using Wireshark turned out to be impractical: the memory taken by the program simply grows over time until it eats up all available space and just stops working. And, when the traffic in your network is busy, this can happen rather quickly. According to the sniffer's manual, it needs large memory "to store packet meta data (e.g. conversation and fragmentation related data) and to display this info on the screen." You could reduce the memory usage drastically by turning off TCP packets desegmentation (in tshark, this is done by adding the "-o tcp.desegment_tcp_streams:FALSE" switch), but then the sniffer will miss too much HTTP requests to be usable for the stated purpose of this article. Well, for the folks like me, Wireshark developers could just add a switch to "flush" all used memory, say, every N minutes.

However, as of writing, there's no such a switch, so here is an obvious workaround solution: you just kill the tshark.exe process every 15 minutes or so and run it again (I use tshark, the commandline version of Wireshark offering basically the same features). To do that, I employ the free utility System Scheduler from Splinterware, which is capable of scheduled program runs. This way some HTTP requests made while the program is restarting would be lost, but not too many, and when you have simple monitoring purposes like me, this can be tolerated. There is one more drawback to this approach: it seems that tshark's domain name resolution feature becomes operational only a few requests after starting up the program (correct me if I'm mistaken here or just have used a wrong tshark switch for name resolution). Thus every 15 minutes, some strings in your log would begin with an IP address instead of a domain name.

To the point

So, here's the setup of my home wireless network: a Windows XP PC that is used as a software router - it shares home cable Internet connection through Windows Internet Connection Sharing, the best method for the people that don't want to buy and configure a dedicated router. This "router" PC shares the Internet for other home devices through an ASUS WiFi card that includes a software Access Point functionality.

To be able to monitor all HTTP requests made by all home devices, I installed rpcapd.exe as a Windows service on the "router" PC. It is a part of the WinpCap packet capture driver distribution responsible for remote monitoring. Winpcap installs it by default, but disables. To enable, go to Start -> Administrative Tools -> Services, find Remote Packet Capture Protocol, right click, Properties -> Run -> Auto (see Screenshot 1).

Screenshot 1. Activating rpcapd service

NOTE: Wireshark installs WinpCap automatically, but if you've decided to use the portable version of the sniffer, you must install WinpCap manually.

There is one more thing that should be done to allow remote monitoring through rpcapd: create a rpcapd.ini file in the driver's folder. The file should contain just a single string:

NullAuthPermit = YES

This one allows external connection to rpcapd without authentication. There is an option to setup an authenticated connection; if you wish to use it, please refer to Winpcap manual.

NOTE1: on Windows 7, you might need to turn off UAC or give rpcapd some special permissions. I usually turn off UAC completely preferring to use Comodo Personal Firewall, which is free, great and easy to use.

NOTE2: rpcapd can create its config file automatically if you run it with a commandline:
rpcapd.exe -n -s rpcapd.ini
The -s parameter will create rpcapd.ini with NullAuthPermit set to Yes.

As I use null authentication for rpcapd, I had disabled any communication with the driver outside of my local net through Comodo personal firewall (see Screenshot 2).

Screenshot 2. Comodo Firewall rules for rpcapd.exe

Be sure to enforce the exclude rule, otherwise the program would show a warning when someone from outside tries to connect to the driver (this once had happened to me, after which I enabled the exclude rule).

At the moment, four devices connect to the "main" Windows XP PC, including two ASUS netbooks with Windows 7 and XP correspondingly, a Lenovo notebook with Win7 x64, and an Amazon Kindle reader.

As it turned out through trial and error, the rate tshark eats up memory in Win7 grows the fastest when uTorrent on a PC downloads some content, and simultaneously online videos from YouTube are being watched on devices in the home net.
I run tshark on one of the netbooks with 2 Gbytes of RAM, and when the traffic load is at a maximum, the program uses up all the memory in about 20 minutes before it stops functioning. So I send Ctrl+C to tshark.exe and run it again right away every 15 minutes through System Scheduler (see Screenshot 3).

Screenshot 3. Running ts.bat in a hidden state with System Scheduler

System Scheduler allows to hide the programs it executes, so the batch file restarts won't bother you much. To further suppress console windows from showing up, I use the hstart utility.

The following batch file (ts.bat, it's included for reference with the source code of my log parser that is described further down) is used to kill and restart the sniffer:

=== ts.bat Begin ===

C:

cd "c:\Program Files\Wireshark"

FOR /F "usebackq tokens=2" %%i IN (`tasklist ^| findstr /r /b "tshark.exe"`) DO start /MIN sendsignal.exe %%i

ping 127.0.0.1 -n 7 -w 1000

tshark -2 -l -t ad -R "http.request.method == GET" -N nC -i rpcap://[192.168.0.1]:2002/\Device\NPF_{896CE060-483D-483D-83FE-D96DDDFDAE6A} | ts_rdln.exe

=== ts.bat End ===

The first two lines are used to change current directory to the working dir of Wireshark.

The third line finds all running processes called tshark.exe and sends a Ctrl-C to them. For this purpose, I use sendsignal.exe utility by Louis K. Thomas (you can find the download link further down). Also, be sure to get the Microsoft tasklist.exe utility if you have a Windows distribution that lacks it; otherwise, this line wouldn't work.

The fourth line creates a delay of about 7 seconds, which is just enough for tshark to receive the Ctrl-C signal and quit.

NOTE: both hstart and sendsignal utilities should be placed either in a system search path or in the Wireshark working folder (preferably the latter).

NOTE2. If, like me, you are using one of the latest versions of Comodo Firewall, you may have to add EVERY program involved in the solution described here, including the ts.bat file, to Comodo's trusted files. Otherwise the firewall may disallow System Scheduler to run and kill tshark.

The last line of ts.bat restarts tshark with the following arguments:
-2 (tells tshark to perform a two-pass analysis - this is needed to log HTTP requests)
-l (helps my log parser that I run alongside tshark to get all its output)
-t ad (formats tshark output to print actual date and time each packet was captured)
-R "http.request.method == GET" (makes sure tshark doesn't log any packets except HTTP requests)
-N nC (enables the network address resolution along with concurrent DNS lookups)
-i rpcap://[192.168.0.1]:2002/\Device\NPF_{896CE060-483D-483D-83FE-D96DDDFDAE6A} (this is a parameter specifying the IP address of my "soft router" PC and its main network adapter name exposed by rpcapd. Learning the hexadecimal string is arguably a hard part; one way it can be done is through Wireshark GUI. Run Wireshark, go to Main Menu -> Capture -> Interfaces -> Options -> Manage Interfaces -> Remote Interfaces -> Add. Then, enter the address of the machine you installed rpcapd on, e.g. 192.168.0.1, specify port number 2002, press Ok. After a while, you'll see a list of strings similar to the above. Do not check any checkboxes. Press Apply, then Close. Now you'll see an expanded adapter list, including remote adapters. Use the one that looks out into the Internet. Sadly, there is no way to copy-paste the tricky string, so you'll have to type it manually into your batch file). NOTE: If you usually work on the same PC that is used for Internet Connection Sharing in your home network, then instead of the "rpcap:// ..." string you should put your local network adapter number there. Again, the number can be learnt either using Wireshark or through trial and error.
| ts_rdln.exe -- this is a simple log parser that I wrote in Free Pascal. You can find the ZIP with source code, Win32 binary and a sample config at the bottom of this hub. This parameter redirects tshark's output into my program, which parses the log, and implements some formatting and filtering. You may freely use and modify the source code to add more complex filters to your liking. Mind however, that in present form the code is very basic and doesn't implement much error or exception handling. So upon an error it just stops working along with tshark.

ts_rdln creates three text files: raw log, alert log, and filtered log (the latter html-formatted for convenience).

The filtered log captures plain host URLs (addresses having no subdirs), as well as URLs with any extensions listed in the [FilteredInclude] section of the parser's config file. I, for one, have put there the following extensions: .php, .sht, .asp, .bml .cfm, and .htm. Mind that they are treated as substrings by the program, so it's actually *.sht*, *.htm*, etc. So, in my case, it filteres out any images, .js-files, .css, etc. Furthermore, the parser excludes any urls with the domain containing more than two dots (if it's not not an IP address). The latter rule is hardcoded (strings #146-147 of ts_rdln.pas), but you may change/expand it to your liking by modifying the source. There is also a simple way to expand the filter rules without recompiling the program (described further down). Thus ideally, the filtered log should include only "meaningful" website hits like www.google.com or instagram.com.

The log parser takes in a config file named ts_rdln.conf.
Here's a sample contents of the file:

=== ts_rdln.conf sample Start ===

Some random introductory text :)

[AlertInclude]

vimeo.com
74.113.233.128


[FilteredExclude]

contentabc.com
trafficjunky.net
geotrust.net
ocsp.comodoca
adlog.com.com
/Rss_Channels
feed=atom
akamai.net
doubleclick.net
statcounter
/rss
/atom
/feed


[FilteredInclude]

.htm
.php
.asp
.sht
.bml
.cfm


[Config End or whatever you like:)]

Again, random text!

=== ts_rdln.conf sample End ===

The structure of the config is the following:
- Everything that goes until first section recognized by the program is simply ignored.
- [AlertInclude] section is a list of keywords indicating that any urls containing them should be included in alert log (alert.txt). E.g., the sample AlertInclude above lists Vimeo.com along with its IP address.
- [FilteredExclude] section's entries are the keywords that are used to exclude urls from the filtered log (filtered.html). Put here everything that's irrelevant to you, like advertising networks, technical urls, etc. Those are usually learned by checking out the raw log (rawlog.txt).
- [FilteredInclude] section contains a list of extensions for page URLs that you would like to include in the filtered.html. This section is processed ahead of the FilteredExclude, so it has a higher priority (this can be changed in the source code).
- Whenever the program encounters an unsupported section, it simply ingnores its contents (e.g., see the last line of the sample config file).

The config file can be modified on the fly: the program reloads it every time the file is updated, so the filtering rules can be expanded or modified at runtime, without restarting ts_rdln.

An example usage scenario is the following: you inspect the filtered log, notice something of attention, look into the raw log to clarify the matter, and finally, put some new keywords into alert section of the config file. The solution can be further expanded with email notification upon alert log updates (e.g., through the use of the free mailer Blat), etc.

To recap, here's the list of software required to implement the solution (everything is free):

Wireshark/tshark
System Scheduler
hstart
WinpCap
sendsignal
ts_rdln (The link leads to a Google Drive page hosting a zip archive that includes the program's source code and Win32 binary exe, as well as samples of both ts.bat and ts_rdln.conf files)

NOTE: This solution won't allow HTTPS request monitoring, so, no snooping on google searches by logged in users, or on facebook surfing, etc., etc.

Any suggestions, corrections to the article, or useful additions to the log parser source code are welcomed.

Comments

    0 of 8192 characters used
    Post Comment

    • profile image

      McCharm 3 years ago

      I think you should mention that if one uses Comodo Personal Firewall on your gateway PC, he or she could turn on its Website Filtering feature to log some or all http requests from all devices in their home net. However, that log could not be viewed from another PC unless Comodo's log storage folder is not shared.

    • profile image

      Gretech 4 years ago

      I've reproduced the whole setup, and it worked ok)

      Thank you so much, you've saved me so much learning. Your log analyzer template is awesome,too. I think i'm gonna implement some hmm adult filters, its easy, you just create flags on certain urls and delete them some time later, right