ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Why use SIEM?

Updated on November 27, 2012

SIEM stands for "Security Incident and Event Management".

If you have just had, or are about to have a security audit of your data structure, the chances of getting ping'd on logging is high. There is good reason for this—people often don't collect and read logs.

When I first started programming, like many others, I made log files for any significant sized program. The log files contained a time-stamp and some other structured data like a token to describe the category of a logging message. Usually, the contents of this log file was "for a programmer's eye only". Often, I made a debug-mode, and an operational mode and included errors, warnings, and other interesting strings of data. Usually, these were one-per-line—each terminated by a carriage-return.

I, and other programmers in the team used these log files for debugging, and for ferreting-out operational issues. They were very useful.

If you have access to the original source code, then any mysterious events or logs could easily be put into context by searching for the error number (often called a Vendor Message Identifier or VMID). This allowed a different programmer to maintain the operational status of the code, and help make enhancements.

While that's all quite brilliant and sensible, it was a monster in the making. You see, every programmer made up their own format—and mostly, they still do. This is why there are now hundreds of thousands of different formats. It makes it impossible for any single mind to comprehend the sum of all logging formats in a modern enterprise.

It's worse than that though. The sheer volume of data now being generated by these log files is truly overwhelming. One can find enterprises that generate more than a billion logs every day. A billion logs a day demands a service time of about 12,000 logs per second.

If one person could inspect 10 logs a second and concentrate for 6 hours a day, this would only cover 216,000 logs. You would need a workforce of 6/24*1200 = 300 people, working 365 days a year just to pick out any significant events from that sea of logs. This of course leaves no time for remediation or mitigation.

SIEM is a machine-assisted tool to help pick out significant events from millions upon millions of logs.

Components of SIEM and terminology.

That which produces a log file is called a log source. The log sources must then be processed by a parsing engine. This engine is responsible for breaking the log into categories, normalising structured data, and storing the original log with meta-data into a suitable database.

The idea of normalising data is easily explained by an example. An event that means someone logged on to something is classified as an audit event, then further tagged as an authentication event, and a successful log on. At this level of parsing, meta-data is created that creates new columns of standardised information like "Origin of the log", "Destination of the log", "User ID", VMID, "Category of event" and so on.

When you are searching for a particular event—like the log on event—you only need to know that it is called "successful log on". This is because your searches and queries are most often done on the meta-data. It's standardised despite the fact that there could be a thousand different representations of a successful log on event inside each of many log sources.

This is a good time to emphasise the difference between a log and an event. A log is a single line or group of lines of raw data. An event is something significant about the particular log. To make SIEM useful, and hence get best value from your logs, you need to suppress uninteresting low-value logs and promote the importance of interesting logs. The latter, we call 'events'. The ratio of events to logs should be typically about 1 in 10.

Instantly, you can now understand how valuable it is to highlight a hundred events from a thousand logs without human intervention. Our 300-staff workforce mentioned earlier is reduced to 30.

We can do better. For every 100 events, if they are categorised and prioritised well, there should be only the occasional critical event. These might include things like:

  • A program crashed
  • A fan failed
  • A brute force attack succeeded
  • Someone just locked out their account
  • Disk space just hit 95% on server X

To improve on this again, we get the SIEM system to assign a severity to each of these most significant events. Many of the events could rest until appearing in a periodic summary report on a manager's desk. Some however, should be alarms.

An alarm should be assigned to an event that precedes an operational or security-related disaster. I'd guess in many enterprises, it would be worth the cost of SIEM to simply alert on server disk space and fan failures, but it's easily extended to include unauthorised access to specific files, and the appearance of unsanctioned processes to cite only a few business cases.

You can check for activity of any account that is supposed to be retired after a system administrator leaves the company.

SIEM should integrate with your authentication system like an Active Directory (AD) server. In that situation, whole groups of people may be classified in any suitable way. If you have an AD group of known database administrators, then the SIEM could be used to detect when someone NOT in that list is granted elevated privileges to database servers.

You can keep track of contractor's time, what resources they use, when they log on, when they log off, and what systems they touch. It's not possible to do this without centralised logging.

With such automation, our 30-workers can now be replaced by anything from typically less than one Full Time Employee (FTE) for most companies to perhaps three for a truly huge organisation.

Other desirable features.


Like Active Directory integration, an ability to work with abstract containers of arbitrary objects will decrease the amount of maintenance needed by a SIEM administrator. If you categorise all servers that process (say) credit-card data into a single container called, "PCI-Servers", then you can write investigative reports and alarms based on that list. You might develop a hundred different alarms and reports over time, but only ever need to update the list. This level of abstraction is essential to the ongoing usability of a SIEM system.


One tenet for good security is to apply job-rotation. This means you need to assign roles, and fill those roles not only with people, but also with what they are allowed to do. Your SIEM system should accommodate this by permitting abstracted roles like 'analyst' and 'root admin' or 'read-only auditor'. You should be able to expose certain alarms and reports to one role, and exclude from another. This idea fits with the tried-and-tested "need to know" technique.

One way agents

There are two accepted methods of log collection. One is to passively wait for a log source to send it, and the other is to actively poll the log source. The latter is called 'agent-less' or 'remote' log collection'. Both these methods demand a two-way flow of data. Sometimes, this data must flow through a firewall, in which case, you will need a secure method of getting the data. In many cases, an agent brokers the traffic on-behalf of a particular security zone and packages it into a secure stream to forward to the system that parses the logs. In rarer but more secure establishments, the security architecture might employ a multi-level system where low-classified data is allowed to travel up to a higher classification, but never the other way around. These types of highly secure systems might employ 'data diodes' which are optical transceivers with one direction disabled. They are housed in tamper-evident and resistant housings and certified at great cost to the manufacturer. When you use them, it means you need one-way-agents. These are agents that are able to cope without being able to get any form of acknowledgement from the recipient.

Self-maintaining database.

Since your goal is to minimise FTE figures for SIEM, you should also seek out a system that is self-maintaining. It needs to be a database that does not require a database administrator. Therefore it needs jobs to clean up and rotate and archive stale data without intervention.

Ease of use

If you can't teach an analyst how to use the major features in a couple of hours, then the SIEM system will be destined to be a boat-anchor. Eventually, enough knowledgeable people will leave the company, and no one knows how to use the system unless it is very simple to use. This is another case for a good role-based SIEM so you spread the skills lightly though the organisation and minimise the risk of a skills-desert. When evaluating a SIEM solution, a good trick is to examine the length and pre-requisites for a formal vendor-lead training course. Similarly, if a system demands hundreds of hours of professional services despite your skilled workforce, then look elsewhere.


As explained, there is no single standard log format. Your SIEM system needs to cope with new formats on a regular basis, and be able to not only customise existing parsing patterns, but also make it as simple as possible to integrate unusual or specialised user-generated log formats. The system should minimise overheads for upgrade and work off a knowledge-base feed in favour of application upgrades.

Standard reporting and compliance.

Your SIEM purchase should come with the benefit of the knowledge of highly skilled engineers and auditors ready-built into the system. Therefore you should check for the availability of canned reports for important and common compliance criteria like:

  • GLB
  • SOX
  • ISO 270001
  • (and more)

Your SIEM needs to be easily tuned to churn out reports in a format directly suitable for any of these auditors. If you can do that automatically and frequently, then suddenly "compliance" is elevated from a paper-accreditation, into a proper grown-up security measure.


There are two kinds of correlation. One is static, and the other is near real-time.

A static correlation is a tool that lets you use any particular parameter to re-mine the database of logs to see what else it is associated with. For example, let's say a contractor JJJ raises an excessive logon failure and you correlate on JJJ. You might find that he or she has been pinging scores of servers at odd times of the night for weeks. This kind of behaviour should raise suspicion.

With a near real-time correlation engine, you expect the system to recognise JJJ is doing odd things. You want it to be intelligent enough to know that someone cannot log on to server X from London, then fly to Paris and log on to the same server with the same credentials within one hour. This kind of stateful log-correlation engine should come with many canned rules that are likely to fire when something odd is happening to your network. It should be able to detect slow attacks like slowloris.

You can think of these advanced correlation engines as something that creates significant events out of a collection of otherwise benign events.

How to justify SIEM

There are three main business units that should simultaneously pool budget for SIEM. These are:

  1. Operations
  2. Security
  3. Compliance

The operations team will save money by minimising downtime and pre-empting problems with critical business infrastructure.

The security team has a tool to demonstrate and justify the effectiveness of their existing systems, and also to enhance the whole security posture with event correlation, and offer advanced forensics to internal and external bodies as required. If only one virus outbreak of data-loss is prevented, many thousands of dollars is saved because recovery from any security breach is tremendously expensive.

The compliance team is continually faces with an expensive challenge to gather enough information for an auditor. A good SIEM system drastically reduces the cost in time and money to do this, and more importantly, allows a continuous audit instead of a near useless quarterly snapshot.


SIEM is essential today. Logs are the closest you can get to the programmer and most organisations are either wasting a huge resource by not collecting logs at all, or have a vast untapped resource going to waste.

This is only a short article that touches on a few areas of a huge topic and so I'll be happy to answer any questions.


    0 of 8192 characters used
    Post Comment

    No comments yet.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: ""

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)