Why use SIEM?
SIEM stands for "Security Incident and Event Management".
If you have just had, or are about to have a security audit of your data structure, the chances of getting ping'd on logging is high. There is good reason for this—people often don't collect and read logs.
When I first started programming, like many others, I made log files for any significant sized program. The log files contained a time-stamp and some other structured data like a token to describe the category of a logging message. Usually, the contents of this log file was "for a programmer's eye only". Often, I made a debug-mode, and an operational mode and included errors, warnings, and other interesting strings of data. Usually, these were one-per-line—each terminated by a carriage-return.
I, and other programmers in the team used these log files for debugging, and for ferreting-out operational issues. They were very useful.
If you have access to the original source code, then any mysterious events or logs could easily be put into context by searching for the error number (often called a Vendor Message Identifier or VMID). This allowed a different programmer to maintain the operational status of the code, and help make enhancements.
While that's all quite brilliant and sensible, it was a monster in the making. You see, every programmer made up their own format—and mostly, they still do. This is why there are now hundreds of thousands of different formats. It makes it impossible for any single mind to comprehend the sum of all logging formats in a modern enterprise.
It's worse than that though. The sheer volume of data now being generated by these log files is truly overwhelming. One can find enterprises that generate more than a billion logs every day. A billion logs a day demands a service time of about 12,000 logs per second.
If one person could inspect 10 logs a second and concentrate for 6 hours a day, this would only cover 216,000 logs. You would need a workforce of 6/24*1200 = 300 people, working 365 days a year just to pick out any significant events from that sea of logs. This of course leaves no time for remediation or mitigation.
SIEM is a machine-assisted tool to help pick out significant events from millions upon millions of logs.
Components of SIEM and terminology.
That which produces a log file is called a log source. The log sources must then be processed by a parsing engine. This engine is responsible for breaking the log into categories, normalising structured data, and storing the original log with meta-data into a suitable database.
The idea of normalising data is easily explained by an example. An event that means someone logged on to something is classified as an audit event, then further tagged as an authentication event, and a successful log on. At this level of parsing, meta-data is created that creates new columns of standardised information like "Origin of the log", "Destination of the log", "User ID", VMID, "Category of event" and so on.
When you are searching for a particular event—like the log on event—you only need to know that it is called "successful log on". This is because your searches and queries are most often done on the meta-data. It's standardised despite the fact that there could be a thousand different representations of a successful log on event inside each of many log sources.
This is a good time to emphasise the difference between a log and an event. A log is a single line or group of lines of raw data. An event is something significant about the particular log. To make SIEM useful, and hence get best value from your logs, you need to suppress uninteresting low-value logs and promote the importance of interesting logs. The latter, we call 'events'. The ratio of events to logs should be typically about 1 in 10.
Instantly, you can now understand how valuable it is to highlight a hundred events from a thousand logs without human intervention. Our 300-staff workforce mentioned earlier is reduced to 30.
We can do better. For every 100 events, if they are categorised and prioritised well, there should be only the occasional critical event. These might include things like:
- A program crashed
- A fan failed
- A brute force attack succeeded
- Someone just locked out their account
- Disk space just hit 95% on server X
To improve on this again, we get the SIEM system to assign a severity to each of these most significant events. Many of the events could rest until appearing in a periodic summary report on a manager's desk. Some however, should be alarms.
An alarm should be assigned to an event that precedes an operational or security-related disaster. I'd guess in many enterprises, it would be worth the cost of SIEM to simply alert on server disk space and fan failures, but it's easily extended to include unauthorised access to specific files, and the appearance of unsanctioned processes to cite only a few business cases.
You can check for activity of any account that is supposed to be retired after a system administrator leaves the company.
SIEM should integrate with your authentication system like an Active Directory (AD) server. In that situation, whole groups of people may be classified in any suitable way. If you have an AD group of known database administrators, then the SIEM could be used to detect when someone NOT in that list is granted elevated privileges to database servers.
You can keep track of contractor's time, what resources they use, when they log on, when they log off, and what systems they touch. It's not possible to do this without centralised logging.
With such automation, our 30-workers can now be replaced by anything from typically less than one Full Time Employee (FTE) for most companies to perhaps three for a truly huge organisation.
Other desirable features.
Like Active Directory integration, an ability to work with abstract containers of arbitrary objects will decrease the amount of maintenance needed by a SIEM administrator. If you categorise all servers that process (say) credit-card data into a single container called, "PCI-Servers", then you can write investigative reports and alarms based on that list. You might develop a hundred different alarms and reports over time, but only ever need to update the list. This level of abstraction is essential to the ongoing usability of a SIEM system.
One tenet for good security is to apply job-rotation. This means you need to assign roles, and fill those roles not only with people, but also with what they are allowed to do. Your SIEM system should accommodate this by permitting abstracted roles like 'analyst' and 'root admin' or 'read-only auditor'. You should be able to expose certain alarms and reports to one role, and exclude from another. This idea fits with the tried-and-tested "need to know" technique.
One way agents
There are two accepted methods of log collection. One is to passively wait for a log source to send it, and the other is to actively poll the log source. The latter is called 'agent-less' or 'remote' log collection'. Both these methods demand a two-way flow of data. Sometimes, this data must flow through a firewall, in which case, you will need a secure method of getting the data. In many cases, an agent brokers the traffic on-behalf of a particular security zone and packages it into a secure stream to forward to the system that parses the logs. In rarer but more secure establishments, the security architecture might employ a multi-level system where low-classified data is allowed to travel up to a higher classification, but never the other way around. These types of highly secure systems might employ 'data diodes' which are optical transceivers with one direction disabled. They are housed in tamper-evident and resistant housings and certified at great cost to the manufacturer. When you use them, it means you need one-way-agents. These are agents that are able to cope without being able to get any form of acknowledgement from the recipient.
Since your goal is to minimise FTE figures for SIEM, you should also seek out a system that is self-maintaining. It needs to be a database that does not require a database administrator. Therefore it needs jobs to clean up and rotate and archive stale data without intervention.
Ease of use
If you can't teach an analyst how to use the major features in a couple of hours, then the SIEM system will be destined to be a boat-anchor. Eventually, enough knowledgeable people will leave the company, and no one knows how to use the system unless it is very simple to use. This is another case for a good role-based SIEM so you spread the skills lightly though the organisation and minimise the risk of a skills-desert. When evaluating a SIEM solution, a good trick is to examine the length and pre-requisites for a formal vendor-lead training course. Similarly, if a system demands hundreds of hours of professional services despite your skilled workforce, then look elsewhere.
As explained, there is no single standard log format. Your SIEM system needs to cope with new formats on a regular basis, and be able to not only customise existing parsing patterns, but also make it as simple as possible to integrate unusual or specialised user-generated log formats. The system should minimise overheads for upgrade and work off a knowledge-base feed in favour of application upgrades.
Standard reporting and compliance.
Your SIEM purchase should come with the benefit of the knowledge of highly skilled engineers and auditors ready-built into the system. Therefore you should check for the availability of canned reports for important and common compliance criteria like:
- PCI DSS
- ISO 270001
- (and more)
Your SIEM needs to be easily tuned to churn out reports in a format directly suitable for any of these auditors. If you can do that automatically and frequently, then suddenly "compliance" is elevated from a paper-accreditation, into a proper grown-up security measure.
There are two kinds of correlation. One is static, and the other is near real-time.
A static correlation is a tool that lets you use any particular parameter to re-mine the database of logs to see what else it is associated with. For example, let's say a contractor JJJ raises an excessive logon failure and you correlate on JJJ. You might find that he or she has been pinging scores of servers at odd times of the night for weeks. This kind of behaviour should raise suspicion.
With a near real-time correlation engine, you expect the system to recognise JJJ is doing odd things. You want it to be intelligent enough to know that someone cannot log on to server X from London, then fly to Paris and log on to the same server with the same credentials within one hour. This kind of stateful log-correlation engine should come with many canned rules that are likely to fire when something odd is happening to your network. It should be able to detect slow attacks like slowloris.
You can think of these advanced correlation engines as something that creates significant events out of a collection of otherwise benign events.
How to justify SIEM
There are three main business units that should simultaneously pool budget for SIEM. These are:
The operations team will save money by minimising downtime and pre-empting problems with critical business infrastructure.
The security team has a tool to demonstrate and justify the effectiveness of their existing systems, and also to enhance the whole security posture with event correlation, and offer advanced forensics to internal and external bodies as required. If only one virus outbreak of data-loss is prevented, many thousands of dollars is saved because recovery from any security breach is tremendously expensive.
The compliance team is continually faces with an expensive challenge to gather enough information for an auditor. A good SIEM system drastically reduces the cost in time and money to do this, and more importantly, allows a continuous audit instead of a near useless quarterly snapshot.
SIEM is essential today. Logs are the closest you can get to the programmer and most organisations are either wasting a huge resource by not collecting logs at all, or have a vast untapped resource going to waste.
This is only a short article that touches on a few areas of a huge topic and so I'll be happy to answer any questions.