Avoiding Unplanned Data Center Downtime
The Nightmare of Unplanned Downtime
An unplanned downtime is the worst nightmare for any person responsible for running a data center. A single incident of unplanned outage can have a significant impact on the business as business become more and more dependent upon datacenters to conduct their operations. The cost impact of unplanned data center outage may differ between businesses and incidents as it depends upon the size of the business, number of transactions, size of the data center, and duration of outage.
An unplanned downtime also leads to recovery costs for getting things up again and reputational loss to the business. Though an accurate assessment of outage cost may not be possible, a study conducted by Ponemon Institute pegs the average cost per outage incident at $690,204.
Causes of Data Center Downtime
The causes for downtime can vary from predictable causes like primary power outage, UPS failure, IT equipment failure, weather problems and some strange causes like a squirrel chewing up telecom wires. While the study by Ponemon institute suggests UPS malfunctioning as the leading cause of outages, another study suggests human error to be the most probable cause of unplanned downtime.
Reaction Time Matters
Whatever may be the cause of downtime, it is important to get things up and working again in the least possible time. As the duration of downtime increase, so do losses to the business. The first step in getting datacenter running again is to detect and identify the cause of down time.
Whether the outage is due to power cut, a broken cable, a server failure or anything else should be identified quickly. In a survey conducted on data center managers, only 26% said they could locate the server that has been down within minutes. Since a data center has thousands of assets and any of them can go down, it is essential to have real-time access to data about the location and functioning of each asset.
Holistic Approach to Capacity Management is Key
Traditionally, data center capacity management has been a divided task between IT and Facilities teams with the IT team taking care of servers and other IT equipments, and the Facilities team taking ownership of space, power and cooling. A new approach is now emerging which considers all four components i.e. IT, space, power and cooling as means of service delivery and strives to ensure consistent service delivery by making all of them functioning properly.
With the new approach, it is easier for data center managers to plan for resources, build redundancies, and increase capacity when needed. They can easily decide whether they need to add more CPU cores, replace UPS or increase cooling in a particular section. This holistic approach, along with a real-time knowledge of functioning of assets, helps data center managers make decisions critical to avoid unplanned downtime.