What is Redundancy?
Colocation & Redundancy
In previous articles I have looked at various aspects of data centres, how environmentally friendly they are and the features that they often have. When learning about data centres, an important concept to understand is ‘redundancy’. Although the term can be used to describe a form of dismissal from a job, in more technical circles it is a useful concept that can help to provide back-ups and extra security in the event of equipment failure or a power cut. You will often find various forms of redundancy being used in colocation centres.
Colocation is a form of web hosting where space in a data centre is rented out to businesses and individuals. Instead of all the servers and equipment in the centre belonging to one company, in a colocation centre the servers of different businesses and customers are housed. You can often also rent a server if you do not have one of your own and can choose a managed hosting plan, where the data centre looks after the maintenance and upkeep of the server for you. The data centre will also provide the bandwidth and the power for the servers. The main aim of any data centre is to achieve high levels of uptime and to make sure that customers’ systems are available at all times. This is where the concept of redundancy is useful.
The Concept of Redundancy
In basic terms, redundancy is where important aspects of the system are duplicated. Through doing this with their equipment data centres can vastly improve their uptime and the reliability of their whole operation. With redundancy measures in place, if a piece of equipment fails or needs replacing, then another component can immediately take its place and the system remains up and running. Without redundancy, if a component breaks it could bring down the whole network and important data could be lost. Colocation centres will often implement ‘N+1 redundancy’ (or ‘parallel redundancy’), this is where any piece of equipment or component has one additional unit and the resource is shared between them. For example, if you have one server - you should add an additional server and they should both be running at 50% power or capacity. This means that if one server fails, the other server can immediately take over and will not overload. The process scales up, so if you have two servers, you should add an additional server and the three units should be running at 66% capacity.
An Engineering Example
Redundancy is a concept that is also found in engineering and building works as well as in the data centre. Suspension bridges are a particularly good example of redundancy in action. Suspension bridges are held up by a large amount of cables - often made of metal - where the weight is distributed evenly between them. The weight that each cable is supporting is not the maximum amount of weight that it can hold, i.e. each cable has the capacity to support more weight. This means that if a cable should break or is cut the remaining cables would be able to support the extra weight and the bridge would not collapse. Redundancy in data centres does not need to utilise as many additional units as there are cables in a suspension bridge, but it is advised that the capacity or power of a unit should be spread between three separate units as this will vastly improve reliability.
The main problems that redundancy helps to guard against are power failures and power surges. If there is a power failure it can bring the whole centre to a halt and can result in the loss of customers’ data or making their website unavailable. Equally a power cut or a power surge can stop the air conditioning and cooling systems from functioning and could lead the servers, which need to be kept at precise temperatures, to overheat and crash. In order to avoid the problems caused by power cuts and surges, redundancy can be implemented. Most colocation centres will receive their power from at least two separate power supplies, often more. If there is a power cut at one of the power supply stations, the others can easily make sure that the data centre continues to receive enough power to run. In the highly rare case that all external power supplies fail, most centres will have a backup in the form of an ‘uninterruptible power supply’ or ‘UPS’ that can temporarily power the centre until the supply comes back on. Backup power options such as UPSs and generators will utilise N+1 redundancy, so that if the backup fails, there is a backup for the backup!
Another area that redundancy can be used is for the internet connection as it is highly crucial that data centres stay connected to the internet at all times. The vast majority of colocation and data centres will utilise several telecoms providers and will have the centre connected via several different fibre optic cables. This means that if the connection is lost from one provider or one cable, it will not affect the whole centre. Connecting to the internet through both a wired connection and wirelessly is another important form of redundancy. Applying redundancy to the internet connection can help data centres to achieve high levels of latency and can ensure that connectivity is maintained almost 100% of the time.
Air Cooling & Conditioning
The final way that redundancy is used in data centres is in the temperature control and air conditioning systems. As data centres are constantly running large numbers of machines, the temperature can quickly rise and servers, which need to be kept at precise temperatures, can easily overheat. So, much like the internet connection and the power supply, the cooling and temperature regulation system needs to be kept online 24/7. The Computer Room Air Conditioner units, or CRACs, will utilise N+1 redundancy so that the centre will stay cool, even if a CRAC malfunctions. N+1 redundancy will also be applied to the pipes that are connected to the CRACs and the ‘chillers’ that support the CRACs. Redundancy, therefore, is a hugely important concept in any form of data centre. Most centres will advertise their high levels of uptime, the way they are able to follow through on their claims is through using redundancy.