Failstop faults and Byzantine faults in a Fault-tolerant system
Within the discipline of computer science and electronic engineering, a great deal of thought has gone into this area in order to meet the requirements of fault-tolerant systems.
Within fault-tolerant systems, there are two different types of faults that are of most interest to us: failstop faults and Byzantine faults.
In many ways these two faults represent extremes of a spectrum: a failstop fault occurs when a faulty process ceases operation and other processes are notified of the fault - a kind of failure rare in real-world systems, where processes tend to fail and other processes are left to pick up the pieces as best they can.
At the other end of the scale is a Byzantine fault, where a fault process may continue to operate but can send arbitrary messages. Byzantine refers to the Byzantine Generals' Problem, an agreement problem in which generals of the Byzantine Empire's army must decide unanimously whether to attack some enemy army. The problem is complicated by the geographic separation of the generals, who must communicate by sending messengers to each other, and by the presence of traitors amongst the generals. Byzantine fault condition is named after a model of a distributed system where 'generals' are 'bribed' to disrupt the operation of a system, a situation similar to some types of attack from internal entities in a system, where a number of entities might collaborate to provide false information.