Naïve Bayes in Python

Updated on July 12, 2012

The Naive Bayes Algorithm

The naïve Bayes algorithm is a classifier based on Bayes' theorem. It relies on independence between features, which sometimes necessitates pre-processing (for example, via eigenvalue decomposition). Formally, the algorithm operates under supervised learning.

As implemented below, input data (consisting of feature vectors and corresponding classification) is supplied to the constructor. Maximum likelihood estimation is used to compute the conditional probabilities under assumption of Normality (other probability densities are possible). In the example, examples are either 'A' or 'B' and the feature space consists of two dimensional floating points, however the model is not limited to binary classification and the feature space may be arbitrarily large. For convenience, a prediction method is included, as well as a string representation.

Python Implementation of Gaussian Naïve Bayes

```from __future__ import division
from numpy import *
from scipy.stats import norm

class GaussianNaiveBayes:

def __init__(self, training_set):
self.training_set = training_set

self.labels = {}
for label in [t[1] for t in self.training_set]:
if self.labels.has_key(label):
self.labels[label] += 1
else: self.labels[label] = 1

self.probl = {}
self.N = len(self.training_set[0][0])
self.cmean = {}
self.cstdv = {}
self.prob = {}
for label in self.labels.keys():
self.probl[label] = self.labels[label]/len(self.training_set)
example_set = [e[0] for e in self.training_set if e[1] == label]
self.cmean[label] = [0 for i in range(self.N)]
self.cstdv[label] = [0 for i in range(self.N)]
self.prob[label] = [0 for i in range(self.N)]
for i in range(self.N):
xi = array([e[i] for e in example_set])
self.cmean[label][i] = xi.mean()
self.cstdv[label][i] = xi.std()
self.prob[label][i] = lambda x: norm.pdf(x, xi.mean(), xi.std())

def predict(self, X):
p = {}
for l in self.labels.keys():
cp = prod([self.prob[l][i](X[i]) for i in range(self.N)])
p[l] = self.probl[l]*cp
results = [(p[l],l) for l in self.labels.keys()]
results.sort()
return results[-1][1], results[-1][0]/sum([r[0] for r in results])

def __str__(self):
s = 'Gaussian Naive Bayes Model:\n'
for l in self.labels.keys():
s+= ' P(%s) = %.3f\n'%(l, self.probl[l])
for i in range(self.N):
s+= ' E(X%s|%s) = %.3f\n'%(i+1, l, self.cmean[l][i])
s+= ' VAR(X%s|%s) = %.3f\n'%(i+1, l, self.cstdv[l][i]**2)
return s.strip()

if __name__ == '__main__':

examples = []
examples.append(((0,5),'A'))
examples.append(((2,9),'A'))
examples.append(((1,3),'B'))
examples.append(((2,4),'B'))
examples.append(((3,5),'B'))
examples.append(((4,6),'B'))
examples.append(((5,7),'B'))

nb = GaussianNaiveBayes(examples)
print nb
print nb.predict((1,3))```

Applications of Naïve Bayes

Bayes' classifiers are popular in spam filtering and have also been used predict the incidence of diseases based on clinical features.

6

0

2