# Naïve Bayes in Python

## The Naive Bayes Algorithm

The naïve Bayes algorithm is a classifier based on Bayes' theorem. It relies on independence between features, which sometimes necessitates pre-processing (for example, via eigenvalue decomposition). Formally, the algorithm operates under supervised learning.

As implemented below, input data (consisting of feature vectors and corresponding classification) is supplied to the constructor. Maximum likelihood estimation is used to compute the conditional probabilities under assumption of Normality (other probability densities are possible). In the example, examples are either 'A' or 'B' and the feature space consists of two dimensional floating points, however the model is not limited to binary classification and the feature space may be arbitrarily large. For convenience, a prediction method is included, as well as a string representation.

## Python Implementation of Gaussian Naïve Bayes

from __future__ import division from numpy import * from scipy.stats import norm class GaussianNaiveBayes: def __init__(self, training_set): self.training_set = training_set self.labels = {} for label in [t[1] for t in self.training_set]: if self.labels.has_key(label): self.labels[label] += 1 else: self.labels[label] = 1 self.probl = {} self.N = len(self.training_set[0][0]) self.cmean = {} self.cstdv = {} self.prob = {} for label in self.labels.keys(): self.probl[label] = self.labels[label]/len(self.training_set) example_set = [e[0] for e in self.training_set if e[1] == label] self.cmean[label] = [0 for i in range(self.N)] self.cstdv[label] = [0 for i in range(self.N)] self.prob[label] = [0 for i in range(self.N)] for i in range(self.N): xi = array([e[i] for e in example_set]) self.cmean[label][i] = xi.mean() self.cstdv[label][i] = xi.std() self.prob[label][i] = lambda x: norm.pdf(x, xi.mean(), xi.std()) def predict(self, X): p = {} for l in self.labels.keys(): cp = prod([self.prob[l][i](X[i]) for i in range(self.N)]) p[l] = self.probl[l]*cp results = [(p[l],l) for l in self.labels.keys()] results.sort() return results[-1][1], results[-1][0]/sum([r[0] for r in results]) def __str__(self): s = 'Gaussian Naive Bayes Model:\n' for l in self.labels.keys(): s+= ' P(%s) = %.3f\n'%(l, self.probl[l]) for i in range(self.N): s+= ' E(X%s|%s) = %.3f\n'%(i+1, l, self.cmean[l][i]) s+= ' VAR(X%s|%s) = %.3f\n'%(i+1, l, self.cstdv[l][i]**2) return s.strip() if __name__ == '__main__': examples = [] examples.append(((0,5),'A')) examples.append(((2,9),'A')) examples.append(((1,3),'B')) examples.append(((2,4),'B')) examples.append(((3,5),'B')) examples.append(((4,6),'B')) examples.append(((5,7),'B')) nb = GaussianNaiveBayes(examples) print nb print nb.predict((1,3))

## Applications of Naïve Bayes

Bayes' classifiers are popular in spam filtering and have also been used predict the incidence of diseases based on clinical features.

## Comments

No comments yet.