22
There can be many mathematical expressions for the probability, but one of the most reliable
ones, which is commonly used with NaΓ―ve Bayes, is the Gaussian distribution (aka normal
distribution), which assumes that π is a continuous variable:
π(π|π = π) =
1
βππ
π
2
π
β(π₯βπ
π
)
2
2π
π
2
Equation 5: probability of X being class Y when X follows Gaussian distribution
, π is the variance, π is the mean value of the data, and π is calculated for a
given class π of π.
The Gaussian NaΓ―ve Bayes is a simple, fast, and very effective algorithm, that can even
outperform high complexity models. It can predict multiclass datasets, especially of categorical
labels, and can perform well with less training data than other algorithms, as long as the
condition for independence of the variables holds. On the other hand, the probabilistic nature
of the algorithm comes with many conditions. If the input variables are not independent (which
is rarely the case in real life) then the model underperforms significantly, as we will see in our
own results too. Another big problem is that if a class that is present in the test set has not
appeared in the training set, then the model assigns that class zero possibility.
3.5. Multi-Layer Perceptron
The multilayer perceptron (MLP) is a fully connected Artificial Neural Network (ANN). it feeds
the input form the input layer to the hidden layer by taking the dot product of the input values
with the weight parameters that exist at the interconnection of all nodes of the ANN. When all
the input nodes are weighted, they add up at the entrance of the next layerβs node, where
their resultant value passes an activation function (e.g., sigmoid, ReLU, tanh).
Figure 5: the multilayer perceptron, a fully connected feedforward ANN