In this example, Bayesian Neural Network, that uses Bayesian Regularization training algorithm, is compared to a normal Neural Network, using Scaled Conjugate Gradient training algorithm in curve fitting application (fitting a noisy sine wave). This example demonstrates the generalization property of the Bayesian Neural Network to avoid over-fitting issue.


Noted that the Bayesian Neural Network is a normal Neural Network (also known as a multi-layer feed forward Neural Network) with Bayesian Regularization training algorithm.

 

Load data

The training data-set is the collection of noisy sine wave samples with 1 input and 1 output.


Figure 4.2: Noisy sine wave training set

Configure Neural Network


Figure 4.3: Configure a Neural Network using Bayesian Regularization (left) and Scaled Conjugate Gradient (right) training algorithms.


The Neural Network classifier is configured to have the same structure (1 input node, 6 hidden nodes and 1 output node). The activation function for hidden layer is "Tansig", and "Purelin" is the activation function for output layer. The cost function is "Mean Squared Error", and the training data ratio =75%. The "Min Max" with [-1;1] range is used for pre-processing, and "Min Max" with [0;1] range is used for post processing. The Bayesian Neural Network (on the left of the Figure 4,3) uses Bayesian Regularization training algorithm while the Multi-layer Feed Forward Neural Network (on the right of the Figure 4.3) uses Scaled Conjugate Gradient training algorithm.


Train Neural Network


Figure 4.4: Train the Neural Network Classifier using Bayesian Regularization (on the left) and Scaled Conjugate Gradient (on the right) training algorithms.


The Bayesian Neural Network (on the left of the Figure 4.4) does not need early stopping technique to avoid over-fitting issue, while the multi-layer feed forward Neural Network (on the right of the Figure 4.4) does require early stopping technique to avoid over-fitting issue.


Evaluate Neural Network

Figure 4.5: Regression curve of Bayesian Neural Network (on the left) and that of Feed forward Neural Network (on the right).


As shown in Figure 4.5, better regression is obtained with Bayesian Neural Network (on the left of the Figure 4.5).


Test Neural Network

Figure 4.6: Fit plot of the new sine wave data tested by trained Bayesian Neural Network (on the left) and trained multi-layer feed forward Neural Network.


Although early stopping technique is applied for the multi-layer feed forward Neural Network (on the right of the Figure 4.6), it fails to predict the sine wave with new data-set due to poor generalization. On the other hand, the trained Bayesian Neural Network (on the left of the Figure 4.6) demonstrate its generalization property via excellent prediction result.


Conclusion 

Although the Neural Network with Scaled Conjugate Gradient in this example shows its poor generalization property, other training algorithms such as Levenberg-Marquardt might provide better generalization result when using early stopping technique. However, the generalization property of the Bayesian Neural Network is still attractive, and it appears to be suitable for function fitting applications and classification applications where the number of training samples in the data-set are small.