Nonlinear Multilayer Backpropagation Networks

Why?
  1. Linear approximation networks are too restrictive and nonlinear approximation networks offer much greater capacity.
  2. In order to enhance the approximation capabilities, it is critical to expand a single-layer structure to a multilayer network.

  3.  

     
     
     

The number of nonlinear layers of a multilayer network is defined to be the number of the weight layers, instead of neuron layers. For example, the network depicted in next figure:

This network is called a three-layer network because it has three weight layers. In linear systems, there si no real benefit to cascading multiple layers of linear networks. The equivalent weight matrix for the total system is simply the product of the weight matrices of different layers.

The situation is quite different if nonlinear hidden neuron unit are inserted between the input and the output layers. In this case, it seems natural to assume that the more layers used, the greater power the networks possesses. However, it is not the case in practice. An excessive number of layers often proves to be improductive. It may cause slower convergence in the backpropagation learning. Two posible reasons are that the error signals maybe numerically degraded when propagating across too many layers and that extra layers tend to create additional local minima. Thus, it is essential to identify the proper number of layers. Generally speaking, two layer network should be adequate as universal approximators of any nonlinear functions. It has been futher demonstrated that a three layer network suffices to separate any (convex or nonconvex) polyhedral decision region from its background. In summary, two or three layers should be adequate for most applications.

Backpropagation Algorithm

 The Bacpropagation algorithm offers an effective approach to the computation of the gradients. This can be applied to any optimization formulation as well as the DBNN formulation.

A linear basis function (LBF) multilayer network is characterized by the following dynamics equations

where the input units are represented by , the output units by , and where L is the number of layers. The activation function is very often a sigmoid function. Other dynamics equations are also of possible interest. For example, it is very popular to use Gaussian activation function on radial basis functions (RBF).

The objective of this algorithm is to train the weights  so as to minimize E. The basic gradient-type learning formula is

with the m training pattern, , and its corresponding teacher , m=1, 2, ..., M, presented. The derivation of the BP algorithm follows the chain rule technique:
where the error signal  is designed as
Backpropgation Rule for Approximation Based Networks

 The aforementioned algorithm can be applied to training approximation-based networks. In this case, the objective is to train the weights  and the thresholds , so as to minimize the least-squares-error between the teacher and the actual response. That is,
 
 

where M is the number of training patterns and N is the dimension of the output space. The back-propagation algorithm can be summarized in two steps:
 
 
  1. The error signal  can be obtained recursively by backpropagation:

  2.  

     
     
     

  3. Based on this equations, the synaptic weights between the (lth and (l-1)th) layers can be updated recursively (in order of l=L,L-1, ...,1)

  4.  

     

    The recursive formula is the key to back-propagation learning. It allows the error signal of a lower layer  to be computed as a linear combination of the error signal of the upper layer . In this manner, the error signals  are back propagated through all the layers from the top to the down. This also implies that the influences from an upper layer to a lower layer (and vice versa) can only be effected via the error signals of the intermediate layer.
     
     


Numerical Backpropagation Methods

Contents


Artificial Neural Networks
About this Tutorial