The objective of the learning phase is to determine the best discrimnant
functions, which in turn dictate the decision boundaries. The linear perceptron
was designated to separate two classes by a linear decision boundary, and
it has later evolved into a good number of more sophisticated variants.
We will distinguish between the Linear Perceptron
for Binary Classification and Linear Perceptron
for Multiple Classification.
Linear Perceptron for Binary Classification
The basic structure of a linear perceptron is shown in
this figure, with a linear discriminant
We can regard for convenience the threshold value
just as an aditional weight parameter. Denote ,
that is, z is the augmented pattern x. Now the linear discrimnant
function can be rewritten as
Recall that the decision value is the binary, that is,
A pattern is classified as when d
=1, it belongs to . The teacher determines
whether the pattern is correctly classified. When and only when a misclassification
occurs, the network will be adjusted.
Upon the presentation of the mth training pattern ,
the weight vector is updated as
where is a positive learning rate.
More precisely, the above learning rule can be viewed from two perspectives:
The training will take as many sweeps
as required, in each sweep all the M
training patterns are presented. At the end of each sweep, the initial
weights are set to
before the next sweep is started. If
there is no misclassification over one entire sweep,
thus no learning incurs in the sweep
and the training process should be terminated.
Constant Learning Rate
The convergence speed for a constant-rate perceptron varies greatly,
depending on the choice of learning rates. If it is too small, it will
be very slow. On the other hand, if it is too large, it can cause numerical
problems. Te convergence speed does not depend on how large is the region
of feasible solution in the w-space.
Linear Perceptron for Multiple Classification
The basic percptron can be extended to the problem of classifying multiple
(e.g., L) classes. For this purpose, the following important features are
incorporated into the general DBNN:
One subnet is designated for one class, that is, a OCON
structure. See this figure.
The linear discriminant functions for the subnets are denoted as ,
for i = 1, ..., L. The discriminant function provides the score
for each subnet (or each class).
A MAXNET is used to
select the subnet (or class) with the winning score.
The output is usually a symbol labeling the winner of the subnets. See
The following mutual training scheme can be used. This output
symbol will be compared with the teacher symbol. If the two symbols match,
then the network will be left alone until a future training pattern is
presented. If the net mismatch, then the weights will bw updated by the
reinforced and antireinforced learning rules.
Supose that is a set of given
training patterns, with each element
belonging to one of the L classes ;
and that the discriminant functions are
for i = 1, ..., L. Suppose that the
presented is known to belong to class ;
and that the winning class for the pattern is denoted by an integer j,
that is, for all ,
The other weights remain unchanged:
for al and .
When j=i, then the pattern is
already correctly classified, so no update will be needed.
When , that is,
is still misclassified, then the following update will be performed:
Decision Based Neural Networks
About this Tutorial
Artificial Neural Networks