R. Chrisley, E. McDermott, S. Katagiri
A discrepancy is noted between the error measure implied by standard objective functions used for the training of back-propagation networks and the actual performance error for such networks. Specifically, if one uses such a network for pattern classification, with one output node per class, and the most active output node indicating the network's classification of the input, then standard objective functions will 1) ascribe non-zero error to network states that are classifying correctly and 2) modify the network more than is necessary to account for incorrectly classified input, thus violating the ``minimal disturbance principle." It is hypothesized that objective functions that lack these two characteristics will more closely reflect the actual recognition error and thus their use in learning will result in better performance (i.e., fewer classification errors). Several such functions are presented, and a few are benchmarked against standard error functions on phoneme recognition tasks. Two of the methods show a consistent improvement in performance on a small (BDG) task, but result in worse performance for a large (all consonants) task.
This paper is not available online