Next: 3 Implementation issues Up: Learning in Non-superpositional Quantum Previous: 1.3 Quantum learning

2 A non-superpositional quantum neurocomputer

Hopfield [Hopfield, 1982] popularized the idea that any physical dynamical system could be constrained to serve as a neural network, with fixed points of the system acting as ``memories'', which could be recalled associatively, in a content-addressable manner. However, the learning algorithms availble for the attractor-based neural networks on which Hopfield focussed are not as powerful as those for feed-forward networks, for which Hopfield's observation also applies. It is for this reason, and because the feed-forward back-propagation network is the one most likely to be familiar to a general audience, that the exmaple here will be based on a feed-forward network architecture. The principles and insights, however, generalize to other kinds of neural network.

Similarly, many different quantum situations could be used to implement a learning algorithm; I will stick to the situation of a barrier with slits in front of a photo-sensitive plate, because of its familiarity. It is very likely that such a situation would prove impractical in order to obtain some of the computational advantages (especially those of speed and size) of quantum computing, but as such advantages are relatively obvious, and are not the advantages on which I will be concentrating, this impracticality need not concern us much.

The details of feed-forward networks involve units, activation values, layers, and weights. But what is important at first is that:

feed-forward networks realize parameterized non-linear functions from an input space to an output space;
networks modify these parameters in response to interaction with their environments (usually via a training set) so that the function realized by each network better approximates some intended function.

Figure 1: A feed-forward quantum neurocomputer. See text for key.

A quantum implementation of this kind of network is illustrated in figure 1. A particle beam G is set up facing a barrier B, behind which is a photo-sensitive plate P. The barrier has several slits in it, which are responsible for the famous interference patterns on the plate. Some of the slits are designated to be input slits, and the rest are the weight slits. The interference pattern on the plate P that results from the beam G being aimed at the barrier B with a particular input slit configuration is the output of the system for that input. Thus, use of the system to perform some computation will require two mappings: one (I) from inputs (e.g., character strings, images, queries, etc.) to input slit configurations, and one (O) from interference patterns to outputs (e.g., classifications, stored data, etc.).

Assume that one already has a mapping from inputs to n-vectors, and a mapping from m-vectors to outputs. One could map these input n-vectors to input slit configurations by, for example, dividing the barrier into n sections, each having an input slit, with the relative position of the ith slit within the ith section of the barrier indicating the value of the ith coordinate of the input vector (this is the type of arrangement depicted in figure 1). For binary input vectors, this mapping could be even simpler: there is a slit in the nth section of the barrier if and only if the nth coordinate of the input vector is ``1''.

Less straightforward is the mapping O from interference patterns to outputs. Since interference patterns are of high-dimensionality, any dimension-reducing mapping will do. Perhaps the plate could be divided into m sections, and the average intensity within the mth region can serve as the mth coordinate of the output vector. For the case of binary output vectors, a soft (sigmoid) thresholding of this value would do (the thresholding must be soft so as to allow differentiation for back-propagation learning; see below).

The error of the system, E, is defined to be some function of the desired ( ) and actual ( ) output vectors, typically . If is the function that yields an interference pattern given an input and a weight slit configuration , then .

Given some such setup, the system could learn an associative mapping f in the following way:

a number of samples are collected as input/output pairs for training;
the system's weight slits are randomized;
for each training sample , the following occurs:
- the input slits are initialized according to ; the plate is cleared or replaced;
- the beam is activated, until an interference pattern of sufficient resolution is produced on the plate, and an output is produced according to ;
- for each weight slit , the partial derivative of E with respect to is estimated. This is done by calculating ;
- this estimate is used to calculate the change to be made to the weight slit in such a way that gradient descent in error is achieved: the change in is proportional to the negative of the partial of E with respect to .

After several passes through the training set, this procedure will ensure that the system settles into a weight configuration that produces minimal error on the training set. In many cases, this will also result in good performance on other samples drawn from the same source as the training set (i.e., generalization).

This is enough to establish a correspondence between neural nets and a quantum system. The correspondence can then be used to suggest how the many variations on connectionist learning (e.g., recurrency, competitive learning, etc.) also could be implemented in a quantum system.

Next: 3 Implementation issues Up: Learning in Non-superpositional Quantum Previous: 1.3 Quantum learning

Ron Chrisley
Wed Nov 20 01:10:59 GMT 1996