Why The Turing Test is AI's Biggest Blind Alley

Blay Whitby 1997

Alan Turing's 1950 paper, _Computing Machinery and Intelligence_ (Turing 1950) and the Turing test suggested in it are rightly seen as inspirational to the inception and development of AI. However, inspiration can soon become distraction in science, and it is not too early to begin to consider whether or not the Turing test is just such a distraction. What this chapter argues is that this is indeed the case with the Turing test and AI. AI has had an intimate relationship with the Turing test throughout its brief history. The view of this relationship, presented in this paper is that it has developed more or less as follows:

1950 - 1966: A source of inspiration to all concerned with AI.

1966 - 1973: A distraction from some more promising avenues of AI research.

1973 - 1990: By now a source of distraction mainly to philosophers, rather than AI workers.

1990: Consigned to history. 1

One conclusion that is implied by this view of the history of AI and Turing's 1950 paper is that for most of the period since its publication it has been a distraction. While not detracting from the brilliance of the paper and its central role in the philosophy of AI, it can be argued that Turing's 1950 paper, or perhaps some strong interpretations of it, has, on occasion, hindered both the practical development of AI and the philosophical work necessary to facilitate that development.
Thus one can make the claim that, in an important philosophical sense, Computing Machinery and Intelligence has led AI into a blind alley from which it only just beginning to extract itself. It is also an implication of the title of this chapter that the Turing test is not the only blind alley in the progress of AI. Although this chapter makes no examination of this implication, it is one that I am happy to accept.
One main source of this distraction has been the common, yet mistaken reading of Computing Machinery and Intelligence as somehow showing that one can attempt to build an intelligent machine without a prior understanding of the nature of intelligence. If we can, by whatever means, build a computer- based system that deceives a human interrogator for a while into suspecting that it might be human, then we have solved the many philosophical, scientific, and engineering problems of AI! This simplistic reading has, of course, proved both false and misleading in practice. The key to this would seem to be the mistaken view that Turing's paper contains an adequate operational definition of intelligence. A later section of this chapter suggests an interpretation of Computing Machinery and Intelligence and the 'imitation game' in their historical context. This interpretation does not imply the existence of an operational definition of intelligence.
That the paper was almost immediately read as providing an operational definition of intelligence is witnessed by the change from the label, 'imitation game' to 'Turing test' by commentators. Turing himself was always careful to refer to 'the game'. The suggestion that it might be some sort of test involves an important extension of Turing's claims. This is not some small semantic quibble, but an important suggestion that Turing's paper was being interpreted as closer to an operational test than he himself intended. If the Turing test is read as something like an operational definition of intelligence, then two very important defects of such a test must be considered. First, it is all or nothing: it gives no indication as to what a partial success might look like. Second, it gives no direct indications as to how success might be achieved. These two defects in turn have two weak implications. The first is that partial success is impossible, i.e. intelligence in computing machinery is an all-or-nothing phenomenon. The second is that the best route to this is by imitating human beings. Readers will see the flaws in these arguments without difficulty, no doubt, but it is hard to deny that much AI work has been distracted by a view of intelligence as a holistic phenomenon, demonstrated only by human beings, and only to be studied by the direct imitation of human beings.

To avoid the charge of setting up 'straw men', it will be argued in the remainder of this paper that the general misreadings of Turing's 1950 paper have led to the currency of three specific mistaken assertions, namely:

1) Intelligence in computing machinery is (or is nearly, or includes) being able to deceive a human interlocutor.

2) The best approach to the problem of defining intelligence is through some sort of operational test, of which the 'imitation game' is a paradigm example.

3) Work specifically directed at producing a machine that could perform well in the 'imitation game' is genuine (or perhaps even useful) AI research.

This paper does not pursue the falsity of Assertions 1 and 2 in any great detail. On Assertion 1 it should be sufficient to remark that the comparative success of ELIZA (Weizenbaum 1966), and programs like it, at deceiving human interlocutors could not be held to to indicate that they are closer to achieving intelligence than more sophisticated AI work. What we should conclude currently about this sort of AI work is that it represents research into the mechanisms of producing certain sorts of illusion in human beings rather than anything to do with intelligence, artificial or otherwise.
Other writers have convincingly attacked Assertion 2 on the grounds that the 'imitation game' does not test for intelligence but rather for other items such as cultural similarity (French 1996 and Michie 1996). Furthermore an all-or-nothing operational definition, such as that provided by the Turing test, is worse than useless for guiding research that is still at an early stage.
Assertion 3 and its effects on the history of AI are clearly the most important for the purposes of this book. A claim already made repeatedly in the previous two chapters is that work directed at success in the Turing test is neither genuine nor useful AI research. In particular, the point will be stressed that, irrespective of whether or not Turing's 1950 paper provided one, the last thing that AI has needed since 1966 is an operational definition of intelligence.
Few, if any, philosophers and AI researchers would assent to Assertion 3 being stated boldly. However, the influence of Alan Turing and his 1950 paper on the history of AI has been so profound that such mistaken claims can have a significant influence at a subconscious or subcultural level.
In any case, the passing of forty years gives sufficient historical perspective to enable us to begin to debate the way in which Computing Machinery and Intelligence has influenced the development of AI. The basic theme of this paper is that the influence of Turing's 1950 paper has been largely unfortunate. This is not through any fault of the paper, but is rather a consequence of the historical circumstances that existed at the time of its writing and some of the pressures that have affected the subsequent development of AI.

Some Consequences of Misinterpretation of Computing Machinery and Intelligence

The main consequence of perceiving intelligence in terms of some sort of imitation of human performance, such as success in the imitation game, is that AI research and experiment has paid far too much attention to the development of machinery and programs that seek directly or indirectly to imitate human performance. It might, at first, be thought that this was inevitable since human behaviour is the only practical clue to the nature of intelligence that is readily available. However, this is a mistaken view. We know so very little about the nature of human intelligence that we cannot produce a definition that is of use to an AI engineer. Such a definition would have to make no direct reference to either humans or machines.
This focus of AI research on imitation of human performance has at least three unfortunate consequences. First it does not seem to have been very productive. Second, as I have argued at length elesewhere (Whitby 1988) it is unlikely to lead to profitable or safe applications of AI. New technology is generally taken up quickly where there is a clear deficiency in existing technologies and very slowly, if at all, where it offers only a marginal improvement over existing technologies. Even an amateur salesman of AI should be able to see that researchers should be steered away from programs that imitate human beings. The old quip about there being no shortage of natural intelligence contains an important truth. There are many safe, profitable applications for AI, but programs inspired by the imitation game are unlikely to lead towards them. This sort of research is more likely to produce interesting curiosities such as ELIZA than working AI applications.
A third unfortunate consequence is the way in which the myth that intelligence can be operationally defined as some sort of imitation of human beings has apparently exempted both philosophers and AI researchers from the rather difficult task of providing the sort of definition of intelligence that would be of use to AI. To be useful in AI research any definition of intelligence needs to be independent of human capabilities for a number of reasons. Among these are the unclear understanding of human intellectual abilities and the lack of an uncontroversial framework within which such understanding might be achieved. The academic study of human psychology is divided into factions that do not agree on basic methodological questions or on the definition of basic terms. The purpose of this chapter is not to be critical of human psychology, but simply to observe that it is not nor likely to be for some time, in a position to provide AI with the theoretical basis that would turn it from a form of research into a form of engineering.

In various other places (Yazdani and Whitby 1987, Whitby 1988) an analogy has been developed between AI and artificial flight. One feature of this analogy relevant here is the way in which direct imitation of natural flight proved a relatively fruitless avenue of research. It is true that many serious aviation pioneers did make detailed study of bird flight, the most notable being Otto Lilienthal; but it must be stressed that working aircraft were developed by achieving greater understanding of the principles of aerodynamics. The Wright brothers were extremely thorough and precise scientists. They succeeded because they were thorough in their experimental methods, whereas others had failed because they were too hasty to build aircraft based upon incomplete theoretical work. There may be some important lessons for AI research in the methodology of the Wrights 2. It is also true that our understanding of bird flight has stemmed from our knowledge of aerodynamics and not the reverse 3. If there were an imitation game type of test for flight we would probably still not be able to build a machine that could pass it. Some aircraft can imitate some features of bird flight such as a glider when soaring in a thermal, but totally convincing imitation does not exist. We do not know how to build a practical ornithopter (an aircraft that employs a birdlike wing-flapping motion), but this is not of any real importance. Some of the purposes for which we use artificial flight, such as the speedy crossing of large distances, are similar to the purposes for which natural flight has evolved, but others, such as controlling the re- entry of spacecraft, are radically different. It is clear that AI, if it is to be a useful technology, should undergo a similar development. Many of the most useful applications for AI are, and will continue to be, in areas that in no way replace or imitate natural intelligence. It is quite probable that we will never build a machine which could pass the Turing test in its entirety, but this may well be because we can see little use for such a machine, indeed it could have dangerous side-effects.
What is needed is a definition of intelligence that does not draw on our assumed intuitive knowledge of our own abilities. Such knowledge is at best vague and unreliable, and there may be Godel-like reasons for believing that we do not fully understand our own capabilities (Lucas 1961). Some writers have made tentative approaches to such a definition (Schank 1986, Yazdani 1990) and perhaps some common features are beginning to emerge, among which are that any intelligent entity must be able to form clearly discernible goals. This is a feature that is not suggested by the Turing test nor possessed by programs that have reputedly done well in the imitation game, such as: ELIZA, DOCTOR (Michie 1986), and PARRY (Colby et. al. 1972).
In considering AI as an engineering enterprise - concerned with the development of useful products - the effects of the imitation game are different but equally misleading. If we focus future work in AI on the imitation of human abilities, such as might be required to succeed in the imitation game, we are in effect building intellectual statue' when what we need are intellectual tools. This may prove to be an expensive piece of vanity. In other words, there is no reason why successful AI products should relate to us as if they were humans. They may instead resemble a workbench that enables a human operator to achieve much more than he could without it, or perhaps be largely invisible to humans in that they operate automatically and autonomously.

A More Useful Interpretation of Computing Machinery and Intelligence

If we are not to read Turing's 1950 paper as providing an operational definition of intelligence, what are we to make of it? There has, of course, been a preponderance of interpretations that stress the use of the imitation game as a test for intelligence or the ability to think. In fact, in the reported discussions about his work in the all-too-brief three years before his death in 1953 Turing seems to have to allowed this sort of interpretation to have played a significant part (Hodges 1983). Turing's toleration of such interpretations of the paper can be explained as something more than the desire for a good argument.
There is little point in being sidetracked into a discussion of Turing's actual intention at the time of his writing. This is probably correctly known in literary criticism as 'the intentionalist fallacy'. What matters is the way in which the paper has been, and is to be, interpreted. However, in order to explain the first part of a better interpretation, it is necessary to set the paper in its historical context. The 1950 paper was in many ways based upon a report written for the National Physics Laboratory in August 1948 4. This in turn, although ostensibly a technical report, drew together Turing's speculations on the possibility of building an intelligent machine that had been carried on in conversation at least as far back as 1940 at Bletchley Park.
During this period Turing (among others) was leading what Thomas Kuhn has christened a 'paradigm shift'(Kuhn 1970). This involved an understanding of what we would now call the logical and physical aspects of certain types of systems. The wartime work at Bletchley Park was crucial to the development of this paradigm shift as Turing was one of the few men who could fully appreciate what the Polish cryptanalysts had done in the years immediately preceding 1939 in discovering how the physical nature of the Enigma coding machine could be deduced from the logical nature of its output. This work also involved the building of further physical machines such as the Colossi to assist in the deciphering of the intercepted traffic, that is the logical output of another machine. The whole of computing is founded upon this understanding of the way in which physical and logical systems can be direct counterparts of each other. However, the fact that we all understand this now should not distract us from the fact that in 1940 only a few men of vision were capable of appreciating its importance. By 1950 this paradigm shift had spread more widely in computing and the sciences, but not to philosophy or the general public.
Thus a crucial part of Computing Machinery and Intelligence is devoted to pursuing the philosophical implications of applying this paradigm shift to the question of whether or not machines can think. In the imitation game, Turing picks a man and a woman because they would be obviously physically different. Turing assumes that the general public would have no difficulty in appreciating that there are physical differences between participants in the imitation game. However the observer is denied any access to the physical attributes of the participants in the imitation game and instead must try to deduce these from their logical output via a teletype. In a way it is Enigma revisited, but with human beings. Just as the war-time cryptanalists had to deduce the physical nature of the Enigma coding device by observing logical patterns in its output, so the observer in the imitation game must attempt to distinguish the physical differences between the participants by discerning differing patterns of output.
When a machine is introduced into the game the observer is again forced to view it in terms of its logical output. Turing does not need to take a view on how successful the observer might be in distinguishing the man from the woman. He simply suggests that when a machine is introduced into the game and achieves comparable levels of success in producing indistinguishable output then we can no longer attach much importance to physical differences between women, men and machines. What Turing managed creatively to show was that the paradigm shift in which he had a leading role could be applied to the familiar question, 'Can a machine think?'. The imitation game contrived a method, understandable to a wide audience, of showing what Turing and a few others had already clearly grasped: that observable physical features have a subordinate role in answering such questions.

The remaining portion of a useful interpretation of Computing Machinery and Intelligence is more relevant today. It also explains the continuing appeal of the Turing test to present day writers. This is because the paper clearly illustrates the importance of human attitudes in determining the answers to questions such as 'Could a machine think?'.
Ascribing the label 'intelligent' is not a purely technical exercise; it involves a number of moral and social dimensions. Human beings consider themselves obligated to behave in certain ways toward intelligent items.
To claim that the ascription of intelligence has moral and social dimensions is not merely to claim that it has moral and social consequences. It may well be the case that certain moral and social criteria must be satisfied before such an ascription can be made 5. In a sense it is true that we feel more at ease ascribing intelligence (and sometimes even the ability to think) to those entities with which we can have an interesting conversation than to radically different entities. This feature of intelligence ascription makes the use of any sort of operational test of intelligence with human beings very unattractive.
These moral and social dimensions to the ascription of intelligence are also covered by Computing Machinery and Intelligence. Turing wanted to ask (although he obviously could not answer), 'What would be the human reaction to the sort of machine that could succeed in the imitation game?'. If, as Turing clearly believed, digital computers could, by the end of the century, succeed in deceiving an interrogator 30 percent of the time, how would we describe such a feat? This is not primarily a technical or philosophical question, but rather a question about human attitudes. As Turing himself observed, the meaning of words such 'thinking' can change with changing patterns of usage. Although sampling human attitudes is rejected as a method of answering the question 'Can a machine think?' in the first paragraph of Computing Machinery and Intelligence, we can read the entire paper as primarily concerned with human attitudes. The contrivance of the imitation game was intended to show the importance of human attitudes, not to be an operational definition of intelligence.
Given this interpretation of the paper, it is not surprising that Turing tolerated a certain amount of misinterpretation of the role of the imitation game. The paper itself was partly an exercise in testing and changing human attitudes. Turing fully expected it to provoke a certain amount of controversy. However, in the fourth decade of research in AI this sort of controversy is no longer productive.

Conclusions

After the passage of forty-five years it is safe to assume that not only will Turing's prediction of machines succeeding in the imitation game by the end of the century not come about, but also that it will probably never be achieved. There would be little practical use for a machine aimed specifically at success in the imitation game. Furthermore, examination of AI products from a 1990's perspective prompts a high degree of cynicism about the possibility of success in the imitation game being simply an emergent property of computers with sufficient memory and performance.
It should be clear that at this stage in the development of AI there is nothing to be gained by clinging to the notion of the imitation game as an operational test for intelligence. It is now clear that we need AI for a number of practical purposes including the development of computing machinery towards being more useful. To imagine, for whatever reason, that this involves making computers more like human beings may well be a distracting vanity.
In conclusion it is worth repeating that the last thing needed by AI qua science is an operational definition of intelligence involving some sort of comparison with human beings. A Philosopher might argue that passing the Turing test, although inappropriate as an operational definition, and in no sense a sufficient condition for intelligence, is nonetheless a necessary condition of intelligence. That is to say that if we were to succeed by whatever means to produce a truly intelligent artefact, and to establish our success at this by some other set of tests, then that artefact would, of necessity, be able to pass the Turing test. There seems no good reason to believe even this weak justification for the test. Were this justification true we would would be prepared to use passing or failing the Turing test as a criterion for general intelligence in human beings. One of the main reasons that we do not is the obvious validity in this case of French's observations that the Turing test does not test for general intelligence, but for cultural similarity (French 1996). Indeed it has been persuasively argued by Bringsjord (Bringsjord 1995) that neither the Turing test, nor any proposed derivative of it, nor any foreseeable derivative of it is capable of testing for the sort of 'inner life' or 'consciousness' in which philosophers are interested.
The clear need in A.I. qua science is for an account of intelligence which makes no direct reference to either humans or machines. This is analogous to to the account provided by the science of aerodynamics in the field of artificial flight. AI qua engineering should not be distracted into direct copying of human performance and methods. There is no reason to assume that this is either an easy task, nor that it is likely to produce useful products.

Notes

This document is based upon a paper presented at the Turing 1990 Colloquium

1] The (somewhat arbitrary) dates in this history are derived from the first publications describing ELIZA (Weizenbaum 1966) and PARRY (Colby et al 1972) and the Turing 1990 Colloquium.
2] There are two features of the Wright's methodology which contrast sharply with other contemporary experimenters and which may have relevance to contemporary AI. First they spent a good deal of time looking at the work, both successful and unsuccessful of previous aviation experimenters. They developed a coherent account of these successes and failures. Second and remarkably, unlike most of their contemporaries they had a full appreciation of the need to control any aircraft in flight addition to simply getting it airborne. (Mondey 1977)
3] In 1928 the first successful soaring flight was made in a glider; the aircraft climbing in the rising air of a thermal. Birds had been observed doing so for centuries, but at this time many biologists maintained that they could do so by virtue of the air trapped within the hollow bones of their wings being expanded by the heat of the sun. In this case a young pilot was able to overturn biological theory.
4] Eventually published (though perhaps incorrectly dated) in Meltzer, B. & Michie, D. (eds.) Machine Intelligence 5, Edinburgh University Press.
5] This should not be read as a claim that the ascription of intelligence to some entity should be made purely according to moral or social criteria. This is an interesting problem which deserves further attention. For a discussion of this issue see Torrance 1986.

References

Bringsjord, S. (1995) Could, How Could We Tell if, and Why Should - Androids Have Innere Lives, in Ford, K. Glymour C. and Hayes, P.J. (eds) Android Epistemology, Cambridge Ma., MIT Press, pp.93-121.

Colby, K.M., Hilf, F.D., Sylvia Weber, and Kraemer, H.C. (1972) Turing-Like Indistinguishability Tests for the Validation of a Computer Simulation of Paranoid Processes in A.I., 3 (1972) pp199-222.

French, R. (1996) in Clark, A. and Millican, P. (eds) Essays in Honour of Alan Turing, O.U.P. (in press)

Hodges, A. (1983) Alan Turing , The Enigma of Intelligence, London, Unwin, pp.413-446.

Lucas, J.R. (1961) 'Minds Machines and Godel' in Philosophy Vol. XXXVI pp. 112-27.

Kuhn, T.S. (1970) The Structure of Scientific Revolutions, (2nd ed.), University of Chicago Press. Mondey, D. (ed)(1977)The International Encyclopaedia of Aviation, Octopus, London pp. 38-49.

Michie. D. (1986) On Machine Intelligence (2nd ed.), Chichester, Ellis Horwood, pp. 241-2.

Michie, D. (1996) in Clark, A. and Millican, P. (eds) Essays in Honour of Alan Turing, O.U.P. (in press)

Schank, R.C. (1986) Explanation Patterns, Lawrence Erlbaum, London.

Torrance, S. (1986), Ethics Mind and Artifice in Artificial Intelligence for Society Gill, K.S. (ed), Chichester, John Wiley and Sons, pp. 65-71

Turing, A.M.(1950) 'Computing Machinery and Intelligence', Mind, vol.LIX, No. 236. pp.433-460 .

Weizenbaum, J. (1966), 'ELIZA - A Computer Program For the Study of Natural Language Communication Between Man and Machine' Communications of the A.C.M. , vol.9 no.1 pp.36-45.

Whitby, B.R. (1988) AI: A Handbook of Professionalism Ellis Horwood, Chichester, pp.11-23 and Chapter 2.

Yazdani, M. and Whitby, B.R.(1987) 'Artificial Intelligence: Building Birds out of Beer Cans Robotica 5 pp. 89-92

Yazdani, M. (1986) in Yazdani, M. (ed) Artificial Intelligence, London, Chapman and Hall, p.263.