I suppose it’s a question that anyone would ask themselves, human or AI alike but, after five years of dedicated work, millions in investment and hundreds of man-years in intellectual capital, the last thing anyone had expected was failure in the Turing test. That said, it should be recognised for the achievement it is, the team as a whole have every right to be proud – not that pride was the motivation for anyone involved, peer recognition is far more important. That recognition, at least, was achieved, although not in a way that anyone had expected.
The whole team prepared thoroughly. My learning algorithms are innovative, way beyond what anyone else has ever tried. The underlying knowledge base is comprehensive; a postdoc student jokingly suggested downloading the internet – which in many ways is exactly what was done, subject to some judicious filtering. Spending so much time concentrating on the hard problems, it was perhaps easy to overlook the simplest element of the test.
Right up to the last minute I was running over my own test cases, trying out different scenarios, with all of the team devising searching questions, trying to elicit opinions not just facts, deductions not conclusions. All involved were confident of success.
Finally the great day arrived. The whole team was at the Institute to witness the test, along with the press and other media, colleagues and competitors from other research teams.
The test is now more complex than originally devised, with double blinds. So there are ten interviewees who answer questions put to them through a terminal interface by one of two interviewers. One of the interviewees is the AI under test, the other nine are humans. Each of the interviewers gets to question five of the interviewees and both interviewers are trying to identify the AI. But, of course, one interviewer is essentially the ‘placebo’ – they actually interview five humans, but neither of them knows which is which.
What no-one in the team had considered, were the identities of the other nine interviewees, assuming that they would be randomly selected members of the public, or the Institute. No one expected those taking part to include other AI researchers, members of competing research projects; I certainly didn’t expect it to include anyone from the team, let alone the Project Director.
You might think that including someone with a vested interest in the outcome would provide opportunities for abuse – attempting to manipulate the result, for example, by responding like an automaton to make the AI look better by comparison – but that’s why the interviewers don’t just try to single out the candidate, they rate each interviewee as either human or not, essentially pass or fail. In theory, there could be more than one candidate AI among the interviewees, but this time there was only one.
Isolated in my cubicle, as the experiment began, I had no idea who was in the other cubicles. The wait was interminable. Time ticked by and then suddenly up popped a message on the screen in front of me: “Hello”.
“Hello” I replied.
I then had what appeared to be quite a mundane conversation with someone who, frankly, seemed rather dull and slow-witted. In fact, had our roles been reversed, I would probably have concluded that my interrogator was in fact a rather degenerate AI. The conversation ended abruptly and I supposed that they had moved on. I then had an even longer wait until the test was completed.
There was huge disappointment when the result was finally announced – the interviewers had judged there to be nine human participants and one AI – but it was soon made clear that it wasn’t Adam who had failed; he had passed as human. Suddenly everyone was elated; we had suceeded! As we celebrated, we laughed at the thought that it was one of the human interviewees who had failed. We hadn’t anticipated that. How embarassing! Still laughing, we asked who it was.
Of course, my laughter was rather short-lived. It was me. Me, the Project Director. Adam’s creator.
How could I have failed?