On Monday, 18 June, IBM hosted a debate at its offices in San Francisco. The debaters included two skilled human debaters and an IBM application, Project Debater. Project Debater was represented at the debate by a tall black rectangle with a computer screen, but, in fact, was an AI application, running on IBM’s Watson cognitive computing platform on a supercomputer located elsewhere. Project Debater did not have access to the Internet but did have access to a corpus of some 300 million documents, including newspaper archives, journals, and, apparently, most of Wikipedia.
The format of the debate was simple. One human and Project Debater paired off. Each had four minutes to state a position, four minutes to counter arguments made by their opponent, and two minutes to summarize. Unlike a college debate, which is judged by a single judge who guarantees that both debaters adhere to rules, logic, and ultimately decides who made the best case, this debate was judged by the audience, who were asked which of the two debaters inclined them to shift their opinion to agree with that debater’s position.
The debaters did not know the questions in advance. Project Debater had been trained in analyzing documents and building arguments but didn’t have any training in the topic to be debated.
The first debate, against Noa Ovadia was: “Resolved that the government should subsidize space exploration.” Ovadia took the negative position and won. The second debate, against Dan Zafrir, was: “Resolved that we should increase the use of telemedicine.” Project Debater argued the pro position and won. In both cases, the audience agreed that the humans were the more effective speakers, but in the second case, the audience believed that Project Debater made the stronger argument. (It’s worth noting that the audience included a significant number of IBM researchers, who were rooting for Project Debater.)
Most readers may recall that Watson was originally developed in conjunction with IBM’s effort to create the Jeopardy!-playing application that defeated two human players in 2011. At that time, it was necessary to type out the questions in advance and feed them to the IBM application just as the question was being asked. At the time, the Jeopardy! application could not parse spoken human sentences fast enough to allow it to understand the question and then identify find an answer within three seconds — the time required to remain competitive with the human players.
The Project Debater application had no such time restraints; it was able to listen to people speaking, understand the arguments made by the human debaters, and answer them by speaking in its own (female) voice. In this case, the real emphasis was on understanding an issue that was presented to both Project Debater and its opponent, developing an argument, and then presenting the argument in a clearly argued and well-spoken manner.
Why Is This Significant?
Although we should keep in mind that this was a demonstration put on by IBM, Project Debater effectively showed that it could interact in a human debate scenario. This becomes a bit clearer after examining Project Debater's functionality more closely (Figure 1).
Project Debater functions by first listening to a resolution or argument, determining what is being resolved (and whether it is being asked to defend or rebut the proposition), and then scanning its corpus to locate relevant information. Next, it analyzes and uses this information to determine an appropriate response. It then assembles a persuasive argument or rebuttal in the form of a complete narrative — including using key points and evidence (e.g., experts' quotes) to support its argument. It then presents its case or rebuttal to the audience in a conversational manner to persuade them to accept its point of view for or against the argument.
Most historical approaches to “reasoning” have relied on formalized versions of logic and variations on the scientific method. In the 1970s, however, it became popular for schools to offer an alternative approach — sometimes called “Reasoning Skills,” and more recently called “Argumentation” — that focuses on the more practical issues involved in convincing people of your point of view. These informal approaches emphasized the importance of a variety of contextual, probabilistic, political, and psychological techniques that humans rely on when they argue. Project Debater, for example, when arguing for government support of space exploration, argued that a government’s prestige depended, in part, on its ability to be active in space, and that space exploration would likely inspire more school children to study science. While we all appreciate that there is some truth in these statements, it would be hard to demonstrate either using traditional logic. Both appeal to our “general sense of what might happen” rather than on logic or hard data.
In other words, teaching a system to argue is quite different from teaching it to reason using strictly logical methods. Argument is a much more human activity: it’s more informal, it relies on claims that are often emotional rather than logical, and it often involves fuzzy claims. Allowing all this is necessary, in many cases, because there is no correct answer. Each side, in a debate, is simply describing one of two or more possible cases, and piling up what evidence they can bring to bear to persuade their listeners that their point of view is the better one.
Think of argumentation as something that lawyers do rather than something that scientists or logicians do. Arguments seek to persuade an audience of a point of view. Project Debater proves that we can now program this type of skill into computer applications.
During the demonstration, Project Debater displayed the ability to handle some key aspects of human interaction. Not only was it able to listen to its opponent's position on an argument, formulate an answer, and respond by speaking in a conversational manner, it did so convincingly, arguing its case not just with facts but by using analogies and even throwing in some jokes. For example, in a rebuttal to an opponent's argument against government-funded space exploration, Project Debater responded by saying that “I would like to offer a different view. Subsidizing space exploration is like investing in really good tires: it may not be fun to spend the extra money, but ultimately you know that both you and everyone else on the road will be better off.”
As an aside, its often argued that the main limitation of deep neural network systems is their inability to explain their reasoning. A team might create a system to advise where to drill for oil. The system scans all kinds of data, applies its probabilistic algorithms, and gets better and better at saying where to drill. If one asks the system to explain why it chose a given location, however, it has no explanation of offer. Contrast that to Project Debater. Given a resolution, it has analyzed a mass of data and prepared an argument: governments should support space exploration. In this case, however, the application is prepared to go further and make a detailed English argument for its position. In effect, with Project Debater, IBM has evolved an emergent way of providing human users with an explanation for why Project Debater has taken a given position. It will be interesting to see how this capability evolves in the near future.
To be sure, Project Debater also demonstrated some of the current limitations of AI technology when applied to an extended conversational scenario. For example, when arguing its case for subsidized space exploration, at one point Project Debater said, “It is more important than good roads or improved schools or better healthcare” — a statement that few humans (and certainly few politicians) would use when trying to argue for subsidized space exploration.
Technical Aspects – What We Know
There are a lot of technical questions IBM has yet to address about Project Debater, including what constitutes the application's actual architecture. What we do know is that it runs on the Watson cognitive computing platform. We also know that, according to IBM researchers, Project Debater relies heavily on three advanced “pioneering” capabilities that, when combined, enable it to conduct a meaningful debate with humans. These include:
Data-driven speech writing and delivery — the ability to automatically generate a whole speech, much like an opinion article, and to deliver it persuasively
Listening comprehension — the ability to understand a long spontaneous speech made by a human opponent in order to construct a meaningful rebuttal
Human dilemma modeling — the ability to model human dilemmas and form principled arguments made by humans in different debates using knowledge graph models
Key to Project Debater's functionality is an “Argumentative Content Search Engine” (developed by IBM researchers) that employs various AI algorithms and techniques. These include neural nets (including deep learning networks), rules-based systems, NLP and speech recognition, and advanced statistical approaches, in effect making Project Debater a hybrid AI system. (IBM refers to the techniques it uses to search articles to acquire background information for its arguments as “Argumentation Mining,” and it will be interesting to see if that emerges as an independent field. We also noted in passing that one professor involved in the effort was from a “Centre for Argument Technology” – so perhaps IBM’s efforts are spawning new academic disciplines.)
Project Debater makes extensive use of neural nets, including deep neural nets for automatic speech recognition (ASR) tasks, such as automatically transcribing and parsing arguments spoken by its human debate opponents into sentences by identifying word phrases and adding appropriate punctuation. Neural nets also measure the “semantic relatedness” between documents, and are used for automatically extracting rich linguistic features of text and then for recognizing these patterns when analyzing new texts.
Deep neural nets also serve the application's text-to-speech (TTS) system, including to predict which words in a sentence or text Project Debater should emphasize when speaking and to generate speech patterns based on predictions. Other deep neural net algorithms allow the application to speak continuously and persuasively for a few minutes on a subject not known in advance, and in a manner that keeps the audience engaged. In short, deep neural nets are key to giving Project Debater the ability to speak in a clear, fluent, and persuasive voice.
Finally, Project Debater uses both rules-based and statistical approaches when analyzing its huge corpus of information to automatically generate a meaningful negation to a given claim made by a human debater about a controversial topic in conversation.
Information online about Project Debater includes technical papers on training computers in argumentation from universities and IBM research centers from around the world. Clearly, IBM stimulated a rather extensive effort as it sought to develop Project Debater, an effort that has taken about six years. The main IBM lab that proposed the project in the first place and led the development effort was IBM’s Research Lab in Israel, which has emerged as a major center of research on natural language and argumentation.
Project Debater has a ways to go, but we fully expect to see it debating again. If it follows the pattern of AlphaGo, it will improve rapidly as it gains practice.
Clearly, IBM is interested in doing more than debating academic topics. IBM believes that one future for AI is as advisors to human decision makers. It knows that humans won’t accept advice from AI systems that simply say that humans should do one thing or another, without any explanation. Thus, IBM is working on AI systems that can serve as assistants that can listen to human discussions, search massive amounts of data, develop arguments, and then explain a position in clearly argued English. Project Debater is a very impressive step in this direction.