Sections Supplements

‘Seemingly Bright’

What Is the Future of the So-called ‘Watson’ Technology?

James Allan, co-director of UMass Amherst’s Center for Intelligent Information Retrieval.

James Allan, co-director of UMass Amherst’s Center for Intelligent Information Retrieval.

The recent Jeopardy! contests featuring IBM’s Watson computer was a success on a number of levels, from television ratings to exposure for IBM and its products. In a quieter fashion, the show and the computer have shed some light on what’s known as question-answering, or QA, technology, and the important work being done in this realm by UMass Amherst and its Center for Intelligent Information Retrieval, which is hard at work finding new and better ways to search materials, extract information, and help people make sense of the information they retrieve.

The correct response, or question, in Jeopardy! parlance, was, “what is Chicago?”
The category was U.S. Cities, and the answer (paraphrasing) was ‘this city’s two airports are named after a war hero and a World War II battle.’
Watson, the IBM-designed supercomputer that cost between $100 million and $2 billion to develop, depending on who is answering that question, ‘wrote’ “what is Toronto” in its Final Jeopardy space.
Hmmmmm.
“That just goes to show that computers can’t do some things as well as humans,” said James Allan, a computer scientist at UMass Amherst and co-director, along with Bruce Croft, of the university’s Center for Intelligent Information Retrieval (CIIR). While not a real fan of the show, he watched every minute of the Jeopardy! episodes involving Watson and his routing of the show’s most accomplished human champions, because UMass — and specifically its CIIR — was one of eight universities collaborating with IBM on the question-answering, or QA, technology behind the company’s new computing system.
So how could Watson, the system named after IBM founder Thomas J. Watson, have made a mistake that most grade-school students wouldn’t have?
It’s fairly simple, said Allan, noting that the computer, in its sophisticated search of a host of databases for the answer, focused on the ‘two airports/war hero’ aspect of the query, and not as much (obviously) on the ‘U.S. Cities’ part. (For the record, the question refers to Chicago’s O’Hare and Midway airports, but one of Toronto’s airports is named after William “Billy” Bishop, a Canadian World War I fighter ace.)
“Toronto’s case is very similar, but not exactly the same as Chicago’s,” Allan explained, adding that the search, in this instance, went in a similar fashion to another of Watson’s few missteps.
The question (answer) from the category Alternate Meanings was ‘stylish elegance or students who all graduated together.’ Watson’s reply was ‘chic’ — other options it considered were ‘panache’ and ‘Vera Wang’ (more on how it could have arrived at such candidates later) — while the correct response was ‘class.’ “Here, ‘stylish elegance’ was obviously more important to Watson,” said Allan, adding that ‘chic’ clearly doesn’t have a definition approaching a ‘group of classmates.’
But while Watson had some wrong answers that led to some serious head-scratching, and even a snicker from Jeopardy! host Alex Trebek, the focus should certainly be on how many questions it got right, said Allan, noting that the computer exceeded the expectations of all but the most optimistic of the individuals involved in the project. And the stunning performance, coupled with vast amounts of hype — television commercials on the Jeopardy! experience were still running weeks after the shows aired — has brought QA technology and its more practical uses to the forefront.
Some of the more obvious of these are in health care, said Allan, noting that IBM, in tandem with voice-recognition software maker Nuance, is already working to produce a medical version of the computer system. It will use speech recognition, super-fast processing, and massive databases to help doctors and nurses find answers to questions from and about patients.
The intelligence sector is another logical landing place for Watson-like technology, he said, adding that a such a system can and likely will be used in “any situation in which getting the answer quickly is an important step in the process.”
Meanwhile, Watson’s exploits have brought some attention — MIT received considerably more — to UMass and the CIIR. Launched in the late 1990s, the center’s work comes down to one word — search — and how to do it better, faster, and more efficiently.
“We look for ways to search for things, ways to organize materials, ways to help people build queries, ways to present what’s on there,” he said. “We’re very interested in issues that are new and interesting; more and more, people are using streaming media, stuff that comes at you all the time, like Twitter feeds and news feeds.
“We’re focusing on finding ways to use computers to help pull from that fire hose of information coming at you stuff that’s interesting to you and also different from what you’ve already seen,” he continued. “In other words, we want to answer the question, ‘how do you find new and interesting stuff in all the stuff that’s constantly arriving?”
For this issue and its focus on technology, BusinessWest takes an indepth look at the Watson technology and its vast potential, and also sheds some light on the ongoing work at the CIIR and how computer scientists at UMass continue to search for answers to the question of how to make computers search better and faster.

It’s Elementary
Allan admitted to BusinessWest that, deep down, he didn’t think Watson would beat his human opponents, and he never imagined the kind of drubbing the computer eventually administered.
This mindset had more to do with the quality of the computer’s opponents than any lack of confidence in the system he and his team helped create. In the end, though, he learned at least a few things — first, that Watson was indeed quite skillful in searching and then finding the right answer, and second, that he was really good at ‘buzzing in,’ as it’s called in Jeopardy!
Actually, some would say the computer had an unfair advantage in that regard, said Allan, noting that many Jeopardy! players don’t fare well on the show, not because they lack smarts, but because they lack good timing with that buzzer. Hitting it too early locks a contestant out for a costly fraction of a second, he explained, and hitting it too late isn’t good, either, obviously.
Watson, because it’s a machine, essentially had perfect timing with the buzzer, he said, adding that he, like all viewers, could see some frustration on the part of Watson’s opponents, and especially Ken Jennings, who knew many of the answers but simply couldn’t buzz in faster than the computer.
That skill — not to mention Watson’s odd ‘Daily Double’ wagers (those certainly weren’t round numbers) — came from some other contributors, said Allen, noting that the CIIR’s assistance came in the form of information retrieval, or text search. This capability of QA technology is the first step taken when looking for text that’s most likely to contain accurate answers. The system’s deep language-processing capabilities then analyze the returned information to find the actual answers within that text.
What IBM essentially borrowed from UMass and adopted for its own use is an open-source software product called Indri that effectively initiates and facilitates the computer’s search for the information that will ultimately lead to an answer, and preferably the right one.
“The question you have essentially becomes a search request,” he explained. “And a search engine, just like a Web-search engine, goes out and searches all the text, the unstructured free text we have available, to pull back portions of documents that seem likely to have an answer. The way that works in a question-answering system is that all those documents are then passed on to the next steps, which do a lot more deep processing to try to extract the specific answer.”
There were many components to Watson’s success, Allan continued, but the search software was critical.
“Search is a very important first step in the question-answering process. If we don’t find the answer, then the system can’t work,” he explained. “If the search step fails early on, all the rest of it doesn’t matter.”
The process of taking a question and arriving at an answer has several components, said Allan, all of them handled in about three seconds total. Specifically, the computer:
• Identifies plausable targets;
• Builds queries to find answers;
• Searches unstructured text for matching text;
• Extracts candidates from the text;
• Looks for evidence for each candidate;
• Scores the candidates; and
• Ranks them and decides if it’s confident enough to choose one.

Nowhere to Hyde
Using some fairly simple language, Allan explained how it all works, using a question from one of the Jeopardy! shows. From the category Literary Character APB (all points bulletin) came the question (answer) ‘Wanted for killing Sir Danvers Carew; appearance: pale & dwarfish; seems to have a split personality.’ Here’s how Watson arrived at the correct answer (question): ‘Hyde,’ as in Mr. Hyde, the alter ego of Dr. Jekyll.
First, it looked at possible targets for the answer (question), said Allan, meaning something or someone that can be wanted, has an appearance, is involved in a killing, and has a personality — more specifically, a split one. The computer then looks for strings that fill all of those, working on the premise that the target is probably a noun, possibly a person (though other animate objects fit), and the category’s key words are ‘literary,’ ‘character,’ and ‘ABP.’
The computer then builds a query from the question (answer), Allan continued, with some words and phrases becoming important: in this case, ‘killing,’ ‘Danvers Carew,’ ‘pale,’ ‘dwarfish,’ and ‘split personality.’ Then, using the CIIR’s Indri search engine, the computer searches text sources — encyclopedia articles, dictionaries, books, newspapers, movie scripts, and some added material needed for Jeopardy!, including the complete works of William Shakespeare.
Next, the computer extracts candidates from the text it searches, he continued, adding that, in this case, it would come across passages such as “Sir Danvers Carew: member of Parliament who is murdered by Hyde,” “Mr. Hyde was pale and dwarfish,” “Mr. Hyde-type split personality,” and “Sherlock Holmes solves the mystery surrounding Jekyll and Hyde.” It would then identify candidates such as:
• Sir Danvers Carew, member of Parliament;
• Murdered, Hyde;
• Sherlock Holmes, mystery; and
• Jekyll.
It would then look for evidence to support candidates, or not support them, as the case may be. ‘Parliament,’ for example, has no personality, and it’s also real, not a literary character; ‘mystery’ is not a character; ‘murdered’ is not a noun; but ‘Hyde’ is a person, has a connection to Jekyll, was the killer of Carew, was wanted, had a split personality, and is fictional.
Fast-forwarding, Allan said Watson eventually came up with three candidates — ‘Hyde,’ ‘Sherlock Holmes,’ and ‘Dracula’ (who indeed had a split personality), and ranked the three in terms of its confidence level — 71%, 15%, and 7%, respectively, and thus chose ‘Hyde.’

Creating a Buzz
That lengthy tutorial explains, sort of, how and why Watson kicked ass on Jeopardy!, said Allan, but it also shows the vast potential for this technology to help users answer questions when there is much more at stake than winning a game show.
Noting that the Watson system used for Jeopardy! is about the size of 10 full-size refrigerators, Allan said that model doesn’t have very many practical, or affordable, applications. But the basic technology (not the buzzing-in capability) does.
“You can get a lot of Watson’s power without all of Watson,” he explained, adding that IBM is already marketing the technology in a smaller, slightly slower package, especially to the health care community, where there is a great deal of potential.
“What is the recommended dose of ibuprofen for a 10-year-old child? — that’s the kind of question this technology can answer and answer quickly,” he explained, adding that there are myriad other examples of medically related questions that don’t involve cause and effect, or subjective thinking, that a computer can help with.
Intelligence analysis, from both business and national-security perspectives, is another potential landing spot, he said, stressing again that the technology is most relevant in realms where correct answers — and speed — are equally critical. “‘Name the people who were seen with Gadhafi in the last year?’ — that’s the kind of question that can be answered.”
As for the CIIR, meanwhile, the Jeopardy! project may be over, but the work to find new and better ways to extract information from a host of databases goes on.
“We have a large project going on now concerning why people want to search books and how we can do that better,” he said. “Some of the early work we’re doing is in collaboration with humanities scholars who want to look at old books, read them, analyze them, and understand what’s happening.”
Meanwhile, Allan said he is spending a good deal of his time involved with something called ‘information literacy.’
Elaborating, he said this genre, if it can be called that, involves helping someone looking at a Web page decide whether — and how much — to trust the material in question.
“We don’t want to tell them whether it’s right or wrong, necessarily,” he explained. “But we want to help them look at it and be literate about material and look at it critically.”
As an example, he cited a theoretical cancer-treatment page.
“There are a lot of bogus cancer treatments out there, but the Web sites look very good; they’re beautifully crafted and seem authoritative,” he explained. “We want to help people look at something like this and decide whether it is to be believed, or how to go about deciding.”
Coming up with answers to such questions will likely take years, not a few seconds, said Allan, adding quickly that, while IBM’s computer amazed those who watched it, the realm of information retrieval and analysis is still in its infancy, and the art of the search is still a work in progress.

Class Act
Watson’s ‘Toronto’ answer shows that QA technology, while it has witnessed significant advances over the years, still has some limitations, said Allan.
But the system’s performance — not the final scores in relation to its human opponents, necessarily, but the number of questions it answered correctly — shows that great strides have been made in enhancing a computer’s ability to understand language, take a question, and efficiently search for the answer.
Where this technology will wind up and when are questions no one can fully answer at this point, he continued, but the practical applications are many.
So, for this exercise, Watson went to the head of the class — and not the ‘chic’ — and showed a good deal of style in the process. n

George O’Brien can be reached
at [email protected]“That just goes to show that computers can’t do some things as well as humans,” said James Allan, a computer scientist at UMass Amherst and co-director, along with Bruce Croft, of the university’s Center for Intelligent Information Retrieval (CIIR). While not a real fan of the show, he watched every minute of the Jeopardy! episodes involving Watson and his routing of the show’s most accomplished human champions, because UMass — and specifically its CIIR — was one of eight universities collaborating with IBM on the question-answering, or QA, technology behind the company’s new computing system.
So how could Watson, the system named after IBM founder Thomas J. Watson, have made a mistake that most grade-school students wouldn’t have?
It’s fairly simple, said Allan, noting that the computer, in its sophisticated search of a host of databases for the answer, focused on the ‘two airports/war hero’ aspect of the query, and not as much (obviously) on the ‘U.S. Cities’ part. (For the record, the question refers to Chicago’s O’Hare and Midway airports, but one of Toronto’s airports is named after William “Billy” Bishop, a Canadian World War I fighter ace.)
“Toronto’s case is very similar, but not exactly the same as Chicago’s,” Allan explained, adding that the search, in this instance, went in a similar fashion to another of Watson’s few missteps.
The question (answer) from the category Alternate Meanings was ‘stylish elegance or students who all graduated together.’ Watson’s reply was ‘chic’ — other options it considered were ‘panache’ and ‘Vera Wang’ (more on how it could have arrived at such candidates later) — while the correct response was ‘class.’ “Here, ‘stylish elegance’ was obviously more important to Watson,” said Allan, adding that ‘chic’ clearly doesn’t have a definition approaching a ‘group of classmates.’
But while Watson had some wrong answers that led to some serious head-scratching, and even a snicker from Jeopardy! host Alex Trebek, the focus should certainly be on how many questions it got right, said Allan, noting that the computer exceeded the expectations of all but the most optimistic of the individuals involved in the project. And the stunning performance, coupled with vast amounts of hype — television commercials on the Jeopardy! experience were still running weeks after the shows aired — has brought QA technology and its more practical uses to the forefront.
Some of the more obvious of these are in health care, said Allan, noting that IBM, in tandem with voice-recognition software maker Nuance, is already working to produce a medical version of the computer system. It will use speech recognition, super-fast processing, and massive databases to help doctors and nurses find answers to questions from and about patients.
The intelligence sector is another logical landing place for Watson-like technology, he said, adding that a such a system can and likely will be used in “any situation in which getting the answer quickly is an important step in the process.”
Meanwhile, Watson’s exploits have brought some attention — MIT received considerably more — to UMass and the CIIR. Launched in the late 1990s, the center’s work comes down to one word — search — and how to do it better, faster, and more efficiently.
“We look for ways to search for things, ways to organize materials, ways to help people build queries, ways to present what’s on there,” he said. “We’re very interested in issues that are new and interesting; more and more, people are using streaming media, stuff that comes at you all the time, like Twitter feeds and news feeds.
“We’re focusing on finding ways to use computers to help pull from that fire hose of information coming at you stuff that’s interesting to you and also different from what you’ve already seen,” he continued. “In other words, we want to answer the question, ‘how do you find new and interesting stuff in all the stuff that’s constantly arriving?”
For this issue and its focus on technology, BusinessWest takes an indepth look at the Watson technology and its vast potential, and also sheds some light on the ongoing work at the CIIR and how computer scientists at UMass continue to search for answers to the question of how to make computers search better and faster.

It’s Elementary
Allan admitted to BusinessWest that, deep down, he didn’t think Watson would beat his human opponents, and he never imagined the kind of drubbing the computer eventually administered.
This mindset had more to do with the quality of the computer’s opponents than any lack of confidence in the system he and his team helped create. In the end, though, he learned at least a few things — first, that Watson was indeed quite skillful in searching and then finding the right answer, and second, that he was really good at ‘buzzing in,’ as it’s called in Jeopardy!
Actually, some would say the computer had an unfair advantage in that regard, said Allan, noting that many Jeopardy! players don’t fare well on the show, not because they lack smarts, but because they lack good timing with that buzzer. Hitting it too early locks a contestant out for a costly fraction of a second, he explained, and hitting it too late isn’t good, either, obviously.
Watson, because it’s a machine, essentially had perfect timing with the buzzer, he said, adding that he, like all viewers, could see some frustration on the part of Watson’s opponents, and especially Ken Jennings, who knew many of the answers but simply couldn’t buzz in faster than the computer.
That skill — not to mention Watson’s odd ‘Daily Double’ wagers (those certainly weren’t round numbers) — came from some other contributors, said Allen, noting that the CIIR’s assistance came in the form of information retrieval, or text search. This capability of QA technology is the first step taken when looking for text that’s most likely to contain accurate answers. The system’s deep language-processing capabilities then analyze the returned information to find the actual answers within that text.
What IBM essentially borrowed from UMass and adopted for its own use is an open-source software product called Indri that effectively initiates and facilitates the computer’s search for the information that will ultimately lead to an answer, and preferably the right one.
“The question you have essentially becomes a search request,” he explained. “And a search engine, just like a Web-search engine, goes out and searches all the text, the unstructured free text we have available, to pull back portions of documents that seem likely to have an answer. The way that works in a question-answering system is that all those documents are then passed on to the next steps, which do a lot more deep processing to try to extract the specific answer.”
There were many components to Watson’s success, Allan continued, but the search software was critical.
“Search is a very important first step in the question-answering process. If we don’t find the answer, then the system can’t work,” he explained. “If the search step fails early on, all the rest of it doesn’t matter.”
The process of taking a question and arriving at an answer has several components, said Allan, all of them handled in about three seconds total. Specifically, the computer:
• Identifies plausable targets;
• Builds queries to find answers;
• Searches unstructured text for matching text;
• Extracts candidates from the text;
• Looks for evidence for each candidate;
• Scores the candidates; and
• Ranks them and decides if it’s confident enough to choose one.

Nowhere to Hyde
Using some fairly simple language, Allan explained how it all works, using a question from one of the Jeopardy! shows. From the category Literary Character APB (all points bulletin) came the question (answer) ‘Wanted for killing Sir Danvers Carew; appearance: pale & dwarfish; seems to have a split personality.’ Here’s how Watson arrived at the correct answer (question): ‘Hyde,’ as in Mr. Hyde, the alter ego of Dr. Jekyll.
First, it looked at possible targets for the answer (question), said Allan, meaning something or someone that can be wanted, has an appearance, is involved in a killing, and has a personality — more specifically, a split one. The computer then looks for strings that fill all of those, working on the premise that the target is probably a noun, possibly a person (though other animate objects fit), and the category’s key words are ‘literary,’ ‘character,’ and ‘ABP.’
The computer then builds a query from the question (answer), Allan continued, with some words and phrases becoming important: in this case, ‘killing,’ ‘Danvers Carew,’ ‘pale,’ ‘dwarfish,’ and ‘split personality.’ Then, using the CIIR’s Indri search engine, the computer searches text sources — encyclopedia articles, dictionaries, books, newspapers, movie scripts, and some added material needed for Jeopardy!, including the complete works of William Shakespeare.
Next, the computer extracts candidates from the text it searches, he continued, adding that, in this case, it would come across passages such as “Sir Danvers Carew: member of Parliament who is murdered by Hyde,” “Mr. Hyde was pale and dwarfish,” “Mr. Hyde-type split personality,” and “Sherlock Holmes solves the mystery surrounding Jekyll and Hyde.” It would then identify candidates such as:
• Sir Danvers Carew, member of Parliament;
• Murdered, Hyde;
• Sherlock Holmes, mystery; and
• Jekyll.
It would then look for evidence to support candidates, or not support them, as the case may be. ‘Parliament,’ for example, has no personality, and it’s also real, not a literary character; ‘mystery’ is not a character; ‘murdered’ is not a noun; but ‘Hyde’ is a person, has a connection to Jekyll, was the killer of Carew, was wanted, had a split personality, and is fictional.
Fast-forwarding, Allan said Watson eventually came up with three candidates — ‘Hyde,’ ‘Sherlock Holmes,’ and ‘Dracula’ (who indeed had a split personality), and ranked the three in terms of its confidence level — 71%, 15%, and 7%, respectively, and thus chose ‘Hyde.’

Creating a Buzz
That lengthy tutorial explains, sort of, how and why Watson kicked ass on Jeopardy!, said Allan, but it also shows the vast potential for this technology to help users answer questions when there is much more at stake than winning a game show.
Noting that the Watson system used for Jeopardy! is about the size of 10 full-size refrigerators, Allan said that model doesn’t have very many practical, or affordable, applications. But the basic technology (not the buzzing-in capability) does.
“You can get a lot of Watson’s power without all of Watson,” he explained, adding that IBM is already marketing the technology in a smaller, slightly slower package, especially to the health care community, where there is a great deal of potential.
“What is the recommended dose of ibuprofen for a 10-year-old child? — that’s the kind of question this technology can answer and answer quickly,” he explained, adding that there are myriad other examples of medically related questions that don’t involve cause and effect, or subjective thinking, that a computer can help with.
Intelligence analysis, from both business and national-security perspectives, is another potential landing spot, he said, stressing again that the technology is most relevant in realms where correct answers — and speed — are equally critical. “‘Name the people who were seen with Gadhafi in the last year?’ — that’s the kind of question that can be answered.”
As for the CIIR, meanwhile, the Jeopardy! project may be over, but the work to find new and better ways to extract information from a host of databases goes on.
“We have a large project going on now concerning why people want to search books and how we can do that better,” he said. “Some of the early work we’re doing is in collaboration with humanities scholars who want to look at old books, read them, analyze them, and understand what’s happening.”
Meanwhile, Allan said he is spending a good deal of his time involved with something called ‘information literacy.’
Elaborating, he said this genre, if it can be called that, involves helping someone looking at a Web page decide whether — and how much — to trust the material in question.
“We don’t want to tell them whether it’s right or wrong, necessarily,” he explained. “But we want to help them look at it and be literate about material and look at it critically.”
As an example, he cited a theoretical cancer-treatment page.
“There are a lot of bogus cancer treatments out there, but the Web sites look very good; they’re beautifully crafted and seem authoritative,” he explained. “We want to help people look at something like this and decide whether it is to be believed, or how to go about deciding.”
Coming up with answers to such questions will likely take years, not a few seconds, said Allan, adding quickly that, while IBM’s computer amazed those who watched it, the realm of information retrieval and analysis is still in its infancy, and the art of the search is still a work in progress.

Class Act
Watson’s ‘Toronto’ answer shows that QA technology, while it has witnessed significant advances over the years, still has some limitations, said Allan.
But the system’s performance — not the final scores in relation to its human opponents, necessarily, but the number of questions it answered correctly — shows that great strides have been made in enhancing a computer’s ability to understand language, take a question, and efficiently search for the answer.
Where this technology will wind up and when are questions no one can fully answer at this point, he continued, but the practical applications are many.
So, for this exercise, Watson went to the head of the class — and not the ‘chic’ — and showed a good deal of style in the process.

George O’Brien can be reached at [email protected]