dtam: PowerSet - a recent talk

Recently I attended a talk by PowerSet's chief scientist, natural language processing guru, Ron Kaplan. He explained how usage patterns in natural languages (e.g. English) make bag-of-word search engines suffer in precision and recall. For example, to search for "who Obama critized" one may use the query words "obama critized". In that case, a bag-of-word search engine (e.g. Google) would return documents with the phrases "Hillary critized Obama" and "McCain critized Obama" - these are cases when a search engine returned results that were irrelevant, a precision problem. In other cases, synonyms for "critized" are not matched, so documents with "Obama rejected" are not returned - this means relevant results are not returned, a recall problem.

He then went on to discuss what technologies PowerSet is built on: Lexical Functional Grammar, Transfer/Glue Semantics, Finite state morphology - technologies to learn/detect structure from free-form documents. Crowd-sourced, structured datastreams are also used - freebase and wikipedia come to mind. By acquiring content (indexing terms with lexical/semantic information), packing ambiguity into compact states, a new generation of software is ready to improve for a wide range of tasks: summarization, question answering, translation, entity extraction... As for search, with less than 1 second in query response time, and less than 1 second per sentence in indexing time, he thinks natural language search is ready for prime time. Or at least, ready to convince investors to put in more money.

A handful of my co-workers also went to the talk. Most of them were underwhelmed. "Sure it was a good talk, but I heard it two years ago," said one. "I tried something similar to his example and it didn't work," said another. To be fair, PowerSet has been over-hyped - some thought it would be a Google Killer. And by only indexing wikipedia, it would not be far-fetched to say they under-delivered. However, in terms of bringing innovation to the table, and trying to improve search experience, I like their efforts. I won't put money on them yet, but I'll be watching.

dtam

Wednesday, June 11, 2008

PowerSet - a recent talk

No comments: